dawsonia.ml.data.preproc#

Data preproc functions:

adjust_to_see: adjust image to better visualize (rotate and transpose) augmentation: apply variations to a list of images normalization: apply normalization and variations on images (if required) preprocess: main function for preprocess text_standardize: preprocess and standardize sentence

Module Contents#

Functions#

adjust_to_see

Rotate and transpose to image visualize (cv2 method or jupyter notebook)

augmentation

Apply variations to a list of images (rotate, width and height shift, scale, erode, dilate)

normalization

Normalize list of images.

preprocess

Make the process with the input_size to the scale resize.

text_standardize

Organize/add spaces around punctuation marks.

Data#

logger

RE_DASH_FILTER

RE_APOSTROPHE_FILTER

RE_RESERVED_CHAR_FILTER

RE_LEFT_PARENTH_FILTER

RE_RIGHT_PARENTH_FILTER

RE_BASIC_CLEANER

LEFT_PUNCTUATION_FILTER

RIGHT_PUNCTUATION_FILTER

NORMALIZE_WHITESPACE_REGEX

API#

dawsonia.ml.data.preproc.logger#

‘getLogger(…)’

dawsonia.ml.data.preproc.adjust_to_see(img)#

Rotate and transpose to image visualize (cv2 method or jupyter notebook)

dawsonia.ml.data.preproc.augmentation(imgs, rotation_range=0, scale_range=0, height_shift_range=0, width_shift_range=0, dilate_range=1, erode_range=1)#

Apply variations to a list of images (rotate, width and height shift, scale, erode, dilate)

dawsonia.ml.data.preproc.normalization(imgs: list[numpy.typing.NDArray]) numpy.typing.NDArray[numpy.float32]#

Normalize list of images.

dawsonia.ml.data.preproc.preprocess(img: str | pathlib.Path | numpy.typing.NDArray | tuple[str, list[int]], input_size: tuple[int, ...])#

Make the process with the input_size to the scale resize.

Preprocess metodology based on:

H. Scheidl, S. Fiel and R. Sablatnig,
Word Beam Search: A Connectionist Temporal Classification Decoding
Algorithm, in 16th International Conference on Frontiers in Handwriting
Recognition, pp. 256-258, 2018.
dawsonia.ml.data.preproc.RE_DASH_FILTER#

‘compile(…)’

dawsonia.ml.data.preproc.RE_APOSTROPHE_FILTER#

‘compile(…)’

dawsonia.ml.data.preproc.RE_RESERVED_CHAR_FILTER#

‘compile(…)’

dawsonia.ml.data.preproc.RE_LEFT_PARENTH_FILTER#

‘compile(…)’

dawsonia.ml.data.preproc.RE_RIGHT_PARENTH_FILTER#

‘compile(…)’

dawsonia.ml.data.preproc.RE_BASIC_CLEANER#

‘compile(…)’

dawsonia.ml.data.preproc.LEFT_PUNCTUATION_FILTER#

‘!%&),.:;<=>?@\]^_`|}~’

dawsonia.ml.data.preproc.RIGHT_PUNCTUATION_FILTER#

‘”(/<=>@[\^_`{|~’

dawsonia.ml.data.preproc.NORMALIZE_WHITESPACE_REGEX#

‘compile(…)’

dawsonia.ml.data.preproc.text_standardize(text)#

Organize/add spaces around punctuation marks.