dawsonia.ml.data.reader#

Dataset reader and process.

Module Contents#

Classes#

Dataset

Dataset class to read images and sentences from base (raw files)

Data#

logger

PartitionName

API#

dawsonia.ml.data.reader.logger#

‘getLogger(…)’

dawsonia.ml.data.reader.PartitionName: typing_extensions.TypeAlias#

None

class dawsonia.ml.data.reader.Dataset(source, name)#

Dataset class to read images and sentences from base (raw files)

Initialization

create_unified_dataset(target: pathlib.Path, image_input_size, max_text_length)#
read_partitions()#

Read images and sentences from dataset.

save_partitions(target, image_input_size, max_text_length, debug=False, partitions=None)#

Save images and sentences from dataset.

_init_dataset()#
_shuffle(*ls)#
_hdsr14_car_a()#

ICFHR 2014 Competition on Handwritten Digit String Recognition in Challenging Datasets dataset reader.

_hdsr14_car_b()#

ICFHR 2014 Competition on Handwritten Digit String Recognition in Challenging Datasets dataset reader.

_read_orand_partitions(basedir, type_f)#

ICFHR 2014 Competition on Handwritten Digit String Recognition in Challenging Datasets dataset reader.

_hdsr14_cvl()#

ICFHR 2014 Competition on Handwritten Digit String Recognition in Challenging Datasets dataset reader.

_dida()#

DIDA dataset 12k digit string images and 12k digit string labels

Reference

Huseyin Kusetogullari, Amir Yavariabdi, Johan Hall, Niklas Lavesson, DIDA: The largest historical handwritten digit dataset with 250k digits, June 2021. Accessed on: June 13, 2021. Available: didadataset/DIDA.

_bentham()#

Bentham dataset reader.

_iam()#

IAM dataset reader.

_rimes()#

Rimes dataset reader.

_saintgall()#

Saint Gall dataset reader.

_washington()#

Washington dataset reader.

static check_text(data, max_text_length=128)#

Checks if the text has more characters instead of punctuation marks.