dawsonia.ml.data.reader#
Dataset reader and process.
Module Contents#
Classes#
Dataset class to read images and sentences from base (raw files) |
Data#
API#
- dawsonia.ml.data.reader.logger#
‘getLogger(…)’
- dawsonia.ml.data.reader.PartitionName: typing_extensions.TypeAlias#
None
- class dawsonia.ml.data.reader.Dataset(source, name)#
Dataset class to read images and sentences from base (raw files)
Initialization
- create_unified_dataset(target: pathlib.Path, image_input_size, max_text_length)#
- read_partitions()#
Read images and sentences from dataset.
- save_partitions(target, image_input_size, max_text_length, debug=False, partitions=None)#
Save images and sentences from dataset.
- _init_dataset()#
- _shuffle(*ls)#
- _hdsr14_car_a()#
ICFHR 2014 Competition on Handwritten Digit String Recognition in Challenging Datasets dataset reader.
- _hdsr14_car_b()#
ICFHR 2014 Competition on Handwritten Digit String Recognition in Challenging Datasets dataset reader.
- _read_orand_partitions(basedir, type_f)#
ICFHR 2014 Competition on Handwritten Digit String Recognition in Challenging Datasets dataset reader.
- _hdsr14_cvl()#
ICFHR 2014 Competition on Handwritten Digit String Recognition in Challenging Datasets dataset reader.
- _dida()#
DIDA dataset 12k digit string images and 12k digit string labels
Reference
Huseyin Kusetogullari, Amir Yavariabdi, Johan Hall, Niklas Lavesson, DIDA: The largest historical handwritten digit dataset with 250k digits, June 2021. Accessed on: June 13, 2021. Available: didadataset/DIDA.
- _bentham()#
Bentham dataset reader.
- _iam()#
IAM dataset reader.
- _rimes()#
Rimes dataset reader.
- _saintgall()#
Saint Gall dataset reader.
- _washington()#
Washington dataset reader.
- static check_text(data, max_text_length=128)#
Checks if the text has more characters instead of punctuation marks.