dawsonia.prepare#

NOTE: This used to be the change_format.py script.

Module Contents#

Functions#

new_label

Encodes a new hyphenated label in the format for Washington dataset

convert

Seperate file ID and generate new_label.

command

Creates new train, validation, test and ground truth text files in “washington” format for the HTR network and copies in image as input data for the model.

Data#

app

API#

dawsonia.prepare.app#

‘Typer(…)’

dawsonia.prepare.new_label(label: str) str[source]#

Encodes a new hyphenated label in the format for Washington dataset

dawsonia.prepare.convert(name: str, word_end_suffix: bool) tuple[str, str][source]#

Seperate file ID and generate new_label.

dawsonia.prepare.command(n_train: int, n_val: int, n_test: int, label_path: pathlib.Path = Path('/local_disk/', 'data', 'ai-for-obs', 'interim', 'label_old'), model_path: pathlib.Path = Path('/local_disk/', 'data', 'ai-for-obs', 'interim', 'model_tmp'), word_end_suffix: bool = False, source: str = 'washington', config: pathlib.Path = typer.Option('dawsonia.toml', *config_cli_names, **config_kwargs))[source]#

Creates new train, validation, test and ground truth text files in “washington” format for the HTR network and copies in image as input data for the model.

Parameters


n_train: int Number of images in training set

n_val: int Number of images in validation set

n_test: int Number if images in test set. NOTE: If n_test == -1, all the files from label_path would be used for testing

label_path: Path Path to label directory (where the pictures are located).