dawsonia.prepare#
NOTE: This used to be the change_format.py script.
Module Contents#
Functions#
Encodes a new hyphenated label in the format for Washington dataset |
|
Seperate file ID and generate new_label. |
|
Creates new train, validation, test and ground truth text files in “washington” format for the HTR network and copies in image as input data for the model. |
Data#
API#
- dawsonia.prepare.app#
‘Typer(…)’
- dawsonia.prepare.new_label(label: str) str[source]#
Encodes a new hyphenated label in the format for Washington dataset
- dawsonia.prepare.convert(name: str, word_end_suffix: bool) tuple[str, str][source]#
Seperate file ID and generate new_label.
- dawsonia.prepare.command(n_train: int, n_val: int, n_test: int, label_path: pathlib.Path = Path('/local_disk/', 'data', 'ai-for-obs', 'interim', 'label_old'), model_path: pathlib.Path = Path('/local_disk/', 'data', 'ai-for-obs', 'interim', 'model_tmp'), word_end_suffix: bool = False, source: str = 'washington', config: pathlib.Path = typer.Option('dawsonia.toml', *config_cli_names, **config_kwargs))[source]#
Creates new train, validation, test and ground truth text files in “washington” format for the HTR network and copies in image as input data for the model.
Parameters
n_train: int Number of images in training set
n_val: int Number of images in validation set
n_test: int Number if images in test set. NOTE: If n_test == -1, all the files from label_path would be used for testing
label_path: Path Path to label directory (where the pictures are located).