# Miscellaneous implementation details ## Input data and neural network models All large data files are organized in another git repository: This also includes trained model files organized under `data/processed/dawsonia_model*`. This path is provided as argument in `dawsonia digitize --model-path`, see below. ## Configuration For ease of use with command line operations and to encode document's table formats we use [TOML](https://toml.io/) configuration files. ## Command line parameters Some parameters specific to a machine / setup are often repeated. These can be saved in configuration file with sections named `[dawsonia.]`. The files are typically saved in the directory `cfg`, but can be placed anywhere and passed on to the commands as ```sh dawsonia -c cfg/dawsonia.toml ... ``` This will save some keystrokes and repeated entering of `--model-path`, `--output-path` etc. ```{note} Hyphens (-) in the command line becomes underscore (_) in the TOML file ``` ## Table formats Typically saved in the directory `table_formats`, but can be specified by the `--table-fmt-dir` command-line argument. The name of the file should correspond to what is returned by {func}`dawsonia.io.get_station_name`. There can be: - many `[versions.]` [section]s to encode aliases for different versions of table formats, - a `[default]` [section] to encode the most common table format version, and - many `[YYYY]` [section]s to specify particular version of the table format for the particular year. See {func}`dawsonia.label.read_specific_table_format` which would return the format specific to a given PDF file. ### Preprocessing The configuration file may include a section `[default.preproc]` for defining the preprocessing operations. `[YYYY.preproc]` sections are also permissible. These are sections are parsed and converted into {class}`dawsonia.typing.PreprocConfig` and gets used in {mod}`dawsonia.image_preproc`. ### Transforms Image transformations can be mentioned in a [section] `[default.transforms]`. See for example `table_formats/öland.toml`. The section should follow the specification in {class}`dawsonia.typing.Transforms`. ### Versions: Columns, rows and tables Each `[versions.]` [section] should contain three keys: - `columns`: headings of the columns - `rows`: indices corresponding to the rows, often in the first column of the table - `tables`: imply the shapes of the tables, i.e. how many rows and columns do we expect to contain handwritten text. This should be listed left-to-right and top-to-down order as it appears in the page. [section]: https://toml.io/en/v1.0.0#table ## Command-line interface The primary mode of using {program}`dawsonia` is from the command line. Typical pipeline follows execution of the following commands in order 1. `dawsonia label` 1. `dawsonia prepare` 1. `dawsonia ml --train` 1. `dawsonia ml --test` 1. `dawsonia digitize` The commands are also sensitive to: - {envvar}`DAWSONIA_DEBUG_TABLE_DETECT` - {envvar}`DAWSONIA_DEBUG_DIGITIZE` - {envvar}`DAWSONIA_DEBUG` (which activates all the above debug) environment variables. ```{eval-rst} .. click:: dawsonia.cli:typer_click_object :prog: dawsonia :nested: full ``` ## Logging See {func}`dawsonia.log.init_logger` for the logging configuration. It includes handlers for logging into the console and filesystem.