# {py:mod}`dawsonia.digitize` ```{py:module} dawsonia.digitize ``` ```{autodoc2-docstring} dawsonia.digitize :allowtitles: ``` ## Module Contents ### Classes ````{list-table} :class: autosummary longtable :align: left * - {py:obj}`_Metadata ` - ```{autodoc2-docstring} dawsonia.digitize._Metadata :summary: ``` * - {py:obj}`Statistics ` - ```{autodoc2-docstring} dawsonia.digitize.Statistics :summary: ``` * - {py:obj}`Page ` - ```{autodoc2-docstring} dawsonia.digitize.Page :summary: ``` * - {py:obj}`TableMetadata ` - ```{autodoc2-docstring} dawsonia.digitize.TableMetadata :summary: ``` * - {py:obj}`SequentialPool ` - ```` ### Functions ````{list-table} :class: autosummary longtable :align: left * - {py:obj}`make_executor ` - ```{autodoc2-docstring} dawsonia.digitize.make_executor :summary: ``` * - {py:obj}`all_file_paths ` - ```{autodoc2-docstring} dawsonia.digitize.all_file_paths :summary: ``` * - {py:obj}`digitize_book ` - ```{autodoc2-docstring} dawsonia.digitize.digitize_book :summary: ``` * - {py:obj}`digitize_page_and_write_output ` - ```{autodoc2-docstring} dawsonia.digitize.digitize_page_and_write_output :summary: ``` * - {py:obj}`init_tokenizer ` - ```{autodoc2-docstring} dawsonia.digitize.init_tokenizer :summary: ``` * - {py:obj}`load_model ` - ```{autodoc2-docstring} dawsonia.digitize.load_model :summary: ``` * - {py:obj}`digitize_table_with_model ` - ```{autodoc2-docstring} dawsonia.digitize.digitize_table_with_model :summary: ``` * - {py:obj}`check_probability_thresh ` - ```{autodoc2-docstring} dawsonia.digitize.check_probability_thresh :summary: ``` ```` ### Data ````{list-table} :class: autosummary longtable :align: left * - {py:obj}`logger ` - ```{autodoc2-docstring} dawsonia.digitize.logger :summary: ``` * - {py:obj}`DAWSONIA_DEBUG_DIGITIZE ` - ```{autodoc2-docstring} dawsonia.digitize.DAWSONIA_DEBUG_DIGITIZE :summary: ``` * - {py:obj}`all_statistics_paths ` - ```{autodoc2-docstring} dawsonia.digitize.all_statistics_paths :summary: ``` * - {py:obj}`all_probablities_paths ` - ```{autodoc2-docstring} dawsonia.digitize.all_probablities_paths :summary: ``` * - {py:obj}`all_table_meta_paths ` - ```{autodoc2-docstring} dawsonia.digitize.all_table_meta_paths :summary: ``` * - {py:obj}`app ` - ```{autodoc2-docstring} dawsonia.digitize.app :summary: ``` ```` ### API ````{py:data} logger :canonical: dawsonia.digitize.logger :value: > 'getLogger(...)' ```{autodoc2-docstring} dawsonia.digitize.logger ``` ```` ````{py:data} DAWSONIA_DEBUG_DIGITIZE :canonical: dawsonia.digitize.DAWSONIA_DEBUG_DIGITIZE :value: > None ```{autodoc2-docstring} dawsonia.digitize.DAWSONIA_DEBUG_DIGITIZE ``` ```` `````{py:class} _Metadata :canonical: dawsonia.digitize._Metadata ```{autodoc2-docstring} dawsonia.digitize._Metadata ``` ````{py:attribute} _subdir :canonical: dawsonia.digitize._Metadata._subdir :type: typing.ClassVar[str] :value: ```{autodoc2-docstring} dawsonia.digitize._Metadata._subdir ``` ```` ````{py:attribute} _ext :canonical: dawsonia.digitize._Metadata._ext :type: typing.ClassVar[str] :value: > '.json' ```{autodoc2-docstring} dawsonia.digitize._Metadata._ext ``` ```` ````{py:method} _metadata_relative_to_output_path(output_path: pathlib.Path) -> pathlib.Path :canonical: dawsonia.digitize._Metadata._metadata_relative_to_output_path :classmethod: ```{autodoc2-docstring} dawsonia.digitize._Metadata._metadata_relative_to_output_path ``` ```` ````{py:method} ensure_output_path(path) -> pathlib.Path :canonical: dawsonia.digitize._Metadata.ensure_output_path :classmethod: ```{autodoc2-docstring} dawsonia.digitize._Metadata.ensure_output_path ``` ```` ````{py:method} to_json(path: pathlib.Path) :canonical: dawsonia.digitize._Metadata.to_json ```{autodoc2-docstring} dawsonia.digitize._Metadata.to_json ``` ```` ````{py:method} from_json(path: Path | str) :canonical: dawsonia.digitize._Metadata.from_json :classmethod: ```{autodoc2-docstring} dawsonia.digitize._Metadata.from_json ``` ```` ````` `````{py:class} Statistics :canonical: dawsonia.digitize.Statistics Bases: {py:obj}`dawsonia.digitize._Metadata` ```{autodoc2-docstring} dawsonia.digitize.Statistics ``` ````{py:attribute} tables_detected :canonical: dawsonia.digitize.Statistics.tables_detected :type: int :value: > 0 ```{autodoc2-docstring} dawsonia.digitize.Statistics.tables_detected ``` ```` ````{py:attribute} predictions_total :canonical: dawsonia.digitize.Statistics.predictions_total :type: int :value: > 0 ```{autodoc2-docstring} dawsonia.digitize.Statistics.predictions_total ``` ```` ````{py:attribute} predictions_above_thresh :canonical: dawsonia.digitize.Statistics.predictions_above_thresh :type: int :value: > 0 ```{autodoc2-docstring} dawsonia.digitize.Statistics.predictions_above_thresh ``` ```` ````{py:attribute} predictions_empty_value :canonical: dawsonia.digitize.Statistics.predictions_empty_value :type: int :value: > 0 ```{autodoc2-docstring} dawsonia.digitize.Statistics.predictions_empty_value ``` ```` ````{py:attribute} unset_values :canonical: dawsonia.digitize.Statistics.unset_values :type: int :value: > 0 ```{autodoc2-docstring} dawsonia.digitize.Statistics.unset_values ``` ```` ````{py:attribute} _subdir :canonical: dawsonia.digitize.Statistics._subdir :type: typing.ClassVar[str] :value: > 'statistics' ```{autodoc2-docstring} dawsonia.digitize.Statistics._subdir ``` ```` ````{py:method} compute(result: pandas.DataFrame, probablities: pandas.DataFrame, prob_thresh: float) :canonical: dawsonia.digitize.Statistics.compute ```{autodoc2-docstring} dawsonia.digitize.Statistics.compute ``` ```` ````` `````{py:class} Page :canonical: dawsonia.digitize.Page Bases: {py:obj}`dawsonia.digitize._Metadata` ```{autodoc2-docstring} dawsonia.digitize.Page ``` ````{py:attribute} image :canonical: dawsonia.digitize.Page.image :type: numpy.typing.NDArray :value: > 'field(...)' ```{autodoc2-docstring} dawsonia.digitize.Page.image ``` ```` ````{py:attribute} _subdir :canonical: dawsonia.digitize.Page._subdir :type: typing.ClassVar[str] :value: > 'pages' ```{autodoc2-docstring} dawsonia.digitize.Page._subdir ``` ```` ````{py:attribute} _ext :canonical: dawsonia.digitize.Page._ext :type: typing.ClassVar[str] :value: > '.webp' ```{autodoc2-docstring} dawsonia.digitize.Page._ext ``` ```` ````{py:method} to_image(path) :canonical: dawsonia.digitize.Page.to_image ```{autodoc2-docstring} dawsonia.digitize.Page.to_image ``` ```` ````` `````{py:class} TableMetadata :canonical: dawsonia.digitize.TableMetadata Bases: {py:obj}`dawsonia.digitize._Metadata` ```{autodoc2-docstring} dawsonia.digitize.TableMetadata ``` ````{py:attribute} table_sizes :canonical: dawsonia.digitize.TableMetadata.table_sizes :type: list[list[int]] :value: > 'field(...)' ```{autodoc2-docstring} dawsonia.digitize.TableMetadata.table_sizes ``` ```` ````{py:attribute} table_positions :canonical: dawsonia.digitize.TableMetadata.table_positions :type: list[list[float]] :value: > 'field(...)' ```{autodoc2-docstring} dawsonia.digitize.TableMetadata.table_positions ``` ```` ````{py:attribute} _subdir :canonical: dawsonia.digitize.TableMetadata._subdir :type: typing.ClassVar[str] :value: > 'table_meta' ```{autodoc2-docstring} dawsonia.digitize.TableMetadata._subdir ``` ```` ````{py:method} set(table_sizes: dawsonia.typing.TableSizes, table_pos_arrays: dawsonia.typing.TablePosArrays) :canonical: dawsonia.digitize.TableMetadata.set ```{autodoc2-docstring} dawsonia.digitize.TableMetadata.set ``` ```` ````` `````{py:class} SequentialPool(max_workers=None) :canonical: dawsonia.digitize.SequentialPool Bases: {py:obj}`concurrent.futures.Executor` ````{py:method} submit(func, *args, **kwargs) :canonical: dawsonia.digitize.SequentialPool.submit ```` ````` ````{py:function} make_executor(jobs: int) -> concurrent.futures.Executor :canonical: dawsonia.digitize.make_executor ```{autodoc2-docstring} dawsonia.digitize.make_executor ``` ```` ````{py:function} all_file_paths(output_path: pathlib.Path, subdir, suffix) :canonical: dawsonia.digitize.all_file_paths ```{autodoc2-docstring} dawsonia.digitize.all_file_paths ``` ```` ````{py:data} all_statistics_paths :canonical: dawsonia.digitize.all_statistics_paths :value: > 'partial(...)' ```{autodoc2-docstring} dawsonia.digitize.all_statistics_paths ``` ```` ````{py:data} all_probablities_paths :canonical: dawsonia.digitize.all_probablities_paths :value: > 'partial(...)' ```{autodoc2-docstring} dawsonia.digitize.all_probablities_paths ``` ```` ````{py:data} all_table_meta_paths :canonical: dawsonia.digitize.all_table_meta_paths :value: > 'partial(...)' ```{autodoc2-docstring} dawsonia.digitize.all_table_meta_paths ``` ```` ````{py:function} digitize_book(path_file: pathlib.Path, first_date: str, last_date: str, size_cell: tuple[float, float, float, float] = (1.0, 1.0, 1.0, 1.0), first_page: typing.Annotated[int, typer.Option('-f', '--first-page', help='the page number corresponding to first_date')] = 0, last_page: typing.Annotated[int, typer.Option('-l', '--last-page', help='the page number corresponding to last_date')] = 0, page_middle: typing.Annotated[int, typer.Option('-m', '--page-middle', help='X coordinate of middle of page to help the rotation correction')] = -1, table_fmt_dir: pathlib.Path = Path('table_formats'), model_path: pathlib.Path = Path('/local_disk', 'data', 'ai-for-obs', 'processed', 'dawsonia_model_2022-12-19'), prob_thresh: float = 0.8, output_path: pathlib.Path = Path('output', 'digitized'), output_text_fmt: bool = False, jobs: typing.Annotated[int, typer.Option(help='parallel jobs over pages in the book (default: max workers in the system)')] = -1, verbose: bool = False, config: typing.Annotated[pathlib.Path, typer.Option(*config_cli_names, **config_kwargs)] = Path('dawsonia.toml')) :canonical: dawsonia.digitize.digitize_book ```{autodoc2-docstring} dawsonia.digitize.digitize_book ``` ```` ````{py:function} digitize_page_and_write_output(book: dawsonia.io.Book, init_data: list[dict[str, numpy.typing.NDArray]], page_number: int, date_str: str, model_path: pathlib.Path, model_predict: collections.abc.Callable, prob_thresh: float, output_path_page: pathlib.Path, output_text_fmt: bool, debug: bool) -> tuple[int, str, dawsonia.digitize.Statistics] :canonical: dawsonia.digitize.digitize_page_and_write_output ```{autodoc2-docstring} dawsonia.digitize.digitize_page_and_write_output ``` ```` ````{py:function} init_tokenizer() :canonical: dawsonia.digitize.init_tokenizer ```{autodoc2-docstring} dawsonia.digitize.init_tokenizer ``` ```` ````{py:function} load_model(model_path: pathlib.Path, vocab_size: int, source: str = 'washington', arch: str = 'flor') -> dawsonia.ml.ml.HTRModel :canonical: dawsonia.digitize.load_model ```{autodoc2-docstring} dawsonia.digitize.load_model ``` ```` ````{py:function} digitize_table_with_model(book: dawsonia.io.Book, predictor: collections.abc.Callable[[numpy.typing.NDArray], tuple[dawsonia.typing.Prediction, dawsonia.typing.Probability]], image_page: numpy.typing.NDArray, table_pos_array: numpy.typing.NDArray, table_size: collections.abc.Iterable[int], init_data: dict[str, numpy.typing.NDArray], row_start: int = 0, col_start: int = 0, debug: bool = False) -> tuple[pandas.DataFrame, pandas.DataFrame] :canonical: dawsonia.digitize.digitize_table_with_model ```{autodoc2-docstring} dawsonia.digitize.digitize_table_with_model ``` ```` ````{py:function} check_probability_thresh(predict, probablities, debug, err_threshold=80, prob_diff_warn_threshold=50) :canonical: dawsonia.digitize.check_probability_thresh ```{autodoc2-docstring} dawsonia.digitize.check_probability_thresh ``` ```` ````{py:data} app :canonical: dawsonia.digitize.app :value: > 'Typer(...)' ```{autodoc2-docstring} dawsonia.digitize.app ``` ````