`dawsonia.io._pdf`#

Module Contents#

`read_pdf_book`	Read PDF book and detect pages
`log_pdf_metadata`
`check_pdf_page_range`	Analyze PDF metadata for max. pagenumbers and compare it with input parameters
`station_name_from_pdf`	Deduce weather station name from the directory name where it is stored
`year_from_pdf`
`get_pdf_pages`	Read two pages from pdf as image using pdfplumber
`set_skip_dict`	Parse string CLI skip arguments and generate a dictionary of tables, rows and columns to skip

dawsonia.io._pdf.read_pdf_book(path_file: pathlib.Path, first_page: int = 1, last_page: int = 1000000, page_middle: int | None = None, size_cell: list[float] | None = None, table_fmt_dir: pathlib.Path = Path('table_formats')) → tuple[int, int, dawsonia.io._book.Book]#: Read PDF book and detect pages

dawsonia.io._pdf.check_pdf_page_range(path_pdf, first_page=1, last_page=1000000)#

Analyze PDF metadata for max. pagenumbers and compare it with input parameters

first_page, last_page: tuple[int, int] The first and last page (with corrections if any or raises a ValueError)

dawsonia.io._pdf.station_name_from_pdf(path_pdf: pathlib.Path) → str#: Deduce weather station name from the directory name where it is stored

dawsonia.io._pdf.get_pdf_pages(path_pdf: pathlib.Path, left_page: int, right_page: int) → Iterator[dawsonia.typing.NDArray[numpy.int32]]#: Read two pages from pdf as image using pdfplumber

dawsonia.io._pdf.set_skip_dict(skip_table: list[int], skip_rows, skip_cols, table_formats) → dict[str, list]#: Parse string CLI skip arguments and generate a dictionary of tables, rows and columns to skip