dawsonia.io._pdf#

Module Contents#

Functions#

read_pdf_book

Read PDF book and detect pages

log_pdf_metadata

check_pdf_page_range

Analyze PDF metadata for max. pagenumbers and compare it with input parameters

station_name_from_pdf

Deduce weather station name from the directory name where it is stored

year_from_pdf

get_pdf_pages

Read two pages from pdf as image using pdfplumber

set_skip_dict

Parse string CLI skip arguments and generate a dictionary of tables, rows and columns to skip

Data#

logger

API#

dawsonia.io._pdf.logger#

‘getLogger(…)’

dawsonia.io._pdf.read_pdf_book(path_file: pathlib.Path, first_page: int = 1, last_page: int = 1000000, page_middle: int | None = None, size_cell: list[float] | None = None, table_fmt_dir: pathlib.Path = Path('table_formats')) tuple[int, int, dawsonia.io._book.Book]#

Read PDF book and detect pages

dawsonia.io._pdf.log_pdf_metadata(pdf: pdfplumber.PDF) None#
dawsonia.io._pdf.check_pdf_page_range(path_pdf, first_page=1, last_page=1000000)#

Analyze PDF metadata for max. pagenumbers and compare it with input parameters

Returns

first_page, last_page: tuple[int, int] The first and last page (with corrections if any or raises a ValueError)

dawsonia.io._pdf.station_name_from_pdf(path_pdf: pathlib.Path) str#

Deduce weather station name from the directory name where it is stored

dawsonia.io._pdf.year_from_pdf(path_pdf: pathlib.Path) str#
dawsonia.io._pdf.get_pdf_pages(path_pdf: pathlib.Path, left_page: int, right_page: int) Iterator[dawsonia.typing.NDArray[numpy.int32]]#

Read two pages from pdf as image using pdfplumber

dawsonia.io._pdf.set_skip_dict(skip_table: list[int], skip_rows, skip_cols, table_formats) dict[str, list]#

Parse string CLI skip arguments and generate a dictionary of tables, rows and columns to skip