dawsonia.table_detect.opencv_contours#

Module Contents#

Functions#

table_detect_opencv_contours

get_table_structure

create_bbox_array

set_nan_to_odd_bboxes

fix_missing_bboxes

page_curvature_along_x_quad

A quadratic polynomial with coefficients. Coefficients should be determined by curve-fitting

page_curvature_along_x_cubic

A cubic polynomial with coefficients. Coefficients should be determined by curve-fitting

bboxes_from_contours

cluster_axis

Cluster bboxes along axis.

Data#

logger

__all__

API#

dawsonia.table_detect.opencv_contours.logger#

‘getLogger(…)’

dawsonia.table_detect.opencv_contours.__all__#

(‘table_detect_opencv_contours’, ‘get_table_structure’, ‘cluster_axis’, ‘create_bbox_array’, ‘set_na…

dawsonia.table_detect.opencv_contours.table_detect_opencv_contours(filtered_image: numpy.typing.NDArray, thresh_value: numpy.typing.NDArray, binary_tables: numpy.typing.NDArray[numpy.bool_], table_fmt, preproc_cfg: dawsonia.typing.PreprocConfig, original_image: numpy.typing.NDArray) tuple[dawsonia.typing.TablePositions, dawsonia.typing.TableSizes, int]#
dawsonia.table_detect.opencv_contours.get_table_structure(table_fmt, preproc_cfg: dawsonia.typing.PreprocConfig, filtered_image: numpy.typing.NDArray, filtered_image_inv: numpy.typing.NDArray, thresh_value: numpy.typing.NDArray, original_image: numpy.typing.NDArray, label_tables: numpy.typing.NDArray[numpy.int64], nb_labels: int, min_nb_pixels: int, sensibility: float, list_sizes: dawsonia.typing.TableSizes, list_positions: dawsonia.typing.TablePositions) None#
dawsonia.table_detect.opencv_contours.create_bbox_array(bboxes: list[dawsonia.typing.BBoxTuple], column_labels: numpy.typing.NDArray[numpy.int64], column_bboxes_idxs: dict[dawsonia.typing.ClusterLabel, numpy.typing.NDArray[numpy.int64]], row_labels: numpy.typing.NDArray[numpy.int64], row_bboxes_idxs: dict[dawsonia.typing.ClusterLabel, numpy.typing.NDArray[numpy.int64]])#
dawsonia.table_detect.opencv_contours.set_nan_to_odd_bboxes(all_bboxes)#
dawsonia.table_detect.opencv_contours.fix_missing_bboxes(all_bboxes)#
dawsonia.table_detect.opencv_contours.page_curvature_along_x_quad(xs, y_median, a0, a1, a2)#

A quadratic polynomial with coefficients. Coefficients should be determined by curve-fitting

dawsonia.table_detect.opencv_contours.page_curvature_along_x_cubic(xs, y_median, a0, a1, a2, a3)#

A cubic polynomial with coefficients. Coefficients should be determined by curve-fitting

dawsonia.table_detect.opencv_contours.bboxes_from_contours(contours, area_range=(300, 10000), aspect_ratio_range=(0.05, 20))#
dawsonia.table_detect.opencv_contours.cluster_axis(ref_image: numpy.typing.NDArray, bboxes: collections.abc.Sequence[dawsonia.typing.BBoxTuple], axis: Literal[0, 1] = 0, distance_threshold: float | None = None, min_cluster_size: int = 0) tuple[numpy.typing.NDArray[numpy.int64], dict[dawsonia.typing.ClusterLabel, numpy.typing.NDArray[numpy.int64]]]#

Cluster bboxes along axis.

Parameters

ref_image: NDArray Image for debugging

bboxes: Iterable[BBoxTuple] Bounding boxes to be clustered

axis: int Axis along which clustering should be done. 0 identifies columns and 1 identifies rows

distance_threshold: float | None Max. distance in pixels between bbox centers within a cluster. It is the linkage distance threshold and above this clusters will not be merged.

min_cluster_size: int Minimum number of bounding bboxes to be considered a cluster

Notes

Reference for this method can be found at https://pyimagesearch.com/2022/02/28/multi-column-table-ocr/. See also sklearn.cluster.AgglomerativeClustering.