HTR model inference#

This notebook illustrates how to make image-to-text inference using the HTR-Flor model which comes bundled with Dawsonia. We will make use of the dawsonia.ml.api.ML.

from dawsonia.ml import ML
from pathlib import Path
model = ML(model_path=Path("data/models/dawsonia/2024-03-29/"))

Inference#

CVL-strings dataset#

Let’s load a image from the CVL strings dataset and try to infer the text. This image was never used in training.

from PIL import Image
import numpy as np

im = Image.open("cvl-strings-eval/25000-0585-08.png")
im
../_images/c3b162afcadfba5ceb04c9ee02b2f0d7f42050b6b859fd186a942598bdcc815c.png
arr = np.array(im)
print(arr.dtype, arr.shape, f"min={arr.min()} max={arr.max()}")
uint8 (71, 284, 3) min=60 max=255

First we need to convert this image to grayscale

import matplotlib.pyplot as plt
from skimage.color import rgb2gray
from skimage.exposure import rescale_intensity

def preprocess(arr):
    if arr.ndim == 3:
        gray = rgb2gray(arr)
    else:
        gray = arr

    # rescale it to range [0, 255] and change datatype from float
    return rescale_intensity(gray, out_range='uint8').astype('uint8')
gray = preprocess(arr)
print(gray.shape, f"min={gray.min()} max={gray.max()}")
(71, 284) min=0 max=255
plt.imshow(gray, cmap='gray')
<matplotlib.image.AxesImage at 0x7f66a4053880>
../_images/ee7df5c0f152fe6ed536398ed45d1e4bbf2c0ded8655f1b01cb55c7188adab9c.png
model.predict(gray)
((('767.1n',
   '767.1n',
   '7.1n',
   '7.1n',
   '767.41n',
   '767.1n',
   '767.n',
   '767.n',
   '7.n',
   '7.n'),),
 [array([0.07069247, 0.0560309 , 0.04045768, 0.0320668 , 0.03077785,
         0.02195447, 0.0121608 , 0.0118763 , 0.00695969, 0.00679687],
        dtype=float32)])

From SMHI’s training data#

This file too has not been used for training and is from the test split for the model loaded above. It however has a very similar handwriting style and background noise compared to the training data.

im = Image.open("label_old/268829_1.png")
im
../_images/654ea9f69b44114ba472902fb904425ee3ee2815dda99cb8126642a564a0056b.png
arr = np.array(im)
print(arr.dtype, arr.shape, f"min={arr.min()} max={arr.max()}")
uint8 (85, 77) min=0 max=255

This image is already preprocessed!

gray = arr
plt.imshow(gray, cmap='gray')
<matplotlib.image.AxesImage at 0x7f669c3d90d0>
../_images/e613f7d2b953ea2f24f4e18cb2fc3c9fa7faaacb633c852b76585204a4148b52.png
model.predict(gray)
((('1n', '1n', 'n', '7n', '3n', 'n', '1n1', '1n.', '1n3', '1n8'),),
 [array([9.96923208e-01, 1.01001037e-03, 8.20051820e-04, 4.62397235e-04,
         2.17886409e-04, 8.41823348e-05, 5.58489376e-07, 1.21680941e-07,
         1.04088947e-07, 1.03685316e-07], dtype=float32)])

Conclusion#

We see that the model fails to give an inference for a image sample from CVL-strings. It does not hallucinate, since the probability of the inference is very low! However for SMHI’s data it does the job. Depending on the input data, you may need to perform transfer learning or not