# Converting PDF files to Zarr

We use a Zarr files where we store images files extracted from the PDF without additional decoding-encoding step. By doing this there are 2 advantages:

- Faster read of input data.
- Works in all platforms. This is not the case with [`pdf2image` package](https://pypi.org/project/pdf2image/) which requires PDF-Poppler to be installed.

The following script can convert all PDF files in `raw` to
`raw_zarr`.

**Requirements**: `pypdf pymupdf tqdm`

```{literalinclude} ../../../scripts/convert_pdf_to_zarr.py
:language: py
```