# Converting PDF files to Zarr We use a Zarr files where we store images files extracted from the PDF without additional decoding-encoding step. By doing this there are 2 advantages: - Faster read of input data. - Works in all platforms. This is not the case with [`pdf2image` package](https://pypi.org/project/pdf2image/) which requires PDF-Poppler to be installed. The following script can convert all PDF files in `raw` to `raw_zarr`. **Requirements**: `pypdf pymupdf tqdm` ```{literalinclude} ../../../scripts/convert_pdf_to_zarr.py :language: py ```