pdfreader

Versions

latest
stable

Description

*pdfreader* is a Pythonic API for: - extracting texts, images and other data from PDF documents - accessing different objects within PDF documents Features - Extracts texts (plain text and formatted text objects) - Extracts PDF forms data (pure strings and formatted text objects) - Supports all PDF encodings, CMap, predefined cmaps. - Extracts images and image masks as Pillow/PIL Images - Allows browse any document objects, resources and extract any data you need (fonts, annotations, metadata, multimedia, etc.) - Follows PDF-1.7 specification - Lazy objects access allows to process huge PDF documents quite fast