Versions
Description
*pdfreader* is a Pythonic API for: - extracting texts, images and other data from PDF documents - accessing different objects within PDF documents Features - Extracts texts (plain text and formatted text objects) - Extracts PDF forms data (pure strings and formatted text objects) - Supports all PDF encodings, CMap, predefined cmaps. - Extracts images and image masks as Pillow/PIL Images - Allows browse any document objects, resources and extract any data you need (fonts, annotations, metadata, multimedia, etc.) - Follows PDF-1.7 specification - Lazy objects access allows to process huge PDF documents quite fast
Repository
https://github.com/maxpmaxp/pdfreader.git
Project Slug
pdfreader
Last Built
1 year, 8 months ago passed
Maintainers
Home Page
https://github.com/maxpmaxp/pdfreader.git
Badge
Tags
extract, image, parse, pdf, python, scrape, text
Short URLs
pdfreader.readthedocs.io
pdfreader.rtfd.io
Default Version
latest
'latest' Version
master