Versions

Description

*pdfreader* is a Pythonic API for: - extracting texts, images and other data from PDF documents - accessing different objects within PDF documents Features - Extracts texts (plain text and formatted text objects) - Extracts PDF forms data (pure strings and formatted text objects) - Supports all PDF encodings, CMap, predefined cmaps. - Extracts images and image masks as Pillow/PIL Images - Allows browse any document objects, resources and extract any data you need (fonts, annotations, metadata, multimedia, etc.) - Follows PDF-1.7 specification - Lazy objects access allows to process huge PDF documents quite fast

Repository

https://github.com/maxpmaxp/pdfreader.git

Project Slug

pdfreader

Last Built

1 year, 8 months ago passed

Maintainers

Home Page

https://github.com/maxpmaxp/pdfreader.git

Badge

Tags

extract, image, parse, pdf, python, scrape, text

Short URLs

pdfreader.readthedocs.io
pdfreader.rtfd.io

Default Version

latest

'latest' Version

master