Versions

Description

*pdfreader* is a Pythonic API for: - extracting texts, images and other data from PDF documents - accessing different objects within PDF documents Features - Extracts texts (plain text and formatted text objects) - Extracts PDF forms data (pure strings and formatted text objects) - Supports all PDF encodings, CMap, predefined cmaps. - Extracts images and image masks as Pillow/PIL Images - Allows browse any document objects, resources and extract any data you need (fonts, annotations, metadata, multimedia, etc.) - Follows PDF-1.7 specification - Lazy objects access allows to process huge PDF documents quite fast

Repository

https://github.com/maxpmaxp/pdfreader.git

Project Slug

pdfreader

Last Built

2 weeks, 6 days ago passed

Maintainers

Home Page

https://github.com/maxpmaxp/pdfreader.git

Badge

Tags

python, text, image, extract, scrape, pdf, parse

Short URLs

pdfreader.readthedocs.io
pdfreader.rtfd.io

Default Version

latest

'latest' Version

master