Versions
Description
Corpora is a lightweight, fast and scalable corpus library able to store a collection of raw text documents with additional key-value headers. It uses Berkeley DB (bsddb3 module) for index managing what guarantee speed and bullet-proof. Text storage model is based on chunked flat, human readable text files. This architecture can easily scale up to millions documents, hundred of gigabytes collections.
Repository
https://github.com/cypreess/corpora.git
Project Slug
corpora
Last Built
8 years ago passed
Maintainers
Home Page
https://github.com/cypreess/corpora
Badge
Tags
Short URLs
corpora.readthedocs.io
corpora.rtfd.io
Default Version
latest
'latest' Version
master