Versions
Description
Baleen is a tool for ingesting formal natural language data from the discourse of professional and amateur writers: e.g. bloggers and news outlets. Rather than performing web scraping, Baleen focuses on data ingestion through the use of RSS feeds. It performs as much raw data collection as it can, saving data into a Mongo document store.
Repository
https://github.com/bbengfort/baleen.git
Project Slug
baleen-ingest
Last Built
7 years, 7 months ago passed
Maintainers
Badge
Tags
baleen, blogs, ingestion, nlp, rss
Short URLs
baleen-ingest.readthedocs.io
baleen-ingest.rtfd.io
Default Version
latest
'latest' Version
master