Versions

Description

Baleen is a tool for ingesting formal natural language data from the discourse of professional and amateur writers: e.g. bloggers and news outlets. Rather than performing web scraping, Baleen focuses on data ingestion through the use of RSS feeds. It performs as much raw data collection as it can, saving data into a Mongo document store.

Repository

https://github.com/bbengfort/baleen.git

Project Slug

baleen-ingest

Last Built

3 years, 1 month ago passed

Maintainers

Badge

Tags

nlp, rss, blogs, baleen, ingestion

Project Privacy Level

Public

Short URLs

baleen-ingest.readthedocs.io
baleen-ingest.rtfd.io

Default Version

latest

'latest' Version

master