Versions

  • No active versions.

Description

Iris is the central controller for the entire OGL OCR pipeline. It oversees and automates the process of converting raw images into citable collections of digitized texts. Images can be uploaded directly via Iris' RESTful web portal, or can be selected from preexisting images located on Iris' image repository. It offers the following functionality: * Grayscale Conversion * Binarization utilizing [Sauvola](http://www.mediateam.oulu.fi/publications/pdf/24.p) adaptive thresholding or leptonica's [Otsu](http://www.leptonica.com/binarization.html) thresholding with background normalization * Deskewing * Dewarping * Integration of [tesseract](http://code.google.com/p/tesseract-ocr/) and ocropus OCR engines * Merging multiple hOCR documents using scoring As it is designed to use a common storage medium on network attached storage and the [celery](http://celeryproject.org) distributed task queue it scales nicely to multi-machine clusters.

Repository

https://github.com/mittagessen/iris

Project Slug

ogl-iris

Last Built

9 years, 1 month ago passed

Maintainers

Home Page

https://github.com/OpenPhilology/iris

Badge

Tags

celery, ocr, ocropus, tesseract

Short URLs

ogl-iris.readthedocs.io
ogl-iris.rtfd.io

Default Version

latest

'latest' Version

master