No active versions.
Iris is the central controller for the entire OGL OCR pipeline. It oversees and automates the process of converting raw images into citable collections of digitized texts. Images can be uploaded directly via Iris' RESTful web portal, or can be selected from preexisting images located on Iris' image repository.
It offers the following functionality:
- Grayscale Conversion
- Binarization utilizing [Sauvola](http://www.mediateam.oulu.fi/publications/pdf/24.p) adaptive thresholding or leptonica's [Otsu](http://www.leptonica.com/binarization.html) thresholding with background normalization
- Integration of [tesseract](http://code.google.com/p/tesseract-ocr/) and ocropus OCR engines
- Merging multiple hOCR documents using scoring
As it is designed to use a common storage medium on network attached storage and the [celery](http://celeryproject.org) distributed task queue it scales nicely to multi-machine clusters.
5 years, 1 month ago passed
.. image:: https://readthedocs.org/projects/ogl-iris/badge/?version=latest :target: https://ogl-iris.readthedocs.io/en/latest/?badge=latest :alt: Documentation Status
<a href='https://ogl-iris.readthedocs.io/en/latest/?badge=latest'> <img src='https://readthedocs.org/projects/ogl-iris/badge/?version=latest' alt='Documentation Status' /> </a>
Project Privacy Level