TF-CRNN : A TensorFlow implementation of Convolutional Recurrent Neural Network

Quickstart

Installation

tf_crnn uses tensorflow-gpu package, which needs CUDA and CuDNN libraries for GPU support. Tensorflow GPU support page lists the requirements.

Using Anaconda

When using Anaconda (or Miniconda), conda will install automatically the compatible versions of CUDA and CuDNN

conda env create -f environment.yml

You can find more information about the installation procedures of CUDA and CuDNN with Anaconda here.

Using pip

Before using tf_crnn we recommend creating a virtual environment (python 3.5). Then, install the dependencies using Github repository’s setup.py file.

pip install git+https://github.com/solivr/tf-crnn

You will then need to install CUDA and CuDNN libraries manually.

Using Docker

(thanks to PonteIneptique)

The Dockerfile in the root directory allows you to run the whole program as a Docker Nvidia Tensorflow GPU container. This is potentially helpful to deal with external dependencies like CUDA and the likes.

You can follow installations processes here :

Once this is installed, we will need to build the image of the container by doing :

nvidia-docker build . --tag tf-crnn

Our container model is now named tf-crnn. We will be able to run it from nvidia-docker run -it tf-crnn:latest bash which will open a bash directory exactly where you are. Although, we recommend using

nvidia-docker run -it -p 8888:8888 -p 6006:6006 -v /absolute/path/to/here/config:./config -v $INPUT_DATA:/sources  tf-crnn:latest bash

where $INPUT_DATA should be replaced by the directory where you have your training and testing data. This will get mounted on the sources folder. We propose to mount by default ./config to the current ./config directory. Path need to be absolute path. We also recommend to change

//...
"output_model_dir" : "/.output/"

to

//...
"output_model_dir" : "/config/output"

Do not forget to rename your training and testing file path, as well as renaming the path to their image by /sources/.../file.{png,jpg}

Note

if you are uncomfortable with bash, you can always replace bash by ipython3 notebook --allow-root and go to your browser on http://localhost:8888/ . A token will be shown in the terminal

How to train a model

sacred package is used to deal with experiments. If you are not yet familiar with it, have a quick look at the documentation.

Input data

In order to train a model, you should input a csv file with each row containing the filename of the image (full path) and its label (plain text) separated by a delimiting character (let’s say ;). Also, each character should be separated by a splitting character (let’s say |), this in order to deal with arbitrary alphabets (especially characters that cannot be encoded with utf-8 format).

An example of such csv file would look like :

/full/path/to/image1.{jpg,png};|s|t|r|i|n|g|_|l|a|b|e|l|1|
/full/path/to/image2.{jpg,png};|s|t|r|i|n|g|_|l|a|b|e|l|2| |w|i|t|h| |special_char|
...

Input lookup alphabet file

You also need to provide a lookup table for the alphabet that will be used. The term alphabet refers to all the symbols you want the network to learn, whether they are characters, digits, symbols, abbreviations, or any other graphical element.

The lookup table is a dictionary mapping alphabet units to integer codes (i.e {‘char’ : <int_code>}). Some lookup tables are already provided as examples in data/alphabet/.

For example to transcribe words that contain only the characters ‘abcdefg’, one possible lookup table would be :

{'a': 0, 'b': 1, 'c': 2, 'd': 3. 'e': 4, 'f': 5, 'g': 6}

The lookup table / dictionary needs to be saved in a json file.

Config file (with sacred)

Set the parameters of the experiment in config_template.json. The file looks like this :

{
  "training_params" : {
    "learning_rate" : 1e-3,
    "learning_decay_rate" : 0.95,
    "learning_decay_steps" : 5000,
    "save_interval" : 1e3,
    "n_epochs" : 50,
    "train_batch_size" : 128,
    "eval_batch_size" : 128
  },
  "input_shape" : [32, 304],
  "string_split_delimiter" : "|",
  "csv_delimiter" : ";",
  "data_augmentation_max_rotation" : 0.1,
  "input_data_n_parallel_calls" : 4,
  "lookup_alphabet_file" : "./data/alphabet/lookup_letters_digits_symbols.json",
  "csv_files_train" : ["./data/csv/train_sample.csv"],
  "csv_files_eval" : ["./data/csv/eval_sample.csv"],
  "output_model_dir" : "./output/"
}

In order to use your data, you should change the parameters csv_files_train, csv_files_eval and probably lookup_alphabet_file.

All the configurable parameters can be found in classes tf_crnn.config.Params and tf_crnn.config.TrainingParams, which can be added to the config file if needed.

Training

Once you have your input csv and alphabet file completed, and the parameters set in config_template.json, we will use sacred syntax to launch the training :

python train.py with config_template.json

The saved model will then be exported to the folder specified in the config file (output_model_dir).

Using a saved model for prediction

During the training, the model is exported every n epochs (you can set n in the config file, by default n=5). The exported models are SavedModel TensorFlow objects, which need to be loaded in order to be used.

Assuming that the output folder is named output_dir, the exported models will be saved in output_dir/export/<timestamp> with different timestamps for each export. Each <timestamp> folder contains a saved_model.pb file and a variables folder.

The saved_model.pb contains the graph definition of your model and the variables folder contains the saved variables (where the weights are stored). You can find more information about SavedModel on the TensorFlow dedicated page.

In order to easily handle the loading of the exported models, a PredictionModel class is provided and you can use the trained model to transcribe new image segments in the following way :

import tensorflow as tf
from tf_crnn.loader import PredictionModel

model_directory = 'output/export/<timestamp>/'
image_filename = 'data/images/b04-034-04-04.png'

with tf.Session() as session:
    model = PredictionModel(model_directory, signature='filename')
    prediction = model.predict(image_filename)

Reference guide

The tf_crnn.data_handler

Data handling for input function

data_loader(csv_filename, str], params[, labels]) Loads, preprocesses (data augmentation, padding) and feeds the data
padding_inputs_width(image, target_shape, …) Given an input image, will pad it to return a target_shape size padded image.
augment_data(image, max_rotation) Data augmentation on an image (padding, brightness, contrast, rotation)
random_rotation(img, max_rotation, crop) Rotates an image with a random angle.
random_padding(image, max_pad_w, max_pad_h) Given an image will pad its border adding a random number of rows and columns
serving_single_input(fixed_height, min_width) Serving input function needed for export (in TensorFlow).

Config for training

Alphabet(lookup_alphabet_file, blank_symbol) Object for alphabet / symbols units.
TrainingParams(**kwargs) Object for parameters related to the training.
Params(**kwargs) Object for general parameters
import_params_from_json(model_directory, …) Read the exported json file with parameters of the experiment.

Model

deep_cnn(input_imgs, input_channels, …) CNN part of the CRNN network.
deep_bidirectional_lstm(inputs, params, …) Recurrent part of the CRNN network.
crnn_fn(features, labels, mode, params) CRNN model definition for tf.Estimator.
get_words_from_chars(characters_list, …[, …]) Joins separated characters to form words.

Loading exported model

PredictionModel(model_dir, session, signature) Helper class to load an exported model and apply it to image segments for transcription.

tf_crnn.data_handler.augment_data(image: <sphinx.ext.autodoc.importer._MockObject object at 0x7f18c25fb0f0>, max_rotation: float = 0.1) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f18c25fb0b8>[source]

Data augmentation on an image (padding, brightness, contrast, rotation)

Parameters:
  • image – Tensor
  • max_rotation – float, maximum permitted rotation (in radians)
Returns:

Tensor

tf_crnn.data_handler.data_loader(csv_filename: Union[List[str], str], params: tf_crnn.config.Params, labels=True, batch_size: int = 64, data_augmentation: bool = False, num_epochs: int = None, image_summaries: bool = False)[source]

Loads, preprocesses (data augmentation, padding) and feeds the data

Parameters:
  • csv_filename – filename or list of filenames
  • params – Params object containing all the parameters
  • labels – transcription labels
  • batch_size – batch_size
  • data_augmentation – flag to select or not data augmentation
  • num_epochs – feeds the data ‘num_epochs’ times
  • image_summaries – floag to show image summaries or not
Returns:

data_loader function

tf_crnn.data_handler.padding_inputs_width(image: <sphinx.ext.autodoc.importer._MockObject object at 0x7f18c259f4e0>, target_shape: Tuple[int, int], increment: int) → Tuple[<sphinx.ext.autodoc.importer._MockObject object at 0x7f18c3b06e80>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f18c25a5048>][source]

Given an input image, will pad it to return a target_shape size padded image. There are 3 cases:

  • image width > target width : simple resizing to shrink the image
  • image width >= 0.5*target width : pad the image
  • image width < 0.5*target width : replicates the image segment and appends it
Parameters:
  • image – Tensor of shape [H,W,C]
  • target_shape – final shape after padding [H, W]
  • increment – reduction factor due to pooling between input width and output width, this makes sure that the final width will be a multiple of increment
Returns:

(image padded, output width)

tf_crnn.data_handler.random_padding(image: <sphinx.ext.autodoc.importer._MockObject object at 0x7f18ce448080>, max_pad_w: int = 5, max_pad_h: int = 10) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f18c25fb198>[source]

Given an image will pad its border adding a random number of rows and columns

Parameters:
  • image – image to pad
  • max_pad_w – maximum padding in width
  • max_pad_h – maximum padding in height
Returns:

a padded image

tf_crnn.data_handler.random_rotation(img: <sphinx.ext.autodoc.importer._MockObject object at 0x7f18ce4bef28>, max_rotation: float = 0.1, crop: bool = True) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f18ce4bef60>[source]

Rotates an image with a random angle. See https://stackoverflow.com/questions/16702966/rotate-image-and-crop-out-black-borders for formulae

Parameters:
  • img – Tensor
  • max_rotation – maximum angle to rotate (radians)
  • crop – boolean to crop or not the image after rotation
Returns:

tf_crnn.data_handler.serving_single_input(fixed_height: int = 32, min_width: int = 8)[source]

Serving input function needed for export (in TensorFlow). Features to serve :

  • images : greyscale image
  • input_filename : filename of image segment
  • input_rgb: RGB image segment
Parameters:
  • fixed_height – height of the image to format the input data with
  • min_width – minimum width to resize the image
Returns:

serving_input_fn

class tf_crnn.config.Alphabet(lookup_alphabet_file: str = None, blank_symbol: str = '$')[source]

Object for alphabet / symbols units.

Variables:
  • _blank_symbol (str) – Blank symbol used for CTC
  • _alphabet_units (List[str]) – list of elements composing the alphabet. The units may be a single character or multiple characters.
  • _codes (List[int]) – Each alphabet unit has a unique corresponding code.
  • _nclasses (int) – number of alphabet units.
alphabet_units
blank_symbol
check_input_file_alphabet(csv_filenames: List[str], discarded_chars: str = ';|\t\n\r\x0b\x0c', csv_delimiter: str = ';') → None[source]

Checks if labels of input files contains only characters that are in the Alphabet.

Parameters:
  • csv_filenames – list of the csv filename
  • discarded_chars – discarded characters
  • csv_delimiter – character delimiting field in the csv file
Returns:

codes
classmethod create_lookup_from_labels(csv_files: List[str], export_lookup_filename: str, original_lookup_filename: str = None)[source]

Create a lookup dictionary for csv files containing labels. Exports a json file with the Alphabet.

Parameters:
  • csv_files – list of files to get the labels from (should be of format path;label)
  • export_lookup_filename – filename to export alphabet lookup dictionary
  • original_lookup_filename – original lookup filename to update (optional)
Returns:

n_classes
class tf_crnn.config.Params(**kwargs)[source]

Object for general parameters

Variables:
  • input_shape (Tuple[int, int]) – input shape of the image to batch (this is the shape after data augmentation). The original will either be resized or pad depending on its original size
  • input_channels (int) – number of color channels for input image
  • csv_delimiter (str) – character to delimit csv input files
  • string_split_delimiter (str) – character that delimits each alphabet unit in the labels
  • num_gpus (int) – number of gpus to use
  • lookup_alphabet_file (str) – json file that contains the mapping alphabet units <-> codes
  • csv_files_train (str) – csv filename which contains the (path;label) of each training sample
  • csv_files_eval (str) – csv filename which contains the (path;label) of each eval sample
  • output_model_dir (str) – output directory where the model will be saved and exported
  • keep_prob_dropout (float) – keep probability
  • num_beam_paths (int) – number of paths (transcriptions) to return for ctc beam search (only used when predicting)
  • data_augmentation (bool) – if True augments data on the fly
  • data_augmentation_max_rotation (float) – max permitted roation to apply to image during training (radians)
  • input_data_n_parallel_calls (int) – number of parallel calls to make when using Dataset.map()
keep_prob_dropout
show_experiment_params() → dict[source]

Returns a dictionary with the variables of the class. :return:

class tf_crnn.config.TrainingParams(**kwargs)[source]

Object for parameters related to the training.

Variables:
  • n_epochs (int) – numbers of epochs to run the training (default: 50)
  • train_batch_size (int) – batch size during training (default: 64)
  • eval_batch_size (int) – batch size during evaluation (default: 128)
  • learning_rate (float) – initial learning rate (default: 1e-4)
  • learning_decay_rate (float) – decay rate for exponential learning rate (default: .96)
  • learning_decay_steps (int) – decay steps for exponential learning rate (default: 1000)
  • evaluate_every_epoch (int) – evaluate every ‘evaluate_every_epoch’ epoch (default: 5)
  • save_interval (int) – save the model every ‘save_interval’ step (default: 1e3)
  • optimizer (str) – which optimizer to use (‘adam’, ‘rms’, ‘ada’) (default: ‘adam)
to_dict() → dict[source]
tf_crnn.config.import_params_from_json(model_directory: str = None, json_filename: str = None) → dict[source]

Read the exported json file with parameters of the experiment.

Parameters:
  • model_directory – Direcoty where the odel was exported
  • json_filename – filename of the file
Returns:

a dictionary containing the parameters of the experiment

tf_crnn.model.crnn_fn(features, labels, mode, params)[source]

CRNN model definition for tf.Estimator. Combines deep_cnn and deep_bidirectional_lstm to define the model and adds loss computation and CTC decoder.

Parameters:
  • features – dictionary with keys : ‘images’, ‘images_widths’, ‘filenames
  • labels – string containing the transcriptions. Flattend (1D) array with encoded label (one code per character)
  • mode – TRAIN, EVAL, PREDICT
  • params – dictionary with keys: ‘Params’, ‘TrainingParams
Returns:

tf_crnn.model.deep_bidirectional_lstm(inputs: <sphinx.ext.autodoc.importer._MockObject object at 0x7f18c25ae9b0>, params: tf_crnn.config.Params, summaries: bool = True) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f18c25aeb70>[source]

Recurrent part of the CRNN network. Uses a biderectional LSTM.

Parameters:
  • inputs – output of deep_cnn
  • params – parameters of the model
  • summaries – flag to enable bias and weight histograms to be visualized in Tensorboard
Returns:

Tuple : (tensor [width(time), batch, n_classes], raw transcription codes)

tf_crnn.model.deep_cnn(input_imgs: <sphinx.ext.autodoc.importer._MockObject object at 0x7f18c25ae940>, input_channels: int, is_training: bool, summaries: bool = True) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f18c25aeac8>[source]

CNN part of the CRNN network.

Parameters:
  • input_imgs – input images [B, H, W, C]
  • input_channels – input channels, 1 for greyscale images, 3 for RGB color images
  • is_training – flag to indicate training or not
  • summaries – flag to enable bias and weight histograms to be visualized in Tensorboard
Returns:

tensor of shape [batch, final_width, final_height x final_features]

class tf_crnn.loader.PredictionModel(model_dir: str, session: <sphinx.ext.autodoc.importer._MockObject object at 0x7f18c25b33c8> = None, signature: str = 'predictions')[source]

Helper class to load an exported model and apply it to image segments for transcription.

Variables:
  • session (tf.Session) – tf.Session within which to run the loading process
  • model – loaded exported model
Parameters:
  • model_dir – directory containing the saved model files.
  • sessiontf.Session to load the model
  • signature

    which signature to use to select the type of input :

    • predictions (default) : input a grayscale image
    • rgb_images : input a RGB image
    • filename : input the filename of the image segment
predict(input_to_predict: Union[numpy.ndarray, str]) → dict[source]

Get transcription for input data.

Parameters:input_to_predict – input data of the format specified in signature when instantiating the object
Returns:a dictionary with the predictions

A TensorFlow implementation of the Convolutional Recurrent Neural Network (CRNN) for image-based sequence recognition tasks, such as scene text recognition and OCR. Original paper and code.

This implementation uses tf.estimator.Estimator to build the model and tf.data modules to handle input data.

Indices and tables