Welcome to TensorFlow.NET’s documentation!¶
The Definitive Guide to TensorFlow.NET¶
Front Cover
Foreword¶
One of the most nerve-wracking periods when releasing the first version of an open source project occurs when the gitter community is created. You are all alone, eagerly hoping and wishing for the first user to come along. I still vividly remember those days.
TensorFlow.NET is my third open source project. BotSharp and NumSharp are the first two. The response is pretty good. I also got a lot of stars on github. Although the first two projects are very difficult, I can’t admit that TensorFlow.NET is much more difficult than the previous two, and it is an area I have never been involved with. Mainly related to GPU parallel computing, distributed computing and neural network model. When I started writing this project, I was also sorting out the idea of the coding process. TensorFlow is a huge and complicated project, and it is easy to go beyond the scope of personal ability. Therefore, I want to record the thoughts at the time as much as possible. The process of recording and sorting clears the way of thinking.
All the examples in this book can be found in the github repository of TensorFlow.NET. When the source code and the code in the book are inconsistent, please refer to the source code. The sample code is typically located in the Example or UnitTest project.
Preface¶
Why do I start the TensorFlow.NET project?
In a few days, it was Christmas in 2018. I watched my children grow up and be sensible every day, and I felt that time passed too fast. IT technology updates are faster than ever, and a variety of front-end technologies are emerging. Big data, Artificial Intelligence and Blockchain, Container technology and Microservices, Distributed Computing and Serverless technology are dazzling. The Amazon AI service interface claims that engineers who don’t need any machine learning experience can use it, so that the idea of just calming down for two years and planning to switch to an AI architecture in the future is a splash of cold water.
TensorFlow is an open source project for machine learning especially for deep learning. It’s used for both research and production at Google company. It’s designed according to dataflow programming pattern across a range of tasks. TensorFlow is not just a deep learning library. As long as you can represent your calculation process as a data flow diagram, you can use TensorFlow for distributed computing. TensorFlow uses a computational graph to build a computing network while operating on the graph. Users can write their own upper-level models in Python based on TensorFlow, or extend the underlying C++ custom action code to TensorFlow.
In order to avoid confusion, the unique classes defined in TensorFlow are not translated in this book. For example, Tensor, Graph, Shape will retain the English name.
Get started with TensorFlow.NET¶
I would describe TensorFlow as an open source machine learning framework developed by Google which can be used to build neural networks and perform a variety of machine learning tasks. it works on data flow graph where nodes are the mathematical operations and the edges are the data in the form of tensor, hence the name Tensor-Flow.
Let’s run a classic HelloWorld program first and see if TensorFlow is running on .NET. I can’t think of a simpler way to be a HelloWorld.
Install the TensorFlow.NET SDK¶
TensorFlow.NET uses the .NET Standard 2.0 standard, so your new project Target Framework can be .NET Framework or .NET Core. All the examples in this book are using .NET Core 2.2 and Microsoft Visual Studio Community 2017. To start building TensorFlow program you just need to download and install the .NET SDK (Software Development Kit). You have to download the latest .NET Core SDK from offical website: https://dotnet.microsoft.com/download.
New a project
New Project
Choose Console App (.NET Core)
Console App
### install tensorflow C# binding
PM> Install-Package TensorFlow.NET
### Install tensorflow binary
### For CPU version
PM> Install-Package SciSharp.TensorFlow.Redist
### For GPU version (CUDA and cuDNN are required)
PM> Install-Package SciSharp.TensorFlow.Redist-Windows-GPU
Start coding Hello World¶
After installing the TensorFlow.NET package, you can use the using Tensorflow
to introduce the TensorFlow library.
using System;
using static Tensorflow.Binding;
namespace TensorFlowNET.Examples
{
/// <summary>
/// Simple hello world using TensorFlow
/// </summary>
public class HelloWorld : IExample
{
public void Run()
{
/* Create a Constant op
The op is added as a node to the default graph.
The value returned by the constructor represents the output
of the Constant op. */
var hello = tf.constant("Hello, TensorFlow!");
// Start tf session
using (var sess = tf.Session())
{
// Run the op
var result = sess.run(hello);
Console.WriteLine(result);
}
}
}
}
After CTRL + F5 run, you will get the output.
2019-01-05 10:53:42.145931: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
Hello, TensorFlow!
Press any key to continue . . .
This sample code can be found at here.
Chapter. Tensor¶
Represents one of the outputs of an Operation¶
What is Tensor?¶
Tensor holds a multi-dimensional array of elements of a single data type which is very similar with numpy’s ndarray. When the dimension is zero, it can be called a scalar. When the dimension is 2, it can be called a matrix. When the dimension is greater than 2, it is usually called a tensor. If you are very familiar with numpy, then understanding Tensor will be quite easy.
How to create a Tensor?¶
There are many ways to initialize a Tensor object in TF.NET. It can be initialized from a scalar, string, matrix or tensor.
// Create a tensor holds a scalar value
var t1 = new Tensor(3);
// Init from a string
var t2 = new Tensor("Hello! TensorFlow.NET");
// Tensor holds a ndarray
var nd = new NDArray(new int[]{3, 1, 1, 2});
var t3 = new Tensor(nd);
Console.WriteLine($"t1: {t1}, t2: {t2}, t3: {t3}");
Data Structure of Tensor¶
TF uses column major order. If we use NumSharp to generate a 2 x 3 matrix, if we access the data from 0 to 5 in order, we won’t get a number of 1-6, but we get the order of 1, 4, 2, 5, 3, 6. a set of numbers.
// Generate a matrix:[[1, 2, 3], [4, 5, 6]]
var nd = np.array(1f, 2f, 3f, 4f, 5f, 6f).reshape(2, 3);
// The index will be 0 2 4 1 3 5, it's column-major order.
column-major order
row-major order
Chapter. Constant¶
In TensorFlow, a constant is a special Tensor that cannot be modified while the graph is running. Like in a linear model $\tilde{y_i}=\boldsymbol{w}x_i+b$, constant $b$ can be represented as a Constant Tensor. Since the constant is a Tensor, it also has all the data characteristics of Tensor, including:
- value: scalar value or constant list matching the data type defined in TensorFlow;
- dtype: data type;
- shape: dimensions;
- name: constant’s name;
How to create a Constant¶
TensorFlow provides a handy function to create a Constant. In TF.NET, you can use the same function name tf.constant
to create it. TF.NET takes the same name as python binding to the API. Naming, although this will make developers who are used to C# naming habits feel uncomfortable, but after careful consideration, I decided to give up the C# convention naming method.
Initialize a scalar constant:
var c1 = tf.constant(3); // int
var c2 = tf.constant(1.0f); // float
var c3 = tf.constant(2.0); // double
var c4 = tf.constant("Big Tree"); // string
Initialize a constant through ndarray:
// dtype=int, shape=(2, 3)
var nd = np.array(new int[][]
{
new int[]{3, 1, 1},
new int[]{2, 3, 1}
});
var tensor = tf.constant(nd);
Dive in Constant¶
Now let’s explore how constant
works.
Other functions to create a Constant¶
- tf.zeros
- tf.zeros_like
- tf.ones
- tf.ones_like
- tf.fill
Chapter. Variable¶
The variables in TensorFlow are mainly used to represent variable parameter values in the machine learning model. Variables can be initialized by the tf.Variable
function. During the graph computation the variables are modified by other operations. Variables exist in the session, as long as they are in the same session, other computing nodes on the network can access the same variable value. Variables use lazy loading and will only request memory space when they are used.
var x = tf.Variable(10, name: "x");
using (var session = tf.Session())
{
session.run(x.initializer);
var result = session.run(x);
Console.Write(result); // should be 10
}
The above code first creates a variable operation, initializes the variable, then runs the session, and finally gets the result. This code is very simple, but it shows the complete process how TensorFlow operates on variables. When creating a variable, you pass a tensor
as the initial value to the function Variable()
. TensorFlow provides a series of operators to initialize the tensor, the initial value is a constant or a random value.
Chapter. Placeholder¶
In this chapter we will talk about another common data type in TensorFlow: Placeholder. It is a simplified variable that can be passed to the required value by the session when the graph is run, that is, when you build the graph, you don’t need to specify the value of that variable, but delay the session to the beginning. In TensorFlow terminology, we then feed data into the graph through these placeholders. The difference between placeholders and constants is that placeholders can specify coefficient values more flexibly without modifying the code that builds the graph. For example, mathematical constants are suitable for Constant, and some model smoothing values can be specified with Placeholder.
var x = tf.placeholder(tf.int32);
var y = x * 3;
using (var sess = tf.Session())
{
var result = sess.run(y, feed_dict: new FeedItem[]
{
new FeedItem(x, 2)
});
// (int)result should be 6;
}
Chapter. Graph¶
TensorFlow uses a dataflow graph to represent your computation in terms of the dependencies between individual operations. A graph defines the computation. It doesn’t compute anything, it doesn’t hold any values, it just defines the operations that you specified in your code.
Defining the Graph¶
We define a graph with a variable and three operations: variable
returns the current value of our variable. initialize
assigns the initial value of 31 to that variable. assign
assigns the new value of 12 to that variable.
with<Graph>(tf.Graph().as_default(), graph =>
{
var variable = tf.Variable(31, name: "tree");
tf.global_variables_initializer();
variable.assign(12);
});
TF.NET simulate a with
syntax to manage the Graph lifecycle which will be disposed when the graph instance is no long need. The graph is also what the sessions in the next chapter use when not manually specifying a graph because use invoked the as_default()
.
A typical graph is looks like below:
image
Save Model¶
Saving the model means saving all the values of the parameters and the graph.
saver = tf.train.Saver()
saver.save(sess,'./tensorflowModel.ckpt')
After saving the model there will be four files:
- tensorflowModel.ckpt.meta:
- tensorflowModel.ckpt.data-00000-of-00001:
- tensorflowModel.ckpt.index
- checkpoint
We also created a protocol buffer file .pbtxt. It is human readable if you want to convert it to binary: as_text: false
.
- tensorflowModel.pbtxt:
This holds a network of nodes, each representing one operation, connected to each other as inputs and outputs.
Freezing the Graph¶
Why we need it?¶
When we need to keep all the values of the variables and the Graph structure in a single file we have to freeze the graph.
from tensorflow.python.tools import freeze_graph
freeze_graph.freeze_graph(input_graph = 'logistic_regression/tensorflowModel.pbtxt',
input_saver = "",
input_binary = False,
input_checkpoint = 'logistic_regression/tensorflowModel.ckpt',
output_node_names = "Softmax",
restore_op_name = "save/restore_all",
filename_tensor_name = "save/Const:0",
output_graph = 'frozentensorflowModel.pb',
clear_devices = True,
initializer_nodes = "")
Optimizing for Inference¶
To Reduce the amount of computation needed when the network is used only for inferences we can remove some parts of a graph that are only needed for training.
Restoring the Model¶
Chapter. Session¶
TensorFlow session runs parts of the graph across a set of local and remote devices. A session allows to execute graphs or part of graphs. It allocates resources (on one or more machines) for that and holds the actual values of intermediate results and variables.
Running Computations in a Session¶
Let’s complete the example in last chapter. To run any of the operations, we need to create a session for that graph. The session will also allocate memory to store the current value of the variable.
with<Graph>(tf.Graph(), graph =>
{
var variable = tf.Variable(31, name: "tree");
var init = tf.global_variables_initializer();
var sess = tf.Session(graph);
sess.run(init);
var result = sess.run(variable); // 31
var assign = variable.assign(12);
result = sess.run(assign); // 12
});
The value of our variables is only valid within one session. If we try to get the value in another session. TensorFlow will raise an error of Attempting to use uninitialized value foo
. Of course, we can use the graph in more than one session, because session copies graph definition to new memory area. We just have to initialize the variables again. The values in the new session will be completely independent from the previous one.
Chapter. Operation¶
Operation
represents a Graph
node that performs computation on tensors. An operation is a Node
in a Graph
that takes zero or more Tensor
s (produced by other Operations in the Graph) as input, and produces zero or more Tensors as output.
Chapter. Queue¶
ThensorFlow is capable to handle multiple threads, and queues are powerful mechanism for asynchronous computation. If we have large datasets this can significantly speed up the training process of our models. This functionality is especially handy when reading, pre-processing and extracting in mini-batches our training data. The secret to being able to do professional and high performance training of our model is understanding TensorFlow queuing operations. TensorFlow has implemented 4 types of Queue: FIFOQueue, PaddingFIFOQueue, PriorityQueue and RandomShuffleQueue.
FIFOQueue
Like everything in TensorFlow, a queue is a node in a computation graph. It’s a stateful node, like a variable: other nodes can modify its content, In particular, nodes can enqueue new items into the queue, or dequeue existing items from the queue.
To get started with queue, let’s consider a simple example. We will create a “first in, first out” queue (FIFOQueue) and fill it with numbers. Then we’ll construct a graph that takes an item off the queue, adds one to that item, and puts it back on the end of the queue.
[TestMethod]
public void FIFOQueue()
{
// create a first in first out queue with capacity up to 2
// and data type set as int32
var queue = tf.FIFOQueue(2, tf.int32);
// init queue, push 2 elements into queue.
var init = queue.enqueue_many(new[] { 10, 20 });
// pop out the first element
var x = queue.dequeue();
// add 1
var y = x + 1;
// push back into queue
var inc = queue.enqueue(y);
using (var sess = tf.Session())
{
// init queue
init.run();
// pop out first element and push back calculated y
(int dequeued, _) = sess.run((x, inc));
Assert.AreEqual(10, dequeued);
(dequeued, _) = sess.run((x, inc));
Assert.AreEqual(20, dequeued);
(dequeued, _) = sess.run((x, inc));
Assert.AreEqual(11, dequeued);
(dequeued, _) = sess.run((x, inc));
Assert.AreEqual(21, dequeued);
// thread will hang or block if you run sess.run(x) again
// until queue has more element.
}
}
Enqueue
, EnqueueMany
and Dequeue
are special nodes. They take a pointer to the queue instead of a normal value, allowing them to change it. I first create a FIFOQueue queue of size up to 3, I enqueue two values into the queue. Then I immediately attempt to dequeue a value from it and assign it to y where I simply add 1 to the dequeued variable. Next, we start up a session and run. After we’ve run this operation a few times the queue will be empty - if we try and run the operation again, the main thread of the program will hang or block - this is because it will be waiting for another operation to be run to put more values in the queue.
FIFOQueue¶
Creates a queue that dequeues elements in a first-in first-out order. A FIFOQueue
has bounded capacity; supports multiple concurrent producers and consumers; and provides exactly-once delivery. A FIFOQueue
holds a list of up to capacity
elements. Each element is a fixed-length tuple of tensors whose dtypes are described by dtypes
, and whose shapes are optionally described by the shapes
argument.
PaddingFIFOQueue¶
A FIFOQueue that supports batching variable-sized tensors by padding. A PaddingFIFOQueue
may contain components with dynamic shape, while also supporting dequeue_many
. A PaddingFIFOQueue
holds a list of up to capacity
elements. Each element is a fixed-length tuple of tensors whose dtypes are described by dtypes
, and whose shapes are described by the shapes
argument.
[TestMethod]
public void PaddingFIFOQueue()
{
var numbers = tf.placeholder(tf.int32);
var queue = tf.PaddingFIFOQueue(10, tf.int32, new TensorShape(-1));
var enqueue = queue.enqueue(numbers);
var dequeue_many = queue.dequeue_many(n: 3);
using(var sess = tf.Session())
{
sess.run(enqueue, (numbers, new[] { 1 }));
sess.run(enqueue, (numbers, new[] { 2, 3 }));
sess.run(enqueue, (numbers, new[] { 3, 4, 5 }));
var result = sess.run(dequeue_many[0]);
Assert.IsTrue(Enumerable.SequenceEqual(new int[] { 1, 0, 0 }, result[0].ToArray<int>()));
Assert.IsTrue(Enumerable.SequenceEqual(new int[] { 2, 3, 0 }, result[1].ToArray<int>()));
Assert.IsTrue(Enumerable.SequenceEqual(new int[] { 3, 4, 5 }, result[2].ToArray<int>()));
}
}
PriorityQueue¶
A queue implementation that dequeues elements in prioritized order. A PriorityQueue
has bounded capacity; supports multiple concurrent producers and consumers; and provides exactly-once delivery. A PriorityQueue
holds a list of up to capacity
elements. Each element is a fixed-length tuple of tensors whose dtypes are described by types
, and whose shapes are optionally described by the shapes
argument.
[TestMethod]
public void PriorityQueue()
{
var queue = tf.PriorityQueue(3, tf.@string);
var init = queue.enqueue_many(new[] { 2L, 4L, 3L }, new[] { "p1", "p2", "p3" });
var x = queue.dequeue();
using (var sess = tf.Session())
{
init.run();
// output will 2, 3, 4
var result = sess.run(x);
Assert.AreEqual(result[0].GetInt64(), 2L);
result = sess.run(x);
Assert.AreEqual(result[0].GetInt64(), 3L);
result = sess.run(x);
Assert.AreEqual(result[0].GetInt64(), 4L);
}
}
RandomShuffleQueue¶
A queue implementation that dequeues elements in a random order. A RandomShuffleQueue
has bounded capacity; supports multiple concurrent producers and consumers; and provides exactly-once delivery. A RandomShuffleQueue
holds a list of up to capacity
elements. Each element is a fixed-length tuple of tensors whose dtypes are described by dtypes
, and whose shapes are optionally described by the shapes
argument.
[TestMethod]
public void RandomShuffleQueue()
{
var queue = tf.RandomShuffleQueue(10, min_after_dequeue: 1, dtype: tf.int32);
var init = queue.enqueue_many(new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 });
var x = queue.dequeue();
string results = "";
using (var sess = tf.Session())
{
init.run();
foreach(var i in range(9))
results += (int)sess.run(x) + ".";
// output in random order
// 1.2.3.4.5.6.7.8.9.
}
}
Queue methods must run on the same device as the queue. FIFOQueue
and RandomShuffleQueue
are important TensorFlow objects for computing tensor asynchronously in a graph. For example, a typical input architecture is to use a RandomShuffleQueue
to prepare inputs for training a model:
- Multiple threads prepare training examples and push them in the queue.
- A training thread executes a training op that dequeues mini-batches from the queue.
This architecture simplifies the construction of input pipelines.
From the above example, once the output gets to the point above you’ll actually have to terminate the program as it is blocked. Now, this isn’t very useful. What we really want to happen is for our little program to reload or enqueue more values whenever our queue is empty or is about to become empty. We could fix this by explicitly running our enqueue_op again in the code above to reload our queue with values. However, for large, more realistic programs, this will become unwieldy. Thankfully, TensorFlow has a solution.
TensorFlow provides two classes to help multi-threading task: tf.Coordinator
and tf.QueueRunner
. There two classes are designed to be used together. The Coordinator
class helps multiple threads stop together and report exceptions to a main thread. The QueueRunner
class is used to create a number of threads cooperating to enqueue tensors in the same queue.
Chapter. Gradient¶
Register custom gradient function¶
TF.NET is extensible which can be added custom gradient function.
// define gradient function
ops.RegisterGradientFunction("ConcatV2", (oper, out_grads) =>
{
var grad = grads[0];
return new Tensor[]{ };
});
Chapter. Eager Mode¶
Chapter. Linear Regression¶
What is linear regression?¶
Linear regression is a linear approach to modelling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables).
Consider the case of a single variable of interest y and a single predictor variable x. The predictor variables are called by many names: covariates, inputs, features; the predicted variable is often called response, output, outcome.
We have some data $D={x{\tiny i},y{\tiny i}}$ and we assume a simple linear model of this dataset with Gaussian noise:
// Prepare training Data
var train_X = np.array(3.3f, 4.4f, 5.5f, 6.71f, 6.93f, 4.168f, 9.779f, 6.182f, 7.59f, 2.167f, 7.042f, 10.791f, 5.313f, 7.997f, 5.654f, 9.27f, 3.1f);
var train_Y = np.array(1.7f, 2.76f, 2.09f, 3.19f, 1.694f, 1.573f, 3.366f, 2.596f, 2.53f, 1.221f, 2.827f, 3.465f, 1.65f, 2.904f, 2.42f, 2.94f, 1.3f);
var n_samples = train_X.shape[0];
regression dataset
Based on the given data points, we try to plot a line that models the points the best. The red line can be modelled based on the linear equation: $y = wx + b$. The motive of the linear regression algorithm is to find the best values for $w$ and $b$. Before moving on to the algorithm, le’s have a look at two important concepts you must know to better understand linear regression.
Cost Function¶
The cost function helps us to figure out the best possible values for $w$ and $b$ which would provide the best fit line for the data points. Since we want the best values for $w$ and $b$, we convert this search problem into a minimization problem where we would like to minimize the error between the predicted value and the actual value.
minimize-square-cost
We choose the above function to minimize. The difference between the predicted values and ground truth measures the error difference. We square the error difference and sum over all data points and divide that value by the total number of data points. This provides the average squared error over all the data points. Therefore, this cost function is also known as the Mean Squared Error(MSE) function. Now, using this MSE function we are going to change the values of $w$ and $b$ such that the MSE value settles at the minima.
// tf Graph Input
var X = tf.placeholder(tf.float32);
var Y = tf.placeholder(tf.float32);
// Set model weights
var W = tf.Variable(rng.randn<float>(), name: "weight");
var b = tf.Variable(rng.randn<float>(), name: "bias");
// Construct a linear model
var pred = tf.add(tf.multiply(X, W), b);
// Mean squared error
var cost = tf.reduce_sum(tf.pow(pred - Y, 2.0f)) / (2.0f * n_samples);
Gradient Descent¶
The another important concept needed to understand is gradient descent. Gradient descent is a method of updating $w$ and $b$ to minimize the cost function. The idea is that we start with some random values for $w$ and $b$ and then we change these values iteratively to reduce the cost. Gradient descent helps us on how to update the values or which direction we would go next. Gradient descent is also know as steepest descent.
gradient-descent
To draw an analogy, imagine a pit in the shape of U and you are standing at the topmost point in the pit and your objective is to reach the bottom of the pit. There is a catch, you can only take a discrete number of steps to reach the bottom. If you decide to take one step at a time you would eventually reach the bottom of the pit but this would take a longer time. If you choose to take longer steps each time, you would reach sooner but, there is a chance that you could overshoot the bottom of the pit and not exactly at the bottom. In the gradient descent algorithm, the number of steps you take is the learning rate. This decides on how fast the algorithm converges to the minima.
// Gradient descent
// Note, minimize() knows to modify W and b because Variable objects are trainable=True by default
var optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost);
When we visualize the graph in TensorBoard:
linear-regression
The full example is here.
Chapter. Logistic Regression¶
What is logistic regression?¶
Logistic regression is a statistical analysis method used to predict a data value based on prior observations of a data set. A logistic regression model predicts a dependent data variable by analyzing the relationship between one or more existing independent variables.
The dependent variable of logistics regression can be two-category or multi-category, but the two-category is more common and easier to explain. So the most common use in practice is the logistics of the two classifications. An example used by TensorFlow.NET is a hand-written digit recognition, which is a multi-category.
Softmax regression allows us to handle 1557035393445 where K is the number of classes.
The full example is here.
Chapter. Nearest Neighbor¶
The nearest neighbour algorithm was one of the first algorithms used to solve the travelling salesman problem. In it, the salesman starts at a random city and repeatedly visits the nearest city until all have been visited. It quickly yields a short tour, but usually not the optimal one.
The full example is here.
Chapter. Image Recognition¶
An example for using the TensorFlow.NET and NumSharp for image recognition, it will use a pre-trained inception model to predict a image which outputs the categories sorted by probability. The original paper is here. The Inception architecture of GoogLeNet was designed to perform well even under strict constraints on memory and computational budget. The computational cost of Inception is also much lower than other performing successors. This has made it feasible to utilize Inception networks in big-data scenarios, where huge amount of data needed to be processed at reasonable cost or scenarios where memory or computational capacity is inherently limited, for example in mobile vision settings.
The GoogLeNet architecture conforms to below design principles:
- Avoid representational bottlenecks, especially early in the network.
- Higher dimensional representations are easier to process locally within a network.
- Spatial aggregation can be done over lower dimensional embeddings without much or any loss in representational power.
- Balance the width and depth of the network.
Let’s get started with real code.¶
1. Prepare data¶
This example will download the dataset and uncompress it automatically. Some external paths are omitted, please refer to the source code for the real path.
private void PrepareData()
{
Directory.CreateDirectory(dir);
// get model file
string url = "models/inception_v3_2016_08_28_frozen.pb.tar.gz";
string zipFile = Path.Join(dir, $"{pbFile}.tar.gz");
Utility.Web.Download(url, zipFile);
Utility.Compress.ExtractTGZ(zipFile, dir);
// download sample picture
string pic = "grace_hopper.jpg";
Utility.Web.Download($"data/{pic}", Path.Join(dir, pic));
}
2. Load image file and normalize¶
We need to load a sample image to test our pre-trained inception model. Convert it into tensor and normalized the input image. The pre-trained model takes input in the form of a 4-dimensional tensor with shape [BATCH_SIZE, INPUT_HEIGHT, INPUT_WEIGHT, 3] where:
- BATCH_SIZE allows for inference of multiple images in one pass through the graph
- INPUT_HEIGHT is the height of the images on which the model was trained
- INPUT_WEIGHT is the width of the images on which the model was trained
- 3 is the (R, G, B) values of the pixel colors represented as a float.
private NDArray ReadTensorFromImageFile(string file_name,
int input_height = 299,
int input_width = 299,
int input_mean = 0,
int input_std = 255)
{
return with<Graph, NDArray>(tf.Graph().as_default(), graph =>
{
var file_reader = tf.read_file(file_name, "file_reader");
var image_reader = tf.image.decode_jpeg(file_reader, channels: 3, name: "jpeg_reader");
var caster = tf.cast(image_reader, tf.float32);
var dims_expander = tf.expand_dims(caster, 0);
var resize = tf.constant(new int[] { input_height, input_width });
var bilinear = tf.image.resize_bilinear(dims_expander, resize);
var sub = tf.subtract(bilinear, new float[] { input_mean });
var normalized = tf.divide(sub, new float[] { input_std });
return with<Session, NDArray>(tf.Session(graph), sess => sess.run(normalized));
});
}
3. Load pre-trained model and predict¶
Load the pre-trained inception model which is saved as Google’s protobuf file format. Construct a new graph then set input and output operations in a new session. After run the session, you will get a numpy-like ndarray which is provided by NumSharp. With NumSharp, you can easily perform various operations on multiple dimensional arrays in the .NET environment.
public void Run()
{
PrepareData();
var labels = File.ReadAllLines(Path.Join(dir, labelFile));
var nd = ReadTensorFromImageFile(Path.Join(dir, picFile),
input_height: input_height,
input_width: input_width,
input_mean: input_mean,
input_std: input_std);
var graph = Graph.ImportFromPB(Path.Join(dir, pbFile));
var input_operation = graph.get_operation_by_name(input_name);
var output_operation = graph.get_operation_by_name(output_name);
var results = with<Session, NDArray>(tf.Session(graph),
sess => sess.run(output_operation.outputs[0],
new FeedItem(input_operation.outputs[0], nd)));
results = np.squeeze(results);
var argsort = results.argsort<float>();
var top_k = argsort.Data<float>()
.Skip(results.size - 5)
.Reverse()
.ToArray();
foreach (float idx in top_k)
Console.WriteLine($"{picFile}: {idx} {labels[(int)idx]}, {results[(int)idx]}");
}
4. Print the result¶
The best probability is military uniform
which is 0.8343058. It’s the correct classification.
2/18/2019 3:56:18 AM Starting InceptionArchGoogLeNet
label_image_data\inception_v3_2016_08_28_frozen.pb.tar.gz already exists.
label_image_data\grace_hopper.jpg already exists.
2019-02-19 21:56:18.684463: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
create_op: Const 'file_reader/filename', inputs: empty, control_inputs: empty, outputs: file_reader/filename:0
create_op: ReadFile 'file_reader', inputs: file_reader/filename:0, control_inputs: empty, outputs: file_reader:0
create_op: DecodeJpeg 'jpeg_reader', inputs: file_reader:0, control_inputs: empty, outputs: jpeg_reader:0
create_op: Cast 'Cast/Cast', inputs: jpeg_reader:0, control_inputs: empty, outputs: Cast/Cast:0
create_op: Const 'ExpandDims/dim', inputs: empty, control_inputs: empty, outputs: ExpandDims/dim:0
create_op: ExpandDims 'ExpandDims', inputs: Cast/Cast:0, ExpandDims/dim:0, control_inputs: empty, outputs: ExpandDims:0
create_op: Const 'Const', inputs: empty, control_inputs: empty, outputs: Const:0
create_op: ResizeBilinear 'ResizeBilinear', inputs: ExpandDims:0, Const:0, control_inputs: empty, outputs: ResizeBilinear:0
create_op: Const 'y', inputs: empty, control_inputs: empty, outputs: y:0
create_op: Sub 'Sub', inputs: ResizeBilinear:0, y:0, control_inputs: empty, outputs: Sub:0
create_op: Const 'y_1', inputs: empty, control_inputs: empty, outputs: y_1:0
create_op: RealDiv 'truediv', inputs: Sub:0, y_1:0, control_inputs: empty, outputs: truediv:0
grace_hopper.jpg: 653 military uniform, 0.8343058
grace_hopper.jpg: 668 mortarboard, 0.02186947
grace_hopper.jpg: 401 academic gown, 0.01035806
grace_hopper.jpg: 716 pickelhaube, 0.008008132
grace_hopper.jpg: 466 bulletproof vest, 0.005350832
2/18/2019 3:56:25 AM Completed InceptionArchGoogLeNet
You can find the full source code from github.
Chapter. Neural Network¶
In this chapter, we’ll learn how to build a graph of neural network model. The key advantage of neural network compared to Linear Classifier is that it can separate data which it not linearly separable. We’ll implement this model to classify hand-written digits images from the MNIST dataset.
The structure of the neural network we’re going to build is as follows. The hand-written digits images of the MNIST data which has 10 classes (from 0 to 9). The network is with 2 hidden layers: the first layer with 200 hidden units (neurons) and the second one (known as classifier layer) with 10 neurons.
neural network architecture
Get started with the implementation step by step:
Prepare data
MNIST is dataset of handwritten digits which contains 55,000 examples for training, 5,000 examples for validation and 10,000 example for testing. The digits have been size-normalized and centered in a fixed-size image (28 x 28 pixels) with values from 0 and 1.Each image has been flattened and converted to a 1-D array of 784 features. It’s also kind of benchmark of datasets for deep learning.
MNIST dataset
We define some variables makes it easier to modify them later. It’s important to note that in a linear model, we have to flatten the input images to a vector.
using System; using NumSharp; using Tensorflow; using TensorFlowNET.Examples.Utility; using static Tensorflow.Python;
const int img_h = 28; const int img_w = 28; int img_size_flat = img_h * img_w; // 784, the total number of pixels int n_classes = 10; // Number of classes, one class per digit
We’ll write the function which automatically loads the MNIST data and returns it in our desired shape and format. There is an MNIST data helper to make life easier.
Datasets mnist; public void PrepareData() { mnist = MnistDataSet.read_data_sets("mnist", one_hot: true); }
Other than a function for loading the images and corresponding labels, we still need two more functions:
randomize: which randomizes the order of images and their labels. At the beginning of each epoch, we will re-randomize the order of data samples to make sure that the trained model is not sensitive to the order of data.
private (NDArray, NDArray) randomize(NDArray x, NDArray y) { var perm = np.random.permutation(y.shape[0]); np.random.shuffle(perm); return (mnist.train.images[perm], mnist.train.labels[perm]); }
get_next_batch: which only selects a few number of images determined by the batch_size variable (as per Stochastic Gradient Descent method).
private (NDArray, NDArray) get_next_batch(NDArray x, NDArray y, int start, int end) { var x_batch = x[$"{start}:{end}"]; var y_batch = y[$"{start}:{end}"]; return (x_batch, y_batch); }
Set Hyperparameters
There’re about 55,000 images in training set, it takes a long time to calculate the gradient of the model using all there images. Therefore we use a small batch of images in each iteration of the optimizer by Stochastic Gradient Descent.
- epoch: one forward pass and one backward pass of all the training examples.
- batch size: the number of training examples in one forward/backward pass. The higher the batch size, the more memory space you’ll need.
- iteration: one forward pass and one backward pass of one batch of images the training examples.
int epochs = 10; int batch_size = 100; float learning_rate = 0.001f; int h1 = 200; // number of nodes in the 1st hidden layer
Building the neural network
Let’s make some functions to help build computation graph.
variables: We need to define two variables
W
andb
to construct our linear model. We useTensorflow Variables
of proper size and initialization to define them.// weight_variable var in_dim = x.shape[1]; var initer = tf.truncated_normal_initializer(stddev: 0.01f); var W = tf.get_variable("W_" + name, dtype: tf.float32, shape: (in_dim, num_units), initializer: initer); // bias_variable var initial = tf.constant(0f, num_units); var b = tf.get_variable("b_" + name, dtype: tf.float32, initializer: initial);
fully-connected layer: Neural network consists of stacks of fully-connected (dense) layers. Having the weight (W) and bias (b) variables, a fully-connected layer is defined as
activation(W x X + b)
. The completefc_layer
function is as below:private Tensor fc_layer(Tensor x, int num_units, string name, bool use_relu = true) { var in_dim = x.shape[1]; var initer = tf.truncated_normal_initializer(stddev: 0.01f); var W = tf.get_variable("W_" + name, dtype: tf.float32, shape: (in_dim, num_units), initializer: initer); var initial = tf.constant(0f, num_units); var b = tf.get_variable("b_" + name, dtype: tf.float32, initializer: initial); var layer = tf.matmul(x, W) + b; if (use_relu) layer = tf.nn.relu(layer); return layer; }
inputs: Now we need to define the proper tensors to feed in the input to our model. Placeholder variable is the suitable choice for the input images and corresponding labels. This allow us to change the inputs (images and labels) to the TensorFlow graph.
// Placeholders for inputs (x) and outputs(y) x = tf.placeholder(tf.float32, shape: (-1, img_size_flat), name: "X"); y = tf.placeholder(tf.float32, shape: (-1, n_classes), name: "Y");
Placeholder
x
is defined for the images, the shape is set to[None, img_size_flat]
, whereNone
means that the tensor may hold an arbitrary number of images with each image being a vector of lengthimg_size_flat
.Placeholder
y
is the variable for the true labels associated with the images that were input in the placeholder variablex
. It holds an arbitrary number of labels and each label is a vector of lengthnum_classes
which is 10.network layers: After creating the proper input, we have to pass it to our model. Since we have a neural network, we can stack multiple fully-connected layers using
fc_layer
method. Note that we will not use any activation function (use_relu = false) in the last layer. The reason is that we can usetf.nn.softmax_cross_entropy_with_logits
to calculate the loss.// Create a fully-connected layer with h1 nodes as hidden layer var fc1 = fc_layer(x, h1, "FC1", use_relu: true); // Create a fully-connected layer with n_classes nodes as output layer var output_logits = fc_layer(fc1, n_classes, "OUT", use_relu: false);
loss function: After creating the network, we have to calculate the loss and optimize it, we have to calculate the
correct_prediction
andaccuracy
.// Define the loss function, optimizer, and accuracy var logits = tf.nn.softmax_cross_entropy_with_logits(labels: y, logits: output_logits); loss = tf.reduce_mean(logits, name: "loss"); optimizer = tf.train.AdamOptimizer(learning_rate: learning_rate, name: "Adam-op").minimize(loss); var correct_prediction = tf.equal(tf.argmax(output_logits, 1), tf.argmax(y, 1), name: "correct_pred"); accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name: "accuracy");
initialize variables: We have to invoke a variable initializer operation to initialize all variables.
var init = tf.global_variables_initializer();
The complete computation graph is looks like below:
TensorBoard-nn
Train
After creating the graph, we can train our model. To train the model, we have to create a session and run the graph in the session.
// Number of training iterations in each epoch var num_tr_iter = mnist.train.labels.len / batch_size; with(tf.Session(), sess => { sess.run(init); float loss_val = 100.0f; float accuracy_val = 0f; foreach (var epoch in range(epochs)) { print($"Training epoch: {epoch + 1}"); // Randomly shuffle the training data at the beginning of each epoch var (x_train, y_train) = randomize(mnist.train.images, mnist.train.labels); foreach (var iteration in range(num_tr_iter)) { var start = iteration * batch_size; var end = (iteration + 1) * batch_size; var (x_batch, y_batch) = get_next_batch(x_train, y_train, start, end); // Run optimization op (backprop) sess.run(optimizer, new FeedItem(x, x_batch), new FeedItem(y, y_batch)); if (iteration % display_freq == 0) { // Calculate and display the batch loss and accuracy var result = sess.run(new[] { loss, accuracy }, new FeedItem(x, x_batch), new FeedItem(y, y_batch)); loss_val = result[0]; accuracy_val = result[1]; print($"iter {iteration.ToString("000")}: Loss={loss_val.ToString("0.0000")}, Training Accuracy={accuracy_val.ToString("P")}"); } } // Run validation after every epoch var results1 = sess.run(new[] { loss, accuracy }, new FeedItem(x, mnist.validation.images), new FeedItem(y, mnist.validation.labels)); loss_val = results1[0]; accuracy_val = results1[1]; print("---------------------------------------------------------"); print($"Epoch: {epoch + 1}, validation loss: {loss_val.ToString("0.0000")}, validation accuracy: {accuracy_val.ToString("P")}"); print("---------------------------------------------------------"); } });
Test
After the training is done, we have to test our model to see how good it performs on a new dataset.
var result = sess.run(new[] { loss, accuracy }, new FeedItem(x, mnist.test.images), new FeedItem(y, mnist.test.labels)); loss_test = result[0]; accuracy_test = result[1]; print("---------------------------------------------------------"); print($"Test loss: {loss_test.ToString("0.0000")}, test accuracy: {accuracy_test.ToString("P")}"); print("---------------------------------------------------------");
result
Chapter. Convolution Neural Network¶
In this chapter, we’ll implement a simple Convolutional Neural Network model. We’ll implement this model to classify MNIST dataset.
The structure of the neural network we’re going to build is as follows. The hand-written digits images of the MNIST data which has 10 classes (from 0 to 9). The network is with 2 convolutional layers followed by 2 full-connected layers at the end.
neural network architecture
Get started with the implementation:
Prepare data
MNIST is dataset of handwritten digits which contains 55,000 examples for training, 5,000 examples for validation and 10,000 example for testing. The digits have been size-normalized and centered in a fixed-size image (28 x 28 pixels) with values from 0 and 1.Each image has been flattened and converted to a 1-D array of 784 features. It’s also kind of benchmark of datasets for deep learning.
MNIST dataset
We define some variables makes it easier to modify them later.
using System; using NumSharp; using Tensorflow; using TensorFlowNET.Examples.Utility; using static Tensorflow.Python;
const int img_h = 28; const int img_w = 28; int n_classes = 10; // Number of classes, one class per digit int n_channels = 1;
We’ll write the function which automatically loads the MNIST data and returns it in our desired shape and format. There is an MNIST data helper to make life easier.
Datasets mnist; public void PrepareData() { mnist = MnistDataSet.read_data_sets("mnist", one_hot: true); }
Other than a function for loading the images and corresponding labels, we still need three more functions:
reformat: reformats the data to the format acceptable for convolutional layer.
private (NDArray, NDArray) Reformat(NDArray x, NDArray y) { var (img_size, num_ch, num_class) = (np.sqrt(x.shape[1]), 1, len(np.unique<int>(np.argmax(y, 1)))); var dataset = x.reshape(x.shape[0], img_size, img_size, num_ch).astype(np.float32); //y[0] = np.arange(num_class) == y[0]; //var labels = (np.arange(num_class) == y.reshape(y.shape[0], 1, y.shape[1])).astype(np.float32); return (dataset, y); }
randomize: which randomizes the order of images and their labels. At the beginning of each epoch, we will re-randomize the order of data samples to make sure that the trained model is not sensitive to the order of data.
private (NDArray, NDArray) randomize(NDArray x, NDArray y) { var perm = np.random.permutation(y.shape[0]); np.random.shuffle(perm); return (mnist.train.images[perm], mnist.train.labels[perm]); }
get_next_batch: which only selects a few number of images determined by the batch_size variable (as per Stochastic Gradient Descent method).
private (NDArray, NDArray) get_next_batch(NDArray x, NDArray y, int start, int end) { var x_batch = x[$"{start}:{end}"]; var y_batch = y[$"{start}:{end}"]; return (x_batch, y_batch); }
Set Hyperparameters
There’re about 55,000 images in training set, it takes a long time to calculate the gradient of the model using all there images. Therefore we use a small batch of images in each iteration of the optimizer by Stochastic Gradient Descent.
- epoch: one forward pass and one backward pass of all the training examples.
- batch size: the number of training examples in one forward/backward pass. The higher the batch size, the more memory space you’ll need.
- iteration: one forward pass and one backward pass of one batch of images the training examples.
int epochs = 10; int batch_size = 100; float learning_rate = 0.001f; int display_freq = 200; // Frequency of displaying the training results
Network configuration
1st convolutional layer:
int filter_size1 = 5; // Convolution filters are 5 x 5 pixels. int num_filters1 = 16; // There are 16 of these filters. int stride1 = 1; // The stride of the sliding window
2nd convolutional layer:
int filter_size2 = 5; // Convolution filters are 5 x 5 pixels. int num_filters2 = 32;// There are 32 of these filters. int stride2 = 1; // The stride of the sliding window
Fully-connected layer:
h1 = 128 # Number of neurons in fully-connected layer.
Building the neural network
Let’s make some functions to help build computation graph.
variables: We need to define two variables
W
andb
to construct our linear model. We useTensorflow Variables
of proper size and initialization to define them.// Create a weight variable with appropriate initialization private RefVariable weight_variable(string name, int[] shape) { var initer = tf.truncated_normal_initializer(stddev: 0.01f); return tf.get_variable(name, dtype: tf.float32, shape: shape, initializer: initer); } // Create a bias variable with appropriate initialization private RefVariable bias_variable(string name, int[] shape) { var initial = tf.constant(0f, shape: shape, dtype: tf.float32); return tf.get_variable(name, dtype: tf.float32, initializer: initial); }
2D convolution layer: This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs.
private Tensor conv_layer(Tensor x, int filter_size, int num_filters, int stride, string name) { return with(tf.variable_scope(name), delegate { var num_in_channel = x.shape[x.NDims - 1]; var shape = new[] { filter_size, filter_size, num_in_channel, num_filters }; var W = weight_variable("W", shape); // var tf.summary.histogram("weight", W); var b = bias_variable("b", new[] { num_filters }); // tf.summary.histogram("bias", b); var layer = tf.nn.conv2d(x, W, strides: new[] { 1, stride, stride, 1 }, padding: "SAME"); layer += b; return tf.nn.relu(layer); }); }
max-pooling layer: Max pooling operation for temporal data.
private Tensor max_pool(Tensor x, int ksize, int stride, string name) { return tf.nn.max_pool(x, ksize: new[] { 1, ksize, ksize, 1 }, strides: new[] { 1, stride, stride, 1 }, padding: "SAME", name: name); }
flatten_layer: Flattens the output of the convolutional layer to be fed into fully-connected layer.
private Tensor flatten_layer(Tensor layer) { return with(tf.variable_scope("Flatten_layer"), delegate { var layer_shape = layer.TensorShape; var num_features = layer_shape[new Slice(1, 4)].Size; var layer_flat = tf.reshape(layer, new[] { -1, num_features }); return layer_flat; }); }
fully-connected layer: Neural network consists of stacks of fully-connected (dense) layers. Having the weight (W) and bias (b) variables, a fully-connected layer is defined as
activation(W x X + b)
. The completefc_layer
function is as below:private Tensor fc_layer(Tensor x, int num_units, string name, bool use_relu = true) { return with(tf.variable_scope(name), delegate { var in_dim = x.shape[1]; var W = weight_variable("W_" + name, shape: new[] { in_dim, num_units }); var b = bias_variable("b_" + name, new[] { num_units }); var layer = tf.matmul(x, W) + b; if (use_relu) layer = tf.nn.relu(layer); return layer; }); }
inputs: Now we need to define the proper tensors to feed in the input to our model. Placeholder variable is the suitable choice for the input images and corresponding labels. This allow us to change the inputs (images and labels) to the TensorFlow graph.
with(tf.name_scope("Input"), delegate { // Placeholders for inputs (x) and outputs(y) x = tf.placeholder(tf.float32, shape: (-1, img_h, img_w, n_channels), name: "X"); y = tf.placeholder(tf.float32, shape: (-1, n_classes), name: "Y"); });
Placeholder
y
is the variable for the true labels associated with the images that were input in the placeholder variablex
. It holds an arbitrary number of labels and each label is a vector of lengthnum_classes
which is 10.network layers: After creating the proper input, we have to pass it to our model. Since we have a neural network, we can stack multiple fully-connected layers using
fc_layer
method. Note that we will not use any activation function (use_relu = false) in the last layer. The reason is that we can usetf.nn.softmax_cross_entropy_with_logits
to calculate the loss.var conv1 = conv_layer(x, filter_size1, num_filters1, stride1, name: "conv1"); var pool1 = max_pool(conv1, ksize: 2, stride: 2, name: "pool1"); var conv2 = conv_layer(pool1, filter_size2, num_filters2, stride2, name: "conv2"); var pool2 = max_pool(conv2, ksize: 2, stride: 2, name: "pool2"); var layer_flat = flatten_layer(pool2); var fc1 = fc_layer(layer_flat, h1, "FC1", use_relu: true); var output_logits = fc_layer(fc1, n_classes, "OUT", use_relu: false);
loss function, optimizer, accuracy, prediction: After creating the network, we have to calculate the loss and optimize it, we have to calculate the
prediction
andaccuracy
.with(tf.variable_scope("Train"), delegate { with(tf.variable_scope("Optimizer"), delegate { optimizer = tf.train.AdamOptimizer(learning_rate: learning_rate, name: "Adam-op").minimize(loss); }); with(tf.variable_scope("Accuracy"), delegate { var correct_prediction = tf.equal(tf.argmax(output_logits, 1), tf.argmax(y, 1), name: "correct_pred"); accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name: "accuracy"); }); with(tf.variable_scope("Prediction"), delegate { cls_prediction = tf.argmax(output_logits, axis: 1, name: "predictions"); }); });
initialize variables: We have to invoke a variable initializer operation to initialize all variables.
var init = tf.global_variables_initializer();
Train
After creating the graph, we can train our model. To train the model, we have to create a session and run the graph in the session.
// Number of training iterations in each epoch var num_tr_iter = y_train.len / batch_size; var init = tf.global_variables_initializer(); sess.run(init); float loss_val = 100.0f; float accuracy_val = 0f; foreach (var epoch in range(epochs)) { print($"Training epoch: {epoch + 1}"); // Randomly shuffle the training data at the beginning of each epoch (x_train, y_train) = mnist.Randomize(x_train, y_train); foreach (var iteration in range(num_tr_iter)) { var start = iteration * batch_size; var end = (iteration + 1) * batch_size; var (x_batch, y_batch) = mnist.GetNextBatch(x_train, y_train, start, end); // Run optimization op (backprop) sess.run(optimizer, new FeedItem(x, x_batch), new FeedItem(y, y_batch)); if (iteration % display_freq == 0) { // Calculate and display the batch loss and accuracy var result = sess.run(new[] { loss, accuracy }, new FeedItem(x, x_batch), new FeedItem(y, y_batch)); loss_val = result[0]; accuracy_val = result[1]; print($"iter {iteration.ToString("000")}: Loss={loss_val.ToString("0.0000")}, Training Accuracy={accuracy_val.ToString("P")}"); } } // Run validation after every epoch var results1 = sess.run(new[] { loss, accuracy }, new FeedItem(x, x_valid), new FeedItem(y, y_valid)); loss_val = results1[0]; accuracy_val = results1[1]; print("---------------------------------------------------------"); print($"Epoch: {epoch + 1}, validation loss: {loss_val.ToString("0.0000")}, validation accuracy: {accuracy_val.ToString("P")}"); print("---------------------------------------------------------"); }
Test
After the training is done, we have to test our model to see how good it performs on a new dataset.
public void Test(Session sess) { var result = sess.run(new[] { loss, accuracy }, new FeedItem(x, x_test), new FeedItem(y, y_test)); loss_test = result[0]; accuracy_test = result[1]; print("---------------------------------------------------------"); print($"Test loss: {loss_test.ToString("0.0000")}, test accuracy: {accuracy_test.ToString("P")}"); print("---------------------------------------------------------"); }