Welcome to TensorFlow.NET’s documentation!

The Definitive Guide to TensorFlow.NET

_images/front-cover.jpgFront Cover

The CSharp binding for Google’s TensorFlow

An Open Source Machine Learning Framework for Everyone

Haiping Chen
Christmas, 2018

Foreword

One of the most nerve-wracking periods when releasing the first version of an open source project occurs when the gitter community is created. You are all alone, eagerly hoping and wishing for the first user to come along. I still vividly remember those days.

TensorFlow.NET is my third open source project. BotSharp and NumSharp are the first two. The response is pretty good. I also got a lot of stars on github. Although the first two projects are very difficult, I can’t admit that TensorFlow.NET is much more difficult than the previous two, and it is an area I have never been involved with. Mainly related to GPU parallel computing, distributed computing and neural network model. When I started writing this project, I was also sorting out the idea of the coding process. TensorFlow is a huge and complicated project, and it is easy to go beyond the scope of personal ability. Therefore, I want to record the thoughts at the time as much as possible. The process of recording and sorting clears the way of thinking.

All the examples in this book can be found in the github repository of TensorFlow.NET. When the source code and the code in the book are inconsistent, please refer to the source code. The sample code is typically located in the Example or UnitTest project.

Preface

Why do I start the TensorFlow.NET project?

In a few days, it was Christmas in 2018. I watched my children grow up and be sensible every day, and I felt that time passed too fast. IT technology updates are faster than ever, and a variety of front-end technologies are emerging. Big data, Artificial Intelligence and Blockchain, Container technology and Microservices, Distributed Computing and Serverless technology are dazzling. The Amazon AI service interface claims that engineers who don’t need any machine learning experience can use it, so that the idea of just calming down for two years and planning to switch to an AI architecture in the future is a splash of cold water.

TensorFlow is an open source project for machine learning especially for deep learning. It’s used for both research and production at Google company. It’s designed according to dataflow programming pattern across a range of tasks. TensorFlow is not just a deep learning library. As long as you can represent your calculation process as a data flow diagram, you can use TensorFlow for distributed computing. TensorFlow uses a computational graph to build a computing network while operating on the graph. Users can write their own upper-level models in Python based on TensorFlow, or extend the underlying C++ custom action code to TensorFlow.

In order to avoid confusion, the unique classes defined in TensorFlow are not translated in this book. For example, Tensor, Graph, Shape will retain the English name.

Get started with TensorFlow.NET

I would describe TensorFlow as an open source machine learning framework developed by Google which can be used to build neural networks and perform a variety of machine learning tasks. it works on data flow graph where nodes are the mathematical operations and the edges are the data in the form of tensor, hence the name Tensor-Flow.

Let’s run a classic HelloWorld program first and see if TensorFlow is running on .NET. I can’t think of a simpler way to be a HelloWorld.

Install the TensorFlow.NET SDK

TensorFlow.NET uses the .NET Standard 2.0 standard, so your new project Target Framework can be .NET Framework or .NET Core. All the examples in this book are using .NET Core 2.2 and Microsoft Visual Studio Community 2017. To start building TensorFlow program you just need to download and install the .NET SDK (Software Development Kit). You have to download the latest .NET Core SDK from offical website: https://dotnet.microsoft.com/download.

  1. New a project

    _images/new-project.pngNew Project

  2. Choose Console App (.NET Core)

    _images/new-project-console.pngConsole App

### install tensorflow C# binding
PM> Install-Package TensorFlow.NET

### Install tensorflow binary
### For CPU version
PM> Install-Package SciSharp.TensorFlow.Redist

### For GPU version (CUDA and cuDNN are required)
PM> Install-Package SciSharp.TensorFlow.Redist-Windows-GPU

Start coding Hello World

After installing the TensorFlow.NET package, you can use the using Tensorflow to introduce the TensorFlow library.

using System;
using static Tensorflow.Binding;

namespace TensorFlowNET.Examples
{
    /// <summary>
    /// Simple hello world using TensorFlow
    /// </summary>
    public class HelloWorld : IExample
    {
        public void Run()
        {
            /* Create a Constant op
               The op is added as a node to the default graph.
            
               The value returned by the constructor represents the output
               of the Constant op. */
            var hello = tf.constant("Hello, TensorFlow!");

            // Start tf session
            using (var sess = tf.Session())
            {
                // Run the op
                var result = sess.run(hello);
                Console.WriteLine(result);
            }
        }
    }
}

After CTRL + F5 run, you will get the output.

2019-01-05 10:53:42.145931: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
Hello, TensorFlow!
Press any key to continue . . .

This sample code can be found at here.

Chapter. Tensor

Represents one of the outputs of an Operation

What is Tensor?

Tensor holds a multi-dimensional array of elements of a single data type which is very similar with numpy’s ndarray. When the dimension is zero, it can be called a scalar. When the dimension is 2, it can be called a matrix. When the dimension is greater than 2, it is usually called a tensor. If you are very familiar with numpy, then understanding Tensor will be quite easy.

How to create a Tensor?

There are many ways to initialize a Tensor object in TF.NET. It can be initialized from a scalar, string, matrix or tensor.

// Create a tensor holds a scalar value
var t1 = new Tensor(3);

// Init from a string
var t2 = new Tensor("Hello! TensorFlow.NET");

// Tensor holds a ndarray
var nd = new NDArray(new int[]{3, 1, 1, 2});
var t3 = new Tensor(nd);

Console.WriteLine($"t1: {t1}, t2: {t2}, t3: {t3}");

Data Structure of Tensor

TF uses column major order. If we use NumSharp to generate a 2 x 3 matrix, if we access the data from 0 to 5 in order, we won’t get a number of 1-6, but we get the order of 1, 4, 2, 5, 3, 6. a set of numbers.

// Generate a matrix:[[1, 2, 3], [4, 5, 6]]
var nd = np.array(1f, 2f, 3f, 4f, 5f, 6f).reshape(2, 3);
// The index will be   0  2  4    1  3  5, it's column-major order.

_images/column-major-order.pngcolumn-major order

_images/row-major-order.pngrow-major order

Chapter. Constant

In TensorFlow, a constant is a special Tensor that cannot be modified while the graph is running. Like in a linear model $\tilde{y_i}=\boldsymbol{w}x_i+b$, constant $b$ can be represented as a Constant Tensor. Since the constant is a Tensor, it also has all the data characteristics of Tensor, including:

  • value: scalar value or constant list matching the data type defined in TensorFlow;
  • dtype: data type;
  • shape: dimensions;
  • name: constant’s name;

How to create a Constant

TensorFlow provides a handy function to create a Constant. In TF.NET, you can use the same function name tf.constant to create it. TF.NET takes the same name as python binding to the API. Naming, although this will make developers who are used to C# naming habits feel uncomfortable, but after careful consideration, I decided to give up the C# convention naming method.

Initialize a scalar constant:

var c1 = tf.constant(3); // int
var c2 = tf.constant(1.0f); // float
var c3 = tf.constant(2.0); // double
var c4 = tf.constant("Big Tree"); // string

Initialize a constant through ndarray:

// dtype=int, shape=(2, 3)
var nd = np.array(new int[][]
{
	new int[]{3, 1, 1},
    new int[]{2, 3, 1}
});
var tensor = tf.constant(nd);

Dive in Constant

Now let’s explore how constant works.

Other functions to create a Constant

  • tf.zeros
  • tf.zeros_like
  • tf.ones
  • tf.ones_like
  • tf.fill

Chapter. Variable

The variables in TensorFlow are mainly used to represent variable parameter values in the machine learning model. Variables can be initialized by the tf.Variable function. During the graph computation the variables are modified by other operations. Variables exist in the session, as long as they are in the same session, other computing nodes on the network can access the same variable value. Variables use lazy loading and will only request memory space when they are used.

var x = tf.Variable(10, name: "x");
using (var session = tf.Session())
{
    session.run(x.initializer);
    var result = session.run(x);
    Console.Write(result); // should be 10
}

The above code first creates a variable operation, initializes the variable, then runs the session, and finally gets the result. This code is very simple, but it shows the complete process how TensorFlow operates on variables. When creating a variable, you pass a tensor as the initial value to the function Variable(). TensorFlow provides a series of operators to initialize the tensor, the initial value is a constant or a random value.

Chapter. Placeholder

In this chapter we will talk about another common data type in TensorFlow: Placeholder. It is a simplified variable that can be passed to the required value by the session when the graph is run, that is, when you build the graph, you don’t need to specify the value of that variable, but delay the session to the beginning. In TensorFlow terminology, we then feed data into the graph through these placeholders. The difference between placeholders and constants is that placeholders can specify coefficient values more flexibly without modifying the code that builds the graph. For example, mathematical constants are suitable for Constant, and some model smoothing values can be specified with Placeholder.

var x = tf.placeholder(tf.int32);
var y = x * 3;

using (var sess = tf.Session())
{
    var result = sess.run(y, feed_dict: new FeedItem[]
    {
        new FeedItem(x, 2)
    });
    // (int)result should be 6;
}

Chapter. Graph

TensorFlow uses a dataflow graph to represent your computation in terms of the dependencies between individual operations. A graph defines the computation. It doesn’t compute anything, it doesn’t hold any values, it just defines the operations that you specified in your code.

Defining the Graph

We define a graph with a variable and three operations: variable returns the current value of our variable. initialize assigns the initial value of 31 to that variable. assign assigns the new value of 12 to that variable.

with<Graph>(tf.Graph().as_default(), graph =>
{
	var variable = tf.Variable(31, name: "tree");
	tf.global_variables_initializer();
	variable.assign(12);
});

TF.NET simulate a with syntax to manage the Graph lifecycle which will be disposed when the graph instance is no long need. The graph is also what the sessions in the next chapter use when not manually specifying a graph because use invoked the as_default().

A typical graph is looks like below:

_images/graph_vis_animation.gifimage

Save Model

Saving the model means saving all the values of the parameters and the graph.

saver = tf.train.Saver()
saver.save(sess,'./tensorflowModel.ckpt')

After saving the model there will be four files:

  • tensorflowModel.ckpt.meta:
  • tensorflowModel.ckpt.data-00000-of-00001:
  • tensorflowModel.ckpt.index
  • checkpoint

We also created a protocol buffer file .pbtxt. It is human readable if you want to convert it to binary: as_text: false.

  • tensorflowModel.pbtxt:

This holds a network of nodes, each representing one operation, connected to each other as inputs and outputs.

Freezing the Graph

Why we need it?

When we need to keep all the values of the variables and the Graph structure in a single file we have to freeze the graph.

from tensorflow.python.tools import freeze_graph

freeze_graph.freeze_graph(input_graph = 'logistic_regression/tensorflowModel.pbtxt', 
                              input_saver = "", 
                              input_binary = False, 
                              input_checkpoint = 'logistic_regression/tensorflowModel.ckpt', 
                              output_node_names = "Softmax",
                              restore_op_name = "save/restore_all", 
                              filename_tensor_name = "save/Const:0",
                              output_graph = 'frozentensorflowModel.pb', 
                              clear_devices = True, 
                              initializer_nodes = "")

Optimizing for Inference

To Reduce the amount of computation needed when the network is used only for inferences we can remove some parts of a graph that are only needed for training.

Restoring the Model

Chapter. Session

TensorFlow session runs parts of the graph across a set of local and remote devices. A session allows to execute graphs or part of graphs. It allocates resources (on one or more machines) for that and holds the actual values of intermediate results and variables.

Running Computations in a Session

Let’s complete the example in last chapter. To run any of the operations, we need to create a session for that graph. The session will also allocate memory to store the current value of the variable.

with<Graph>(tf.Graph(), graph =>
{
    var variable = tf.Variable(31, name: "tree");
    var init = tf.global_variables_initializer();

    var sess = tf.Session(graph);
    sess.run(init);

    var result = sess.run(variable); // 31

    var assign = variable.assign(12);
    result = sess.run(assign); // 12
});

The value of our variables is only valid within one session. If we try to get the value in another session. TensorFlow will raise an error of Attempting to use uninitialized value foo. Of course, we can use the graph in more than one session, because session copies graph definition to new memory area. We just have to initialize the variables again. The values in the new session will be completely independent from the previous one.

Chapter. Operation

Operation represents a Graph node that performs computation on tensors. An operation is a Node in a Graph that takes zero or more Tensors (produced by other Operations in the Graph) as input, and produces zero or more Tensors as output.

Chapter. Queue

ThensorFlow is capable to handle multiple threads, and queues are powerful mechanism for asynchronous computation. If we have large datasets this can significantly speed up the training process of our models. This functionality is especially handy when reading, pre-processing and extracting in mini-batches our training data. The secret to being able to do professional and high performance training of our model is understanding TensorFlow queuing operations. TensorFlow has implemented 4 types of Queue: FIFOQueue, PaddingFIFOQueue, PriorityQueue and RandomShuffleQueue.

_images/FIFOQueue-example.jpgFIFOQueue

Like everything in TensorFlow, a queue is a node in a computation graph. It’s a stateful node, like a variable: other nodes can modify its content, In particular, nodes can enqueue new items into the queue, or dequeue existing items from the queue.

To get started with queue, let’s consider a simple example. We will create a “first in, first out” queue (FIFOQueue) and fill it with numbers. Then we’ll construct a graph that takes an item off the queue, adds one to that item, and puts it back on the end of the queue.

[TestMethod]
public void FIFOQueue()
{
	// create a first in first out queue with capacity up to 2
	// and data type set as int32
	var queue = tf.FIFOQueue(2, tf.int32);
	// init queue, push 2 elements into queue.
	var init = queue.enqueue_many(new[] { 10, 20 });
	// pop out the first element
	var x = queue.dequeue();
	// add 1
	var y = x + 1;
	// push back into queue
	var inc = queue.enqueue(y);

	using (var sess = tf.Session())
	{
		// init queue
		init.run();

		// pop out first element and push back calculated y
		(int dequeued, _) = sess.run((x, inc));
		Assert.AreEqual(10, dequeued);

		(dequeued, _) = sess.run((x, inc));
		Assert.AreEqual(20, dequeued);

		(dequeued, _) = sess.run((x, inc));
		Assert.AreEqual(11, dequeued);

		(dequeued, _) = sess.run((x, inc));
		Assert.AreEqual(21, dequeued);
        
		// thread will hang or block if you run sess.run(x) again
		// until queue has more element.
	}
}

Enqueue, EnqueueMany and Dequeue are special nodes. They take a pointer to the queue instead of a normal value, allowing them to change it. I first create a FIFOQueue queue of size up to 3, I enqueue two values into the queue. Then I immediately attempt to dequeue a value from it and assign it to y where I simply add 1 to the dequeued variable. Next, we start up a session and run. After we’ve run this operation a few times the queue will be empty - if we try and run the operation again, the main thread of the program will hang or block - this is because it will be waiting for another operation to be run to put more values in the queue.

FIFOQueue

Creates a queue that dequeues elements in a first-in first-out order. A FIFOQueue has bounded capacity; supports multiple concurrent producers and consumers; and provides exactly-once delivery. A FIFOQueue holds a list of up to capacity elements. Each element is a fixed-length tuple of tensors whose dtypes are described by dtypes, and whose shapes are optionally described by the shapes argument.

PaddingFIFOQueue

A FIFOQueue that supports batching variable-sized tensors by padding. A PaddingFIFOQueue may contain components with dynamic shape, while also supporting dequeue_many. A PaddingFIFOQueue holds a list of up to capacity elements. Each element is a fixed-length tuple of tensors whose dtypes are described by dtypes, and whose shapes are described by the shapes argument.

[TestMethod]
public void PaddingFIFOQueue()
{
	var numbers = tf.placeholder(tf.int32);
	var queue = tf.PaddingFIFOQueue(10, tf.int32, new TensorShape(-1));
	var enqueue = queue.enqueue(numbers);
	var dequeue_many = queue.dequeue_many(n: 3);

	using(var sess = tf.Session())
	{
		sess.run(enqueue, (numbers, new[] { 1 }));
		sess.run(enqueue, (numbers, new[] { 2, 3 }));
		sess.run(enqueue, (numbers, new[] { 3, 4, 5 }));

		var result = sess.run(dequeue_many[0]);

		Assert.IsTrue(Enumerable.SequenceEqual(new int[] { 1, 0, 0 }, result[0].ToArray<int>()));
		Assert.IsTrue(Enumerable.SequenceEqual(new int[] { 2, 3, 0 }, result[1].ToArray<int>()));
		Assert.IsTrue(Enumerable.SequenceEqual(new int[] { 3, 4, 5 }, result[2].ToArray<int>()));
	}
}

PriorityQueue

A queue implementation that dequeues elements in prioritized order. A PriorityQueue has bounded capacity; supports multiple concurrent producers and consumers; and provides exactly-once delivery. A PriorityQueue holds a list of up to capacity elements. Each element is a fixed-length tuple of tensors whose dtypes are described by types, and whose shapes are optionally described by the shapes argument.

[TestMethod]
public void PriorityQueue()
{
	var queue = tf.PriorityQueue(3, tf.@string);
	var init = queue.enqueue_many(new[] { 2L, 4L, 3L }, new[] { "p1", "p2", "p3" });
	var x = queue.dequeue();

	using (var sess = tf.Session())
	{
		init.run();

		// output will 2, 3, 4
		var result = sess.run(x);
		Assert.AreEqual(result[0].GetInt64(), 2L);

		result = sess.run(x);
		Assert.AreEqual(result[0].GetInt64(), 3L);

		result = sess.run(x);
		Assert.AreEqual(result[0].GetInt64(), 4L);
	}
}

RandomShuffleQueue

A queue implementation that dequeues elements in a random order. A RandomShuffleQueue has bounded capacity; supports multiple concurrent producers and consumers; and provides exactly-once delivery. A RandomShuffleQueue holds a list of up to capacity elements. Each element is a fixed-length tuple of tensors whose dtypes are described by dtypes, and whose shapes are optionally described by the shapes argument.

[TestMethod]
public void RandomShuffleQueue()
{
	var queue = tf.RandomShuffleQueue(10, min_after_dequeue: 1, dtype: tf.int32);
	var init = queue.enqueue_many(new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 });
	var x = queue.dequeue();

	string results = "";
	using (var sess = tf.Session())
	{
		init.run();

		foreach(var i in range(9))
			results += (int)sess.run(x) + ".";

		// output in random order
		// 1.2.3.4.5.6.7.8.9.
	}
}

Queue methods must run on the same device as the queue. FIFOQueue and RandomShuffleQueue are important TensorFlow objects for computing tensor asynchronously in a graph. For example, a typical input architecture is to use a RandomShuffleQueue to prepare inputs for training a model:

  • Multiple threads prepare training examples and push them in the queue.
  • A training thread executes a training op that dequeues mini-batches from the queue.

This architecture simplifies the construction of input pipelines.

From the above example, once the output gets to the point above you’ll actually have to terminate the program as it is blocked. Now, this isn’t very useful. What we really want to happen is for our little program to reload or enqueue more values whenever our queue is empty or is about to become empty. We could fix this by explicitly running our enqueue_op again in the code above to reload our queue with values. However, for large, more realistic programs, this will become unwieldy. Thankfully, TensorFlow has a solution.

TensorFlow provides two classes to help multi-threading task: tf.Coordinator and tf.QueueRunner. There two classes are designed to be used together. The Coordinator class helps multiple threads stop together and report exceptions to a main thread. The QueueRunner class is used to create a number of threads cooperating to enqueue tensors in the same queue.

Chapter. Gradient

Register custom gradient function

TF.NET is extensible which can be added custom gradient function.

// define gradient function
ops.RegisterGradientFunction("ConcatV2", (oper, out_grads) => 
{
    var grad = grads[0];
    return new Tensor[]{ };    
});

Chapter. Trainer

Saver

The tf.train.saver class provides methods to save and restore models.

Saver Builder

Bulk Saver Builder

Chapter. Eager Mode

Chapter. Linear Regression

What is linear regression?

Linear regression is a linear approach to modelling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables).

Consider the case of a single variable of interest y and a single predictor variable x. The predictor variables are called by many names: covariates, inputs, features; the predicted variable is often called response, output, outcome.

We have some data $D={x{\tiny i},y{\tiny i}}$ and we assume a simple linear model of this dataset with Gaussian noise:

// Prepare training Data
var train_X = np.array(3.3f, 4.4f, 5.5f, 6.71f, 6.93f, 4.168f, 9.779f, 6.182f, 7.59f, 2.167f, 7.042f, 10.791f, 5.313f, 7.997f, 5.654f, 9.27f, 3.1f);
var train_Y = np.array(1.7f, 2.76f, 2.09f, 3.19f, 1.694f, 1.573f, 3.366f, 2.596f, 2.53f, 1.221f, 2.827f, 3.465f, 1.65f, 2.904f, 2.42f, 2.94f, 1.3f);
var n_samples = train_X.shape[0];

_images/regression-dataset.pngregression dataset

Based on the given data points, we try to plot a line that models the points the best. The red line can be modelled based on the linear equation: $y = wx + b$. The motive of the linear regression algorithm is to find the best values for $w$ and $b$. Before moving on to the algorithm, le’s have a look at two important concepts you must know to better understand linear regression.

Cost Function

The cost function helps us to figure out the best possible values for $w$ and $b$ which would provide the best fit line for the data points. Since we want the best values for $w$ and $b$, we convert this search problem into a minimization problem where we would like to minimize the error between the predicted value and the actual value.

_images/minimize-square-cost.pngminimize-square-cost

We choose the above function to minimize. The difference between the predicted values and ground truth measures the error difference. We square the error difference and sum over all data points and divide that value by the total number of data points. This provides the average squared error over all the data points. Therefore, this cost function is also known as the Mean Squared Error(MSE) function. Now, using this MSE function we are going to change the values of $w$ and $b$ such that the MSE value settles at the minima.

// tf Graph Input
var X = tf.placeholder(tf.float32);
var Y = tf.placeholder(tf.float32);

// Set model weights 
var W = tf.Variable(rng.randn<float>(), name: "weight");
var b = tf.Variable(rng.randn<float>(), name: "bias");

// Construct a linear model
var pred = tf.add(tf.multiply(X, W), b);

// Mean squared error
var cost = tf.reduce_sum(tf.pow(pred - Y, 2.0f)) / (2.0f * n_samples);

Gradient Descent

The another important concept needed to understand is gradient descent. Gradient descent is a method of updating $w$ and $b$ to minimize the cost function. The idea is that we start with some random values for $w$ and $b$ and then we change these values iteratively to reduce the cost. Gradient descent helps us on how to update the values or which direction we would go next. Gradient descent is also know as steepest descent.

_images/gradient-descent.pnggradient-descent

To draw an analogy, imagine a pit in the shape of U and you are standing at the topmost point in the pit and your objective is to reach the bottom of the pit. There is a catch, you can only take a discrete number of steps to reach the bottom. If you decide to take one step at a time you would eventually reach the bottom of the pit but this would take a longer time. If you choose to take longer steps each time, you would reach sooner but, there is a chance that you could overshoot the bottom of the pit and not exactly at the bottom. In the gradient descent algorithm, the number of steps you take is the learning rate. This decides on how fast the algorithm converges to the minima.

// Gradient descent
// Note, minimize() knows to modify W and b because Variable objects are trainable=True by default
var optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost);

When we visualize the graph in TensorBoard:

_images/linear-regression-tensor-board.pnglinear-regression

The full example is here.

Chapter. Logistic Regression

What is logistic regression?

Logistic regression is a statistical analysis method used to predict a data value based on prior observations of a data set. A logistic regression model predicts a dependent data variable by analyzing the relationship between one or more existing independent variables.

The dependent variable of logistics regression can be two-category or multi-category, but the two-category is more common and easier to explain. So the most common use in practice is the logistics of the two classifications. An example used by TensorFlow.NET is a hand-written digit recognition, which is a multi-category.

Softmax regression allows us to handle _static%5Clogistic-regression%5C1557035393445.png1557035393445 where K is the number of classes.

The full example is here.

Chapter. Nearest Neighbor

The nearest neighbour algorithm was one of the first algorithms used to solve the travelling salesman problem. In it, the salesman starts at a random city and repeatedly visits the nearest city until all have been visited. It quickly yields a short tour, but usually not the optimal one.

The full example is here.

Chapter. Image Recognition

An example for using the TensorFlow.NET and NumSharp for image recognition, it will use a pre-trained inception model to predict a image which outputs the categories sorted by probability. The original paper is here. The Inception architecture of GoogLeNet was designed to perform well even under strict constraints on memory and computational budget. The computational cost of Inception is also much lower than other performing successors. This has made it feasible to utilize Inception networks in big-data scenarios, where huge amount of data needed to be processed at reasonable cost or scenarios where memory or computational capacity is inherently limited, for example in mobile vision settings.

The GoogLeNet architecture conforms to below design principles:

  • Avoid representational bottlenecks, especially early in the network.
  • Higher dimensional representations are easier to process locally within a network.
  • Spatial aggregation can be done over lower dimensional embeddings without much or any loss in representational power.
  • Balance the width and depth of the network.

Let’s get started with real code.

1. Prepare data

This example will download the dataset and uncompress it automatically. Some external paths are omitted, please refer to the source code for the real path.

private void PrepareData()
{
    Directory.CreateDirectory(dir);

    // get model file
    string url = "models/inception_v3_2016_08_28_frozen.pb.tar.gz";

    string zipFile = Path.Join(dir, $"{pbFile}.tar.gz");
    Utility.Web.Download(url, zipFile);

    Utility.Compress.ExtractTGZ(zipFile, dir);

    // download sample picture
    string pic = "grace_hopper.jpg";
    Utility.Web.Download($"data/{pic}", Path.Join(dir, pic));
}

2. Load image file and normalize

We need to load a sample image to test our pre-trained inception model. Convert it into tensor and normalized the input image. The pre-trained model takes input in the form of a 4-dimensional tensor with shape [BATCH_SIZE, INPUT_HEIGHT, INPUT_WEIGHT, 3] where:

  • BATCH_SIZE allows for inference of multiple images in one pass through the graph
  • INPUT_HEIGHT is the height of the images on which the model was trained
  • INPUT_WEIGHT is the width of the images on which the model was trained
  • 3 is the (R, G, B) values of the pixel colors represented as a float.
private NDArray ReadTensorFromImageFile(string file_name,
                                int input_height = 299,
                                int input_width = 299,
                                int input_mean = 0,
                                int input_std = 255)
{
	return with<Graph, NDArray>(tf.Graph().as_default(), graph =>
    {
		var file_reader = tf.read_file(file_name, "file_reader");
        var image_reader = tf.image.decode_jpeg(file_reader, channels: 3, name: "jpeg_reader");
        var caster = tf.cast(image_reader, tf.float32);
        var dims_expander = tf.expand_dims(caster, 0);
        var resize = tf.constant(new int[] { input_height, input_width });
        var bilinear = tf.image.resize_bilinear(dims_expander, resize);
        var sub = tf.subtract(bilinear, new float[] { input_mean });
        var normalized = tf.divide(sub, new float[] { input_std });

		return with<Session, NDArray>(tf.Session(graph), sess => sess.run(normalized));
    });
}

3. Load pre-trained model and predict

Load the pre-trained inception model which is saved as Google’s protobuf file format. Construct a new graph then set input and output operations in a new session. After run the session, you will get a numpy-like ndarray which is provided by NumSharp. With NumSharp, you can easily perform various operations on multiple dimensional arrays in the .NET environment.

public void Run()
{
	PrepareData();

	var labels = File.ReadAllLines(Path.Join(dir, labelFile));

    var nd = ReadTensorFromImageFile(Path.Join(dir, picFile),
        input_height: input_height,
        input_width: input_width,
        input_mean: input_mean,
        input_std: input_std);

    var graph = Graph.ImportFromPB(Path.Join(dir, pbFile));
    var input_operation = graph.get_operation_by_name(input_name);
    var output_operation = graph.get_operation_by_name(output_name);

    var results = with<Session, NDArray>(tf.Session(graph),
    	sess => sess.run(output_operation.outputs[0], 
        	new FeedItem(input_operation.outputs[0], nd)));

	results = np.squeeze(results);

    var argsort = results.argsort<float>();
    var top_k = argsort.Data<float>()
        .Skip(results.size - 5)
        .Reverse()
        .ToArray();

    foreach (float idx in top_k)
    	Console.WriteLine($"{picFile}: {idx} {labels[(int)idx]}, {results[(int)idx]}");
}

Chapter. Neural Network

In this chapter, we’ll learn how to build a graph of neural network model. The key advantage of neural network compared to Linear Classifier is that it can separate data which it not linearly separable. We’ll implement this model to classify hand-written digits images from the MNIST dataset.

The structure of the neural network we’re going to build is as follows. The hand-written digits images of the MNIST data which has 10 classes (from 0 to 9). The network is with 2 hidden layers: the first layer with 200 hidden units (neurons) and the second one (known as classifier layer) with 10 neurons.

_images/nn.pngneural network architecture

Get started with the implementation step by step:

  1. Prepare data

    MNIST is dataset of handwritten digits which contains 55,000 examples for training, 5,000 examples for validation and 10,000 example for testing. The digits have been size-normalized and centered in a fixed-size image (28 x 28 pixels) with values from 0 and 1.Each image has been flattened and converted to a 1-D array of 784 features. It’s also kind of benchmark of datasets for deep learning.

    _images/mnist.pngMNIST dataset

    We define some variables makes it easier to modify them later. It’s important to note that in a linear model, we have to flatten the input images to a vector.

    using System;
    using NumSharp;
    using Tensorflow;
    using TensorFlowNET.Examples.Utility;
    using static Tensorflow.Python;
    
    const int img_h = 28;
    const int img_w = 28;
    int img_size_flat = img_h * img_w; // 784, the total number of pixels
    int n_classes = 10; // Number of classes, one class per digit
    

    We’ll write the function which automatically loads the MNIST data and returns it in our desired shape and format. There is an MNIST data helper to make life easier.

    Datasets mnist;
    public void PrepareData()
    {
        mnist = MnistDataSet.read_data_sets("mnist", one_hot: true);
    }
    

    Other than a function for loading the images and corresponding labels, we still need two more functions:

    randomize: which randomizes the order of images and their labels. At the beginning of each epoch, we will re-randomize the order of data samples to make sure that the trained model is not sensitive to the order of data.

    private (NDArray, NDArray) randomize(NDArray x, NDArray y)
    {
        var perm = np.random.permutation(y.shape[0]);
    
        np.random.shuffle(perm);
        return (mnist.train.images[perm], mnist.train.labels[perm]);
    }
    

    get_next_batch: which only selects a few number of images determined by the batch_size variable (as per Stochastic Gradient Descent method).

    private (NDArray, NDArray) get_next_batch(NDArray x, NDArray y, int start, int end)
    {
        var x_batch = x[$"{start}:{end}"];
        var y_batch = y[$"{start}:{end}"];
        return (x_batch, y_batch);
    }
    
  2. Set Hyperparameters

    There’re about 55,000 images in training set, it takes a long time to calculate the gradient of the model using all there images. Therefore we use a small batch of images in each iteration of the optimizer by Stochastic Gradient Descent.

    • epoch: one forward pass and one backward pass of all the training examples.
    • batch size: the number of training examples in one forward/backward pass. The higher the batch size, the more memory space you’ll need.
    • iteration: one forward pass and one backward pass of one batch of images the training examples.
    int epochs = 10;
    int batch_size = 100;
    float learning_rate = 0.001f;
    int h1 = 200; // number of nodes in the 1st hidden layer
    
  3. Building the neural network

    Let’s make some functions to help build computation graph.

    variables: We need to define two variables W and b to construct our linear model. We use Tensorflow Variables of proper size and initialization to define them.

    // weight_variable
    var in_dim = x.shape[1];
    
    var initer = tf.truncated_normal_initializer(stddev: 0.01f);
    var W = tf.get_variable("W_" + name,
                            dtype: tf.float32,
                            shape: (in_dim, num_units),
                            initializer: initer);
    
    // bias_variable
    var initial = tf.constant(0f, num_units);
    var b = tf.get_variable("b_" + name,
                            dtype: tf.float32,
                            initializer: initial);
    

    fully-connected layer: Neural network consists of stacks of fully-connected (dense) layers. Having the weight (W) and bias (b) variables, a fully-connected layer is defined as activation(W x X + b). The complete fc_layer function is as below:

    private Tensor fc_layer(Tensor x, int num_units, string name, bool use_relu = true)
    {
        var in_dim = x.shape[1];
    
        var initer = tf.truncated_normal_initializer(stddev: 0.01f);
        var W = tf.get_variable("W_" + name,
                                dtype: tf.float32,
                                shape: (in_dim, num_units),
                                initializer: initer);
    
        var initial = tf.constant(0f, num_units);
        var b = tf.get_variable("b_" + name,
                                dtype: tf.float32,
                                initializer: initial);
    
        var layer = tf.matmul(x, W) + b;
        if (use_relu)
            layer = tf.nn.relu(layer);
    
        return layer;
    } 
    

    inputs: Now we need to define the proper tensors to feed in the input to our model. Placeholder variable is the suitable choice for the input images and corresponding labels. This allow us to change the inputs (images and labels) to the TensorFlow graph.

    // Placeholders for inputs (x) and outputs(y)
    x = tf.placeholder(tf.float32, shape: (-1, img_size_flat), name: "X");
    y = tf.placeholder(tf.float32, shape: (-1, n_classes), name: "Y");
    

    Placeholder x is defined for the images, the shape is set to [None, img_size_flat], where None means that the tensor may hold an arbitrary number of images with each image being a vector of length img_size_flat.

    Placeholder y is the variable for the true labels associated with the images that were input in the placeholder variable x. It holds an arbitrary number of labels and each label is a vector of length num_classes which is 10.

    network layers: After creating the proper input, we have to pass it to our model. Since we have a neural network, we can stack multiple fully-connected layers using fc_layer method. Note that we will not use any activation function (use_relu = false) in the last layer. The reason is that we can use tf.nn.softmax_cross_entropy_with_logits to calculate the loss.

    // Create a fully-connected layer with h1 nodes as hidden layer
    var fc1 = fc_layer(x, h1, "FC1", use_relu: true);
    // Create a fully-connected layer with n_classes nodes as output layer
    var output_logits = fc_layer(fc1, n_classes, "OUT", use_relu: false);
    

    loss function: After creating the network, we have to calculate the loss and optimize it, we have to calculate the correct_prediction and accuracy.

    // Define the loss function, optimizer, and accuracy
    var logits = tf.nn.softmax_cross_entropy_with_logits(labels: y, logits: output_logits);
    loss = tf.reduce_mean(logits, name: "loss");
    optimizer = tf.train.AdamOptimizer(learning_rate: learning_rate, name: "Adam-op").minimize(loss);
    var correct_prediction = tf.equal(tf.argmax(output_logits, 1), tf.argmax(y, 1), name: "correct_pred");
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name: "accuracy");
    

    initialize variables: We have to invoke a variable initializer operation to initialize all variables.

    var init = tf.global_variables_initializer();
    

    The complete computation graph is looks like below:

    _images/TensorBoard-nn.pngTensorBoard-nn

  4. Train

    After creating the graph, we can train our model. To train the model, we have to create a session and run the graph in the session.

    // Number of training iterations in each epoch
    var num_tr_iter = mnist.train.labels.len / batch_size;
    with(tf.Session(), sess =>
    {
        sess.run(init);
    
        float loss_val = 100.0f;
        float accuracy_val = 0f;
    
        foreach (var epoch in range(epochs))
        {
            print($"Training epoch: {epoch + 1}");
            // Randomly shuffle the training data at the beginning of each epoch 
            var (x_train, y_train) = randomize(mnist.train.images, mnist.train.labels);
    
            foreach (var iteration in range(num_tr_iter))
            {
                var start = iteration * batch_size;
                var end = (iteration + 1) * batch_size;
                var (x_batch, y_batch) = get_next_batch(x_train, y_train, start, end);
    
                // Run optimization op (backprop)
                sess.run(optimizer, new FeedItem(x, x_batch), new FeedItem(y, y_batch));
    
                if (iteration % display_freq == 0)
                {
                    // Calculate and display the batch loss and accuracy
                    var result = sess.run(new[] { loss, accuracy }, new FeedItem(x, x_batch), new FeedItem(y, y_batch));
                    loss_val = result[0];
                    accuracy_val = result[1];
                    print($"iter {iteration.ToString("000")}: Loss={loss_val.ToString("0.0000")}, Training Accuracy={accuracy_val.ToString("P")}");
                }
            }
    
            // Run validation after every epoch
            var results1 = sess.run(new[] { loss, accuracy }, new FeedItem(x, mnist.validation.images), new FeedItem(y, mnist.validation.labels));
            loss_val = results1[0];
            accuracy_val = results1[1];
            print("---------------------------------------------------------");
            print($"Epoch: {epoch + 1}, validation loss: {loss_val.ToString("0.0000")}, validation accuracy: {accuracy_val.ToString("P")}");
            print("---------------------------------------------------------");
        }
    });
    
  5. Test

    After the training is done, we have to test our model to see how good it performs on a new dataset.

    var result = sess.run(new[] { loss, accuracy }, new FeedItem(x, mnist.test.images), 	new FeedItem(y, mnist.test.labels));
    loss_test = result[0];
    accuracy_test = result[1];
    print("---------------------------------------------------------");
    print($"Test loss: {loss_test.ToString("0.0000")}, test accuracy: {accuracy_test.ToString("P")}");
    print("---------------------------------------------------------");
    

    _images/nn-result.pngresult

Chapter. Convolution Neural Network

In this chapter, we’ll implement a simple Convolutional Neural Network model. We’ll implement this model to classify MNIST dataset.

The structure of the neural network we’re going to build is as follows. The hand-written digits images of the MNIST data which has 10 classes (from 0 to 9). The network is with 2 convolutional layers followed by 2 full-connected layers at the end.

_images/cnn.pngneural network architecture

Get started with the implementation:

  1. Prepare data

    MNIST is dataset of handwritten digits which contains 55,000 examples for training, 5,000 examples for validation and 10,000 example for testing. The digits have been size-normalized and centered in a fixed-size image (28 x 28 pixels) with values from 0 and 1.Each image has been flattened and converted to a 1-D array of 784 features. It’s also kind of benchmark of datasets for deep learning.

    _images/mnist.pngMNIST dataset

    We define some variables makes it easier to modify them later.

    using System;
    using NumSharp;
    using Tensorflow;
    using TensorFlowNET.Examples.Utility;
    using static Tensorflow.Python;
    
    const int img_h = 28;
    const int img_w = 28;
    int n_classes = 10; // Number of classes, one class per digit
    int n_channels = 1;
    

    We’ll write the function which automatically loads the MNIST data and returns it in our desired shape and format. There is an MNIST data helper to make life easier.

    Datasets mnist;
    public void PrepareData()
    {
        mnist = MnistDataSet.read_data_sets("mnist", one_hot: true);
    }
    

    Other than a function for loading the images and corresponding labels, we still need three more functions:

    reformat: reformats the data to the format acceptable for convolutional layer.

    private (NDArray, NDArray) Reformat(NDArray x, NDArray y)
    {
        var (img_size, num_ch, num_class) = (np.sqrt(x.shape[1]), 1, len(np.unique<int>(np.argmax(y, 1))));
        var dataset = x.reshape(x.shape[0], img_size, img_size, num_ch).astype(np.float32);
        //y[0] = np.arange(num_class) == y[0];
        //var labels = (np.arange(num_class) == y.reshape(y.shape[0], 1, y.shape[1])).astype(np.float32);
        return (dataset, y);
    }
    

    randomize: which randomizes the order of images and their labels. At the beginning of each epoch, we will re-randomize the order of data samples to make sure that the trained model is not sensitive to the order of data.

    private (NDArray, NDArray) randomize(NDArray x, NDArray y)
    {
        var perm = np.random.permutation(y.shape[0]);
    
        np.random.shuffle(perm);
        return (mnist.train.images[perm], mnist.train.labels[perm]);
    }
    

    get_next_batch: which only selects a few number of images determined by the batch_size variable (as per Stochastic Gradient Descent method).

    private (NDArray, NDArray) get_next_batch(NDArray x, NDArray y, int start, int end)
    {
        var x_batch = x[$"{start}:{end}"];
        var y_batch = y[$"{start}:{end}"];
        return (x_batch, y_batch);
    }
    
  2. Set Hyperparameters

    There’re about 55,000 images in training set, it takes a long time to calculate the gradient of the model using all there images. Therefore we use a small batch of images in each iteration of the optimizer by Stochastic Gradient Descent.

    • epoch: one forward pass and one backward pass of all the training examples.
    • batch size: the number of training examples in one forward/backward pass. The higher the batch size, the more memory space you’ll need.
    • iteration: one forward pass and one backward pass of one batch of images the training examples.
    int epochs = 10;
    int batch_size = 100;
    float learning_rate = 0.001f;
    int display_freq = 200; // Frequency of displaying the training results
    
  3. Network configuration

    1st convolutional layer:

    int filter_size1 = 5;  // Convolution filters are 5 x 5 pixels.
    int num_filters1 = 16; //  There are 16 of these filters.
    int stride1 = 1;  // The stride of the sliding window
    

    2nd convolutional layer:

    int filter_size2 = 5; // Convolution filters are 5 x 5 pixels.
    int num_filters2 = 32;// There are 32 of these filters.
    int stride2 = 1;  // The stride of the sliding window
    

    Fully-connected layer:

    h1 = 128  # Number of neurons in fully-connected layer.
    
  4. Building the neural network

    Let’s make some functions to help build computation graph.

    variables: We need to define two variables W and b to construct our linear model. We use Tensorflow Variables of proper size and initialization to define them.

    // Create a weight variable with appropriate initialization
    private RefVariable weight_variable(string name, int[] shape)
    {
        var initer = tf.truncated_normal_initializer(stddev: 0.01f);
        return tf.get_variable(name,
                               dtype: tf.float32,
                               shape: shape,
                               initializer: initer);
    }
    
    // Create a bias variable with appropriate initialization
    private RefVariable bias_variable(string name, int[] shape)
    {
        var initial = tf.constant(0f, shape: shape, dtype: tf.float32);
        return tf.get_variable(name,
                               dtype: tf.float32,
                               initializer: initial);
    }
    

    2D convolution layer: This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs.

    private Tensor conv_layer(Tensor x, int filter_size, int num_filters, int stride, string name)
    {
        return with(tf.variable_scope(name), delegate {
    
            var num_in_channel = x.shape[x.NDims - 1];
            var shape = new[] { filter_size, filter_size, num_in_channel, num_filters };
            var W = weight_variable("W", shape);
            // var tf.summary.histogram("weight", W);
            var b = bias_variable("b", new[] { num_filters });
            // tf.summary.histogram("bias", b);
            var layer = tf.nn.conv2d(x, W,
                                     strides: new[] { 1, stride, stride, 1 },
                                     padding: "SAME");
            layer += b;
            return tf.nn.relu(layer);
        });
    }
    

    max-pooling layer: Max pooling operation for temporal data.

    private Tensor max_pool(Tensor x, int ksize, int stride, string name)
    {
        return tf.nn.max_pool(x,
                              ksize: new[] { 1, ksize, ksize, 1 },
                              strides: new[] { 1, stride, stride, 1 },
                              padding: "SAME",
                              name: name);
    }
    

    flatten_layer: Flattens the output of the convolutional layer to be fed into fully-connected layer.

    private Tensor flatten_layer(Tensor layer)
    {
        return with(tf.variable_scope("Flatten_layer"), delegate
                    {
                        var layer_shape = layer.TensorShape;
                        var num_features = layer_shape[new Slice(1, 4)].Size;
                        var layer_flat = tf.reshape(layer, new[] { -1, num_features });
    
                        return layer_flat;
                    });
    }
    

    fully-connected layer: Neural network consists of stacks of fully-connected (dense) layers. Having the weight (W) and bias (b) variables, a fully-connected layer is defined as activation(W x X + b). The complete fc_layer function is as below:

    private Tensor fc_layer(Tensor x, int num_units, string name, bool use_relu = true)
    {
        return with(tf.variable_scope(name), delegate
                    {
                        var in_dim = x.shape[1];
    
                        var W = weight_variable("W_" + name, shape: new[] { in_dim, num_units });
                        var b = bias_variable("b_" + name, new[] { num_units });
    
                        var layer = tf.matmul(x, W) + b;
                        if (use_relu)
                            layer = tf.nn.relu(layer);
    
                        return layer;
                    });
    } 
    

    inputs: Now we need to define the proper tensors to feed in the input to our model. Placeholder variable is the suitable choice for the input images and corresponding labels. This allow us to change the inputs (images and labels) to the TensorFlow graph.

    with(tf.name_scope("Input"), delegate
         {
             // Placeholders for inputs (x) and outputs(y)
             x = tf.placeholder(tf.float32, shape: (-1, img_h, img_w, n_channels), name: "X");
             y = tf.placeholder(tf.float32, shape: (-1, n_classes), name: "Y");
         });
    

    Placeholder y is the variable for the true labels associated with the images that were input in the placeholder variable x. It holds an arbitrary number of labels and each label is a vector of length num_classes which is 10.

    network layers: After creating the proper input, we have to pass it to our model. Since we have a neural network, we can stack multiple fully-connected layers using fc_layer method. Note that we will not use any activation function (use_relu = false) in the last layer. The reason is that we can use tf.nn.softmax_cross_entropy_with_logits to calculate the loss.

    var conv1 = conv_layer(x, filter_size1, num_filters1, stride1, name: "conv1");
    var pool1 = max_pool(conv1, ksize: 2, stride: 2, name: "pool1");
    var conv2 = conv_layer(pool1, filter_size2, num_filters2, stride2, name: "conv2");
    var pool2 = max_pool(conv2, ksize: 2, stride: 2, name: "pool2");
    var layer_flat = flatten_layer(pool2);
    var fc1 = fc_layer(layer_flat, h1, "FC1", use_relu: true);
    var output_logits = fc_layer(fc1, n_classes, "OUT", use_relu: false);
    

    loss function, optimizer, accuracy, prediction: After creating the network, we have to calculate the loss and optimize it, we have to calculate the prediction and accuracy.

    with(tf.variable_scope("Train"), delegate
         {
    
    
             with(tf.variable_scope("Optimizer"), delegate
                  {
                      optimizer = tf.train.AdamOptimizer(learning_rate: learning_rate, name: "Adam-op").minimize(loss);
                  });
    
             with(tf.variable_scope("Accuracy"), delegate
                  {
                      var correct_prediction = tf.equal(tf.argmax(output_logits, 1), tf.argmax(y, 1), name: "correct_pred");
                      accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name: "accuracy");
                  });
    
             with(tf.variable_scope("Prediction"), delegate
                  {
                      cls_prediction = tf.argmax(output_logits, axis: 1, name: "predictions");
                  });
         });
    

    initialize variables: We have to invoke a variable initializer operation to initialize all variables.

   var init = tf.global_variables_initializer();
  1. Train

    After creating the graph, we can train our model. To train the model, we have to create a session and run the graph in the session.

    // Number of training iterations in each epoch
    var num_tr_iter = y_train.len / batch_size;
    
    var init = tf.global_variables_initializer();
    sess.run(init);
    
    float loss_val = 100.0f;
    float accuracy_val = 0f;
    
    foreach (var epoch in range(epochs))
    {
        print($"Training epoch: {epoch + 1}");
        // Randomly shuffle the training data at the beginning of each epoch 
        (x_train, y_train) = mnist.Randomize(x_train, y_train);
    
        foreach (var iteration in range(num_tr_iter))
        {
            var start = iteration * batch_size;
            var end = (iteration + 1) * batch_size;
            var (x_batch, y_batch) = mnist.GetNextBatch(x_train, y_train, start, end);
    
            // Run optimization op (backprop)
            sess.run(optimizer, new FeedItem(x, x_batch), new FeedItem(y, y_batch));
    
            if (iteration % display_freq == 0)
            {
                // Calculate and display the batch loss and accuracy
                var result = sess.run(new[] { loss, accuracy }, new FeedItem(x, x_batch), new FeedItem(y, y_batch));
                loss_val = result[0];
                accuracy_val = result[1];
                print($"iter {iteration.ToString("000")}: Loss={loss_val.ToString("0.0000")}, Training Accuracy={accuracy_val.ToString("P")}");
            }
        }
    
        // Run validation after every epoch
        var results1 = sess.run(new[] { loss, accuracy }, new FeedItem(x, x_valid), new FeedItem(y, y_valid));
        loss_val = results1[0];
        accuracy_val = results1[1];
        print("---------------------------------------------------------");
        print($"Epoch: {epoch + 1}, validation loss: {loss_val.ToString("0.0000")}, validation accuracy: {accuracy_val.ToString("P")}");
        print("---------------------------------------------------------");
    }
    
  2. Test

    After the training is done, we have to test our model to see how good it performs on a new dataset.

    public void Test(Session sess)
    {
        var result = sess.run(new[] { loss, accuracy }, new FeedItem(x, x_test), new FeedItem(y, y_test));
        loss_test = result[0];
        accuracy_test = result[1];
        print("---------------------------------------------------------");
        print($"Test loss: {loss_test.ToString("0.0000")}, test accuracy: {accuracy_test.ToString("P")}");
     print("---------------------------------------------------------");
    }
    

_images/cnn-result.png