Machine Learning with PyTroch

Frequency Anticipated Questions

On this page, you will find a list of questions that we either anticipate people will ask or that we have been asked previously. They are intended to be the first stop for any confusion or trouble that might occur.

What do I need to know?

You should come to the training session knowing the Python programming language. No knowledge of Machine Learning or Deep Learning is assumed.

In addition, you might take more away from the course if you know have an idea of the kinds of data for which you would most likely be using Machine Learning.

Do I need to have a laptop?

Yes! You should bring a laptop that can run a modern web browser. Most of your work will happen inside the browser. During the training, you will connect to O’Reilly’s JupyterHub where a Jupyter instance will be created for you! You can run the code and follow along with the training using just this!

So, you could also follow along this tutorial on an iPad or any such tablet.

Can I run this tutorial code locally?

Yes! If you want to bring your laptop to run the materials, feel free to do so! We will provide a Dockerfile that you can build and run, so you don’t have to worry about dependencies. Please be warned that it takes time to build the docker image in class and you might have trouble with the conference WiFi, etc. We will provide less support if you don’t want to use the Dockerfile, but we understand if you want to get the environment working natively. We ask that you attempt the setup before the training sessions begin so that you are not encumbered by downloading data and the setup.

In the worst case, you can always fall back to using the JupyterHub instance provided for you!

If I use my laptop, does it need to have a GPU?

While having a NVIDIA GPU enabled laptop will make the training run faster (if running locally), it is not completely necessary.

If you are plan on working on Deep Learning in the future, a GPU enabled laptop might be a good investment.

Tensor-Fu

Below you will find exercises to practice your tensor-fu!

Tensor-Fu-1

Exercise 1

import torch
from torch import nn
x = torch.randn(9, 10)

Exercise 2

import torch
from torch import nn

x2dim = torch.randn(9, 10)

# required and default parameters:
# fc = nn.Linear(in_features, out_features)

Task: Create a linear layer which works wih x2dim

Exercise 3

import torch
from torch import nn

x4dim = torch.randn(9, 10, 11, 12)

# required and default parameters:
# conv1 = nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0)

Task: Create a convolution which works on x4dim

Exercise 4

import torch
from torch import nn

x3dim = torch.randn(9, 10, 11)

# required and default parameters:
# rnn = nn.RNN(input_size, hidden_size, batch_first=True)

Task: Create an RNN which works on x3dim.

Special note: The RNN will output 2 values. The first is the output at each timestep. The second is the final hidden state for each batch item. There is something odd (tricky) about the final hidden state. What is it?

Also, what happens if batch_first is False? Important for future headaches: batch_first is by default False.

Exercise 5

import torch
from torch import nn

x4dim = torch.randn(9, 10, 11, 12)

# required and default parameters:
# conv1 = nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0)

Task: Create a convolution that has the same in_channels as out_channels that will work with x4dim. How many times can you apply it before it’s as small as it can get? What happens at this point? Can you think of a way to solve it?

Exercise 6

import torch
from torch import nn

x4dim = torch.randn(9, 10, 11, 12)

class CustomConvolutions(nn.Module):
    def __init__(self):
        super(CustomConvolutions, self).__init__()

    def forward(self, x4dim):
        pass

Task: Once you have the series of steps you want to encapsulate, write a class that subclasses from nn.Module and does those computations in the forward pass.

Tensor-Fu-2

Exercise 1

indices = torch.arange(10).long()
indices = torch.from_numpy(np.random.randint(0, 10, size=(10,)))

emb = nn.Embedding(num_embeddings=100, embedding_dim=16)
emb(indices)

Task: Get the above code to work. Use the second indices method and change the size to a matrix (such as (10,11)).

Exercise 2

Task: Create a MultiEmbedding class which can input two sets of indices, embed them, and concat the results!

class MultiEmbedding(nn.Module):
    def __init__(self, num_embeddings1, num_embeddings2, embedding_dim1, embedding_dim2):
        pass

    def forward(self, indices1, indices2):
        # use something like
        # z = torch.concat([x, y], dim=1)

        pass

Choose Your Own Adventures

Sampling from Surnames

In this exercise, you will be writing the sampler for the surname model.

What is required for this model is the following:

  • An initial input
  • An initial hidden
  • A loop which uses these two things to get the next hidden

For creating the initial hidden and input vectors, we make use of the Vocabularies.

begin_seq_index = vectorizer.surname_vocab.begin_seq_index
index_0 = Variable(torch.LongTensor([begin_seq_index]))
x_0 = model.char_emb(index_0)

nationality_index = vectorizer.nationality_vocab.lookup_index('Irish')
hidden_0_index = Variable(torch.LongTensor([nationality_index]))
hidden_0 = model.nat_emb(hidden_0_index)

Now, the goal is to use these input and hidden vectors to compute the next hidden vector.

rnn_cell = model.rnn.rnn_cell
hidden_1 = rnn_cell(x_0, hidden_0)
fc = self.model.fc
relu = self.model.relu

y_0 = fc(relu(hidden_1))
y_0 = F.softmax(y_t, dim=1)

# sample
index_1 = torch.multinomial(y_t, 1)[:, 0]
# or argmax
# index_1 = torch.max(y_t, dim=1)[1]

What’s left: write a loop that performs these computations, aggregates the indices, and outputs them. Additionally, you will have to use the vocab to lookup each index.

Convert Binary RNN to Binary Predictions

Consider the following modification to the OHLCDataset class:

class OHLCDataset(Dataset):
    def __init__(self, data_matrix, history_size):
        self.data_matrix = data_matrix
        self.history_size = history_size

    def __getitem__(self, index):
        # data_matrix.shape = (time, 4)
        x_history = self.data_matrix[index:index+self.history_size]
        x_next = self.data_matrix[index+self.history_size]

        x_mean = np.mean(x_history)
        x_var = np.var(x_history)
        x_std = np.std(x_history)


        if x_next > x_mean:
            x_next_larger = 1
        else:
            x_next_larger = 0

        # alternatively:
        # if x_next > x_mean + x_std:

        return {'x_history': (x_history - x_mean) / x_var,
                'x_next': (x_next - x_mean) / x_var,
                'x_next_larger': x_next_larger,
                'x_history_mean': x_mean,
                'x_history_var': x_var}

    def __len__(self):
        return len(self.data_matrix) - self.history_size

    def get_num_batches(self, batch_size):
        return len(self) // batch_size

What would have to change for the model? Hint: the output would no longer be 4 values, but 1 value.

Regularize the MultiLayer Perceptron Model

For this exercise, proceed in the following steps:

  1. Make a copy of the Unit5-MLP notebook
  2. Consider adding an L2 penalty to the loss. What is the change in performance?
    • In PyTorch, the L2 Penalty is simple to add. It is called “weight_decay”. If you check the documentation for each loss function (for example, here for Adam), then you will see you can set it during the optimizer initialization.

Surname Sampler Solution

class SurnameSampler(object):
    def __init__(self, model, vectorizer):
        self.model = model
        self.vectorizer = vectorizer

    def make_initial_x(self, batch_size):
        begin_seq_index = self.vectorizer.surname_vocab.begin_seq_index
        initial_x = Variable(torch.ones(batch_size) * begin_seq_index).long()
        return initial_x

    def make_initial_hidden(self, batch_size):
        nat_vocab = self.vectorizer.nationality_vocab
        chosen_indices = np.random.choice(np.arange(len(nat_vocab)),
                                          size=batch_size,
                                          replace=True)

        nationality_strings = [nat_vocab.lookup_index(index) for index in chosen_indices]

        nationality_index_variable = Variable(torch.LongTensor(chosen_indices))
        initial_hidden = self.model.nat_emb(nationality_index_variable)

        return initial_hidden, nationality_strings


    def sample(self, batch_size, max_sample_size=20, temperature=1.0):
        seq_indices = [self.make_initial_x(batch_size)]
        # todo fix random nationality selection to allow for choosing
        initial_hidden, nationality_strings = self.make_initial_hidden(batch_size)
        hiddens = [initial_hidden]

        x_t = seq_indices[0]
        hid_t = initial_hidden

        char_emb = self.model.char_emb
        rnn_cell = self.model.rnn.rnn_cell
        fc = self.model.fc
        relu = self.model.relu

        for t in range(max_sample_size):
            x_emb_t = char_emb(x_t)
            hid_t = rnn_cell(x_emb_t, hid_t)
            y_t = fc(relu(hid_t))
            y_t = F.softmax( y_t * temperature, dim=1)
            x_t = torch.multinomial(y_t, 1)[:, 0]

            hiddens.append(hid_t)
            seq_indices.append(x_t)

        seq_indices = torch.stack(seq_indices).squeeze().permute(1, 0)

        return seq_indices, nationality_strings

    def decode_one(self, indices_vector):
        surname_vocab = self.vectorizer.surname_vocab

        out = []
        for i in indices_vector:
            if surname_vocab.begin_seq_index == i:
                continue
            if surname_vocab.end_seq_index == i:
                return ''.join(out)
            out.append(surname_vocab.lookup_index(i))
        return ''.join(out)

    def decode_many(self, indices_matrix):
        if isinstance(indices_matrix, Variable):
            indices_matrix = indices_matrix.cpu().data.numpy()
        return [self.decode_one(indices_matrix[i]) for i in range(len(indices_matrix))]


sampler = SurnameSampler(model.cpu(), vectorizer)
samples, nationality_strings = sampler.sample(20)
list(zip(nationality_strings, sampler.decode_many(samples)))

Welcome! This is a directory of resources for a training tutorial to be given at the O’Reilly Strata Conference in San Jose on March 5th and 6th, 2018.

General Information

Prerequisites:

  • A working knowledge of Python.
  • Familiarity with precalc math (multiply matrices, dot products of vectors, etc.) and derivatives of simple functions.

Agenda

Machine Learning with PyTorch is split into two days. One the first day, we will first provide an introduction to the supervised training paradigm and the basics of the PyTorch library. Then, we will cover fundamental Deep Learning models and how they relate to the kinds of tasks for which Machine Learning is typically deployed.

On the second day, we will discuss how projects should be structured to avoid common pitfalls and traps. Then, we talk about how models can typically be improved using regularization and other techniques. Next, we will you through two more advanced models—generating surnames using a Recurrent Neural Network and generating captions for images. Then, we provide a substantial amount of time for going through some in-class exercises at your own pace so that you can practice what we’ve taught. Finally, we end the day with a brief overview of other topics in Machine Learning and provide pointers for further learning.