Machine Learning with PyTroch¶
Frequency Anticipated Questions¶
On this page, you will find a list of questions that we either anticipate people will ask or that we have been asked previously. They are intended to be the first stop for any confusion or trouble that might occur.
What do I need to know?¶
You should come to the training session knowing the Python programming language. No knowledge of Machine Learning or Deep Learning is assumed.
In addition, you might take more away from the course if you know have an idea of the kinds of data for which you would most likely be using Machine Learning.
Do I need to have a laptop?¶
Yes! You should bring a laptop that can run a modern web browser. Most of your work will happen inside the browser. During the training, you will connect to O’Reilly’s JupyterHub where a Jupyter instance will be created for you! You can run the code and follow along with the training using just this!
So, you could also follow along this tutorial on an iPad or any such tablet.
Can I run this tutorial code locally?¶
Yes! If you want to bring your laptop to run the materials, feel free to do so! We will provide a Dockerfile that you can build and run, so you don’t have to worry about dependencies. Please be warned that it takes time to build the docker image in class and you might have trouble with the conference WiFi, etc. We will provide less support if you don’t want to use the Dockerfile, but we understand if you want to get the environment working natively. We ask that you attempt the setup before the training sessions begin so that you are not encumbered by downloading data and the setup.
In the worst case, you can always fall back to using the JupyterHub instance provided for you!
If I use my laptop, does it need to have a GPU?¶
While having a NVIDIA GPU enabled laptop will make the training run faster (if running locally), it is not completely necessary.
If you are plan on working on Deep Learning in the future, a GPU enabled laptop might be a good investment.
Tensor-Fu¶
Below you will find exercises to practice your tensor-fu!
Tensor-Fu-1¶
Exercise 1¶
import torch
from torch import nn
x = torch.randn(9, 10)
Exercise 2¶
import torch
from torch import nn
x2dim = torch.randn(9, 10)
# required and default parameters:
# fc = nn.Linear(in_features, out_features)
Task: Create a linear layer which works wih x2dim
Exercise 3¶
import torch
from torch import nn
x4dim = torch.randn(9, 10, 11, 12)
# required and default parameters:
# conv1 = nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0)
Task: Create a convolution which works on x4dim
Exercise 4¶
import torch
from torch import nn
x3dim = torch.randn(9, 10, 11)
# required and default parameters:
# rnn = nn.RNN(input_size, hidden_size, batch_first=True)
Task: Create an RNN which works on x3dim.
Special note: The RNN will output 2 values. The first is the output at each timestep. The second is the final hidden state for each batch item. There is something odd (tricky) about the final hidden state. What is it?
Also, what happens if batch_first is False? Important for future headaches: batch_first is by default False.
Exercise 5¶
import torch
from torch import nn
x4dim = torch.randn(9, 10, 11, 12)
# required and default parameters:
# conv1 = nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0)
Task: Create a convolution that has the same in_channels as out_channels that will work with x4dim. How many times can you apply it before it’s as small as it can get? What happens at this point? Can you think of a way to solve it?
Exercise 6¶
import torch
from torch import nn
x4dim = torch.randn(9, 10, 11, 12)
class CustomConvolutions(nn.Module):
def __init__(self):
super(CustomConvolutions, self).__init__()
def forward(self, x4dim):
pass
Task: Once you have the series of steps you want to encapsulate, write a class that subclasses from nn.Module and does those computations in the forward pass.
Tensor-Fu-2¶
Exercise 1¶
indices = torch.arange(10).long()
indices = torch.from_numpy(np.random.randint(0, 10, size=(10,)))
emb = nn.Embedding(num_embeddings=100, embedding_dim=16)
emb(indices)
Task: Get the above code to work. Use the second indices method and change the size to a matrix (such as (10,11)).
Exercise 2¶
Task: Create a MultiEmbedding class which can input two sets of indices, embed them, and concat the results!
class MultiEmbedding(nn.Module):
def __init__(self, num_embeddings1, num_embeddings2, embedding_dim1, embedding_dim2):
pass
def forward(self, indices1, indices2):
# use something like
# z = torch.concat([x, y], dim=1)
pass
Choose Your Own Adventures¶
Sampling from Surnames¶
In this exercise, you will be writing the sampler for the surname model.
What is required for this model is the following:
- An initial input
- An initial hidden
- A loop which uses these two things to get the next hidden
For creating the initial hidden and input vectors, we make use of the Vocabularies.
begin_seq_index = vectorizer.surname_vocab.begin_seq_index
index_0 = Variable(torch.LongTensor([begin_seq_index]))
x_0 = model.char_emb(index_0)
nationality_index = vectorizer.nationality_vocab.lookup_index('Irish')
hidden_0_index = Variable(torch.LongTensor([nationality_index]))
hidden_0 = model.nat_emb(hidden_0_index)
Now, the goal is to use these input and hidden vectors to compute the next hidden vector.
rnn_cell = model.rnn.rnn_cell
hidden_1 = rnn_cell(x_0, hidden_0)
fc = self.model.fc
relu = self.model.relu
y_0 = fc(relu(hidden_1))
y_0 = F.softmax(y_t, dim=1)
# sample
index_1 = torch.multinomial(y_t, 1)[:, 0]
# or argmax
# index_1 = torch.max(y_t, dim=1)[1]
What’s left: write a loop that performs these computations, aggregates the indices, and outputs them. Additionally, you will have to use the vocab to lookup each index.
Convert Binary RNN to Binary Predictions¶
Consider the following modification to the OHLCDataset class:
class OHLCDataset(Dataset):
def __init__(self, data_matrix, history_size):
self.data_matrix = data_matrix
self.history_size = history_size
def __getitem__(self, index):
# data_matrix.shape = (time, 4)
x_history = self.data_matrix[index:index+self.history_size]
x_next = self.data_matrix[index+self.history_size]
x_mean = np.mean(x_history)
x_var = np.var(x_history)
x_std = np.std(x_history)
if x_next > x_mean:
x_next_larger = 1
else:
x_next_larger = 0
# alternatively:
# if x_next > x_mean + x_std:
return {'x_history': (x_history - x_mean) / x_var,
'x_next': (x_next - x_mean) / x_var,
'x_next_larger': x_next_larger,
'x_history_mean': x_mean,
'x_history_var': x_var}
def __len__(self):
return len(self.data_matrix) - self.history_size
def get_num_batches(self, batch_size):
return len(self) // batch_size
What would have to change for the model? Hint: the output would no longer be 4 values, but 1 value.
Regularize the MultiLayer Perceptron Model¶
For this exercise, proceed in the following steps:
- Make a copy of the Unit5-MLP notebook
- Consider adding an L2 penalty to the loss. What is the change in performance?
- In PyTorch, the L2 Penalty is simple to add. It is called “weight_decay”. If you check the documentation for each loss function (for example, here for Adam), then you will see you can set it during the optimizer initialization.
Surname Sampler Solution¶
class SurnameSampler(object):
def __init__(self, model, vectorizer):
self.model = model
self.vectorizer = vectorizer
def make_initial_x(self, batch_size):
begin_seq_index = self.vectorizer.surname_vocab.begin_seq_index
initial_x = Variable(torch.ones(batch_size) * begin_seq_index).long()
return initial_x
def make_initial_hidden(self, batch_size):
nat_vocab = self.vectorizer.nationality_vocab
chosen_indices = np.random.choice(np.arange(len(nat_vocab)),
size=batch_size,
replace=True)
nationality_strings = [nat_vocab.lookup_index(index) for index in chosen_indices]
nationality_index_variable = Variable(torch.LongTensor(chosen_indices))
initial_hidden = self.model.nat_emb(nationality_index_variable)
return initial_hidden, nationality_strings
def sample(self, batch_size, max_sample_size=20, temperature=1.0):
seq_indices = [self.make_initial_x(batch_size)]
# todo fix random nationality selection to allow for choosing
initial_hidden, nationality_strings = self.make_initial_hidden(batch_size)
hiddens = [initial_hidden]
x_t = seq_indices[0]
hid_t = initial_hidden
char_emb = self.model.char_emb
rnn_cell = self.model.rnn.rnn_cell
fc = self.model.fc
relu = self.model.relu
for t in range(max_sample_size):
x_emb_t = char_emb(x_t)
hid_t = rnn_cell(x_emb_t, hid_t)
y_t = fc(relu(hid_t))
y_t = F.softmax( y_t * temperature, dim=1)
x_t = torch.multinomial(y_t, 1)[:, 0]
hiddens.append(hid_t)
seq_indices.append(x_t)
seq_indices = torch.stack(seq_indices).squeeze().permute(1, 0)
return seq_indices, nationality_strings
def decode_one(self, indices_vector):
surname_vocab = self.vectorizer.surname_vocab
out = []
for i in indices_vector:
if surname_vocab.begin_seq_index == i:
continue
if surname_vocab.end_seq_index == i:
return ''.join(out)
out.append(surname_vocab.lookup_index(i))
return ''.join(out)
def decode_many(self, indices_matrix):
if isinstance(indices_matrix, Variable):
indices_matrix = indices_matrix.cpu().data.numpy()
return [self.decode_one(indices_matrix[i]) for i in range(len(indices_matrix))]
sampler = SurnameSampler(model.cpu(), vectorizer)
samples, nationality_strings = sampler.sample(20)
list(zip(nationality_strings, sampler.decode_many(samples)))
Welcome! This is a directory of resources for a training tutorial to be given at the O’Reilly Strata Conference in San Jose on March 5th and 6th, 2018.
General Information¶
Prerequisites:¶
- A working knowledge of Python.
- Familiarity with precalc math (multiply matrices, dot products of vectors, etc.) and derivatives of simple functions.
Agenda¶
Machine Learning with PyTorch is split into two days. One the first day, we will first provide an introduction to the supervised training paradigm and the basics of the PyTorch library. Then, we will cover fundamental Deep Learning models and how they relate to the kinds of tasks for which Machine Learning is typically deployed.
On the second day, we will discuss how projects should be structured to avoid common pitfalls and traps. Then, we talk about how models can typically be improved using regularization and other techniques. Next, we will you through two more advanced models—generating surnames using a Recurrent Neural Network and generating captions for images. Then, we provide a substantial amount of time for going through some in-class exercises at your own pace so that you can practice what we’ve taught. Finally, we end the day with a brief overview of other topics in Machine Learning and provide pointers for further learning.