Touvlo documentation

Welcome to touvlo’s documentation!

Contents

Supervised learning

Linear Regression routines

touvlo.supv.lin_rg.cost_func(X, y, theta)[source]

Computes the cost function J for Linear Regression.

Parameters:
  • X (numpy.array) – Features’ dataset plus bias column.
  • y (numpy.array) – Column vector of expected values.
  • theta (numpy.array) – Column vector of model’s parameters.
Returns:

Computed cost.

Return type:

float

touvlo.supv.lin_rg.grad(X, y, theta)[source]

Computes the gradient for Linear Regression.

Parameters:
  • X (numpy.array) – Features’ dataset plus bias column.
  • y (numpy.array) – Column vector of expected values.
  • theta (numpy.array) – Column vector of model’s parameters.
Returns:

Gradient column vector.

Return type:

numpy.array

touvlo.supv.lin_rg.h(X, theta)[source]

Linear regression hypothesis.

Parameters:
  • X (numpy.array) – Features’ dataset plus bias column.
  • theta (numpy.array) – Column vector of model’s parameters.
Returns:

The projected value for each line of the dataset.

Return type:

numpy.array

touvlo.supv.lin_rg.normal_eqn(X, y)[source]

Produces optimal theta via normal equation.

Parameters:
  • X (numpy.array) – Features’ dataset plus bias column.
  • y (numpy.array) – Column vector of expected values.
Raises:

LinAlgError

Returns:

Optimized model parameters theta.

Return type:

numpy.array

touvlo.supv.lin_rg.predict(X, theta)[source]

Computes prediction vector.

Parameters:
  • X (numpy.array) – Features’ dataset plus bias column.
  • theta (numpy.array) – Column vector of model’s parameters.
Returns:

vector with predictions for each input line.

Return type:

numpy.array

touvlo.supv.lin_rg.reg_cost_func(X, y, theta, _lambda)[source]

Computes the regularized cost function J for Linear Regression.

Parameters:
  • X (numpy.array) – Features’ dataset plus bias column.
  • y (numpy.array) – Column vector of expected values.
  • theta (numpy.array) – Column vector of model’s parameters.
  • _lambda (float) – The regularization hyperparameter.
Returns:

Computed cost with regularization.

Return type:

float

touvlo.supv.lin_rg.reg_grad(X, y, theta, _lambda)[source]

Computes the regularized gradient for Linear Regression.

Parameters:
  • X (numpy.array) – Features’ dataset plus bias column.
  • y (numpy.array) – Column vector of expected values.
  • theta (numpy.array) – Column vector of model’s parameters.
  • _lambda (float) – The regularization hyperparameter.
Returns:

Regularized gradient column vector.

Return type:

numpy.array

Logistic Regression routines

touvlo.supv.lgx_rg.cost_func(X, y, theta)[source]

Computes the cost function J for Logistic Regression.

Parameters:
  • X (numpy.array) – Features’ dataset plus bias column.
  • y (numpy.array) – Column vector of expected values.
  • theta (numpy.array) – Column vector of model’s parameters.
Returns:

Computed cost.

Return type:

float

touvlo.supv.lgx_rg.grad(X, y, theta)[source]

Computes the gradient for the parameters theta.

Parameters:
  • X (numpy.array) – Features’ dataset plus bias column.
  • y (numpy.array) – Column vector of expected values.
  • theta (numpy.array) – Column vector of model’s parameters.
Returns:

Gradient column vector.

Return type:

numpy.array

touvlo.supv.lgx_rg.h(X, theta)[source]

Logistic regression hypothesis.

Parameters:
  • X (numpy.array) – Features’ dataset plus bias column.
  • theta (numpy.array) – Column vector of model’s parameters.
Raises:

ValueError

Returns:

The probability that each entry belong to class 1.

Return type:

numpy.array

touvlo.supv.lgx_rg.p(x, threshold=0.5)[source]

Predicts whether a probability falls into class 1.

Parameters:
  • x (obj) – Probability that example belongs to class 1.
  • threshold (float) – point above which a probability is deemed of class 1.
Returns:

Binary value to denote class 1 or 0

Return type:

int

touvlo.supv.lgx_rg.predict(X, theta)[source]

Classifies each entry as class 1 or class 0.

Parameters:
  • X (numpy.array) – Features’ dataset plus bias column.
  • theta (numpy.array) – Column vector of model’s parameters.
Returns:

Column vector with each entry classification.

Return type:

numpy.array

touvlo.supv.lgx_rg.predict_prob(X, theta)[source]

Produces the probability that the entries belong to class 1.

Returns:Features’ dataset plus bias column. theta (numpy.array): Column vector of model’s parameters.
Return type:X (numpy.array)
Raises:ValueError
Returns:The probability that each entry belong to class 1.
Return type:numpy.array
touvlo.supv.lgx_rg.reg_cost_func(X, y, theta, _lambda)[source]

Computes the regularized cost function J for Logistic Regression.

Parameters:
  • X (numpy.array) – Features’ dataset plus bias column.
  • y (numpy.array) – Column vector of expected values.
  • theta (numpy.array) – Column vector of model’s parameters.
  • _lambda (float) – The regularization hyperparameter.
Returns:

Computed cost with regularization.

Return type:

float

touvlo.supv.lgx_rg.reg_grad(X, y, theta, _lambda)[source]

Computes the regularized gradient for Logistic Regression.

Parameters:
  • X (numpy.array) – Features’ dataset plus bias column.
  • y (numpy.array) – Column vector of expected values.
  • theta (numpy.array) – Column vector of model’s parameters.
  • _lambda (float) – The regularization hyperparameter.
Returns:

Regularized gradient column vector.

Return type:

numpy.array

Classification Neural Network routines

touvlo.supv.nn_clsf.back_propagation(y, theta, a, z, num_labels, n_hidden_layers=1)[source]

Applies back propagation to minimize model’s loss.

Parameters:
  • y (numpy.array) – Column vector of expected values.
  • theta (numpy.array(numpy.array)) – array of model’s weight matrices by layer.
  • a (numpy.array(numpy.array)) – array of activation matrices by layer.
  • z (numpy.array(numpy.array)) – array of parameters prior to sigmoid by layer.
  • num_labels (int) – Number of classes in multiclass classification.
  • n_hidden_layers (int) – Number of hidden layers in network.
Returns:

array of matrices of ‘error values’ by layer.

Return type:

numpy.array(numpy.array)

touvlo.supv.nn_clsf.cost_function(X, y, theta, _lambda, num_labels, n_hidden_layers=1)[source]

Computes the cost function J for Neural Network.

Parameters:
  • X (numpy.array) – Features’ dataset.
  • y (numpy.array) – Column vector of expected values.
  • theta (numpy.array) – Column vector of model’s parameters.
  • _lambda (float) – The regularization hyperparameter.
  • num_labels (int) – Number of classes in multiclass classification.
  • n_hidden_layers (int) – Number of hidden layers in network.
Returns:

Computed cost.

Return type:

float

touvlo.supv.nn_clsf.feed_forward(X, theta, n_hidden_layers=1)[source]

Applies forward propagation to calculate model’s hypothesis.

Parameters:
  • X (numpy.array) – Features’ dataset.
  • theta (numpy.array) – Column vector of model’s parameters.
  • n_hidden_layers (int) – Number of hidden layers in network.
Returns:

A 2-tuple

consisting of an array of parameters prior to activation by layer and an array of activation matrices by layer.

Return type:

(numpy.array(numpy.array), numpy.array(numpy.array))

touvlo.supv.nn_clsf.grad(X, y, nn_params, _lambda, input_layer_size, hidden_layer_size, num_labels, n_hidden_layers=1)[source]

Calculates gradient of neural network’s parameters.

Parameters:
  • X (numpy.array) – Features’ dataset.
  • y (numpy.array) – Column vector of expected values.
  • nn_params (numpy.array) – Column vector of model’s parameters.
  • _lambda (float) – The regularization hyperparameter.
  • input_layer_size (int) – Number of units in the input layer.
  • hidden_layer_size (int) – Number of units in a hidden layer.
  • num_labels (int) – Number of classes in multiclass classification.
  • n_hidden_layers (int) – Number of hidden layers in network.
Returns:

array of gradient values by weight matrix.

Return type:

numpy.array(numpy.array)

touvlo.supv.nn_clsf.h(X, theta, n_hidden_layers=1)[source]

Classification Neural Network hypothesis.

Parameters:
  • X (numpy.array) – Features’ dataset.
  • theta (numpy.array) – Column vector of model’s parameters.
  • n_hidden_layers (int) – Number of hidden layers in network.
Returns:

The probability that each entry belong to class 1.

Return type:

numpy.array

touvlo.supv.nn_clsf.init_nn_weights(input_layer_size, hidden_layer_size, num_labels, n_hidden_layers=1)[source]

Initialize the weight matrices of a network with random values.

Parameters:
  • hidden_layer_size (int) – Number of units in a hidden layer.
  • input_layer_size (int) – Number of units in the input layer.
  • num_labels (int) – Number of classes in multiclass classification.
  • n_hidden_layers (int) – Number of hidden layers in network.
Returns:

array of weight matrices of random values.

Return type:

numpy.array(numpy.array)

touvlo.supv.nn_clsf.rand_init_weights(L_in, L_out)[source]

Initializes weight matrix with random values.

Parameters:
  • X (numpy.array) – Features’ dataset.
  • L_in (int) – Number of units in previous layer.
  • n_hidden_layers (int) – Number of units in next layer.
Returns:

Random values’ matrix of conforming dimensions.

Return type:

numpy.array

touvlo.supv.nn_clsf.unravel_params(nn_params, input_layer_size, hidden_layer_size, num_labels, n_hidden_layers=1)[source]

Unravels flattened array into list of weight matrices

Parameters:
  • nn_params (numpy.array) – Row vector of model’s parameters.
  • input_layer_size (int) – Number of units in the input layer.
  • hidden_layer_size (int) – Number of units in a hidden layer.
  • num_labels (int) – Number of classes in multiclass classification.
  • n_hidden_layers (int) – Number of hidden layers in network.
Returns:

array with model’s weight matrices.

Return type:

numpy.array(numpy.array)

Unsupervised learning

PCA

touvlo.unsupv.pca.pca(X)[source]

Runs Principal Component Analysis on dataset

Parameters:X (numpy.array) – Features’ dataset
Returns:
A 2-tuple of U, eigenvectors of covariance
matrix, and S, eigenvalues (on diagonal) of covariance matrix.
Return type:(numpy.array, numpy.array)
touvlo.unsupv.pca.project_data(X, U, k)[source]

Computes reduced data representation (projected data)

Parameters:
  • X (numpy.array) – Normalized features’ dataset
  • U (numpy.array) – eigenvectors of covariance matrix
  • k (int) – Number of features in reduced data representation
Returns:

Reduced data representation (projection)

Return type:

numpy.array

touvlo.unsupv.pca.recover_data(Z, U, k)[source]

Recovers an approximation of original data using the projected data

Parameters:
  • Z (numpy.array) – Reduced data representation (projection)
  • U (numpy.array) – eigenvectors of covariance matrix
  • k (int) – Number of features in reduced data representation
Returns:

Approximated features’ dataset

Return type:

numpy.array

K-means

touvlo.unsupv.kmeans.compute_centroids(X, idx, K)[source]

Computes centroids from the mean of its cluster’s members.

Parameters:
  • X (numpy.array) – Features’ dataset
  • idx (numpy.array) – Column vector of assigned centroids’ indices.
  • K (int) – Number of centroids.
Returns:

Column vector of newly computed centroids

Return type:

numpy.array

touvlo.unsupv.kmeans.cost_function(X, idx, centroids)[source]

Calculates the cost function for K means.

Parameters:
  • X (numpy.array) – Features’ dataset
  • idx (numpy.array) – Column vector of assigned centroids’ indices.
Returns:

Computed cost

Return type:

float

touvlo.unsupv.kmeans.elbow_method(X, K_values, max_iters, n_inits)[source]

Calculates the cost for each given K.

Parameters:
  • X (numpy.array) – Features’ dataset
  • K_values (list(int)) – List of possible number of centroids.
  • max_iters (int) – Number of times the algorithm will be fitted.
  • n_inits (int) – Number of random initialization.
Returns:

A 2-tuple of K_values, a list of possible

numbers of centroids, and cost_values, a computed cost for each K.

Return type:

(list(int), list(float))

touvlo.unsupv.kmeans.euclidean_dist(p, q)[source]

Calculates Euclidean distance between 2 n-dimensional points.

Parameters:
  • p (numpy.array) – First n-dimensional point.
  • q (numpy.array) – Second n-dimensional point.
Returns:

Distance between 2 points.

Return type:

float

touvlo.unsupv.kmeans.find_closest_centroids(X, initial_centroids)[source]

Assigns to each example the indice of the closest centroid.

Parameters:
  • X (numpy.array) – Features’ dataset
  • initial_centroids (numpy.array) – List of initialized centroids.
Returns:

Column vector of assigned centroids’ indices.

Return type:

numpy.array

touvlo.unsupv.kmeans.init_centroids(X, K)[source]

Computes centroids from the mean of its cluster’s members.

Parameters:
  • X (numpy.array) – Features’ dataset
  • idx (numpy.array) – Column vector of assigned centroids’ indices.
  • K (int) – Number of centroids.
Returns:

Column vector of centroids randomly picked from dataset

Return type:

numpy.array

touvlo.unsupv.kmeans.run_intensive_kmeans(X, K, max_iters, n_inits)[source]

Applies kmeans using multiple random initializations.

Parameters:
  • X (numpy.array) – Features’ dataset
  • K (int) – Number of centroids.
  • max_iters (int) – Number of times the algorithm will be fitted.
  • n_inits (int) – Number of random initialization.
Returns:

A 2-tuple of centroids, a column vector of

centroids, and idx, a column vector of assigned centroids’ indices.

Return type:

(numpy.array, numpy.array)

touvlo.unsupv.kmeans.run_kmeans(X, K, max_iters)[source]

Applies kmeans using a single random initialization.

Parameters:
  • X (numpy.array) – Features’ dataset
  • K (int) – Number of centroids.
  • max_iters (int) – Number of times the algorithm will be fitted.
Returns:

A 2-tuple of centroids, a column vector of

centroids, and idx, a column vector of assigned centroids’ indices.

Return type:

(numpy.array, numpy.array)

Anomaly Detection

touvlo.unsupv.anmly_detc.cov_matrix(X, mu)[source]

Calculates the covariance matrix for matrix X (m x n).

Parameters:
  • X (numpy.array) – Features’ dataset.
  • mu (numpy.array) – Mean of each feature/column of.
Returns:

Covariance matrix (n x n)

Return type:

int

touvlo.unsupv.anmly_detc.estimate_multi_gaussian(X)[source]

Estimates parameters for Multivariate Gaussian distribution.

Parameters:X (numpy.array) – Features’ dataset.
Returns:
A 2-tuple of mu, the mean of each
feature/column of X, and sigma, the covariance matrix for X.
Return type:(numpy.array, numpy.array)
touvlo.unsupv.anmly_detc.estimate_uni_gaussian(X)[source]

Estimates parameters for Univariate Gaussian distribution.

Parameters:X (numpy.array) – Features’ dataset.
Returns:
A 2-tuple of mu, the mean of each
feature/column of X, and sigma2, the variance of each feature/column of X.
Return type:(numpy.array, numpy.array)
touvlo.unsupv.anmly_detc.is_anomaly(p, threshold=0.5)[source]

Predicts whether a probability falls into class 1 (anomaly).

Parameters:
  • p (numpy.array) – Probability that example belongs to class 1 (is anomaly).
  • threshold (float) – point below which an example is considered of class 1.
Returns:

Binary value to denote class 1 or 0

Return type:

int

touvlo.unsupv.anmly_detc.multi_gaussian(X, mu, sigma)[source]

Estimates probability that examples belong to Multivariate Gaussian.

Parameters:
  • X (numpy.array) – Features’ dataset.
  • mu (numpy.array) – Mean of each feature/column of X.
  • sigma (numpy.array) – Covariance matrix for X.
Returns:

Probability density function for each example

Return type:

numpy.array

touvlo.unsupv.anmly_detc.predict(X, epsilon, gaussian, **kwargs)[source]

Predicts whether examples are anomalies.

Parameters:
  • X (numpy.array) – Features’ dataset.
  • epsilon (float) – point below which an example is considered of class 1.
  • gaussian (numpy.array) – Function that estimates pertinency probability.
Returns:

Column vector of classification

Return type:

numpy.array

touvlo.unsupv.anmly_detc.uni_gaussian(X, mu, sigma2)[source]

Estimates probability that examples belong to Univariate Gaussian.

Parameters:
  • X (numpy.array) – Features’ dataset.
  • mu (numpy.array) – Mean of each feature/column of X.
  • sigma2 (numpy.array) – Variance of each feature/column of X.
Returns:

Probability density function for each example

Return type:

numpy.array

Recommender Systems

Collaborative Filtering

touvlo.rec_sys.cf.cost_function(X, Y, R, theta, _lambda)[source]

Computes the cost function J for Collaborative Filtering.

Parameters:
  • X (numpy.array) – Matrix of product features.
  • Y (numpy.array) – Scores’ matrix.
  • R (numpy.array) – Matrix of 0s and 1s (whether there’s a rating).
  • theta (numpy.array) – Matrix of user features.
  • _lambda (float) – The regularization hyperparameter.
Returns:

Computed cost.

Return type:

float

touvlo.rec_sys.cf.grad(params, Y, R, num_users, num_products, num_features, _lambda)[source]

Calculates gradient of Collaborative Filtering’s parameters

Parameters:
  • params (numpy.array) – flattened product and user features..
  • Y (numpy.array) – Scores’ matrix.
  • R (numpy.array) – Matrix of 0s and 1s (whether there’s a rating).
  • num_users (int) – Number of users in this instance.
  • num_products (int) – Number of products in this instance.
  • num_features (int) – Number of features in this instance.
  • _lambda (float) – The regularization hyperparameter.
Returns:

Flattened gradient of product and user parameters.

Return type:

numpy.array

touvlo.rec_sys.cf.unravel_params(params, num_users, num_products, num_features)[source]

Unravels flattened array into features’ matrices

Parameters:
  • params (numpy.array) – Row vector of coefficients.
  • num_users (int) – Number of users in this instance.
  • num_products (int) – Number of products in this instance.
  • num_features (int) – Number of features in this instance.
Returns:

A 2-tuple consisting of a matrix of product features and a matrix of user features.

Return type:

(numpy.array, numpy.array)

Utils

touvlo.utils.BGD(X, y, grad, initial_theta, alpha, num_iters, **kwargs)[source]

Performs parameter optimization via Batch Gradient Descent.

Parameters:
  • X (numpy.array) – Features’ dataset plus bias column.
  • y (numpy.array) – Column vector of expected values.
  • grad (numpy.array) – Routine that generates the partial derivatives given theta.
  • initial_theta (numpy.array) – Initial value for parameters to be optimized.
  • alpha (float) – Learning rate or _step size of the optimization.
  • num_iters (int) – Number of times the optimization will be performed.
Returns:

Optimized model parameters.

Return type:

numpy.array

touvlo.utils.MBGD(X, y, grad, initial_theta, alpha, num_iters, b, **kwargs)[source]

Performs parameter optimization via Mini-Batch Gradient Descent.

Parameters:
  • X (numpy.array) – Features’ dataset plus bias column.
  • y (numpy.array) – Column vector of expected values.
  • grad (numpy.array) – Routine that generates the partial derivatives given theta.
  • initial_theta (numpy.array) – Initial value for parameters to be optimized.
  • alpha (float) – Learning rate or _step size of the optimization.
  • num_iters (int) – Number of times the optimization will be performed.
  • b (int) – Number of examples in mini batch.
Returns:

Optimized model parameters.

Return type:

numpy.array

touvlo.utils.SGD(X, y, grad, initial_theta, alpha, num_iters, **kwargs)[source]

Performs parameter optimization via Stochastic Gradient Descent.

Parameters:
  • X (numpy.array) – Features’ dataset plus bias column.
  • y (numpy.array) – Column vector of expected values.
  • grad (numpy.array) – Routine that generates the partial derivatives given theta.
  • initial_theta (numpy.array) – Initial value for parameters to be optimized.
  • alpha (float) – Learning rate or _step size of the optimization.
  • num_iters (int) – Number of times the optimization will be performed.
Returns:

Optimized model parameters.

Return type:

numpy.array

touvlo.utils.feature_normalize(X)[source]

Performs Z score normalization in a numeric dataset.

Parameters:X (numpy.array) – Features’ dataset plus bias column.
Returns:
A 3-tuple of X_norm,
normalized features’ dataset, mu, mean of each feature, and sigma, standard deviation of each feature.
Return type:(numpy.array, numpy.array, numpy.array)
touvlo.utils.g(x)[source]

This function applies the sigmoid function on a given value.

Parameters:x (obj) – Input value or object containing value .
Returns:Sigmoid function at value.
Return type:obj
touvlo.utils.g_grad(x)[source]

This function calculates the sigmoid gradient at a given value.

Parameters:x (obj) – Input value or object containing value .
Returns:Sigmoid gradient at value.
Return type:obj
touvlo.utils.mean_normlztn(Y, R)[source]

Performs mean normalization in a numeric dataset.

Parameters:
  • Y (numpy.array) – Scores’ dataset.
  • R (numpy.array) – Dataset of 0s and 1s (whether there’s a rating).
Returns:

  • Y_norm - Normalized scores’ dataset (row wise).
  • Y_mean - Column vector of calculated means.

Return type:

  • Y_norm (:py:class: numpy.array)
  • Y_mean (:py:class: numpy.array)

touvlo.utils.numerical_grad(J, theta, err)[source]

Numerically calculates the gradient of a given cost function.

Parameters:
  • J (Callable) – Function handle that computes cost given theta.
  • theta (numpy.array) – Model parameters.
  • err (float) – distance between points where J is evaluated.
Returns:

Computed numeric gradient.

Return type:

numpy.array

Indices and tables