Touvlo documentation¶
Welcome to touvlo’s documentation!
Contents¶
Supervised learning¶
Linear Regression routines¶
-
touvlo.supv.lin_rg.
cost_func
(X, y, theta)[source]¶ Computes the cost function J for Linear Regression.
Parameters: - X (numpy.array) – Features’ dataset plus bias column.
- y (numpy.array) – Column vector of expected values.
- theta (numpy.array) – Column vector of model’s parameters.
Returns: Computed cost.
Return type:
-
touvlo.supv.lin_rg.
grad
(X, y, theta)[source]¶ Computes the gradient for Linear Regression.
Parameters: - X (numpy.array) – Features’ dataset plus bias column.
- y (numpy.array) – Column vector of expected values.
- theta (numpy.array) – Column vector of model’s parameters.
Returns: Gradient column vector.
Return type: numpy.array
-
touvlo.supv.lin_rg.
h
(X, theta)[source]¶ Linear regression hypothesis.
Parameters: - X (numpy.array) – Features’ dataset plus bias column.
- theta (numpy.array) – Column vector of model’s parameters.
Returns: The projected value for each line of the dataset.
Return type: numpy.array
-
touvlo.supv.lin_rg.
normal_eqn
(X, y)[source]¶ Produces optimal theta via normal equation.
Parameters: - X (numpy.array) – Features’ dataset plus bias column.
- y (numpy.array) – Column vector of expected values.
Raises: LinAlgError
Returns: Optimized model parameters theta.
Return type: numpy.array
-
touvlo.supv.lin_rg.
predict
(X, theta)[source]¶ Computes prediction vector.
Parameters: - X (numpy.array) – Features’ dataset plus bias column.
- theta (numpy.array) – Column vector of model’s parameters.
Returns: vector with predictions for each input line.
Return type: numpy.array
-
touvlo.supv.lin_rg.
reg_cost_func
(X, y, theta, _lambda)[source]¶ Computes the regularized cost function J for Linear Regression.
Parameters: - X (numpy.array) – Features’ dataset plus bias column.
- y (numpy.array) – Column vector of expected values.
- theta (numpy.array) – Column vector of model’s parameters.
- _lambda (float) – The regularization hyperparameter.
Returns: Computed cost with regularization.
Return type:
-
touvlo.supv.lin_rg.
reg_grad
(X, y, theta, _lambda)[source]¶ Computes the regularized gradient for Linear Regression.
Parameters: - X (numpy.array) – Features’ dataset plus bias column.
- y (numpy.array) – Column vector of expected values.
- theta (numpy.array) – Column vector of model’s parameters.
- _lambda (float) – The regularization hyperparameter.
Returns: Regularized gradient column vector.
Return type: numpy.array
Logistic Regression routines¶
-
touvlo.supv.lgx_rg.
cost_func
(X, y, theta)[source]¶ Computes the cost function J for Logistic Regression.
Parameters: - X (numpy.array) – Features’ dataset plus bias column.
- y (numpy.array) – Column vector of expected values.
- theta (numpy.array) – Column vector of model’s parameters.
Returns: Computed cost.
Return type:
-
touvlo.supv.lgx_rg.
grad
(X, y, theta)[source]¶ Computes the gradient for the parameters theta.
Parameters: - X (numpy.array) – Features’ dataset plus bias column.
- y (numpy.array) – Column vector of expected values.
- theta (numpy.array) – Column vector of model’s parameters.
Returns: Gradient column vector.
Return type: numpy.array
-
touvlo.supv.lgx_rg.
h
(X, theta)[source]¶ Logistic regression hypothesis.
Parameters: - X (numpy.array) – Features’ dataset plus bias column.
- theta (numpy.array) – Column vector of model’s parameters.
Raises: Returns: The probability that each entry belong to class 1.
Return type: numpy.array
-
touvlo.supv.lgx_rg.
p
(x, threshold=0.5)[source]¶ Predicts whether a probability falls into class 1.
Parameters: - x (obj) – Probability that example belongs to class 1.
- threshold (float) – point above which a probability is deemed of class 1.
Returns: Binary value to denote class 1 or 0
Return type:
-
touvlo.supv.lgx_rg.
predict
(X, theta)[source]¶ Classifies each entry as class 1 or class 0.
Parameters: - X (numpy.array) – Features’ dataset plus bias column.
- theta (numpy.array) – Column vector of model’s parameters.
Returns: Column vector with each entry classification.
Return type: numpy.array
-
touvlo.supv.lgx_rg.
predict_prob
(X, theta)[source]¶ Produces the probability that the entries belong to class 1.
Returns: Features’ dataset plus bias column. theta (numpy.array): Column vector of model’s parameters. Return type: X (numpy.array) Raises: ValueError
Returns: The probability that each entry belong to class 1. Return type: numpy.array
-
touvlo.supv.lgx_rg.
reg_cost_func
(X, y, theta, _lambda)[source]¶ Computes the regularized cost function J for Logistic Regression.
Parameters: - X (numpy.array) – Features’ dataset plus bias column.
- y (numpy.array) – Column vector of expected values.
- theta (numpy.array) – Column vector of model’s parameters.
- _lambda (float) – The regularization hyperparameter.
Returns: Computed cost with regularization.
Return type:
-
touvlo.supv.lgx_rg.
reg_grad
(X, y, theta, _lambda)[source]¶ Computes the regularized gradient for Logistic Regression.
Parameters: - X (numpy.array) – Features’ dataset plus bias column.
- y (numpy.array) – Column vector of expected values.
- theta (numpy.array) – Column vector of model’s parameters.
- _lambda (float) – The regularization hyperparameter.
Returns: Regularized gradient column vector.
Return type: numpy.array
Classification Neural Network routines¶
-
touvlo.supv.nn_clsf.
back_propagation
(y, theta, a, z, num_labels, n_hidden_layers=1)[source]¶ Applies back propagation to minimize model’s loss.
Parameters: - y (numpy.array) – Column vector of expected values.
- theta (numpy.array(numpy.array)) – array of model’s weight matrices by layer.
- a (numpy.array(numpy.array)) – array of activation matrices by layer.
- z (numpy.array(numpy.array)) – array of parameters prior to sigmoid by layer.
- num_labels (int) – Number of classes in multiclass classification.
- n_hidden_layers (int) – Number of hidden layers in network.
Returns: array of matrices of ‘error values’ by layer.
Return type: numpy.array(numpy.array)
-
touvlo.supv.nn_clsf.
cost_function
(X, y, theta, _lambda, num_labels, n_hidden_layers=1)[source]¶ Computes the cost function J for Neural Network.
Parameters: - X (numpy.array) – Features’ dataset.
- y (numpy.array) – Column vector of expected values.
- theta (numpy.array) – Column vector of model’s parameters.
- _lambda (float) – The regularization hyperparameter.
- num_labels (int) – Number of classes in multiclass classification.
- n_hidden_layers (int) – Number of hidden layers in network.
Returns: Computed cost.
Return type:
-
touvlo.supv.nn_clsf.
feed_forward
(X, theta, n_hidden_layers=1)[source]¶ Applies forward propagation to calculate model’s hypothesis.
Parameters: - X (numpy.array) – Features’ dataset.
- theta (numpy.array) – Column vector of model’s parameters.
- n_hidden_layers (int) – Number of hidden layers in network.
Returns: - A 2-tuple
consisting of an array of parameters prior to activation by layer and an array of activation matrices by layer.
Return type: (numpy.array(numpy.array), numpy.array(numpy.array))
-
touvlo.supv.nn_clsf.
grad
(X, y, nn_params, _lambda, input_layer_size, hidden_layer_size, num_labels, n_hidden_layers=1)[source]¶ Calculates gradient of neural network’s parameters.
Parameters: - X (numpy.array) – Features’ dataset.
- y (numpy.array) – Column vector of expected values.
- nn_params (numpy.array) – Column vector of model’s parameters.
- _lambda (float) – The regularization hyperparameter.
- input_layer_size (int) – Number of units in the input layer.
- hidden_layer_size (int) – Number of units in a hidden layer.
- num_labels (int) – Number of classes in multiclass classification.
- n_hidden_layers (int) – Number of hidden layers in network.
Returns: array of gradient values by weight matrix.
Return type: numpy.array(numpy.array)
-
touvlo.supv.nn_clsf.
h
(X, theta, n_hidden_layers=1)[source]¶ Classification Neural Network hypothesis.
Parameters: - X (numpy.array) – Features’ dataset.
- theta (numpy.array) – Column vector of model’s parameters.
- n_hidden_layers (int) – Number of hidden layers in network.
Returns: The probability that each entry belong to class 1.
Return type: numpy.array
-
touvlo.supv.nn_clsf.
init_nn_weights
(input_layer_size, hidden_layer_size, num_labels, n_hidden_layers=1)[source]¶ Initialize the weight matrices of a network with random values.
Parameters: Returns: array of weight matrices of random values.
Return type: numpy.array(numpy.array)
-
touvlo.supv.nn_clsf.
rand_init_weights
(L_in, L_out)[source]¶ Initializes weight matrix with random values.
Parameters: Returns: Random values’ matrix of conforming dimensions.
Return type: numpy.array
-
touvlo.supv.nn_clsf.
unravel_params
(nn_params, input_layer_size, hidden_layer_size, num_labels, n_hidden_layers=1)[source]¶ Unravels flattened array into list of weight matrices
Parameters: - nn_params (numpy.array) – Row vector of model’s parameters.
- input_layer_size (int) – Number of units in the input layer.
- hidden_layer_size (int) – Number of units in a hidden layer.
- num_labels (int) – Number of classes in multiclass classification.
- n_hidden_layers (int) – Number of hidden layers in network.
Returns: array with model’s weight matrices.
Return type: numpy.array(numpy.array)
Unsupervised learning¶
PCA¶
-
touvlo.unsupv.pca.
pca
(X)[source]¶ Runs Principal Component Analysis on dataset
Parameters: X (numpy.array) – Features’ dataset Returns: - A 2-tuple of U, eigenvectors of covariance
- matrix, and S, eigenvalues (on diagonal) of covariance matrix.
Return type: (numpy.array, numpy.array)
-
touvlo.unsupv.pca.
project_data
(X, U, k)[source]¶ Computes reduced data representation (projected data)
Parameters: - X (numpy.array) – Normalized features’ dataset
- U (numpy.array) – eigenvectors of covariance matrix
- k (int) – Number of features in reduced data representation
Returns: Reduced data representation (projection)
Return type: numpy.array
-
touvlo.unsupv.pca.
recover_data
(Z, U, k)[source]¶ Recovers an approximation of original data using the projected data
Parameters: - Z (numpy.array) – Reduced data representation (projection)
- U (numpy.array) – eigenvectors of covariance matrix
- k (int) – Number of features in reduced data representation
Returns: Approximated features’ dataset
Return type: numpy.array
K-means¶
-
touvlo.unsupv.kmeans.
compute_centroids
(X, idx, K)[source]¶ Computes centroids from the mean of its cluster’s members.
Parameters: - X (numpy.array) – Features’ dataset
- idx (numpy.array) – Column vector of assigned centroids’ indices.
- K (int) – Number of centroids.
Returns: Column vector of newly computed centroids
Return type: numpy.array
-
touvlo.unsupv.kmeans.
cost_function
(X, idx, centroids)[source]¶ Calculates the cost function for K means.
Parameters: - X (numpy.array) – Features’ dataset
- idx (numpy.array) – Column vector of assigned centroids’ indices.
Returns: Computed cost
Return type:
-
touvlo.unsupv.kmeans.
elbow_method
(X, K_values, max_iters, n_inits)[source]¶ Calculates the cost for each given K.
Parameters: Returns: - A 2-tuple of K_values, a list of possible
numbers of centroids, and cost_values, a computed cost for each K.
Return type:
-
touvlo.unsupv.kmeans.
euclidean_dist
(p, q)[source]¶ Calculates Euclidean distance between 2 n-dimensional points.
Parameters: - p (numpy.array) – First n-dimensional point.
- q (numpy.array) – Second n-dimensional point.
Returns: Distance between 2 points.
Return type:
-
touvlo.unsupv.kmeans.
find_closest_centroids
(X, initial_centroids)[source]¶ Assigns to each example the indice of the closest centroid.
Parameters: - X (numpy.array) – Features’ dataset
- initial_centroids (numpy.array) – List of initialized centroids.
Returns: Column vector of assigned centroids’ indices.
Return type: numpy.array
-
touvlo.unsupv.kmeans.
init_centroids
(X, K)[source]¶ Computes centroids from the mean of its cluster’s members.
Parameters: - X (numpy.array) – Features’ dataset
- idx (numpy.array) – Column vector of assigned centroids’ indices.
- K (int) – Number of centroids.
Returns: Column vector of centroids randomly picked from dataset
Return type: numpy.array
-
touvlo.unsupv.kmeans.
run_intensive_kmeans
(X, K, max_iters, n_inits)[source]¶ Applies kmeans using multiple random initializations.
Parameters: Returns: - A 2-tuple of centroids, a column vector of
centroids, and idx, a column vector of assigned centroids’ indices.
Return type: (numpy.array, numpy.array)
Anomaly Detection¶
-
touvlo.unsupv.anmly_detc.
cov_matrix
(X, mu)[source]¶ Calculates the covariance matrix for matrix X (m x n).
Parameters: - X (numpy.array) – Features’ dataset.
- mu (numpy.array) – Mean of each feature/column of.
Returns: Covariance matrix (n x n)
Return type:
-
touvlo.unsupv.anmly_detc.
estimate_multi_gaussian
(X)[source]¶ Estimates parameters for Multivariate Gaussian distribution.
Parameters: X (numpy.array) – Features’ dataset. Returns: - A 2-tuple of mu, the mean of each
- feature/column of X, and sigma, the covariance matrix for X.
Return type: (numpy.array, numpy.array)
-
touvlo.unsupv.anmly_detc.
estimate_uni_gaussian
(X)[source]¶ Estimates parameters for Univariate Gaussian distribution.
Parameters: X (numpy.array) – Features’ dataset. Returns: - A 2-tuple of mu, the mean of each
- feature/column of X, and sigma2, the variance of each feature/column of X.
Return type: (numpy.array, numpy.array)
-
touvlo.unsupv.anmly_detc.
is_anomaly
(p, threshold=0.5)[source]¶ Predicts whether a probability falls into class 1 (anomaly).
Parameters: - p (numpy.array) – Probability that example belongs to class 1 (is anomaly).
- threshold (float) – point below which an example is considered of class 1.
Returns: Binary value to denote class 1 or 0
Return type:
-
touvlo.unsupv.anmly_detc.
multi_gaussian
(X, mu, sigma)[source]¶ Estimates probability that examples belong to Multivariate Gaussian.
Parameters: - X (numpy.array) – Features’ dataset.
- mu (numpy.array) – Mean of each feature/column of X.
- sigma (numpy.array) – Covariance matrix for X.
Returns: Probability density function for each example
Return type: numpy.array
-
touvlo.unsupv.anmly_detc.
predict
(X, epsilon, gaussian, **kwargs)[source]¶ Predicts whether examples are anomalies.
Parameters: - X (numpy.array) – Features’ dataset.
- epsilon (float) – point below which an example is considered of class 1.
- gaussian (numpy.array) – Function that estimates pertinency probability.
Returns: Column vector of classification
Return type: numpy.array
-
touvlo.unsupv.anmly_detc.
uni_gaussian
(X, mu, sigma2)[source]¶ Estimates probability that examples belong to Univariate Gaussian.
Parameters: - X (numpy.array) – Features’ dataset.
- mu (numpy.array) – Mean of each feature/column of X.
- sigma2 (numpy.array) – Variance of each feature/column of X.
Returns: Probability density function for each example
Return type: numpy.array
Recommender Systems¶
Collaborative Filtering¶
-
touvlo.rec_sys.cf.
cost_function
(X, Y, R, theta, _lambda)[source]¶ Computes the cost function J for Collaborative Filtering.
Parameters: - X (numpy.array) – Matrix of product features.
- Y (numpy.array) – Scores’ matrix.
- R (numpy.array) – Matrix of 0s and 1s (whether there’s a rating).
- theta (numpy.array) – Matrix of user features.
- _lambda (float) – The regularization hyperparameter.
Returns: Computed cost.
Return type:
-
touvlo.rec_sys.cf.
grad
(params, Y, R, num_users, num_products, num_features, _lambda)[source]¶ Calculates gradient of Collaborative Filtering’s parameters
Parameters: - params (numpy.array) – flattened product and user features..
- Y (numpy.array) – Scores’ matrix.
- R (numpy.array) – Matrix of 0s and 1s (whether there’s a rating).
- num_users (int) – Number of users in this instance.
- num_products (int) – Number of products in this instance.
- num_features (int) – Number of features in this instance.
- _lambda (float) – The regularization hyperparameter.
Returns: Flattened gradient of product and user parameters.
Return type: numpy.array
Utils¶
-
touvlo.utils.
BGD
(X, y, grad, initial_theta, alpha, num_iters, **kwargs)[source]¶ Performs parameter optimization via Batch Gradient Descent.
Parameters: - X (numpy.array) – Features’ dataset plus bias column.
- y (numpy.array) – Column vector of expected values.
- grad (numpy.array) – Routine that generates the partial derivatives given theta.
- initial_theta (numpy.array) – Initial value for parameters to be optimized.
- alpha (float) – Learning rate or _step size of the optimization.
- num_iters (int) – Number of times the optimization will be performed.
Returns: Optimized model parameters.
Return type: numpy.array
-
touvlo.utils.
MBGD
(X, y, grad, initial_theta, alpha, num_iters, b, **kwargs)[source]¶ Performs parameter optimization via Mini-Batch Gradient Descent.
Parameters: - X (numpy.array) – Features’ dataset plus bias column.
- y (numpy.array) – Column vector of expected values.
- grad (numpy.array) – Routine that generates the partial derivatives given theta.
- initial_theta (numpy.array) – Initial value for parameters to be optimized.
- alpha (float) – Learning rate or _step size of the optimization.
- num_iters (int) – Number of times the optimization will be performed.
- b (int) – Number of examples in mini batch.
Returns: Optimized model parameters.
Return type: numpy.array
-
touvlo.utils.
SGD
(X, y, grad, initial_theta, alpha, num_iters, **kwargs)[source]¶ Performs parameter optimization via Stochastic Gradient Descent.
Parameters: - X (numpy.array) – Features’ dataset plus bias column.
- y (numpy.array) – Column vector of expected values.
- grad (numpy.array) – Routine that generates the partial derivatives given theta.
- initial_theta (numpy.array) – Initial value for parameters to be optimized.
- alpha (float) – Learning rate or _step size of the optimization.
- num_iters (int) – Number of times the optimization will be performed.
Returns: Optimized model parameters.
Return type: numpy.array
-
touvlo.utils.
feature_normalize
(X)[source]¶ Performs Z score normalization in a numeric dataset.
Parameters: X (numpy.array) – Features’ dataset plus bias column. Returns: - A 3-tuple of X_norm,
- normalized features’ dataset, mu, mean of each feature, and sigma, standard deviation of each feature.
Return type: (numpy.array, numpy.array, numpy.array)
-
touvlo.utils.
g
(x)[source]¶ This function applies the sigmoid function on a given value.
Parameters: x (obj) – Input value or object containing value . Returns: Sigmoid function at value. Return type: obj
-
touvlo.utils.
g_grad
(x)[source]¶ This function calculates the sigmoid gradient at a given value.
Parameters: x (obj) – Input value or object containing value . Returns: Sigmoid gradient at value. Return type: obj
-
touvlo.utils.
mean_normlztn
(Y, R)[source]¶ Performs mean normalization in a numeric dataset.
Parameters: - Y (numpy.array) – Scores’ dataset.
- R (numpy.array) – Dataset of 0s and 1s (whether there’s a rating).
Returns: - Y_norm - Normalized scores’ dataset (row wise).
- Y_mean - Column vector of calculated means.
Return type: - Y_norm (:py:class: numpy.array)
- Y_mean (:py:class: numpy.array)
-
touvlo.utils.
numerical_grad
(J, theta, err)[source]¶ Numerically calculates the gradient of a given cost function.
Parameters: - J (Callable) – Function handle that computes cost given theta.
- theta (numpy.array) – Model parameters.
- err (float) – distance between points where J is evaluated.
Returns: Computed numeric gradient.
Return type: numpy.array