Welcome to MetaHeuristics documentation!¶
This project is a implementation to anyone who wishes to use metaheuristics in feature selection.
API Reference¶
The API reference provides detailed descriptions of MetaHeuristic classes and functions. It should be helpful if you plan to implement those algorithms into your datamining workflow
HarmonicSearch
¶This algorithm was implemented following the article:
X. Z. Gao, V. Govindasamy, H. Xu, X. Wang, and K. Zenger, “Harmony Search Method: Theory and Applications,”Computational Intelligence and Neuroscience, vol. 2015,
Article ID 258491, 10 pages, 2015. doi:10.1155/2015/258491
- class
feature_selection.harmonic_search.
HarmonicSearch
(classifier=None, HMCR=0.95, indpb=0.05, pitch=0.05, number_gen=100, size_pop=50, verbose=0, repeat=1, make_logbook=False, random_state=None, parallel=False, cv_metric_fuction=None, features_metric_function=None)¶Implementation of a Harmonic Search Algorithm for Feature Selection
Parameters:
- HMCR : float in [0,1], (default=0.95)
Is the Harmonic Memory Considering Rate
- indpb : float in [0,1], (default=0.05)
Is the mutation rate of each new harmony
- pitch : float in [0,1], (default=0.05)
Is the Pitch Adjustament factor
- number_gen : positive integer, (default=100)
Number of generations
- size_pop : positive integer, (default=50)
Size of the Harmonic Memory
- verbose : boolean, (default=False)
If true, print information in every generation
- repeat : positive int, (default=1)
Number of times to repeat the fitting process
- parallel : boolean, (default=False)
Set to True if you want to use multiprocessors
- make_logbook : boolean, (default=False)
If True, a logbook from DEAP will be made
- cv_metric_fuction : callable, (default=matthews_corrcoef)
A metric score function as stated in the sklearn http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter
- features_metric_function : callable, (default=pow(sum(mask)/(len(mask)*5), 2))
A function that return a float from the binary mask of features
Methods
fit
([X, y, normalize])Fit method fit_transform
(X, y[, normalize])Fit to data, then transform it. get_params
([deep])Get parameters for this estimator. get_support
([indices])Get a mask, or integer index, of the features selected Parameters ———- indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask. plot_results
()This method plots all the statistics for each repetition in a graph. safe_mask
(x, mask)Return a mask which is safe to use on X. score
(X, y[, sample_weight])Returns the mean accuracy on the given test data and labels. score_func_to_gridsearch
(estimator[, …])Function to be given as a scorer function to Grid Search Method. set_params
(**params)Set the parameters of this estimator. transform
(X[, mask])Reduce X to the selected features.
adaptative_binary_mutation predict
fit
(X=None, y=None, normalize=False, **arg)¶Fit method
Parameters:
- X : array of shape [n_samples, n_features]
The input samples
- y : array of shape [n_samples, 1]
The input of labels
- normalize : boolean, (default=False)
If true, StandardScaler will be applied to X
- **arg : parameters
Set parameters
fit_transform
(X, y, normalize=False, **fit_params)¶Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters:
- X : numpy array of shape [n_samples, n_features]
Training set.
- y : numpy array of shape [n_samples]
Target values.
Returns:
- X_new : numpy array of shape [n_samples, n_features_new]
Transformed array.
get_params
(deep=True)¶Get parameters for this estimator.
Parameters:
- deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:
- params : mapping of string to any
Parameter names mapped to their values.
get_support
(indices=False)¶Get a mask, or integer index, of the features selected Parameters ———- indices : boolean (default False)
If True, the return value will be an array of integers, rather than a boolean mask.
- support : array
- An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
plot_results
()¶This method plots all the statistics for each repetition in a graph.
The curves are minimun, average and maximun accuracy
- static
safe_mask
(x, mask)¶Return a mask which is safe to use on X. Parameters ———- X : {array-like, sparse matrix}
Data on which to apply mask.
- mask : array
- Mask to be used on X.
mask
score
(X, y, sample_weight=None)¶Returns the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Parameters:
- X : array-like, shape = (n_samples, n_features)
Test samples.
- y : array-like, shape = (n_samples) or (n_samples, n_outputs)
True labels for X.
- sample_weight : array-like, shape = [n_samples], optional
Sample weights.
Returns:
- score : float
Mean accuracy of self.predict(X) wrt. y.
- static
score_func_to_gridsearch
(estimator, X_test=None, y_test=None)¶Function to be given as a scorer function to Grid Search Method. It is going to transform the matrix os predicts generated by ‘all’ option to an final accuracy score. Use a high value to CV
set_params
(**params)¶Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.
Returns:
- self
transform
(X, mask=None)¶Reduce X to the selected features. Parameters ———- X : array of shape [n_samples, n_features]
The input samples.
- X_r : array of shape [n_samples, n_selected_features]
- The input samples with only the selected features.
GeneticAlgorithm
¶Basic GA
- class
feature_selection.genetic_algorithm.
GeneticAlgorithm
(classifier=None, cross_over_prob=0.5, individual_mut_prob=0.05, gene_mutation_prob=0.05, number_gen=10, size_pop=40, verbose=0, repeat=1, make_logbook=False, random_state=None, parallel=False, cv_metric_fuction=None, features_metric_function=None)¶Implementation of a Genetic Algorithm for Feature Selection
Parameters:
- classifier : sklearn classifier , (default=SVM)
Any classifier that adheres to the scikit-learn API
- cross_over_prob : float in [0,1], (default=0.5)
Probability of happening a cross-over in a individual (chromosome)
- individual_mutation_probability : float in [0,1], (default=0.05)
Probability of happening mutation in a individual ( chromosome )
- gene_mutation_prob : float in [0,1], (default=0.05)
For each gene in the individual (chromosome) chosen for mutation, is the probability of it being mutate
- number_gen : positive integer, (default=10)
Number of generations
- size_pop : positive integer, (default=40)
Number of individuals (choromosome ) in the population
- verbose : boolean, (default=False)
If true, print information in every generation
- repeat : positive int, (default=1)
Number of times to repeat the fitting process
- make_logbook : boolean, (default=False)
If True, a logbook from DEAP will be made
- parallel : boolean, (default=False)
Set to True if you want to use multiprocessors
- cv_metric_fuction : callable, (default=matthews_corrcoef)
A metric score function as stated in the sklearn http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter
- features_metric_function : callable, (default=pow(sum(mask)/(len(mask)*5), 2))
A function that return a float from the binary mask of features
Methods
fit
([X, y, normalize])Fit method fit_transform
(X, y[, normalize])Fit to data, then transform it. get_params
([deep])Get parameters for this estimator. get_support
([indices])Get a mask, or integer index, of the features selected Parameters ———- indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask. plot_results
()This method plots all the statistics for each repetition in a graph. safe_mask
(x, mask)Return a mask which is safe to use on X. score
(X, y[, sample_weight])Returns the mean accuracy on the given test data and labels. score_func_to_gridsearch
(estimator[, …])Function to be given as a scorer function to Grid Search Method. set_params
(**params)Set the parameters of this estimator. transform
(X[, mask])Reduce X to the selected features.
adaptative_binary_mutation predict
fit
(X=None, y=None, normalize=False, **arg)¶Fit method
Parameters:
- X : array of shape [n_samples, n_features]
The input samples
- y : array of shape [n_samples, 1]
The input of labels
- normalize : boolean, (default=False)
If true, StandardScaler will be applied to X
- **arg : parameters
Set parameters
fit_transform
(X, y, normalize=False, **fit_params)¶Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters:
- X : numpy array of shape [n_samples, n_features]
Training set.
- y : numpy array of shape [n_samples]
Target values.
Returns:
- X_new : numpy array of shape [n_samples, n_features_new]
Transformed array.
get_params
(deep=True)¶Get parameters for this estimator.
Parameters:
- deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:
- params : mapping of string to any
Parameter names mapped to their values.
get_support
(indices=False)¶Get a mask, or integer index, of the features selected Parameters ———- indices : boolean (default False)
If True, the return value will be an array of integers, rather than a boolean mask.
- support : array
- An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
plot_results
()¶This method plots all the statistics for each repetition in a graph.
The curves are minimun, average and maximun accuracy
- static
safe_mask
(x, mask)¶Return a mask which is safe to use on X. Parameters ———- X : {array-like, sparse matrix}
Data on which to apply mask.
- mask : array
- Mask to be used on X.
mask
score
(X, y, sample_weight=None)¶Returns the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Parameters:
- X : array-like, shape = (n_samples, n_features)
Test samples.
- y : array-like, shape = (n_samples) or (n_samples, n_outputs)
True labels for X.
- sample_weight : array-like, shape = [n_samples], optional
Sample weights.
Returns:
- score : float
Mean accuracy of self.predict(X) wrt. y.
- static
score_func_to_gridsearch
(estimator, X_test=None, y_test=None)¶Function to be given as a scorer function to Grid Search Method. It is going to transform the matrix os predicts generated by ‘all’ option to an final accuracy score. Use a high value to CV
set_params
(**params)¶Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.
Returns:
- self
transform
(X, mask=None)¶Reduce X to the selected features. Parameters ———- X : array of shape [n_samples, n_features]
The input samples.
- X_r : array of shape [n_samples, n_selected_features]
- The input samples with only the selected features.
RandomSearch
¶A simple dummy algorithm useful for testing the worst case scenario in algorithms with initial populations
- class
feature_selection.random_search.
RandomSearch
(classifier=None, number_gen=5, size_pop=40, verbose=0, repeat=1, parallel=False, make_logbook=False, random_state=None, cv_metric_fuction=None, features_metric_function=None)¶Implementation of a Random Search Algorithm for Feature Selection. It is useful as the worst case
Parameters:
- number_gen : positive integer, (default=5)
Number of generations
- size_pop : positive integer, (default=40)
Size of random samples in each iteration
- verbose : boolean, (default=False)
If true, print information in every generation
- repeat : positive int, (default=1)
Number of times to repeat the fitting process
- parallel : boolean, (default=False)
Set to True if you want to use multiprocessors
- make_logbook: boolean, (default=False)
If True, a logbook from DEAP will be made
- cv_metric_fuction : callable, (default=matthews_corrcoef)
A metric score function as stated in the sklearn http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter
- features_metric_function : callable, (default=pow(sum(mask)/(len(mask)*5), 2))
A function that return a float from the binary mask of features
Methods
fit
([X, y, normalize])Fit method fit_transform
(X, y[, normalize])Fit to data, then transform it. get_params
([deep])Get parameters for this estimator. get_support
([indices])Get a mask, or integer index, of the features selected Parameters ———- indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask. plot_results
()This method plots all the statistics for each repetition in a graph. safe_mask
(x, mask)Return a mask which is safe to use on X. score
(X, y[, sample_weight])Returns the mean accuracy on the given test data and labels. score_func_to_gridsearch
(estimator[, …])Function to be given as a scorer function to Grid Search Method. set_params
(**params)Set the parameters of this estimator. transform
(X[, mask])Reduce X to the selected features.
adaptative_binary_mutation predict
fit
(X=None, y=None, normalize=False, **arg)¶Fit method
Parameters:
- X : array of shape [n_samples, n_features]
The input samples
- y : array of shape [n_samples, 1]
The input of labels
- normalize : boolean, (default=False)
If true, StandardScaler will be applied to X
- **arg : parameters
Set parameters
fit_transform
(X, y, normalize=False, **fit_params)¶Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters:
- X : numpy array of shape [n_samples, n_features]
Training set.
- y : numpy array of shape [n_samples]
Target values.
Returns:
- X_new : numpy array of shape [n_samples, n_features_new]
Transformed array.
get_params
(deep=True)¶Get parameters for this estimator.
Parameters:
- deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:
- params : mapping of string to any
Parameter names mapped to their values.
get_support
(indices=False)¶Get a mask, or integer index, of the features selected Parameters ———- indices : boolean (default False)
If True, the return value will be an array of integers, rather than a boolean mask.
- support : array
- An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
plot_results
()¶This method plots all the statistics for each repetition in a graph.
The curves are minimun, average and maximun accuracy
- static
safe_mask
(x, mask)¶Return a mask which is safe to use on X. Parameters ———- X : {array-like, sparse matrix}
Data on which to apply mask.
- mask : array
- Mask to be used on X.
mask
score
(X, y, sample_weight=None)¶Returns the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Parameters:
- X : array-like, shape = (n_samples, n_features)
Test samples.
- y : array-like, shape = (n_samples) or (n_samples, n_outputs)
True labels for X.
- sample_weight : array-like, shape = [n_samples], optional
Sample weights.
Returns:
- score : float
Mean accuracy of self.predict(X) wrt. y.
- static
score_func_to_gridsearch
(estimator, X_test=None, y_test=None)¶Function to be given as a scorer function to Grid Search Method. It is going to transform the matrix os predicts generated by ‘all’ option to an final accuracy score. Use a high value to CV
set_params
(**params)¶Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.
Returns:
- self
transform
(X, mask=None)¶Reduce X to the selected features. Parameters ———- X : array of shape [n_samples, n_features]
The input samples.
- X_r : array of shape [n_samples, n_selected_features]
- The input samples with only the selected features.
Binary Black Hole Algorithm
¶This algorithm was implemented following the article:
Elnaz Pashaei, Nizamettin Aydin, Binary black hole algorithm for feature selection and classification on biological data, Applied Soft Computing, Volume 56, 2017, Pages 94-106, ISSN 1568-4946,
(http://www.sciencedirect.com/science/article/pii/S1568494617301242)
- class
feature_selection.binary_black_hole.
BinaryBlackHole
(classifier=None, number_gen=10, size_pop=40, verbose=False, repeat=1, make_logbook=False, random_state=None, parallel=False, cv_metric_fuction=None, features_metric_function=None)¶Implementation of Binary Black Hole for Feature Selection
Parameters:
- classifier : sklearn classifier , (default=SVM)
Any classifier that adheres to the scikit-learn API
- number_gen : positive integer, (default=10)
Number of generations
- size_pop : positive integer, (default=40)
Number of individuals in the population
- verbose : boolean, (default=False)
If true, print information in every generation
- repeat : positive int, (default=1)
Number of times to repeat the fitting process
- make_logbook : boolean, (default=False)
If True, a logbook from DEAP will be made
- parallel : boolean, (default=False)
Set to True if you want to use multiprocessors
- cv_metric_fuction : callable, (default=matthews_corrcoef)
A metric score function as stated in the sklearn http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter
- features_metric_function : callable, (default=pow(sum(mask)/(len(mask)*5), 2))
A function that return a float from the binary mask of features
Methods
fit
([X, y, normalize])Fit method fit_transform
(X, y[, normalize])Fit to data, then transform it. get_params
([deep])Get parameters for this estimator. get_support
([indices])Get a mask, or integer index, of the features selected Parameters ———- indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask. plot_results
()This method plots all the statistics for each repetition in a graph. safe_mask
(x, mask)Return a mask which is safe to use on X. score
(X, y[, sample_weight])Returns the mean accuracy on the given test data and labels. score_func_to_gridsearch
(estimator[, …])Function to be given as a scorer function to Grid Search Method. set_params
(**params)Set the parameters of this estimator. transform
(X[, mask])Reduce X to the selected features.
adaptative_binary_mutation predict
fit
(X=None, y=None, normalize=False, **arg)¶Fit method
Parameters:
- X : array of shape [n_samples, n_features]
The input samples
- y : array of shape [n_samples, 1]
The input of labels
- normalize : boolean, (default=False)
If true, StandardScaler will be applied to X
- **arg : parameters
Set parameters
fit_transform
(X, y, normalize=False, **fit_params)¶Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters:
- X : numpy array of shape [n_samples, n_features]
Training set.
- y : numpy array of shape [n_samples]
Target values.
Returns:
- X_new : numpy array of shape [n_samples, n_features_new]
Transformed array.
get_params
(deep=True)¶Get parameters for this estimator.
Parameters:
- deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:
- params : mapping of string to any
Parameter names mapped to their values.
get_support
(indices=False)¶Get a mask, or integer index, of the features selected Parameters ———- indices : boolean (default False)
If True, the return value will be an array of integers, rather than a boolean mask.
- support : array
- An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
plot_results
()¶This method plots all the statistics for each repetition in a graph.
The curves are minimun, average and maximun accuracy
- static
safe_mask
(x, mask)¶Return a mask which is safe to use on X. Parameters ———- X : {array-like, sparse matrix}
Data on which to apply mask.
- mask : array
- Mask to be used on X.
mask
score
(X, y, sample_weight=None)¶Returns the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Parameters:
- X : array-like, shape = (n_samples, n_features)
Test samples.
- y : array-like, shape = (n_samples) or (n_samples, n_outputs)
True labels for X.
- sample_weight : array-like, shape = [n_samples], optional
Sample weights.
Returns:
- score : float
Mean accuracy of self.predict(X) wrt. y.
- static
score_func_to_gridsearch
(estimator, X_test=None, y_test=None)¶Function to be given as a scorer function to Grid Search Method. It is going to transform the matrix os predicts generated by ‘all’ option to an final accuracy score. Use a high value to CV
set_params
(**params)¶Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.
Returns:
- self
transform
(X, mask=None)¶Reduce X to the selected features. Parameters ———- X : array of shape [n_samples, n_features]
The input samples.
- X_r : array of shape [n_samples, n_selected_features]
- The input samples with only the selected features.
Simulated Anneling Algorithm
¶
- Implementation of a Simulated Anneling Algorithm for Feature Selection as
- stated in the book : Fred W. Glover - Handbook of Metaheuristics.
- class
feature_selection.simulated_anneling.
SimulatedAnneling
(classifier=None, mutation_prob=0.05, initial_temp=10, repetition_schedule=10, number_gen=10, repeat=1, verbose=0, parallel=False, make_logbook=False, random_state=None, cv_metric_fuction=None, features_metric_function=None, **arg)¶Implementation of a Simulated Anneling Algorithm for Feature Selection as stated in the book : Fred W. Glover - Handbook of Metaheuristics.
the decay of the temperature is given by temp_init/number_gen
Parameters:
- classifier : sklearn classifier , (default=SVM)
Any classifier that adheres to the scikit-learn API
- mutation_prob : float in [0,1], (default=0.05)
Is the the probability for each value in the solution to be mutated when searching for some neighbor solution.
- number_gen : positive integer, (default=10)
Number of generations
- initial_temp : positive integer, (default=10)
The initial temperature
- verbose : boolean, (default=False)
If true, print information in every generation
- repeat : positive int, (default=1)
Number of times to repeat the fitting process
- parallel : boolean, (default=False)
Set to True if you want to use multiprocessors
- make_logbook : boolean, (default=False)
If True, a logbook from DEAP will be made
- cv_metric_fuction : callable, (default=matthews_corrcoef)
A metric score function as stated in the sklearn http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter
- features_metric_function : callable, (default=pow(sum(mask)/(len(mask)*5), 2))
A function that return a float from the binary mask of features
- size_pop: None
It is needed to
Methods
fit
([X, y, normalize])Fit method fit_transform
(X, y[, normalize])Fit to data, then transform it. get_params
([deep])Get parameters for this estimator. get_support
([indices])Get a mask, or integer index, of the features selected Parameters ———- indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask. plot_results
()This method plots all the statistics for each repetition in a graph. safe_mask
(x, mask)Return a mask which is safe to use on X. score
(X, y[, sample_weight])Returns the mean accuracy on the given test data and labels. score_func_to_gridsearch
(estimator[, …])Function to be given as a scorer function to Grid Search Method. set_params
(**params)Set the parameters of this estimator. transform
(X[, mask])Reduce X to the selected features.
adaptative_binary_mutation predict
fit
(X=None, y=None, normalize=False, **arg)¶Fit method
Parameters:
- X : array of shape [n_samples, n_features]
The input samples
- y : array of shape [n_samples, 1]
The input of labels
- normalize : boolean, (default=False)
If true, StandardScaler will be applied to X
- **arg : parameters
Set parameters
fit_transform
(X, y, normalize=False, **fit_params)¶Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters:
- X : numpy array of shape [n_samples, n_features]
Training set.
- y : numpy array of shape [n_samples]
Target values.
Returns:
- X_new : numpy array of shape [n_samples, n_features_new]
Transformed array.
get_params
(deep=True)¶Get parameters for this estimator.
Parameters:
- deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:
- params : mapping of string to any
Parameter names mapped to their values.
get_support
(indices=False)¶Get a mask, or integer index, of the features selected Parameters ———- indices : boolean (default False)
If True, the return value will be an array of integers, rather than a boolean mask.
- support : array
- An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
plot_results
()¶This method plots all the statistics for each repetition in a graph.
The curves are minimun, average and maximun accuracy
- static
safe_mask
(x, mask)¶Return a mask which is safe to use on X. Parameters ———- X : {array-like, sparse matrix}
Data on which to apply mask.
- mask : array
- Mask to be used on X.
mask
score
(X, y, sample_weight=None)¶Returns the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Parameters:
- X : array-like, shape = (n_samples, n_features)
Test samples.
- y : array-like, shape = (n_samples) or (n_samples, n_outputs)
True labels for X.
- sample_weight : array-like, shape = [n_samples], optional
Sample weights.
Returns:
- score : float
Mean accuracy of self.predict(X) wrt. y.
- static
score_func_to_gridsearch
(estimator, X_test=None, y_test=None)¶Function to be given as a scorer function to Grid Search Method. It is going to transform the matrix os predicts generated by ‘all’ option to an final accuracy score. Use a high value to CV
set_params
(**params)¶Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.
Returns:
- self
transform
(X, mask=None)¶Reduce X to the selected features. Parameters ———- X : array of shape [n_samples, n_features]
The input samples.
- X_r : array of shape [n_samples, n_selected_features]
- The input samples with only the selected features.
General examples¶
Introductory examples.
Note
Click here to download the full example code
Using Pickle to save models¶
You can use pickle to save your model
from feature_selection import HarmonicSearch from sklearn.datasets import load_breast_cancer from six.moves import cPickle dataset = load_breast_cancer() X, y = dataset['data'], dataset['target_names'].take(dataset['target']) hs = HarmonicSearch(random_state=0, make_logbook=True, repeat=2) hs.fit(X,y, normalize=True) file = "HarmonicSearch" f = open(file +'.save', 'wb') cPickle.dump(hs, f, protocol=cPickle.HIGHEST_PROTOCOL) f.close()Total running time of the script: ( 0 minutes 0.000 seconds)
Note
Click here to download the full example code
Plotting MetaHeuristics - Basic Use¶
An example plot of :class:`feature_selection.HarmonicSearch
![]()
![]()
![]()
![]()
Out:
d Number of Features Selected: HS: 0.6666666666666666 % GA: 0.5666666666666667 % Accuracy of the classifier: HS: 0.9807156598691804 GA: 0.9806848787995384from feature_selection import HarmonicSearch, GeneticAlgorithm from sklearn.datasets import load_breast_cancer from sklearn.svm import SVC dataset = load_breast_cancer() X, y = dataset['data'], dataset['target_names'].take(dataset['target']) # Classifier to be used in the metaheuristic clf = SVC() hs = HarmonicSearch(classifier=clf, random_state=0, make_logbook=True, repeat=2) ga = GeneticAlgorithm(classifier=clf, random_state=1, make_logbook=True, repeat=2) # Fit the classifier hs.fit(X, y, normalize=True) ga.fit(X, y, normalize=True) print("Number of Features Selected: \n \t HS: ", sum(hs.best_mask_)/X.shape[1], "% \t GA: ", sum(ga.best_mask_)/X.shape[1], "%") print("Accuracy of the classifier: \n \t HS: ", hs.fitness_[0], "\t GA: ", ga.fitness_[0]) # Transformed dataset X_hs = hs.transform(X) X_ga = ga.transform(X) # Plot the results of each test hs.plot_results() ga.plot_results()Total running time of the script: ( 0 minutes 43.391 seconds)
Note
Click here to download the full example code
Grid Search Example¶
An example of how to use Metaheuristics and GridSearch
from feature_selection import HarmonicSearch from sklearn.datasets import load_breast_cancer from sklearn.model_selection import GridSearchCV from sklearn.svm import SVC from sklearn.preprocessing import StandardScaler import pandas as pd from sklearn.ensemble import RandomForestClassifier dataset = load_breast_cancer() X, y = dataset['data'], dataset['target_names'].take(dataset['target']) sc_X = StandardScaler() X = sc_X.fit_transform(X) # Classifier to be used in the metaheuristic clf = SVC() clf = RandomForestClassifier() clf.fit(X,y) clf.predict(X) == y # Parameter Grid param_grid= { "HMCR":[0, 0.5, 0.95], "indpb":[0.05, 0.5, 1], "pitch":[0.05, 0.5, 1], "repeat":[3] } hs = HarmonicSearch(classifier=clf, make_logbook=True) grid_search = GridSearchCV(hs, param_grid=param_grid, scoring=hs.score_func_to_gridsearch, cv=4, verbose=2) grid_search.fit(X,y) grid_search.best_params_ results = pd.DataFrame.from_dict(grid_search.cv_results_)Total running time of the script: ( 0 minutes 0.000 seconds)
Note
Click here to download the full example code
Parallel Example¶
If a lot of cycles and tests are needed, this approach will lead to more fast results. Instead of making parallel the classifier, each metaheuristics will run into a different process.
## Import dataset import numpy as np from sklearn.preprocessing import StandardScaler from feature_selection import BRKGA from six.moves import cPickle from multiprocessing import Pool import time from sklearn.datasets import load_breast_cancer dataset = load_breast_cancer() X, y = dataset['data'], dataset['target_names'].take(dataset['target']) # Feature Scaling in X sc_X = StandardScaler() X = sc_X.fit_transform(X) def f(i): print("Now in: ", int(i)) a = BRKGA(size_pop=10, mutant_size=2, elite_size=2, number_gen = int(i),repeat = 2, make_logbook=True, verbose=False, cxUniform_indpb=0.9).fit(X,y) return a if __name__ == "__main__": # Cleaning variabels del dataset # Teste A print("Teste A") t0 = time.time() number_gen = np.linspace(1,4,num=3) pool = Pool() # start 4 worker processes clfsA=list( pool.map(f,number_gen)) pool.close() print("Final Time: ", time.time()- t0) file = open('teste_1_A2.save', 'wb') cPickle.dump(clfsA, file, protocol=cPickle.HIGHEST_PROTOCOL) file.close()Total running time of the script: ( 0 minutes 0.000 seconds)
Note
Click here to download the full example code
Parallel Example¶
An example plot of :class:`feature_selection.HarmonicSearch
from feature_selection import BRKGA from sklearn.datasets import load_breast_cancer from sklearn.svm import SVC # It is very necessary to include if __name__ == "__main__" if __name__ == "__main__": dataset = load_breast_cancer() X, y = dataset['data'], dataset['target_names'].take(dataset['target']) # Classifier to be used in the metaheuristic clf = SVC() print("Starting Algorithm") ga =BRKGA(classifier=clf, make_logbook=True, repeat=2, parallel=True, verbose=True, size_pop=100) # Fit the classifier ga.fit(X, y, normalize=True) print("Number of Features Selected: \n \t HS: " , sum(ga.best_mask_)/X.shape[1], "%") print("Accuracy of the classifier: \n \t HS: ", ga.fitness_[0]) # Plot the results of each test ga.plot_results() print("Starting Algorithm") ga =BRKGA(classifier=clf, make_logbook=True, repeat=2, parallel=False, verbose=True, size_pop=100) # Fit the classifier ga.fit(X, y, normalize=True) print("Number of Features Selected: \n \t HS: " , sum(ga.best_mask_)/X.shape[1], "%") print("Accuracy of the classifier: \n \t HS: ", ga.fitness_[0]) # Plot the results of each test ga.plot_results()Total running time of the script: ( 0 minutes 0.000 seconds)
Parallel Processing¶
The parallel feature is given by the multiprocessor package.
When the option parallel=True is set, the line below is called
>> from multiprocessing import Pool
This changes only the evaluation part of individuals on algorithms based in population, changing map to Pool().map, with default inputs. The others algorithms have just the initialization of the inicial population, if more than one individual, implemented with map.
Check the example.
See the README for more information.
Overview¶
Citing¶
Thanks to open source libraries : DEAP, scikit-learn and it’s scikit template.