Welcome to MetaHeuristics documentation!

This project is a implementation to anyone who wishes to use metaheuristics in feature selection.

API Reference

The API reference provides detailed descriptions of MetaHeuristic classes and functions. It should be helpful if you plan to implement those algorithms into your datamining workflow

HarmonicSearch

This algorithm was implemented following the article:

X. Z. Gao, V. Govindasamy, H. Xu, X. Wang, and K. Zenger, “Harmony Search Method: Theory and Applications,”Computational Intelligence and Neuroscience, vol. 2015,

Article ID 258491, 10 pages, 2015. doi:10.1155/2015/258491
class feature_selection.harmonic_search.HarmonicSearch(classifier=None, HMCR=0.95, indpb=0.05, pitch=0.05, number_gen=100, size_pop=50, verbose=0, repeat=1, make_logbook=False, random_state=None, parallel=False, cv_metric_fuction=None, features_metric_function=None)

Implementation of a Harmonic Search Algorithm for Feature Selection

Parameters:
HMCR : float in [0,1], (default=0.95)

Is the Harmonic Memory Considering Rate

indpb : float in [0,1], (default=0.05)

Is the mutation rate of each new harmony

pitch : float in [0,1], (default=0.05)

Is the Pitch Adjustament factor

number_gen : positive integer, (default=100)

Number of generations

size_pop : positive integer, (default=50)

Size of the Harmonic Memory

verbose : boolean, (default=False)

If true, print information in every generation

repeat : positive int, (default=1)

Number of times to repeat the fitting process

parallel : boolean, (default=False)

Set to True if you want to use multiprocessors

make_logbook : boolean, (default=False)

If True, a logbook from DEAP will be made

cv_metric_fuction : callable, (default=matthews_corrcoef)

A metric score function as stated in the sklearn http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

features_metric_function : callable, (default=pow(sum(mask)/(len(mask)*5), 2))

A function that return a float from the binary mask of features

Methods

fit([X, y, normalize]) Fit method
fit_transform(X, y[, normalize]) Fit to data, then transform it.
get_params([deep]) Get parameters for this estimator.
get_support([indices]) Get a mask, or integer index, of the features selected Parameters ———- indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.
plot_results() This method plots all the statistics for each repetition in a graph.
safe_mask(x, mask) Return a mask which is safe to use on X.
score(X, y[, sample_weight]) Returns the mean accuracy on the given test data and labels.
score_func_to_gridsearch(estimator[, …]) Function to be given as a scorer function to Grid Search Method.
set_params(**params) Set the parameters of this estimator.
transform(X[, mask]) Reduce X to the selected features.
adaptative_binary_mutation  
predict  
fit(X=None, y=None, normalize=False, **arg)

Fit method

Parameters:
X : array of shape [n_samples, n_features]

The input samples

y : array of shape [n_samples, 1]

The input of labels

normalize : boolean, (default=False)

If true, StandardScaler will be applied to X

**arg : parameters

Set parameters

fit_transform(X, y, normalize=False, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
X : numpy array of shape [n_samples, n_features]

Training set.

y : numpy array of shape [n_samples]

Target values.

Returns:
X_new : numpy array of shape [n_samples, n_features_new]

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

get_support(indices=False)

Get a mask, or integer index, of the features selected Parameters ———- indices : boolean (default False)

If True, the return value will be an array of integers, rather than a boolean mask.
support : array
An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
plot_results()

This method plots all the statistics for each repetition in a graph.

The curves are minimun, average and maximun accuracy
static safe_mask(x, mask)

Return a mask which is safe to use on X. Parameters ———- X : {array-like, sparse matrix}

Data on which to apply mask.
mask : array
Mask to be used on X.
mask
score(X, y, sample_weight=None)

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
X : array-like, shape = (n_samples, n_features)

Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)

True labels for X.

sample_weight : array-like, shape = [n_samples], optional

Sample weights.

Returns:
score : float

Mean accuracy of self.predict(X) wrt. y.

static score_func_to_gridsearch(estimator, X_test=None, y_test=None)

Function to be given as a scorer function to Grid Search Method. It is going to transform the matrix os predicts generated by ‘all’ option to an final accuracy score. Use a high value to CV

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
self
transform(X, mask=None)

Reduce X to the selected features. Parameters ———- X : array of shape [n_samples, n_features]

The input samples.
X_r : array of shape [n_samples, n_selected_features]
The input samples with only the selected features.

GeneticAlgorithm

Basic GA

class feature_selection.genetic_algorithm.GeneticAlgorithm(classifier=None, cross_over_prob=0.5, individual_mut_prob=0.05, gene_mutation_prob=0.05, number_gen=10, size_pop=40, verbose=0, repeat=1, make_logbook=False, random_state=None, parallel=False, cv_metric_fuction=None, features_metric_function=None)

Implementation of a Genetic Algorithm for Feature Selection

Parameters:
classifier : sklearn classifier , (default=SVM)

Any classifier that adheres to the scikit-learn API

cross_over_prob : float in [0,1], (default=0.5)

Probability of happening a cross-over in a individual (chromosome)

individual_mutation_probability : float in [0,1], (default=0.05)

Probability of happening mutation in a individual ( chromosome )

gene_mutation_prob : float in [0,1], (default=0.05)

For each gene in the individual (chromosome) chosen for mutation, is the probability of it being mutate

number_gen : positive integer, (default=10)

Number of generations

size_pop : positive integer, (default=40)

Number of individuals (choromosome ) in the population

verbose : boolean, (default=False)

If true, print information in every generation

repeat : positive int, (default=1)

Number of times to repeat the fitting process

make_logbook : boolean, (default=False)

If True, a logbook from DEAP will be made

parallel : boolean, (default=False)

Set to True if you want to use multiprocessors

cv_metric_fuction : callable, (default=matthews_corrcoef)

A metric score function as stated in the sklearn http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

features_metric_function : callable, (default=pow(sum(mask)/(len(mask)*5), 2))

A function that return a float from the binary mask of features

Methods

fit([X, y, normalize]) Fit method
fit_transform(X, y[, normalize]) Fit to data, then transform it.
get_params([deep]) Get parameters for this estimator.
get_support([indices]) Get a mask, or integer index, of the features selected Parameters ———- indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.
plot_results() This method plots all the statistics for each repetition in a graph.
safe_mask(x, mask) Return a mask which is safe to use on X.
score(X, y[, sample_weight]) Returns the mean accuracy on the given test data and labels.
score_func_to_gridsearch(estimator[, …]) Function to be given as a scorer function to Grid Search Method.
set_params(**params) Set the parameters of this estimator.
transform(X[, mask]) Reduce X to the selected features.
adaptative_binary_mutation  
predict  
fit(X=None, y=None, normalize=False, **arg)

Fit method

Parameters:
X : array of shape [n_samples, n_features]

The input samples

y : array of shape [n_samples, 1]

The input of labels

normalize : boolean, (default=False)

If true, StandardScaler will be applied to X

**arg : parameters

Set parameters

fit_transform(X, y, normalize=False, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
X : numpy array of shape [n_samples, n_features]

Training set.

y : numpy array of shape [n_samples]

Target values.

Returns:
X_new : numpy array of shape [n_samples, n_features_new]

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

get_support(indices=False)

Get a mask, or integer index, of the features selected Parameters ———- indices : boolean (default False)

If True, the return value will be an array of integers, rather than a boolean mask.
support : array
An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
plot_results()

This method plots all the statistics for each repetition in a graph.

The curves are minimun, average and maximun accuracy
static safe_mask(x, mask)

Return a mask which is safe to use on X. Parameters ———- X : {array-like, sparse matrix}

Data on which to apply mask.
mask : array
Mask to be used on X.
mask
score(X, y, sample_weight=None)

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
X : array-like, shape = (n_samples, n_features)

Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)

True labels for X.

sample_weight : array-like, shape = [n_samples], optional

Sample weights.

Returns:
score : float

Mean accuracy of self.predict(X) wrt. y.

static score_func_to_gridsearch(estimator, X_test=None, y_test=None)

Function to be given as a scorer function to Grid Search Method. It is going to transform the matrix os predicts generated by ‘all’ option to an final accuracy score. Use a high value to CV

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
self
transform(X, mask=None)

Reduce X to the selected features. Parameters ———- X : array of shape [n_samples, n_features]

The input samples.
X_r : array of shape [n_samples, n_selected_features]
The input samples with only the selected features.

RandomSearch

A simple dummy algorithm useful for testing the worst case scenario in algorithms with initial populations

class feature_selection.random_search.RandomSearch(classifier=None, number_gen=5, size_pop=40, verbose=0, repeat=1, parallel=False, make_logbook=False, random_state=None, cv_metric_fuction=None, features_metric_function=None)

Implementation of a Random Search Algorithm for Feature Selection. It is useful as the worst case

Parameters:
number_gen : positive integer, (default=5)

Number of generations

size_pop : positive integer, (default=40)

Size of random samples in each iteration

verbose : boolean, (default=False)

If true, print information in every generation

repeat : positive int, (default=1)

Number of times to repeat the fitting process

parallel : boolean, (default=False)

Set to True if you want to use multiprocessors

make_logbook: boolean, (default=False)

If True, a logbook from DEAP will be made

cv_metric_fuction : callable, (default=matthews_corrcoef)

A metric score function as stated in the sklearn http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

features_metric_function : callable, (default=pow(sum(mask)/(len(mask)*5), 2))

A function that return a float from the binary mask of features

Methods

fit([X, y, normalize]) Fit method
fit_transform(X, y[, normalize]) Fit to data, then transform it.
get_params([deep]) Get parameters for this estimator.
get_support([indices]) Get a mask, or integer index, of the features selected Parameters ———- indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.
plot_results() This method plots all the statistics for each repetition in a graph.
safe_mask(x, mask) Return a mask which is safe to use on X.
score(X, y[, sample_weight]) Returns the mean accuracy on the given test data and labels.
score_func_to_gridsearch(estimator[, …]) Function to be given as a scorer function to Grid Search Method.
set_params(**params) Set the parameters of this estimator.
transform(X[, mask]) Reduce X to the selected features.
adaptative_binary_mutation  
predict  
fit(X=None, y=None, normalize=False, **arg)

Fit method

Parameters:
X : array of shape [n_samples, n_features]

The input samples

y : array of shape [n_samples, 1]

The input of labels

normalize : boolean, (default=False)

If true, StandardScaler will be applied to X

**arg : parameters

Set parameters

fit_transform(X, y, normalize=False, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
X : numpy array of shape [n_samples, n_features]

Training set.

y : numpy array of shape [n_samples]

Target values.

Returns:
X_new : numpy array of shape [n_samples, n_features_new]

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

get_support(indices=False)

Get a mask, or integer index, of the features selected Parameters ———- indices : boolean (default False)

If True, the return value will be an array of integers, rather than a boolean mask.
support : array
An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
plot_results()

This method plots all the statistics for each repetition in a graph.

The curves are minimun, average and maximun accuracy
static safe_mask(x, mask)

Return a mask which is safe to use on X. Parameters ———- X : {array-like, sparse matrix}

Data on which to apply mask.
mask : array
Mask to be used on X.
mask
score(X, y, sample_weight=None)

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
X : array-like, shape = (n_samples, n_features)

Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)

True labels for X.

sample_weight : array-like, shape = [n_samples], optional

Sample weights.

Returns:
score : float

Mean accuracy of self.predict(X) wrt. y.

static score_func_to_gridsearch(estimator, X_test=None, y_test=None)

Function to be given as a scorer function to Grid Search Method. It is going to transform the matrix os predicts generated by ‘all’ option to an final accuracy score. Use a high value to CV

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
self
transform(X, mask=None)

Reduce X to the selected features. Parameters ———- X : array of shape [n_samples, n_features]

The input samples.
X_r : array of shape [n_samples, n_selected_features]
The input samples with only the selected features.

Binary Black Hole Algorithm

This algorithm was implemented following the article:

Elnaz Pashaei, Nizamettin Aydin, Binary black hole algorithm for feature selection and classification on biological data, Applied Soft Computing, Volume 56, 2017, Pages 94-106, ISSN 1568-4946,

(http://www.sciencedirect.com/science/article/pii/S1568494617301242)

class feature_selection.binary_black_hole.BinaryBlackHole(classifier=None, number_gen=10, size_pop=40, verbose=False, repeat=1, make_logbook=False, random_state=None, parallel=False, cv_metric_fuction=None, features_metric_function=None)

Implementation of Binary Black Hole for Feature Selection

Parameters:
classifier : sklearn classifier , (default=SVM)

Any classifier that adheres to the scikit-learn API

number_gen : positive integer, (default=10)

Number of generations

size_pop : positive integer, (default=40)

Number of individuals in the population

verbose : boolean, (default=False)

If true, print information in every generation

repeat : positive int, (default=1)

Number of times to repeat the fitting process

make_logbook : boolean, (default=False)

If True, a logbook from DEAP will be made

parallel : boolean, (default=False)

Set to True if you want to use multiprocessors

cv_metric_fuction : callable, (default=matthews_corrcoef)

A metric score function as stated in the sklearn http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

features_metric_function : callable, (default=pow(sum(mask)/(len(mask)*5), 2))

A function that return a float from the binary mask of features

Methods

fit([X, y, normalize]) Fit method
fit_transform(X, y[, normalize]) Fit to data, then transform it.
get_params([deep]) Get parameters for this estimator.
get_support([indices]) Get a mask, or integer index, of the features selected Parameters ———- indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.
plot_results() This method plots all the statistics for each repetition in a graph.
safe_mask(x, mask) Return a mask which is safe to use on X.
score(X, y[, sample_weight]) Returns the mean accuracy on the given test data and labels.
score_func_to_gridsearch(estimator[, …]) Function to be given as a scorer function to Grid Search Method.
set_params(**params) Set the parameters of this estimator.
transform(X[, mask]) Reduce X to the selected features.
adaptative_binary_mutation  
predict  
fit(X=None, y=None, normalize=False, **arg)

Fit method

Parameters:
X : array of shape [n_samples, n_features]

The input samples

y : array of shape [n_samples, 1]

The input of labels

normalize : boolean, (default=False)

If true, StandardScaler will be applied to X

**arg : parameters

Set parameters

fit_transform(X, y, normalize=False, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
X : numpy array of shape [n_samples, n_features]

Training set.

y : numpy array of shape [n_samples]

Target values.

Returns:
X_new : numpy array of shape [n_samples, n_features_new]

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

get_support(indices=False)

Get a mask, or integer index, of the features selected Parameters ———- indices : boolean (default False)

If True, the return value will be an array of integers, rather than a boolean mask.
support : array
An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
plot_results()

This method plots all the statistics for each repetition in a graph.

The curves are minimun, average and maximun accuracy
static safe_mask(x, mask)

Return a mask which is safe to use on X. Parameters ———- X : {array-like, sparse matrix}

Data on which to apply mask.
mask : array
Mask to be used on X.
mask
score(X, y, sample_weight=None)

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
X : array-like, shape = (n_samples, n_features)

Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)

True labels for X.

sample_weight : array-like, shape = [n_samples], optional

Sample weights.

Returns:
score : float

Mean accuracy of self.predict(X) wrt. y.

static score_func_to_gridsearch(estimator, X_test=None, y_test=None)

Function to be given as a scorer function to Grid Search Method. It is going to transform the matrix os predicts generated by ‘all’ option to an final accuracy score. Use a high value to CV

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
self
transform(X, mask=None)

Reduce X to the selected features. Parameters ———- X : array of shape [n_samples, n_features]

The input samples.
X_r : array of shape [n_samples, n_selected_features]
The input samples with only the selected features.

Simulated Anneling Algorithm

Implementation of a Simulated Anneling Algorithm for Feature Selection as
stated in the book : Fred W. Glover - Handbook of Metaheuristics.
class feature_selection.simulated_anneling.SimulatedAnneling(classifier=None, mutation_prob=0.05, initial_temp=10, repetition_schedule=10, number_gen=10, repeat=1, verbose=0, parallel=False, make_logbook=False, random_state=None, cv_metric_fuction=None, features_metric_function=None, **arg)

Implementation of a Simulated Anneling Algorithm for Feature Selection as stated in the book : Fred W. Glover - Handbook of Metaheuristics.

the decay of the temperature is given by temp_init/number_gen

Parameters:
classifier : sklearn classifier , (default=SVM)

Any classifier that adheres to the scikit-learn API

mutation_prob : float in [0,1], (default=0.05)

Is the the probability for each value in the solution to be mutated when searching for some neighbor solution.

number_gen : positive integer, (default=10)

Number of generations

initial_temp : positive integer, (default=10)

The initial temperature

verbose : boolean, (default=False)

If true, print information in every generation

repeat : positive int, (default=1)

Number of times to repeat the fitting process

parallel : boolean, (default=False)

Set to True if you want to use multiprocessors

make_logbook : boolean, (default=False)

If True, a logbook from DEAP will be made

cv_metric_fuction : callable, (default=matthews_corrcoef)

A metric score function as stated in the sklearn http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

features_metric_function : callable, (default=pow(sum(mask)/(len(mask)*5), 2))

A function that return a float from the binary mask of features

size_pop: None

It is needed to

Methods

fit([X, y, normalize]) Fit method
fit_transform(X, y[, normalize]) Fit to data, then transform it.
get_params([deep]) Get parameters for this estimator.
get_support([indices]) Get a mask, or integer index, of the features selected Parameters ———- indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.
plot_results() This method plots all the statistics for each repetition in a graph.
safe_mask(x, mask) Return a mask which is safe to use on X.
score(X, y[, sample_weight]) Returns the mean accuracy on the given test data and labels.
score_func_to_gridsearch(estimator[, …]) Function to be given as a scorer function to Grid Search Method.
set_params(**params) Set the parameters of this estimator.
transform(X[, mask]) Reduce X to the selected features.
adaptative_binary_mutation  
predict  
fit(X=None, y=None, normalize=False, **arg)

Fit method

Parameters:
X : array of shape [n_samples, n_features]

The input samples

y : array of shape [n_samples, 1]

The input of labels

normalize : boolean, (default=False)

If true, StandardScaler will be applied to X

**arg : parameters

Set parameters

fit_transform(X, y, normalize=False, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
X : numpy array of shape [n_samples, n_features]

Training set.

y : numpy array of shape [n_samples]

Target values.

Returns:
X_new : numpy array of shape [n_samples, n_features_new]

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

get_support(indices=False)

Get a mask, or integer index, of the features selected Parameters ———- indices : boolean (default False)

If True, the return value will be an array of integers, rather than a boolean mask.
support : array
An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
plot_results()

This method plots all the statistics for each repetition in a graph.

The curves are minimun, average and maximun accuracy
static safe_mask(x, mask)

Return a mask which is safe to use on X. Parameters ———- X : {array-like, sparse matrix}

Data on which to apply mask.
mask : array
Mask to be used on X.
mask
score(X, y, sample_weight=None)

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
X : array-like, shape = (n_samples, n_features)

Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)

True labels for X.

sample_weight : array-like, shape = [n_samples], optional

Sample weights.

Returns:
score : float

Mean accuracy of self.predict(X) wrt. y.

static score_func_to_gridsearch(estimator, X_test=None, y_test=None)

Function to be given as a scorer function to Grid Search Method. It is going to transform the matrix os predicts generated by ‘all’ option to an final accuracy score. Use a high value to CV

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
self
transform(X, mask=None)

Reduce X to the selected features. Parameters ———- X : array of shape [n_samples, n_features]

The input samples.
X_r : array of shape [n_samples, n_selected_features]
The input samples with only the selected features.

General examples

Introductory examples.

Using Pickle to save models

You can use pickle to save your model

from feature_selection import HarmonicSearch
from sklearn.datasets import load_breast_cancer
from six.moves import cPickle

dataset = load_breast_cancer()
X, y = dataset['data'], dataset['target_names'].take(dataset['target'])

hs = HarmonicSearch(random_state=0, make_logbook=True,
                    repeat=2)

hs.fit(X,y, normalize=True)

file = "HarmonicSearch"

f = open(file +'.save', 'wb')
cPickle.dump(hs, f, protocol=cPickle.HIGHEST_PROTOCOL)
f.close()

Total running time of the script: ( 0 minutes 0.000 seconds)

Gallery generated by Sphinx-Gallery

Plotting MetaHeuristics - Basic Use

An example plot of :class:`feature_selection.HarmonicSearch

  • _images/sphx_glr_plot_001.png
  • _images/sphx_glr_plot_002.png
  • _images/sphx_glr_plot_003.png
  • _images/sphx_glr_plot_004.png

Out:

d
Number of Features Selected:
         HS:  0.6666666666666666 %       GA:  0.5666666666666667 %
Accuracy of the classifier:
         HS:  0.9807156598691804         GA:  0.9806848787995384

from feature_selection import HarmonicSearch, GeneticAlgorithm
from sklearn.datasets import load_breast_cancer
from sklearn.svm import SVC

dataset = load_breast_cancer()
X, y = dataset['data'], dataset['target_names'].take(dataset['target'])

# Classifier to be used in the metaheuristic
clf = SVC()

hs = HarmonicSearch(classifier=clf, random_state=0, make_logbook=True,
                    repeat=2)

ga = GeneticAlgorithm(classifier=clf, random_state=1, make_logbook=True,
                      repeat=2)

# Fit the classifier
hs.fit(X, y, normalize=True)
ga.fit(X, y, normalize=True)

print("Number of Features Selected: \n \t HS: ", sum(hs.best_mask_)/X.shape[1],
      "% \t GA: ", sum(ga.best_mask_)/X.shape[1], "%")
print("Accuracy of the classifier: \n \t HS: ", hs.fitness_[0], "\t GA: ",
      ga.fitness_[0])

# Transformed dataset
X_hs = hs.transform(X)
X_ga = ga.transform(X)

# Plot the results of each test
hs.plot_results()
ga.plot_results()

Total running time of the script: ( 0 minutes 43.391 seconds)

Gallery generated by Sphinx-Gallery

Grid Search Example

An example of how to use Metaheuristics and GridSearch

from feature_selection import HarmonicSearch
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
import pandas as pd
from sklearn.ensemble import RandomForestClassifier

dataset = load_breast_cancer()
X, y = dataset['data'], dataset['target_names'].take(dataset['target'])
sc_X = StandardScaler()
X = sc_X.fit_transform(X)

# Classifier to be used in the metaheuristic
clf = SVC()
clf = RandomForestClassifier()
clf.fit(X,y)
clf.predict(X) == y
# Parameter Grid
param_grid= {
    "HMCR":[0, 0.5, 0.95],
    "indpb":[0.05, 0.5, 1],
    "pitch":[0.05, 0.5, 1],
    "repeat":[3]
     }
hs = HarmonicSearch(classifier=clf, make_logbook=True)
grid_search = GridSearchCV(hs, param_grid=param_grid, scoring=hs.score_func_to_gridsearch, cv=4,
                           verbose=2)
grid_search.fit(X,y)

grid_search.best_params_
results = pd.DataFrame.from_dict(grid_search.cv_results_)

Total running time of the script: ( 0 minutes 0.000 seconds)

Gallery generated by Sphinx-Gallery

Parallel Example

If a lot of cycles and tests are needed, this approach will lead to more fast results. Instead of making parallel the classifier, each metaheuristics will run into a different process.

## Import dataset
import numpy as np
from sklearn.preprocessing import StandardScaler
from feature_selection import BRKGA
from six.moves import cPickle
from multiprocessing import Pool
import time
from sklearn.datasets import load_breast_cancer

dataset = load_breast_cancer()
X, y = dataset['data'], dataset['target_names'].take(dataset['target'])
# Feature Scaling in X
sc_X = StandardScaler()
X = sc_X.fit_transform(X)

def f(i):
    print("Now in: ", int(i))
    a = BRKGA(size_pop=10, mutant_size=2, elite_size=2,
                  number_gen = int(i),repeat = 2, make_logbook=True,
                  verbose=False, cxUniform_indpb=0.9).fit(X,y)
    return a

if __name__ == "__main__":

    # Cleaning variabels
    del dataset

    # Teste A
    print("Teste A")
    t0 = time.time()

    number_gen = np.linspace(1,4,num=3)

    pool = Pool()              # start 4 worker processes

    clfsA=list( pool.map(f,number_gen))
    pool.close()

    print("Final Time: ", time.time()- t0)
    file = open('teste_1_A2.save', 'wb')
    cPickle.dump(clfsA, file, protocol=cPickle.HIGHEST_PROTOCOL)
    file.close()

Total running time of the script: ( 0 minutes 0.000 seconds)

Gallery generated by Sphinx-Gallery

Parallel Example

An example plot of :class:`feature_selection.HarmonicSearch

from feature_selection import BRKGA
from sklearn.datasets import load_breast_cancer
from sklearn.svm import SVC

# It is very necessary to include if __name__ == "__main__"
if __name__ == "__main__":
    dataset = load_breast_cancer()
    X, y = dataset['data'], dataset['target_names'].take(dataset['target'])

    # Classifier to be used in the metaheuristic
    clf = SVC()

    print("Starting Algorithm")
    ga =BRKGA(classifier=clf, make_logbook=True, repeat=2, parallel=True,
                          verbose=True, size_pop=100)

    # Fit the classifier
    ga.fit(X, y, normalize=True)

    print("Number of Features Selected: \n \t HS: " , sum(ga.best_mask_)/X.shape[1], "%")
    print("Accuracy of the classifier: \n \t HS: ", ga.fitness_[0])

    # Plot the results of each test
    ga.plot_results()

    print("Starting Algorithm")
    ga =BRKGA(classifier=clf, make_logbook=True, repeat=2, parallel=False,
                          verbose=True, size_pop=100)

    # Fit the classifier
    ga.fit(X, y, normalize=True)

    print("Number of Features Selected: \n \t HS: " , sum(ga.best_mask_)/X.shape[1], "%")
    print("Accuracy of the classifier: \n \t HS: ", ga.fitness_[0])

    # Plot the results of each test
    ga.plot_results()

Total running time of the script: ( 0 minutes 0.000 seconds)

Gallery generated by Sphinx-Gallery

Gallery generated by Sphinx-Gallery

Parallel Processing

The parallel feature is given by the multiprocessor package.

When the option parallel=True is set, the line below is called

>> from multiprocessing import Pool

This changes only the evaluation part of individuals on algorithms based in population, changing map to Pool().map, with default inputs. The others algorithms have just the initialization of the inicial population, if more than one individual, implemented with map.

Check the example.

See the README for more information.

Overview

Citing

Thanks to open source libraries : DEAP, scikit-learn and it’s scikit template.