Welcome to Crunchers’s documentation!¶
Contents:
Overview¶
Reference¶
crunchers package¶
Subpackages¶
crunchers.pandas_helpers package¶
Submodules¶
crunchers.pandas_helpers.transformations module¶
Provide functions for performing non-standard-ish column-wise transformations.
-
crunchers.pandas_helpers.transformations.
apply_ignore_null
(func, s, fillwith=None)[source]¶ Perform func on values on s that are not ‘nan’ or equivalent.
func applied to s after filling the ‘nan’ with fillwith. If fillwith is None, min(s) is used.
You may prefer to use the mean or median like this:
apply_ignore_null(func, s, fillwith=np.mean(s))
Returns a reconstituted pandas.Series with ‘nan’ everywhere there was an original ‘nan’, but with the transformed values everywhere else.
-
crunchers.pandas_helpers.transformations.
apply_pairwise
(series, func)[source]¶ Apply func to items in series pairwise: return dataframe.
-
crunchers.pandas_helpers.transformations.
robust_scale
(df)[source]¶ Return copy of df scaled by (df - df.median()) / MAD(df) where MAD is a function returning the median absolute deviation.
Module contents¶
crunchers.sklearn_helpers package¶
Submodules¶
crunchers.sklearn_helpers.assessment module¶
Provide helper functions for working with scikit-learn based objects.
-
crunchers.sklearn_helpers.assessment.
confusion_matrix_to_pandas
(cm, labels)[source]¶ Return the confusion matrix as a pandas dataframe.
It is created from the confusion matrix stored in cm with rows and columns labeled with labels.
crunchers.sklearn_helpers.exploration module¶
Provide functions that help quickly explore datasets with sklearn.
-
class
crunchers.sklearn_helpers.exploration.
KMeansReport
(data, n_clusters, seed=None, n_jobs=-1, palette='deep')[source]¶ Bases:
object
Manage KMeans Clustering and exploration of results.
-
plot_silhouette_results
(feature_names=None, feature_space=None)[source]¶ Perform plotting similar to that from sklearn link below.
http://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html
-
-
class
crunchers.sklearn_helpers.exploration.
PCAReport
(data, pca=None, n_components=None, data_labels=None, color_palette=None, label_colors=None, name=None)[source]¶ Bases:
object
Manage PCA and exploration of results.
-
filter_by_loadings
(kind, column, hi_thresh, lo_thresh)[source]¶ Return index of row names.
kind (str): either [‘pearsonr’,’spearmanr’] column (str): which PC column to filter hi_thresh (float): retain rows with >= hi_thresh lo_thresh (float): retain rows with <= lo_thresh
-
get_loading_corr
(kind='pearsonr')[source]¶ Return dataframe of correlation based “loadings” repective of kind.
-
n_components
¶ Provide access to the number of PCs.
-
crunchers.sklearn_helpers.misc module¶
Collect misc sklearn helpers here.
Module contents¶
crunchers.statsmodels_helpers package¶
Submodules¶
crunchers.statsmodels_helpers.lazy_stats module¶
Functions for streamlining analysis.
-
crunchers.statsmodels_helpers.lazy_stats.
build_regression_models_grid
(X_hyps_dicts, ctrl_coefs_dicts, outcomes_dicts)[source]¶
-
crunchers.statsmodels_helpers.lazy_stats.
do_regression
(data, y_var, X_ctrls=None, X_hyp=None, kind='OLS', **kwargs)[source]¶ Provide a further abstracted way to build and run multiple types of regressions.
data (pd.DataFrame): data table to use when retrieving the column headers y_var (str): column header of the outcome variable X_ctrls (str): formula specification of the “boring” variables “column_header_1 + column_header_2”… X_hyp (str): formula specification of the “interesting” variables “column_header_1 + column_header_2”… kind (str): the type of regression to run kind in [‘GLM’,’OLS’,’RLM’] == True
-
crunchers.statsmodels_helpers.lazy_stats.
format_all_regression_models
(regs, total)[source]¶ Return tuple of string formated versions of all regression tables in the regs object.
Parameters: - (reg-tree (regs) – dict-like): tree-like dict containing the regression results objects as leaves and descriptors as nodes.
- total (int) – total number of results tables to format.
Returns: tuple
-
crunchers.statsmodels_helpers.lazy_stats.
identify_full_ctrl_names
(X_vars, orig_ctrl_names)[source]¶ Return set of variable names actually used in regression, tolerating mangling of categoricals.
-
crunchers.statsmodels_helpers.lazy_stats.
regression_grid_single
(grid_item, data, kind, **kwargs)[source]¶
-
crunchers.statsmodels_helpers.lazy_stats.
report_glm
(formula, data, verbose=True, **kwargs)[source]¶ Fit GLM, print a report, and return the fit object.
-
crunchers.statsmodels_helpers.lazy_stats.
report_logitreg
(formula, data, verbose=True, disp=1)[source]¶ Fit logistic regression, print a report, and return the fit object.
-
crunchers.statsmodels_helpers.lazy_stats.
report_ols
(formula, data, fit_regularized=False, L1_wt=1, refit=False, **kwargs)[source]¶ Fit OLS regression, print a report, and return the fit object.
-
crunchers.statsmodels_helpers.lazy_stats.
report_rlm
(formula, data, verbose=True, **kwargs)[source]¶ Fit RLM, print a report, and return the fit object.
-
crunchers.statsmodels_helpers.lazy_stats.
run_regressions_grid
(grid, data, kind, max_workers=None, **kwargs)[source]¶
-
crunchers.statsmodels_helpers.lazy_stats.
summarize_X_vars
(results, sig_thresh=0.05, X_ctrls=None, X_ignore=None)[source]¶
-
crunchers.statsmodels_helpers.lazy_stats.
summarize_grid_X_vars_OLS
(regs, reg_grid, sig_thresh=0.05)[source]¶
-
crunchers.statsmodels_helpers.lazy_stats.
summarize_multi_LOGIT
(results)[source]¶ Return dataframe aggregating over-all stats from a dictionary-like object containing LOGIT result objects.
-
crunchers.statsmodels_helpers.lazy_stats.
summarize_multi_OLS
(results)[source]¶ Return dataframe aggregating over-all stats from a dictionary-like object containing OLS result objects.
Module contents¶
Contributing¶
Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.
Bug reports¶
When reporting a bug please include:
- Your operating system name and version.
- Any details about your local setup that might be helpful in troubleshooting.
- Detailed steps to reproduce the bug.
Documentation improvements¶
Crunchers could always use more documentation, whether as part of the official Crunchers docs, in docstrings, or even on the web in blog posts, articles, and such.
Feature requests and feedback¶
The best way to send feedback is to file an issue at https://github.com/xguse/crunchers/issues.
If you are proposing a feature:
- Explain in detail how it would work.
- Keep the scope as narrow as possible, to make it easier to implement.
- Remember that this is a volunteer-driven project, and that contributions are welcome :)
Development¶
To set up crunchers for local development:
Clone your fork locally:
git clone git@github.com:your_name_here/crunchers.git
Create a branch for local development:
git checkout -b name-of-your-bugfix-or-feature
Now you can make your changes locally.
When you’re done making changes, run all the checks, doc builder and spell checker with tox one command:
tox
Commit your changes and push your branch to GitHub:
git add . git commit -m "Your detailed description of your changes." git push origin name-of-your-bugfix-or-feature
Submit a pull request through the GitHub website.
Pull Request Guidelines¶
If you need some code review or feedback while you’re developing the code just make the pull request.
For merging, you should:
- Include passing tests (run
tox
) [1]. - Update documentation when there’s new API, functionality etc.
- Add a note to
CHANGELOG.rst
about the changes. - Add yourself to
AUTHORS.rst
.
[1] | If you don’t have all the necessary python versions available locally you can rely on Travis - it will run the tests for each change you add in the pull request. It will be slower though … |
Tips¶
To run a subset of tests:
tox -e envname -- py.test -k test_myfeature
To run all the test environments in parallel (you need to pip install detox
):
detox
Authors¶
- Gus Dunn - https://github.com/xguse