Welcome to Physt’s documentation!

Tutorials

Get started with physt

This tutorial describes some of the basic features of physt.

[1]:
# Necessary import evil
%matplotlib inline
from physt import histogram, binnings, h1, h2, h3
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1337)     # Have always the same data

Getting physt (to run)

I believe you can skip this section but anyway, for the sake of completeness, the default way of installing a relatively stable version of physt is via pip:

pip install physt

Alternatively, you can download the source code from github (https://github.com/janpipek/physt).

You will need numpy to use physt (required), but there are other packages (optional) that are very useful if you want to use physt at its best: matplotlib for plotting (or bokeh as a not-so-well supported alternative).

Your first histogram

If you need to create a histogram, call the histogram (or h1) function with your data (like heights of people) as the first argument. The default gives a reasonable result…

[2]:
# Basic dataset
heights = [160, 155, 156, 198, 177, 168, 191, 183, 184, 179, 178, 172, 173, 175,
           172, 177, 176, 175, 174, 173, 174, 175, 177, 169, 168, 164, 175, 188,
           178, 174, 173, 181, 185, 166, 162, 163, 171, 165, 180, 189, 166, 163,
           172, 173, 174, 183, 184, 161, 162, 168, 169, 174, 176, 170, 169, 165]

hist = histogram(heights)     # Automatically select all settings
hist
[2]:
Histogram1D(bins=(10,), total=56, dtype=int64)

…which is an object of the Histogram1D type that holds all the bin information…

[3]:
hist.bins          # All the bins
[3]:
array([[155. , 159.3],
       [159.3, 163.6],
       [163.6, 167.9],
       [167.9, 172.2],
       [172.2, 176.5],
       [176.5, 180.8],
       [180.8, 185.1],
       [185.1, 189.4],
       [189.4, 193.7],
       [193.7, 198. ]])
[4]:
hist.frequencies   # All the frequencies
[4]:
array([ 2,  6,  5, 11, 15,  7,  6,  2,  1,  1], dtype=int64)

…and provides further features and methods, like plotting for example…

[5]:
hist.plot(show_values=True);
_images/tutorial_9_0.png

…or adding new values (note that this is something numpy.histogram won’t do for you)…

[6]:
original = hist.copy()             # Store the original data to see changes

# ******* Here comes a lonely giant
hist.fill(197)

step1 = hist.copy()                # Store the intermediate value

# ******* And a bunch of relatively short people
hist.fill_n([160, 160, 161, 157, 156, 159, 162])

# See how the plot changes (you can safely ignore the following 4 lines)
ax = hist.plot(label="After fill_n");
step1.plot(color="yellow", ax=ax, label="After fill")
original.plot(color="red", ax=ax, label="Before filling")
ax.legend(loc=1)

# See the number of entries
hist
[6]:
Histogram1D(bins=(10,), total=64, dtype=int64)
_images/tutorial_11_1.png

Data representation

The primary goal of physt library is to represent histograms as data objects with a set methods for easy manipulation and analysis (including mathematical operations, adaptivity, summary statistics. …). The histogram classes used in ROOT framework served as inspiration but not as a model to copy (though relevant methods often have same names).

Based on its dimensionality, a histogram is an instance of one of the following classes (all inheriting from HistogramBase):

  • Histogram1D for univariate data
  • Histogram2D for bivariate data
  • HistogramND for data with higher dimensionality
  • …or some special dedicated class (user-provided). Currently, there is a PolarHistogram as an example (considered to be experimental, not API-stable).

However, these objects are __init__ialized with already calculated data and therefore, you typically don’t construct the yourselves but call one of the facade functions:

  • histogram or h1
  • histogram2d or h2
  • histogramdd (or h3 for 3D case)

These functions try to find the best binning schema, calculate bin contents and set other properties for the histograms. In principle (if not, submit a bug report), if you call a function with arguments understood by eponymous numpy functions (histogram, histogram2d and histogramdd), you should receive histogram with exactly the same bin edges and bin contents. However, there are many more arguments available!

[7]:
# Back to people's parameters...
heights = np.random.normal(172, 10, 100)
weights = np.random.normal(70, 15, 100)
iqs = np.random.normal(100, 15, 100)
[8]:
# 1D histogram
h1(heights)
[8]:
Histogram1D(bins=(10,), total=100, dtype=int64)
[9]:
# 2D histogram
h2(heights, weights, [5, 7])
[9]:
Histogram2D(bins=(5, 7), total=100, dtype=int64)
[10]:
# 3D histogram
h3([heights, weights, iqs])      # Simplification with respect to numpy.histogramdd
[10]:
HistogramND(bins=(10, 10, 10), total=100, dtype=int64)

So, what do these objects contain? In principle:

  • binning schema (_binning or _binnings)
  • bin contents (frequencies) together with errors (errors)
  • some statistics about the data (mean, variance, std)
  • metadata (like name and axis_name or axis_names)

In the following, properties of Histogram1D will be described. Analogous methods and data fields do exist also for Histogram2D and HistogramND, perhaps with the name in plural.

Binning schema

The structure of bins is stored in the histogram object as a hidden attribute _binning. This value is an instance of one of the binning classes that are all descendants of physt.binnings.BinningBase. You are not supposed to directly access this value because manipulating it without at the same time updating the bin contents is dangerous.

A dedicated notebook deals with the binning specifics, here we sum at least the most important features.

Histogram1D offers the following attributes to access (read-only or read-only-intended) the binning information (explicitly or implicitly stored in _binning):

[11]:
# Create a histogram with "reasonable" bins
data = np.random.normal(0, 7, 10000)
hist = histogram(data, "human", bin_count=4)
hist
[11]:
Histogram1D(bins=(6,), total=10000, dtype=int64)
[12]:
hist._binning         # Just to show, don't use it
[12]:
FixedWidthBinning(bin_width=10.0, bin_count=6, min=-30.0)
[13]:
hist.bin_count        # The total number of bins
[13]:
6
[14]:
hist.bins             # Bins as array of both edges
[14]:
array([[-30., -20.],
       [-20., -10.],
       [-10.,   0.],
       [  0.,  10.],
       [ 10.,  20.],
       [ 20.,  30.]])
[15]:
hist.numpy_bins       # Bin edges with the same semantics as the numpy.histogram
[15]:
array([-30., -20., -10.,   0.,  10.,  20.,  30.])
[16]:
hist.bin_left_edges
[16]:
array([-30., -20., -10.,   0.,  10.,  20.])
[17]:
hist.bin_right_edges
[17]:
array([-20., -10.,   0.,  10.,  20.,  30.])
[18]:
hist.bin_centers      # Centers of the bins - useful for interpretation of histograms as scatter data
[18]:
array([-25., -15.,  -5.,   5.,  15.,  25.])
[19]:
hist.bin_widths       # Widths of the bins - useful for calculating densities and also for bar plots
[19]:
array([10., 10., 10., 10., 10., 10.])

Just as a simple overview of binning schemas, that are provided by physt, we show the bins as produced by different schemas:

[20]:
list(binnings.binning_methods.keys())        # Show them all
[20]:
['numpy',
 'exponential',
 'quantile',
 'fixed_width',
 'integer',
 'human',
 'blocks',
 'knuth',
 'scott',
 'freedman']

These names can be used as the second parameter of the h1 function:

[21]:
# Fixed-width
h1(data, "fixed_width", bin_width=6).numpy_bins
[21]:
array([-30., -24., -18., -12.,  -6.,   0.,   6.,  12.,  18.,  24.,  30.])
[22]:
# Numpy-like
print("Expected:", np.histogram(data, 5)[1])

print("We got:", h1(data, "numpy", bin_count=5).numpy_bins)
Expected: [-26.89092563 -16.07128189  -5.25163815   5.56800559  16.38764933
  27.20729307]
We got: [-26.89092563 -16.07128189  -5.25163815   5.56800559  16.38764933
  27.20729307]
[23]:
# Integer - centered around integers; useful for integer data
h1(data, "integer").numpy_bins
[23]:
array([-27.5, -26.5, -25.5, -24.5, -23.5, -22.5, -21.5, -20.5, -19.5,
       -18.5, -17.5, -16.5, -15.5, -14.5, -13.5, -12.5, -11.5, -10.5,
        -9.5,  -8.5,  -7.5,  -6.5,  -5.5,  -4.5,  -3.5,  -2.5,  -1.5,
        -0.5,   0.5,   1.5,   2.5,   3.5,   4.5,   5.5,   6.5,   7.5,
         8.5,   9.5,  10.5,  11.5,  12.5,  13.5,  14.5,  15.5,  16.5,
        17.5,  18.5,  19.5,  20.5,  21.5,  22.5,  23.5,  24.5,  25.5,
        26.5,  27.5])
[24]:
# Exponential - positive numbers required
h1(np.abs(data), "exponential").numpy_bins      # We 'abs' the values
[24]:
array([1.03182494e-03, 2.03397046e-03, 4.00943579e-03, 7.90354415e-03,
       1.55797507e-02, 3.07113654e-02, 6.05393490e-02, 1.19337344e-01,
       2.35242068e-01, 4.63717632e-01, 9.14096888e-01, 1.80190069e+00,
       3.55197150e+00, 7.00177407e+00, 1.38021491e+01, 2.72072931e+01])
[25]:
# Quantile - each bin should have a similar statistical importance
h1(data, "quantile", bin_count=5).numpy_bins

[25]:
array([-26.89092563,  -5.87687499,  -1.69550961,   1.81670859,
         5.79232538,  27.20729307])
[26]:
# Human - as friendly to your plots as possible, you may set an approximate number of bins
h1(data, "human").numpy_bins
[26]:
array([-30., -25., -20., -15., -10.,  -5.,   0.,   5.,  10.,  15.,  20.,
        25.,  30.])
Bin contents

The bin contents (frequencies) and associated errors (errors) are stored as numpy arrays with a shape corresponding to number of bins (in all dimensions). Again, you cannot manipulate these properties diractly (unless you break the dont-touch-the-underscore convention).

[27]:
hist = h1(data, "human")

hist.frequencies
[27]:
array([   2,   23,  140,  604, 1580, 2601, 2691, 1640,  573,  124,   17,
          5], dtype=int64)

Errors are calculated as \(\sqrt(N)\) which is the simplest expectation for independent values. If you don’t accept this, you can set your errors through _errors2 field which contains squared errors.

Note: Filling with weights, arithmetic operations and scaling preserve correct error values under similar conditions.

[28]:
hist.errors
[28]:
array([ 1.41421356,  4.79583152, 11.83215957, 24.57641145, 39.74921383,
       51.        , 51.8748494 , 40.49691346, 23.93741841, 11.13552873,
        4.12310563,  2.23606798])
[29]:
# Doubling the histogram doubles the error
(hist * 2).errors
[29]:
array([  2.82842712,   9.59166305,  23.66431913,  49.15282291,
        79.49842766, 102.        , 103.74969879,  80.99382693,
        47.87483681,  22.27105745,   8.24621125,   4.47213595])
Data types

Internally, histogram bins can contain values in several types (dtype in numpy terminology). By default, this is either np.int64 (for histograms without weights) or np.float64 (for histograms with weight). Wherever possible, this distinction is preserved. If you try filling in with weights, if you multiply by a float constant, if you divide, … - basically whenever this is reasonable - an integer histogram is automatically converted to a float one.

[30]:
hist = h1(data)
print("Default type:", hist.dtype)

hist = h1(data, weights=np.abs(data))    # Add some random weights
print("Default type with weights:", hist.dtype)

hist = h1(data)
hist.fill(1.0, weight=.44)
print("Default type after filling with weight:", hist.dtype)

hist = h1(data)
hist *= 2
print("Default type after multiplying by an int:", hist.dtype)

hist *= 5.6
print("Default type after multiplying by a float:", hist.dtype)
Default type: int64
Default type with weights: float64
Default type after filling with weight: float64
Default type after multiplying by an int: int64
Default type after multiplying by a float: float64
[31]:
# You can specify the type in the method call
hist = h1(data, dtype="int32")
hist.dtype
[31]:
dtype('int32')
[32]:
# You can set the type of the histogram using the attribute
hist = h1(data)
hist.dtype = np.int32
hist.dtype
[32]:
dtype('int32')
[33]:
# Adding two histograms uses the broader range
hist1 = h1(data, dtype="int64")
hist2 = h1(data, dtype="float32")
(hist1 + hist2).dtype    # See the result!
[33]:
dtype('float64')
Manually create histogram instances

As mentioned, h1 and h2 are just facade functions. You can construct the objects directly using the constructors. The first argument accepts something that can be interpreted as binning or list of bins, second argument is an array of frequencies (numpy array or something convertible).

[34]:
from physt.histogram1d import Histogram1D
from physt.histogram_nd import Histogram2D
hist1 = Histogram1D([0.0, 0.2, 0.4, 0.6, 0.8, 1.0], [1, 2, 3, 4, 5])
hist2 = Histogram2D([[0, 0.5, 1], [0, 1, 2, 3]], [[0.2, 2.2, 7.3], [6, 5, 3]], axis_names=["x", "y"])

fig, axes = plt.subplots(1, 2, figsize=(10, 4))
hist1.plot(ax = axes[0])
hist2.plot(ax = axes[1])
hist1, hist2
[34]:
(Histogram1D(bins=(5,), total=15, dtype=int64),
 Histogram2D(bins=(2, 3), total=23.7, dtype=float64))
_images/tutorial_49_1.png
[35]:
# Create a physt "logo", also available as physt.examples.fist
_, ax = plt.subplots(figsize=(4, 4))
widths = np.cumsum([0, 1.2, 0.2, 1, 0.1, 1, 0.1, 0.9, 0.1, 0.8])
fingers = np.asarray([4, 1, 7.5, 6, 7.6, 6, 7.5, 6, 7.2]) + 5
hist1 = Histogram1D(widths, fingers)
hist1.plot(lw=0, ax=ax)
ax.set_xticks([])
ax.set_yticks([])
ax.set_xlabel("physt")
hist1
[35]:
Histogram1D(bins=(9,), total=97.8, dtype=float64)
_images/tutorial_50_1.png

Indexing

Supported indexing is more or less compatible with numpy arrays.

[36]:
hist.find_bin(3)             # Find a proper bin for some value (0 - based indices)
[36]:
5
[37]:
hist[3]                      # Return the bin (with frequency)
[37]:
(array([-10.66146002,  -5.25163815]), 1600)
[38]:
hist[-3:]                    # Sub-histogram (as slice)
[38]:
Histogram1D(bins=(3,), total=559, dtype=int32)
[39]:
hist[hist.frequencies > 5]   # Masked array (destroys underflow & overflow information)
[39]:
Histogram1D(bins=(10,), total=10000, dtype=int32)
[40]:
hist[[1, 3, 5]]              # Select some of the bins
[40]:
Histogram1D(bins=(3,), total=4550, dtype=int32)

Arithmetics

With histograms, you can do basic arithmetic operations, preserving bins and usually having intuitive meaning.

[41]:
hist + hist
[41]:
Histogram1D(bins=(10,), total=20000, dtype=int32)
[42]:
hist - hist
c:\users\janpi\documents\code\my\physt\physt\histogram_base.py:852: UserWarning:

Subtracting histograms is considered to be a bad idea.

[42]:
Histogram1D(bins=(10,), total=0, dtype=int32)
[43]:
hist * 0.45
[43]:
Histogram1D(bins=(10,), total=4500.000000000001, dtype=float64)
[44]:
hist / 0.45
[44]:
Histogram1D(bins=(10,), total=22222.222222222226, dtype=float64)

Some of the operations are prohibited:

[45]:
try:
    hist * hist  # Does not make sense
except Exception as ex:
    print(repr(ex))
TypeError('Multiplication of two histograms is not supported.')
[46]:
try:
    hist + 4  # Does not make sense
except Exception as ex:
    print(repr(ex))
TypeError("Only histograms can be added together. <class 'int'> found instead.")
[47]:
try:
    (-0.2) * hist
except Exception as ex:
    print(repr(ex))
ValueError('Cannot have negative frequencies.')

Some of the above checks are dropped if you allow “free arithmetics”. This you can do by: 1. Setting the PHYST_FREE_ARITHMETICS environment variable to 1 (note: not any other “truthy” value) 2. By setting config.free_arithmetics to True 3. By using the context manager config.enable_free_arithmetics():

[48]:
from physt.config import config
with config.enable_free_arithmetics():
    neg_hist = (-0.2) * hist
ax = neg_hist.plot()
ax.set_ylim((-800, 0))  # TODO: Rendering bug requires this
neg_hist
c:\users\janpi\documents\code\my\physt\physt\histogram_base.py:338: UserWarning:

Negative frequencies in the histogram.

[48]:
Histogram1D(bins=(10,), total=-1999.9999999999998, dtype=float64)
_images/tutorial_68_2.png

With this relaxation, you can also use any numpy array as (right) operand for any of the operations:

[49]:
# Add some noise
with config.enable_free_arithmetics():
    hist_plus_array = hist + np.random.normal(800, 200, hist.shape)
hist_plus_array.plot()
hist_plus_array
[49]:
Histogram1D(bins=(10,), total=18628.188812174514, dtype=float64)
_images/tutorial_70_1.png

If you need to side-step any rules completely, just use the histogram in a numpy array:

[50]:
np.asarray(hist) * np.asarray(hist)
# Excercise: Reconstruct a histogram with original bins
[50]:
array([    100,   10000,  285156, 2560000, 7856809, 8122500, 2383936,
        221841,    5625,     169])

Statistics

When creating histograms, it is possible to keep simple statistics about the sampled distribution, like mean() and std(). The behaviour was inspired by similar features in ROOT.

To be yet refined.

[51]:
hist.mean()
[51]:
-0.0
[52]:
hist.std()
[52]:
0.0

Plotting

This is currently based on matplotlib, but other tools might come later (d3.js, bokeh?)

[53]:
hist.plot();     # Basic plot
_images/tutorial_77_0.png
[54]:
hist.plot(density=True, errors=True, ecolor="red");   # Include errors
_images/tutorial_78_0.png
[55]:
hist.plot(show_stats=True, errors=True, alpha=0.3);    # Show summary statistics (not fully supported yet)
_images/tutorial_79_0.png
[56]:
hist.plot(cumulative=True, color="yellow", lw=3, edgecolor="red");           # Use matplotlib parameters
_images/tutorial_80_0.png
[57]:
hist.plot(kind="scatter", s=hist.frequencies, cmap="rainbow", density=True);    # Another plot type
_images/tutorial_81_0.png
[58]:
hist.plot(kind="step", lw=4)
[58]:
<AxesSubplot:xlabel='axis0'>
_images/tutorial_82_1.png
[59]:
# Plot different bins using different styles
axis = hist[hist.frequencies > 5].plot(label="High", alpha=0.5)
hist[1:-1][hist[1:-1].frequencies <= 5].plot(ax=axis, color="green", label="Low", alpha=0.5)
hist[[0, -1]].plot(ax=axis, color="red", label="Edge cases", alpha=0.5)
hist.plot(kind="scatter", ax=axis, s=hist.frequencies / 10, label="Scatter")
# axis.legend();     # Does not work - why?
[59]:
<AxesSubplot:xlabel='axis0'>
_images/tutorial_83_1.png
[60]:
# Bar plot with colormap (with logarithmic scale)
ax = hist.plot(cmap="Reds_r", yscale="log", show_values=True);
_images/tutorial_84_0.png

Irregular binning and densities

[61]:
figure, axes = plt.subplots(1, 3, figsize=(11, 3))

hist_irregular = histogram(heights, [160, 162, 166, 167, 175, 188, 191])
hist_irregular.plot(ax=axes[0], errors=True, cmap="rainbow");
hist_irregular.plot(ax=axes[1], density=True, errors=True, cmap="rainbow");
hist_irregular.plot(ax=axes[2], density=True, cumulative=True, cmap="rainbow");

axes[0].set_title("Absolute values")
axes[1].set_title("Densities")
axes[2].set_title("Cumulative");
_images/tutorial_86_0.png

Adding new values

Add (fill) single values
[62]:
figure, axes = plt.subplots(1, 4, figsize=(12, 3))

hist3 = histogram([], 20, range=(160, 200))

for i, ax in enumerate(axes):
    for height in np.random.normal(165 + 10 * i, 2.8, 10000):
        hist3.fill(height)
    hist3.plot(ax=ax);
    print("After {0} batches: {1}".format(i, hist3))
figure.tight_layout()
After 0 batches: Histogram1D(bins=(20,), total=9648, dtype=int64)
After 1 batches: Histogram1D(bins=(20,), total=19648, dtype=int64)
After 2 batches: Histogram1D(bins=(20,), total=29648, dtype=int64)
After 3 batches: Histogram1D(bins=(20,), total=39251, dtype=int64)
_images/tutorial_89_1.png
Add histograms with same binning
[63]:
heights1 = histogram(np.random.normal(169, 10, 100000), 50, range=(150, 200))
heights2 = histogram(np.random.normal(180, 11, 100000), 50, range=(150, 200))


total = heights1 + heights2

axis = heights1.plot(label="Women", color="red", alpha=0.5)
heights2.plot(label="Men", color="blue", alpha=0.5, ax=axis)
total.plot(label="All", color="gray", alpha=0.5, ax=axis)
axis.legend();
_images/tutorial_91_0.png

Compatibility

Note: Mostly, the compatibility is a trivial consequence of the object being convertible to numpy array

[64]:
# Convert to pandas dataframe
hist.to_dataframe()
[64]:
left right frequency error
0 -26.890926 -21.481104 10 3.162278
1 -21.481104 -16.071282 100 10.000000
2 -16.071282 -10.661460 534 23.108440
3 -10.661460 -5.251638 1600 40.000000
4 -5.251638 0.158184 2803 52.943366
5 0.158184 5.568006 2850 53.385391
6 5.568006 10.977827 1544 39.293765
7 10.977827 16.387649 471 21.702534
8 16.387649 21.797471 75 8.660254
9 21.797471 27.207293 13 3.605551
[65]:
# Works on xarray
import xarray as xr
arr = xr.DataArray(np.random.rand(10, 50, 100))
histogram(arr).plot(cmap="Reds_r", cmap_min=4744, cmap_max=5100, lw=1, edgecolor="red", show_values=True);
_images/tutorial_94_0.png
[66]:
# Works on pandas dataseries
import pandas as pd
series = pd.Series(heights, name="height [cm]")
hist = histogram(series, title="Height distribution")
hist.plot()
hist
[66]:
Histogram1D(bins=(10,), total=100, dtype=int64)
_images/tutorial_95_1.png

Export & import

[67]:
json = hist.to_json()     # add path argument to write it to file
json
[67]:
'{"histogram_type": "Histogram1D", "binnings": [{"adaptive": false, "binning_type": "NumpyBinning", "numpy_bins": [144.46274207992508, 148.91498677707023, 153.36723147421537, 157.81947617136055, 162.2717208685057, 166.72396556565084, 171.176210262796, 175.62845495994114, 180.0806996570863, 184.53294435423146, 188.9851890513766]}], "frequencies": [2, 4, 4, 15, 11, 12, 19, 17, 7, 9], "dtype": "int64", "errors2": [2, 4, 4, 15, 11, 12, 19, 17, 7, 9], "meta_data": {"name": null, "title": "Height distribution", "axis_names": ["height [cm]"]}, "missed": [0, 0, 0], "missed_keep": true, "physt_version": "0.4.12.2", "physt_compatible": "0.3.20"}'
[68]:
from physt.io import parse_json
hist = parse_json(json)
hist.plot()
hist
[68]:
Histogram1D(bins=(10,), total=100, dtype=int64)
_images/tutorial_98_1.png

Plotting physt histograms

Some matplotlib-based plotting examples, with no exhaustive documentation.

[1]:
# Necessary import evil
import physt
from physt import h1, h2, histogramdd
from physt.plotting import matplotlib
import numpy as np
import matplotlib.pyplot as plt
# %matplotlib inline

np.random.seed(42)

from physt import plotting
[2]:
# Some data
x = np.random.normal(100, 1, 10000)
y = np.random.normal(10, 10, 10000)
[3]:
ax = h2(x, y, 15).plot(figsize=(6, 6), show_zero=False, alpha=0, text_color="black", show_values=True, cmap="BuGn_r", show_colorbar=False, transform=lambda x:1)
h2(x, y, 50).plot.image(cmap="Spectral_r", alpha=0.75, ax=ax)
[3]:
<AxesSubplot:xlabel='axis0', ylabel='axis1'>
_images/plotting_3_1.png
[ ]:

[4]:
h2(x, y, 40, name="Gauss").plot("image", cmap="rainbow", figsize=(5, 5))
[4]:
<AxesSubplot:title={'center':'Gauss'}, xlabel='axis0', ylabel='axis1'>
_images/plotting_5_1.png
[5]:
plotting.matplotlib.bar3d(h2(x, y, 10, name="Gauss"), figsize=(5, 5), cmap="Accent");
_images/plotting_6_0.png
[6]:
h1(x, "human", bin_count=10, name="Gauss").plot(ylim=(100, 1020), cmap="Greys", ticks="edge", errors=True);
_images/plotting_7_0.png
[7]:
h1(x, "human", bin_count=200, name="Gauss").plot.line(errors=True, yscale="log")
[7]:
<AxesSubplot:xlabel='axis0'>
_images/plotting_8_1.png
[8]:
h1(x, "human", bin_count=200, name="Gauss").plot.fill(lw=1, alpha=.4, figsize=(8, 4))
h1(x, "human", bin_count=200, name="Gauss").plot.fill(lw=1, alpha=.4, yscale="log", figsize=(8, 4), color="red")
[8]:
<AxesSubplot:xlabel='axis0'>
_images/plotting_9_1.png
_images/plotting_9_2.png
[9]:
h1(x, "human", bin_count=200, name="Gauss").plot.scatter(errors=True, xlim=(90, 100), show_stats="all")
[9]:
<AxesSubplot:xlabel='axis0'>
_images/plotting_10_1.png
[10]:
ha = h1(x, "human", bin_count=20, name="Left")
hb = h1(x + 1 * np.sin(x / 12), "human", bin_count=40, name="Right")

from physt.plotting.matplotlib import pair_bars
[11]:
pair_bars(ha, hb, density=True, errors=True, figsize=(5, 5));
c:\users\janpi\documents\code\my\physt\physt\histogram_base.py:355: UserWarning:

Negative frequencies in the histogram.

c:\users\janpi\documents\code\my\physt\physt\plotting\matplotlib.py:789: UserWarning:

FixedFormatter should only be used together with FixedLocator

_images/plotting_12_1.png

Example - histogram of time values

[12]:
# Get values close to
from physt.plotting.common import TimeTickHandler

data = np.random.normal(3600, 900, 4800)
H = h1(data, "human", axis_name="time")
H.plot(tick_handler=TimeTickHandler())
[12]:
<AxesSubplot:xlabel='time'>
_images/plotting_14_1.png

2D Histograms in physt

[1]:
# Necessary import evil
import physt
from physt import h1, h2, histogramdd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

np.random.seed(42)
[2]:
# Some data
x = np.random.normal(100, 1, 1000)
y = np.random.normal(10, 10, 1000)
[3]:
# Create a simple histogram
histogram = h2(x, y, [8, 4], name="Some histogram", axis_names=["x", "y"])
histogram
[3]:
Histogram2D('Some histogram', bins=(8, 4), total=1000, dtype=int64)
[4]:
# Frequencies are a 2D-array
histogram.frequencies
[4]:
array([[  0,   2,   4,   0],
       [  3,  26,  20,   5],
       [ 17,  78, 104,  10],
       [ 26, 163, 147,  17],
       [ 17, 136,  96,  17],
       [  6,  41,  38,   6],
       [  1,  11,   7,   0],
       [  0,   1,   0,   1]], dtype=int64)

Multidimensional binning

In most cases, binning methods that apply for 1D histograms, can be used also in higher dimensions. In such cases, each parameter can be either scalar (applies to all dimensions) or a list/tuple with independent values for each dimension. This also applies for range that has to be list/tuple of tuples.

[6]:
histogram = h2(x, y, "fixed_width", bin_width=[2, 10], name="Fixed-width bins", axis_names=["x", "y"])
histogram.plot();
histogram.numpy_bins
[6]:
[array([ 96.,  98., 100., 102., 104.]),
 array([-20., -10.,   0.,  10.,  20.,  30.,  40.,  50.])]
_images/2d_histograms_6_1.png
[7]:
histogram = h2(x, y, "quantile", bin_count=[3, 4], name="Quantile bins", axis_names=["x", "y"])
histogram.plot(cmap_min=0);
histogram.numpy_bins
[7]:
[array([ 96.75873266,  99.54993453, 100.40825276, 103.85273149]),
 array([-19.40388635,   3.93758311,  10.63077132,  17.28882177,
         41.93107568])]
_images/2d_histograms_7_1.png
[8]:
histogram = h2(x, y, "human", bin_count=5, name="Human-friendly bins", axis_names=["x", "y"])
histogram.plot();
histogram.numpy_bins
[8]:
[array([ 96.,  98., 100., 102., 104.]),
 array([-20., -10.,   0.,  10.,  20.,  30.,  40.,  50.])]
_images/2d_histograms_8_1.png

Plotting

2D
[ ]:
# Default is workable
ax = histogram.plot()
[9]:
# Custom colormap, no colorbar
import matplotlib.cm as cm
fig, ax = plt.subplots()
ax = histogram.plot(ax=ax, cmap=cm.copper, show_colorbar=False, grid_color=cm.copper(0.5))
ax.set_title("Custom colormap");
_images/2d_histograms_11_0.png
[10]:
# Use a named colormap + limit it to a range of values
import matplotlib.cm as cm
fig, ax = plt.subplots()
ax = histogram.plot(ax=ax, cmap="Oranges", show_colorbar=True, cmap_min=20, cmap_max=100, show_values=True)
ax.set_title("Clipped colormap");
_images/2d_histograms_12_0.png
[11]:
# Show labels (and hide zero bins), no grid(lw=0)
ax = histogram.plot(show_values=True, show_zero=False, cmap=cm.RdBu, format_value=float, lw=0)
_images/2d_histograms_13_0.png
Large histograms as images

Plotting histograms in this way gets problematic with more than roughly 50x50 bins. There is an alternative, though, partially inspired by the datashader project - plot the histogram as bitmap, which works very fast even for very large histograms.

Note: This method does not work for histograms with irregular bins.

[12]:
x = np.random.normal(100, 1, 1000000)
y = np.random.normal(10, 10, 1000000)
[13]:
fig, axes = plt.subplots(1, 3, figsize=(12, 4))
h2(x, y, 20, name="20 bins - map").plot("map", cmap="rainbow", lw=0, alpha=1, ax=axes[0], show_colorbar=False)
h2(x, y, 20, name="20 bins - image").plot("image", cmap="rainbow", alpha=1, ax=axes[1])
h2(x, y, 500, name="500 bins - image").plot("image", cmap="rainbow", alpha=1, ax=axes[2]);
_images/2d_histograms_16_0.png

See that the output is equivalent to map without lines.

Transformation

Sometimes, the value range is too big to show details. Therefore, it may be of some use to transform the values by a function, e.g. logarithm.

[14]:
fig, axes = plt.subplots(1, 3, figsize=(12, 4))
h2(x, y, 20, name="20 bins - map").plot("map", alpha=1, lw=0, show_zero=False, cmap="rainbow", ax=axes[0], show_colorbar=False, cmap_normalize="log")
h2(x, y, 20, name="20 bins - image").plot("image", alpha=1, ax=axes[1], cmap="rainbow", cmap_normalize="log")
h2(x, y, 500, name="500 bins - image").plot("image", alpha=1, ax=axes[2], cmap="rainbow", cmap_normalize="log");
_images/2d_histograms_19_0.png
[15]:
# Composition - show histogram overlayed with "points"
fig, ax = plt.subplots(figsize=(8, 7))
h_2 = h2(x, y, 30)
h_2.plot("map", lw=0, alpha=0.9, cmap="Blues", ax=ax, cmap_normalize="log", show_zero=False)
# h2(x, y, 300).plot("image", alpha=1, cmap="Greys", ax=ax, transform=lambda x: x > 0);
# Not working currently
[15]:
<AxesSubplot:xlabel='axis0', ylabel='axis1'>
_images/2d_histograms_20_1.png

3D

By this, we mean 3D bar plots of 2D histograms (not a visual representation of 3D histograms).

[16]:
histogram.plot("bar3d", cmap="rainbow");
_images/2d_histograms_22_0.png
[17]:
histogram.plot("bar3d", color="red");
_images/2d_histograms_23_0.png

Projections

[18]:
proj1 = histogram.projection("x", name="Projection to X")
proj1.plot(errors=True)
proj1
[18]:
Histogram1D('Projection to X', bins=(4,), total=1000, dtype=int64)
_images/2d_histograms_25_1.png
[19]:
proj2 = histogram.projection("y", name="Projection to Y")
proj2.plot(errors=True)
proj2
[19]:
Histogram1D('Projection to Y', bins=(7,), total=1000, dtype=int64)
_images/2d_histograms_26_1.png

Adaptive 2D histograms

[20]:
# Create and add two histograms with adaptive binning
height1 = np.random.normal(180, 5, 1000)
weight1 = np.random.normal(80, 2, 1000)
ad1 = h2(height1, weight1, "fixed_width", bin_width=1, adaptive=True)
ad1.plot(show_zero=False)

height2 = np.random.normal(160, 5, 1000)
weight2 = np.random.normal(70, 2, 1000)
ad2 = h2(height2, weight2, "fixed_width", bin_width=1, adaptive=True)
ad2.plot(show_zero=False)

(ad1 + ad2).plot(show_zero=False);
_images/2d_histograms_28_0.png
_images/2d_histograms_28_1.png
_images/2d_histograms_28_2.png

N-dimensional histograms

Although is not easy to visualize them, it is possible to create histograms of any dimensions that behave similar to 2D ones. Warning: be aware that the memory consumption can be significant.

[21]:
# Create a 4D histogram
data = [np.random.rand(1000)[:, np.newaxis] for i in range(4)]
data = np.concatenate(data, axis=1)
h4 = histogramdd(data, [3, 2, 2, 3], axis_names="abcd")
h4
[21]:
HistogramND(bins=(3, 2, 2, 3), total=1000, dtype=int64)
[22]:
h4.frequencies
[22]:
array([[[[31, 28, 33],
         [21, 22, 22]],

        [[25, 29, 28],
         [29, 35, 28]]],


       [[[20, 25, 20],
         [28, 32, 31]],

        [[30, 28, 24],
         [29, 21, 27]]],


       [[[27, 26, 33],
         [21, 35, 30]],

        [[38, 30, 32],
         [25, 30, 27]]]], dtype=int64)
[23]:
h4.projection("a", "d", name="4D -> 2D").plot(show_values=True, format_value=int, cmap_min="min");
_images/2d_histograms_32_0.png
[24]:
h4.projection("d", name="4D -> 1D").plot("scatter", errors=True);
_images/2d_histograms_33_0.png

Support for pandas DataFrames (without pandas dependency ;-))

[25]:
# Load notorious example data set
iris = sns.load_dataset('iris')
[28]:
iris = sns.load_dataset('iris')
iris_hist = physt.h2(iris["sepal_length"], iris["sepal_width"], "human", bin_count=[12, 7], name="Iris")
iris_hist.plot(show_zero=False, cmap=cm.gray_r, show_values=True, format_value=int);
_images/2d_histograms_36_0.png
[29]:
iris_hist.projection("sepal_length").plot();
_images/2d_histograms_37_0.png

Binning in physt

[1]:
# Necessary import evil
%matplotlib inline
from physt import histogram, binnings
import numpy as np
import matplotlib.pyplot as plt
[2]:
# Some data
np.random.seed(42)

heights1 = np.random.normal(169, 10, 100000)
heights2 = np.random.normal(180, 6, 100000)
numbers = np.random.rand(100000)

Ideal number of bins

[3]:
X = [int(x) for x in np.logspace(0, 4, 50)]

algos = binnings.bincount_methods
Ys = { algo: [] for algo in algos}

for x in X:
    ex_dataset = np.random.exponential(1, x)
    for algo in algos:
        Ys[algo].append(binnings.ideal_bin_count(ex_dataset, algo))

figure, axis = plt.subplots(figsize=(8, 8))
for algo in algos:
    if algo == "default":
        axis.plot(X, Ys[algo], ":.", label=algo, alpha=0.5, lw=2)
    else:
        axis.plot(X, Ys[algo], "-", label=algo, alpha=0.5, lw=2)
axis.set_xscale("log")
axis.set_yscale("log")
axis.set_xlabel("Sample size")
axis.set_ylabel("Bin count")
axis.legend(loc=2);
_images/binning_4_0.png

Binning schemes

Exponential binning

Uses numpy.logscale to create bins.

[4]:
figure, axis = plt.subplots(1, 2, figsize=(10, 4))
hist1 = histogram(numbers, "exponential", bin_count=10, range=(0.0001, 1))
hist1.plot(color="green", ax=axis[0])
hist1.plot(density=True, errors=True, ax=axis[1])
axis[0].set_title("Absolute scale")
axis[1].set_title("Log scale")
axis[1].set_xscale("log");
_images/binning_7_0.png
Integer binning

Useful for integer values (or something you want to round to integers), creates bins of width=1 around integers (i.e. 0.5-1.5, …)

[5]:
# Sum of two dice (should be triangle, right?)
dice = np.floor(np.random.rand(10000) * 6) + np.floor(np.random.rand(10000) * 6) + 2
histogram(dice, "integer").plot(ticks="center", density=True);
_images/binning_9_0.png
Quantile-based binning

Based on quantiles, this binning results in all bins containing roughly the same amount of observances.

[6]:
figure, axis = plt.subplots(1, 2, figsize=(10, 4))
# bins2 = binning.quantile_bins(heights1, 40)
hist2 = histogram(heights1, "quantile", bin_count=40)
hist2.plot(ax=axis[0]);
hist2.plot(density=True, ax=axis[1]);
axis[0].set_title("Frequencies")
axis[1].set_title("Density");
hist2
[6]:
Histogram1D(bins=(40,), total=100000, dtype=int64)
_images/binning_11_1.png
[7]:
figure, axis = plt.subplots()

histogram(heights1, "quantile", bin_count=10).plot(alpha=0.3, density=True, ax=axis, label="Quantile based")
histogram(heights1, 10).plot(alpha=0.3, density=True, ax=axis, color="green", label="Equal spaced")
axis.legend(loc=2);
_images/binning_12_0.png
Fixed-width bins

This binning is useful if you want “human-friendly” bin intervals.

[8]:
hist_fixed = histogram(heights1, "fixed_width", bin_width=3)
hist_fixed.plot()
hist_fixed
[8]:
Histogram1D(bins=(31,), total=100000, dtype=int64)
_images/binning_14_1.png
“Human” bins

The width and alignment of bins is guessed from the data with an approximate number of bins as (optional) parameter.

[9]:
human = histogram(heights1, "human", bin_count=15)
human.plot()
human
[9]:
Histogram1D(bins=(19,), total=100000, dtype=int64)
_images/binning_16_1.png
Astropy binning

Astropy includes its histogramming tools. If this package is available, we reuse its binning methods. These include:

  • Bayesian blocks
  • Knuth
  • Freedman
  • Scott

See http://docs.astropy.org/en/stable/visualization/histogram.html for more details.

[10]:
middle_sized = np.random.normal(180, 6, 5000)

for n in ["blocks", "scott", "knuth", "freedman"]:
    algo = "{0}".format(n)
    hist = histogram(middle_sized, algo, name=algo)
    hist.plot(density=True)
_images/binning_18_0.png
_images/binning_18_1.png
_images/binning_18_2.png
_images/binning_18_3.png

Adaptive histogram

This type of histogram automatically adapts bins when new values are added. Note that only fixed-width continuous binning scheme is currently supported.

[1]:
# Necessary import evil
import physt
from physt import h1, h2, histogramdd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
[2]:
# Create an empty histogram
h = h1(None, "fixed_width", bin_width=10, name="People height", axis_name="cm", adaptive=True)
h
[2]:
Histogram1D('People height', bins=(0,), total=0, dtype=int64)

Adding single values

[3]:
# Add a first value
h.fill(157)
h.plot()
h
[3]:
Histogram1D('People height', bins=(1,), total=1, dtype=int64)
_images/adaptive_histogram_4_1.png
[4]:
# Add a second value
h.fill(173)
h.plot();
_images/adaptive_histogram_5_0.png
[5]:
# Add a few more values, including weights
h.fill(173, 2)
h.fill(186, 5)
h.fill(188, 3)
h.fill(193, 1)
h.plot(errors=True, show_stats=True);
_images/adaptive_histogram_6_0.png

Adding multiple values at once

[6]:
ha = h1(None, "fixed_width", bin_width=10, adaptive=True)
ha.plot(show_stats=True);
_images/adaptive_histogram_8_0.png
[7]:
# Beginning
ha.fill_n([10, 11, 34])
ha.plot();
_images/adaptive_histogram_9_0.png
[8]:
# Add a distant value
ha.fill_n([234], weights=[10])
ha.plot(show_stats=True);
_images/adaptive_histogram_10_0.png
[9]:
# Let's create a huge dataset
values = np.random.normal(130, 20, 100000)
[10]:
%%time
# Add lots of values (no loop in Python)
hn = h1(None, "fixed_width", bin_width=10, adaptive=True)
hn.fill_n(values)
# ha.plot()
Wall time: 15.5 ms
[11]:
%%time
# Comparison with Python loop
hp = h1(None, "fixed_width", bin_width=10, adaptive=True)
for value in values:
    hp.fill(value)
Wall time: 5.2 s
[12]:
# Hopefully equal results
print("Equal?", hp == hn)
hp.plot(show_stats=True);
Equal? True
_images/adaptive_histogram_14_1.png

Adding two adaptive histograms together

[13]:
ha1 = h1(None, "fixed_width", bin_width=5, adaptive=True)
ha1.fill_n(np.random.normal(100, 10, 1000))

ha2 = h1(None, "fixed_width", bin_width=5, adaptive=True)
ha2.fill_n(np.random.normal(70, 10, 500))

ha = ha1 + ha2

fig, ax= plt.subplots()

ha1.plot(alpha=0.1, ax=ax, label="1", color="red")
ha2.plot(alpha=0.1, ax=ax, label="2")

ha.plot("scatter", label="sum", ax=ax, errors=True)

ax.legend(loc=2);   # TODO? Why don't we show the sum???
_images/adaptive_histogram_16_0.png
[ ]:

Interrupted workflow

This example shows that using IO, you can easily interrupt your workflow, save it and continue some other time.

[1]:
import numpy as np
import physt

%matplotlib inline
[2]:
histogram = physt.h1(None, "fixed_width", bin_width=0.1, adaptive=True)
histogram
[2]:
Histogram1D(bins=(0,), total=0, dtype=int64)
[3]:
# Big chunk of data
data1 = np.random.normal(0, 1, 10000000)
histogram.fill_n(data1)
histogram
[3]:
Histogram1D(bins=(106,), total=10000000, dtype=int64)
[4]:
histogram.plot()
[4]:
<AxesSubplot:xlabel='axis0'>
_images/interrupted-workflow_4_1.png

Store the histogram (and delete it to pretend we come with a fresh table):

[5]:
histogram.to_json(path="./histogram.json");
del histogram

Turn off the machine, go for lunch, return home later…

Read the histogram:

[6]:
histogram = physt.io.load_json(path="./histogram.json")
histogram
[6]:
Histogram1D(bins=(106,), total=10000000, dtype=int64)
[7]:
histogram.plot()
[7]:
<AxesSubplot:xlabel='axis0'>
_images/interrupted-workflow_9_1.png

The same one ;-)

Continue filling:

[8]:
# Another big chunk of data
data1 = np.random.normal(3, 2, 10000000)
histogram.fill_n(data1)
histogram
[8]:
Histogram1D(bins=(205,), total=20000000, dtype=int64)
[9]:
histogram.plot()
[9]:
<AxesSubplot:xlabel='axis0'>
_images/interrupted-workflow_12_1.png

Merging bins

[1]:
from physt.binnings import *
from physt import h1, h2
import numpy as np

np.random.seed(42)

%matplotlib inline
[2]:
data = np.random.rand(100)
[3]:
hh = h1(data, 120)
hh.plot(errors=True);
_images/merge-bins_3_0.png
[4]:
hh.merge_bins(2, inplace=True)
hh.plot(errors=True);
_images/merge-bins_4_0.png
[5]:
hh.merge_bins(2, inplace=True)
hh.plot(errors=True);
_images/merge-bins_5_0.png
[6]:
hh.merge_bins(2, inplace=True)
hh.plot(errors=True);
_images/merge-bins_6_0.png
[7]:
hh.merge_bins(2, inplace=True)
hh.plot(errors=True);
_images/merge-bins_7_0.png
[8]:
hh.merge_bins(2, inplace=True)
hh.plot(errors=True);
_images/merge-bins_8_0.png
[9]:
hh.merge_bins(2, inplace=True)
hh.plot(errors=True);
_images/merge-bins_9_0.png

By min frequency

[10]:
data = np.random.normal(0, 1, 5000)
hh = h1(data, 120)
hh.plot();
_images/merge-bins_11_0.png
[11]:
hh.merge_bins(min_frequency=100, inplace=True)
hh.plot(density=True);
_images/merge-bins_12_0.png
[12]:
hh.merge_bins(min_frequency=600, inplace=True)
hh.plot(density=True);
_images/merge-bins_13_0.png

The same can be done for 2D histograms (i.e. each column, each row should contain more than the minimum). Unfortunately, a general, irregular-shaped binning is not yet supported.

[13]:
# 2D example
data1 = np.random.normal(0, 1, 600)
data2 = np.random.rand(600)
[14]:
hh = h2(data1, data2, 23)
ax = hh.plot(show_zero=0, cmap="rainbow", show_colorbar=False);
ax.set_title("Before merging")
hh.merge_bins(min_frequency=30, inplace=True)
ax = hh.plot(density=True, show_zero=False, cmap="rainbow", show_colorbar=False)
ax.set_title("After merging");
_images/merge-bins_16_0.png
_images/merge-bins_16_1.png

Support for dask arrays

It is possible to operate on dask arrays and spare the memory (or perhaps even time).

[1]:
# Necessary imports
import dask
import dask.multiprocessing
import physt
import numpy as np

import dask.array as da
from physt import h1, h2
%matplotlib inline
[2]:
# Create two arrays
np.random.seed(42)

SIZE = 2 ** 21
CHUNK = int(SIZE / 16)

million = np.random.rand(SIZE)#.astype(int)
million2 = (3 * million + np.random.normal(0., 0.3, SIZE))#.astype(int)

# Chunk them for dask
chunked = da.from_array(million, chunks=(CHUNK))
chunked2 = da.from_array(million2, chunks=(CHUNK))

Create histograms

h1, h2, … have their alternatives in physt.dask_compat. They should work similarly. Although, they are not complete and unexpected errors may occur.

[3]:
from physt.compat.dask import h1 as d1
from physt.compat.dask import h2 as d2
[4]:
# Use chunks to create a 1D histogram
ha = d1(chunked2, "fixed_width", bin_width=0.2)
check_ha = h1(million2, "fixed_width", bin_width=0.2)
ok = (ha == check_ha)
print("Check: ", ok)
ha.plot()
ha
Check:  True
[4]:
Histogram1D(bins=(28,), total=2097152, dtype=int64)
_images/dask_5_2.png
[5]:
# Use chunks to create a 2D histogram
hb = d2(chunked, chunked2, "fixed_width", bin_width=.2, axis_names=["x", "y"])
check_hb = h2(million, million2, "fixed_width", bin_width=.2, axis_names=["x", "y"])
hb.plot(show_zero=False, cmap="rainbow")
ok = (hb == check_hb)
print("Check: ", ok)
hb
Check:  True
[5]:
Histogram2D(bins=(5, 28), total=2097152, dtype=int64)
_images/dask_6_2.png
[6]:
# And another cross-check
hh = hb.projection("y")
hh.plot()
print("Check: ", np.array_equal(hh.frequencies, ha.frequencies))   # Just frequencies
Check:  True
_images/dask_7_1.png
[8]:
# Use dask for normal arrays (will automatically split array to chunks)
d1(million2, "fixed_width", bin_width=0.2) == ha
[8]:
True

Some timings

Your results may vary substantially. These numbers are just for illustration, on 4-core (8-thread) machine. The real gain comes when we have data that don’t fit into memory.

Efficiency
[9]:
# Standard
%time h1(million2, "fixed_width", bin_width=0.2)
Wall time: 361 ms
[9]:
Histogram1D(bins=(28,), total=2097152, dtype=int64)
[10]:
# Same array, but using dask
%time d1(million2, "fixed_width", bin_width=0.2)
Wall time: 116 ms
[10]:
Histogram1D(bins=(28,), total=2097152, dtype=int64)
[11]:
# Most efficient: dask with already chunked data
%time d1(chunked2, "fixed_width", bin_width=0.2)
Wall time: 91.8 ms
[11]:
Histogram1D(bins=(28,), total=2097152, dtype=int64)
Different scheduling
[12]:
%time d1(chunked2, "fixed_width", bin_width=0.2)
Wall time: 76 ms
[12]:
Histogram1D(bins=(28,), total=2097152, dtype=int64)
[13]:
%%time
# Hyper-threading or not?
graph, name = d1(chunked2, "fixed_width", bin_width=0.2, compute=False)
dask.threaded.get(graph, name, num_workers=4)
Wall time: 114 ms
[13]:
Histogram1D(bins=(28,), total=2097152, dtype=int64)
[14]:
# Multiprocessing not so efficient for small arrays?
%time d1(chunked2, "fixed_width", bin_width=0.2, dask_method=dask.multiprocessing.get)
Wall time: 960 ms
[14]:
Histogram1D(bins=(28,), total=2097152, dtype=int64)

Special histograms in physt

Sometimes, it is necessary to bin values in transformed coordinates (e.g. polar). In principle, it is possible to create histograms from already transformed values (i.e. r and φ). However, this is not always the best way to go as each set of coordinates has its own peculiarities (e.g. the typical range of values for azimuthal angle)

Physt provides a general framework for constructing the transformed histograms (see a dedicated section of this document) and a couple of most frequently used variants:

  • PolarHistogram
  • SphericalHistogram
  • CylindricalHistogram
[1]:
# Necessary import evil
%matplotlib inline

from physt import cylindrical, polar, spherical
import numpy as np
import matplotlib.pyplot as plt
[2]:
# Generate some points in the Cartesian coordinates
np.random.seed(42)

x = np.random.rand(1000)
y = np.random.rand(1000)
z = np.random.rand(1000)

Polar histogram

This histograms maps values to radius (r) and azimuthal angle (φ, ranging from 0 to 2π).

By default (unless you specify the phi_bins parameter), the whole azimuthal range is spanned (even if there are no values that fall in parts of the circle).

[3]:
# Create a polar histogram with default parameters
hist = polar(x, y)
ax = hist.plot.polar_map()
hist
[3]:
PolarHistogram(bins=(10, 16), total=1000, dtype=int64)
_images/special_histograms_4_1.png
[4]:
hist.bins
[4]:
[array([[0.02704268, 0.16306851],
        [0.16306851, 0.29909433],
        [0.29909433, 0.43512015],
        [0.43512015, 0.57114597],
        [0.57114597, 0.7071718 ],
        [0.7071718 , 0.84319762],
        [0.84319762, 0.97922344],
        [0.97922344, 1.11524926],
        [1.11524926, 1.25127509],
        [1.25127509, 1.38730091]]),
 array([[0.        , 0.39269908],
        [0.39269908, 0.78539816],
        [0.78539816, 1.17809725],
        [1.17809725, 1.57079633],
        [1.57079633, 1.96349541],
        [1.96349541, 2.35619449],
        [2.35619449, 2.74889357],
        [2.74889357, 3.14159265],
        [3.14159265, 3.53429174],
        [3.53429174, 3.92699082],
        [3.92699082, 4.3196899 ],
        [4.3196899 , 4.71238898],
        [4.71238898, 5.10508806],
        [5.10508806, 5.49778714],
        [5.49778714, 5.89048623],
        [5.89048623, 6.28318531]])]
[5]:
# Create a polar histogram with different binning
hist2 = polar(x+.3, y+.3, radial_bins="human", phi_bins="human")
ax = hist2.plot.polar_map(density=True)
_images/special_histograms_6_0.png
[6]:
# Default axes names
hist.axis_names
[6]:
('r', 'phi')

When working with any transformed histograms, you can fill values in the original, or transformed coordinates. All methods working with coordinates understand the parameter transformed which (if True) says that the method parameter are already in the transformed coordinated; otherwise, all values are considered to be in the original coordinates and transformed on inserting (creating, searching).

[7]:
# Using transformed / untransformed values
print("Non-transformed", hist.find_bin((0.1, 1)))
print("Transformed", hist.find_bin((0.1, 1), transformed=True))

print("Non-transformed", hist.find_bin((0.1, 2.7)))     # Value
print("Transformed", hist.find_bin((0.1, 2.7), transformed=True))
Non-transformed (7, 3)
Transformed (0, 2)
Non-transformed None
Transformed (0, 6)
[8]:
# Simple plotting, similar to Histogram2D
hist.plot.polar_map(density=True, show_zero=False, cmap="Wistia", lw=0.5, figsize=(5, 5));
_images/special_histograms_10_0.png
Adding new values
[9]:
# Add a single, untransformed value
hist.fill((-.5, -.5), weight=12)
hist.plot.polar_map(density=True, show_zero=True, cmap="Reds", lw=0.5, figsize=(5, 5));
_images/special_histograms_12_0.png
[10]:
# Add a couple of values, transformed
data = [[.5, 3.05], [.5, 3.2], [.7, 3.3]]
weights = [1, 5, 20]

hist.fill_n(data, weights=weights, transformed=True)
hist.plot.polar_map(density=True, show_zero=True, cmap="Reds", lw=0.5, figsize=(5, 5));
_images/special_histograms_13_0.png
Projections

The projections are stored using specialized Histogram1D subclasses that keep (in the case of radial) information about the proper bin sizes.

[11]:
radial = hist.projection("r")
radial.plot(density=True, color="red", alpha=0.5).set_title("Density")
radial.plot(label="absolute", color="blue", alpha=0.5).set_title("Absolute")
radial.plot(label="cumulative", cumulative=True, density=True, color="green", alpha=0.5).set_title("Cumulative")
radial
[11]:
RadialHistogram(bins=(10,), total=1026, dtype=int64)
_images/special_histograms_15_1.png
_images/special_histograms_15_2.png
_images/special_histograms_15_3.png
[12]:
hist.projection("phi").plot(cmap="rainbow")
[12]:
<AxesSubplot:xlabel='phi'>
_images/special_histograms_16_1.png

Cylindrical histogram

To be implemented

[13]:
data = np.random.rand(100, 3)
h = cylindrical(data)
h
[13]:
CylindricalHistogram(bins=(10, 16, 10), total=100, dtype=int64)
[14]:
# %matplotlib qt
proj = h.projection("rho", "phi")
proj.plot.polar_map()
proj
[14]:
PolarHistogram(bins=(10, 16), total=100, dtype=int64)
_images/special_histograms_19_1.png
[15]:
proj = h.projection("phi", "z")
ax = proj.plot.cylinder_map(show_zero=False)
ax.view_init(50, 70)
proj
[15]:
CylindricalSurfaceHistogram(bins=(16, 10), total=100, dtype=int64)
_images/special_histograms_20_1.png

Spherical histogram

To be implemented

[16]:
n = 1000000
data = np.empty((n, 3))
data[:,0] = np.random.normal(0, 1, n)
data[:,1] = np.random.normal(0, 1.3, n)
data[:,2] = np.random.normal(1, 1.2, n)
h = spherical(data)
h
[16]:
SphericalHistogram(bins=(10, 16, 16), total=1000000, dtype=int64)
[17]:
globe = h.projection("theta", "phi")
# globe.plot()
globe.plot.globe_map(density=True, figsize=(7, 7), cmap="rainbow")
globe.plot.globe_map(density=False, figsize=(7, 7))
globe
[17]:
SphericalSurfaceHistogram(bins=(16, 16), total=1000000, dtype=int64)
_images/special_histograms_23_1.png
_images/special_histograms_23_2.png

Implementing custom transformed histogram

TO BE WRITTEN

ASCII plotting

Note: For this notebook to work properly, you need to install the ``asciiplotlib`` and ``xtermcolor`` packages.

[1]:
from physt import examples
from physt import plotting
plotting.set_default_backend("ascii")

import numpy as np
np.random.seed(42)
[2]:
examples.normal_h1().plot()
-3.92e+00 - -3.14e+00  [  10]  ▏
-3.14e+00 - -2.35e+00  [  88]  █▎
-2.35e+00 - -1.57e+00  [ 485]  ██████▉
-1.57e+00 - -7.83e-01  [1605]  ██████████████████████▋
-7.83e-01 - +1.92e-03  [2831]  ███████████████████████████████████████▉
+1.92e-03 - +7.87e-01  [2844]  ████████████████████████████████████████
+7.87e-01 - +1.57e+00  [1543]  █████████████████████▊
+1.57e+00 - +2.36e+00  [ 498]  ███████
+2.36e+00 - +3.14e+00  [  88]  █▎
+3.14e+00 - +3.93e+00  [   8]  ▏
[3]:
plotting.ascii.ENABLE_ASCIIPLOTLIB = False
examples.normal_h1().plot(show_values=True)
 13
# 143
##### 680
################# 2160
########################## 3223
#################### 2482
######## 1059
## 213
 25
 2
[4]:
examples.normal_h2().plot(cmap='Greys_r')
      3.69 →
+----------+
||3.73 ↑
||
||
||
||
||
||
||
||
||-4.47 ↓
+----------+
← -3.66

↓ 0

       843 ↑

Geospatial histogram visualization using folium

Note: You need to have the ``folium`` package installed to run this notebook.

“Bagging” the munros into rectangular bins

A Munro (About this sound listen (help·info)) is a mountain in Scotland with a height over 3,000 feet (914 m). Munros are named after Sir Hugh Munro, 4th Baronet (1856–1919), who produced the first list of such hills, known as Munro’s Tables, in 1891… says Wikipedia, more in https://en.wikipedia.org/wiki/Munro.

Let’s show the possibility to plot histograms in the maps with the help of folium library.

[1]:
# Necessary import evil
import pandas as pd
import numpy as np
import physt
import physt.plotting
physt.plotting.set_default_backend("folium")
[2]:
# Read the data
import pandas as pd
munros = pd.read_csv("../physt/examples/munros.csv")
munros.head()
[2]:
name height long lat
0 Ben Nevis 1344.0 -5.003526 56.796834
1 Ben Macdui [Beinn Macduibh] 1309.0 -3.669100 57.070386
2 Braeriach 1296.0 -3.728581 57.078177
3 Cairn Toul 1291.0 -3.710790 57.054415
4 Sgor an Lochain Uaine 1258.0 -3.725797 57.058378
[3]:
# How many of them are there? Wikipedia says 282 (as of 2017)
munros.shape
[3]:
(282, 4)
How many munros are in each 10’ rectangle?
[4]:
hist = physt.h2(munros["lat"], munros["long"], "fixed_width", bin_width=1 / 6)
[5]:
map = hist.plot()
map
[5]:
Make this Notebook Trusted to load map: File -> Trust Notebook
[6]:
# Now, let's combine this information with positions of the 20 tallest
import folium
map = hist.plot()
for i, row in munros.iloc[:20].iterrows():
    marker = folium.Marker([row["lat"], row["long"]], popup="{0} ({1} m)".format(row["name"], row["height"]))
    marker.add_to(map)
map
[6]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Vega backend examples

Note: for this notebook to work, you need to have ``vega3`` library installed.

pip install vega3
[1]:
from physt.examples import normal_h1, normal_h2, normal_h3, munros
from physt.plotting import set_default_backend

import numpy as np
np.random.seed(42)

set_default_backend("vega")
[2]:
H = normal_h1()
H.plot.scatter()
_images/vega-examples_2_2.png
[3]:
H.plot.bar(cumulative=True, xlabel="Other label")
_images/vega-examples_3_2.png
[4]:
H = normal_h1()
H.plot.line(cumulative=True)
_images/vega-examples_4_2.png
[5]:
H2 = munros().T
H2.plot(cmap="YellowGreen", show_values=True, height=333, width=333, value_format=".:;oO##############".__getitem__)
_images/vega-examples_5_2.png

Example of an interactive 3D histogram

Note: Unfortunately, this example does not render properly nor in GitHub renderer or notebook viewer. A live notebook must be running.

[6]:
H3 = normal_h3()
H3.axis_names = ("first", "second", "third")
H3.plot(show_values=True, show_zero=False, cmap="Blues", density=True, show_colorbar=False, value_format=".1f")
_images/vega-examples_7_2.png

Plotting with plotly backend

[1]:
# Basic imports
from physt.examples import normal_h2, normal_h1
from physt.plotting import plotly
import physt.plotting

import numpy as np
np.random.seed(42)

# Set that we want plotly
physt.plotting.set_default_backend("plotly")
[2]:
# Define the 1-D example
H = normal_h1()

The default plot is bar.

[3]:
H.plot()  # Same as H.plot.bar()
[4]:
H.plot.line()
[5]:
H.plot.scatter()

Plotly Figure object

If you want to further manipulate the figures, you can return them from the function as-is using the raw keyword.

[6]:
figure = H.plot.scatter(raw=True)
type(figure)
[6]:
plotly.graph_objs._figure.Figure

2D histograms

[7]:
H2 = normal_h2()
[8]:
# Default is heatmap
H2.plot()

Collections

[9]:
from physt import collection

collection = collection({
    "small": np.random.normal(160, 20, 600),
    "tall": np.random.normal(180, 20, 1000),
    "huge": np.random.normal(200, 20, 400),
    "gigantic": np.random.normal(220, 20, 200)
}, "human")
[10]:
collection.plot.line()
[11]:
# Let's see normalized histograms in the collection
collection.normalize_bins().plot(barmode="overlay", alpha=0.3)
[12]:
# ...and how they look like when stacked
collection.normalize_bins().plot(barmode="stack")

API

API Reference

physt package

Subpackages
physt.compat package
Submodules
physt.compat.dask module
physt.compat.geant4 module

Support for Geant4 histograms saved in CSV format.

See https://geant4.web.cern.ch/ for the project pages.

physt.compat.geant4.load_csv(path)

Loads a histogram as output from Geant4 analysis tools in CSV format.

Parameters:path (str) – Path to the CSV file
Returns:
Return type:physt.histogram1d.Histogram1D or physt.histogram_nd.Histogram2D
Module contents

Histograms types and function for various external libraries.

physt.examples package
Module contents

A set of examples used for demonstrating the physt capabilities / in tests.

physt.examples.fist() → physt.histogram1d.Histogram1D

A simple histogram in the shape of a fist.

physt.examples.normal_h1(size: int = 10000, mean: float = 0, sigma: float = 1) → physt.histogram1d.Histogram1D

A simple 1D histogram with normal distribution.

Parameters:
  • size (Number of points) –
  • mean (Mean of the distribution) –
  • sigma (Sigma of the distribution) –
physt.examples.normal_h2(size: int = 10000) → physt.histogram_nd.Histogram2D

A simple 2D histogram with normal distribution.

Parameters:size (Number of points) –
physt.examples.normal_h3(size: int = 10000) → physt.histogram_nd.HistogramND

A simple 3D histogram with normal distribution.

Parameters:size (Number of points) –
physt.io package
Submodules
physt.io.json module

JSON I/O

physt.io.json.load_json(path: str, encoding: str = 'utf-8') → Union[physt.histogram_base.HistogramBase, physt.histogram_collection.HistogramCollection]

Load histogram from a JSON file.

physt.io.json.parse_json(text: str) → Union[physt.histogram_base.HistogramBase, physt.histogram_collection.HistogramCollection]

Create histogram from a JSON string.

physt.io.json.save_json(histogram: Union[physt.histogram_base.HistogramBase, physt.histogram_collection.HistogramCollection], path: Optional[str] = None, **kwargs) → str

Save histogram to JSON format.

Parameters:
  • histogram (Any histogram) –
  • path (If set, also writes to the path.) –
Returns:

json

Return type:

The JSON representation of the histogram

physt.io.root module
physt.io.util module
physt.io.util.create_from_dict(data: dict, format_name: str, check_version: bool = True) → Union[physt.histogram_base.HistogramBase, physt.histogram_collection.HistogramCollection]

Once dict from source data is created, turn this into histogram.

Parameters:data (Parsed JSON-like tree.) –
Returns:histogram
Return type:A histogram (of any dimensionality)
physt.io.version module
exception physt.io.version.VersionError

Bases: Exception

physt.io.version.require_compatible_version(compatible_version, word='File')

Check that compatible version of input data is not too new.

Module contents

Input and output for histograms.

JSON format is included by default. Other formats are/will be available as modules.

Note: When implementing, try to work with a JSON-like
tree and reuse create_from_dict and HistogramBase.to_dict.
physt.plotting package
Submodules
physt.plotting.ascii module

ASCII plots (experimental).

The plots are printed directly to standard output.

physt.plotting.ascii.hbar(h1, width=80, show_values=False)
physt.plotting.common module

Functions that are shared by several (all) plotting backends.

class physt.plotting.common.TimeTickHandler(level: str = None)

Bases: object

Callable that creates ticks and labels corresponding to “sane” time values.

Note: This class is very experimental and subject to change or disappear.

LEVELS = {'day': 86400, 'hour': 3600, 'min': 60, 'sec': 1}
LevelType = typing.Tuple[str, typing.Union[float, int]]
classmethod deduce_level(h1: physt.histogram1d.Histogram1D, min_: float, max_: float) → Tuple[str, Union[float, int]]
format_time_ticks(ticks: List[float], level: Tuple[str, Union[float, int]]) → List[str]
get_time_ticks(h1: physt.histogram1d.Histogram1D, level: Tuple[str, Union[float, int]], min_: float, max_: float) → List[float]
classmethod parse_level(value: Union[Tuple[str, Union[float, int]], float, str, datetime.timedelta]) → Tuple[str, Union[float, int]]
classmethod split_hms(value) → Tuple[bool, int, int, Union[int, float]]
physt.plotting.common.check_ndim(ndim: Union[int, Tuple[int, ...]])

Decorator checking proper histogram dimension.

physt.plotting.common.get_data(histogram: physt.histogram_base.HistogramBase, density: bool = False, cumulative: bool = False, flatten: bool = False) → numpy.ndarray

Get histogram data based on plotting parameters.

Parameters:
  • density (Whether to divide bin contents by bin size) –
  • cumulative (Whether to return cumulative sums instead of individual) –
  • flatten (Whether to flatten multidimensional bins) –
physt.plotting.common.get_err_data(histogram: physt.histogram_base.HistogramBase, density: bool = False, cumulative: bool = False, flatten: bool = False) → numpy.ndarray

Get histogram error data based on plotting parameters.

Parameters:
  • density (Whether to divide bin contents by bin size) –
  • cumulative (Whether to return cumulative sums instead of individual) –
  • flatten (Whether to flatten multidimensional bins) –
physt.plotting.common.get_value_format(value_format: Union[Callable[[float], str], str, None]) → Callable[[float], str]

Create a formatting function from a generic value_format argument.

physt.plotting.common.pop_kwargs_with_prefix(prefix: str, kwargs: dict) → dict

Pop all items from a dictionary that have keys beginning with a prefix.

Parameters:
  • prefix (str) –
  • kwargs (dict) –
Returns:

kwargs – Items popped from the original directory, with prefix removed.

Return type:

dict

physt.plotting.folium module
physt.plotting.matplotlib module
physt.plotting.plotly module
physt.plotting.vega module

Vega3 backend for plotting in physt.

The JSON can be produced without any external dependency, the ability to show plots in-line in IPython requires ‘vega3’ library.

Implementation note: Values passed to JSON cannot be of type np.int64 (solution: explicit cast to float)

Common parameters

See the enable_inline_view wrapper.

physt.plotting.vega.bar(h1: physt.histogram1d.Histogram1D, **kwargs) → dict

Bar plot of 1D histogram.

Parameters:
  • lw (float) – Width of the line between bars
  • alpha (float) – Opacity of the bars
  • hover_alpha (float) – Opacity of the bars when hover on
physt.plotting.vega.display_vega(vega_data: dict, display: bool = True) → Union[Vega, dict]

Optionally display vega dictionary.

Parameters:
  • vega_data (Valid vega data as dictionary) –
  • display (Whether to try in-line display in IPython) –
physt.plotting.vega.enable_inline_view(f)

Decorator to enable in-line viewing in Python and saving to external file.

It adds several parameters to each decorated plotted function:

Parameters:
  • write_to (str (optional)) – Path to write vega JSON/HTML to.
  • write_format ("auto" | "json" | "html") – Whether to create a JSON data file or a full-fledged HTML page.
  • display ("auto" | True | False) – Whether to try in-line display in IPython
  • indent (int) – Indentation of JSON
physt.plotting.vega.line(h1: physt.histogram1d.Histogram1D, **kwargs) → dict

Line plot of 1D histogram values.

Points are horizontally placed in bin centers.

Parameters:h1 (physt.histogram1d.Histogram1D) – Dimensionality of histogram for which it is applicable
physt.plotting.vega.map(h2: physt.histogram_nd.Histogram2D, *, show_zero: bool = True, show_values: bool = False, **kwargs) → dict

Heat-map of two-dimensional histogram.

physt.plotting.vega.map_with_slider(h3: physt.histogram_nd.HistogramND, *, show_zero: bool = True, show_values: bool = False, **kwargs) → dict

Heatmap showing slice in first two dimensions, third dimension represented as a slider.

physt.plotting.vega.scatter(h1: physt.histogram1d.Histogram1D, **kwargs) → dict

Scatter plot of 1D histogram values.

Points are horizontally placed in bin centers.

Parameters:shape (str) –
physt.plotting.vega.write_vega(vega_data, *, title: Optional[str], write_to: str, write_format: str = 'auto', indent: int = 2)

Write vega dictionary to an external file.

Parameters:
  • vega_data (Valid vega data as dictionary) –
  • write_to (Path to write vega JSON/HTML to.) –
  • write_format ("auto" | "json" | "html") – Whether to create a JSON data file or a full-fledged HTML page.
  • indent (Indentation of JSON) –
Module contents

Plotting for physt histograms.

Available backends
  • matplotlib
  • vega
  • plotly (simple wrapper around matplotlib for 1D histograms)
  • folium (just for the geographical histograms)

Calling the plotting functions

Common parameters

There are several backends (and user-defined may be added) and several plotting functions for each - we try to keep a consistent set of parameters to which all implementations should try to stick (with exceptions).

All histograms
write_to : str (optional)
Path to file where the output will be stored
title : str (optional)
String to be displayed as plot title (defaults to h.title)
xlabel : str (optional)
String to be displayed as x-axis label (defaults to corr. axis name)
ylabel : str (optional)
String to be displayed as y-axis label (defaults to corr. axis name)
xscale : str (optional)
If “log”, x axis will be scaled logarithmically
yscale : str (optional)
If “log”, y axis will be scaled logarithmically

xlim : tuple | “auto” | “keep”

ylim : tuple | “auto” | “keep”

invert_y : bool
If True, the y axis points downwards
ticks : {“center”, “edge”}, optional
If set, each bin will have a tick (either central or edge)
alpha : float (optional)
The alpha of the whole plot (default: 1)
cmap : str or list
Name of the palette or list of colors or something that the respective backend can interpret as colourmap.

cmap_normalize : {“log”}, optional

cmap_min :

cmap_max :

show_values : bool
If True, show values next to (or inside) the bins
value_format : str or Callable
How bin values (if to be displayed) are rendered.

zorder : int (optional)

text_color : text_alpha : text_* :

Other options that are passed to the formatting of values without the prefix
1D histograms
cumulative : bool
If True, show CDF instead of bin heights
density : bool
If True, does not show bin contents but contents divided by width
errors : bool
Whether to show error bars (if available)
show_stats : bool
If True, display a small box with statistical info
2D heatmaps
show_zero : bool
Whether to show bins that have 0 frequency
grid_color :
Colour of line between bins
show_colorbar : bool
Whether to display a colorbar next to the plot itself
Line plots
lw (or linewidth) : int
Width of the lines
class physt.plotting.PlottingProxy(h: Union[physt.histogram_base.HistogramBase, physt.histogram_collection.HistogramCollection])

Bases: object

Proxy enabling to call plotting methods on histogram objects.

It can be used both as a method or as an object containing methods. In any case, it only forwards the call to the universal plot() function.

The __dir__ method should offer all plotting methods supported by the currently selected backend.

Example

plotter = histogram.plot plotter(…) # Plots using defaults plotter.bar(…) # Plots as a specified plot type (“bar”)

Note

Inspiration taken from the way how pandas deals with this.

physt.plotting.get_default_backend() → Optional[str]

The backend that will be used by default with the plot function.

physt.plotting.plot(histogram: Union[physt.histogram_base.HistogramBase, physt.histogram_collection.HistogramCollection], kind: Optional[str] = None, backend: Optional[str] = None, **kwargs)

Universal plotting function.

All keyword arguments are passed to the plotting methods.

Parameters:kind (Type of the plot (like "scatter", "line", ..), similar to pandas) –
physt.plotting.set_default_backend(name: str) → None

Choose a default backend.

Submodules
physt.bin_utils module

Methods for investigation and manipulation of bin arrays.

physt.bin_utils.find_human_width(raw_width: float, kind: Optional[str] = None) → float
physt.bin_utils.find_human_width_24(raw_width: float) → int
physt.bin_utils.find_human_width_60(raw_width: float) → int
physt.bin_utils.find_human_width_decimal(raw_width: float) → float
physt.bin_utils.is_bin_subset(sub: Union[numpy.ndarray, Iterable[T_co], int, float], sup: Union[numpy.ndarray, Iterable[T_co], int, float]) → bool

Check whether all bins in one binning are present also in another:

Parameters:
  • sub (array_like) – Candidate for the bin subset
  • sup (array_like) – Candidate for the bin superset
physt.bin_utils.is_bin_superset(sup: Union[numpy.ndarray, Iterable[T_co], int, float], sub: Union[numpy.ndarray, Iterable[T_co], int, float]) → bool

Inverse of is_bin_subset

physt.bin_utils.is_consecutive(bins: Union[numpy.ndarray, Iterable[T_co], int, float], rtol: float = 1e-05, atol: float = 1e-08) → bool

Check whether the bins are consecutive (edges match).

Does not check if the bins are in rising order.

physt.bin_utils.is_rising(bins: Union[numpy.ndarray, Iterable[T_co], int, float]) → bool

Check whether the bins are in raising order.

Does not check if the bins are consecutive.

Parameters:bins (array_like) –
physt.bin_utils.make_bin_array(bins: Union[numpy.ndarray, Iterable[T_co], int, float]) → numpy.ndarray

Turn bin data into array understood by HistogramXX classes.

Parameters:bins (array_like) – Array of edges or array of edge tuples

Examples

>>> make_bin_array([0, 1, 2])
array([[0, 1],
       [1, 2]])
>>> make_bin_array([[0, 1], [2, 3]])
array([[0, 1],
       [2, 3]])
physt.bin_utils.to_numpy_bins(bins: Union[numpy.ndarray, Iterable[T_co], int, float]) → numpy.ndarray

Convert physt bin format to numpy edges.

Parameters:bins (array_like) – 1-D (n) or 2-D (n, 2) array of edges
Returns:edges
Return type:all edges
physt.bin_utils.to_numpy_bins_with_mask(bins: Union[numpy.ndarray, Iterable[T_co], int, float]) → Tuple[numpy.ndarray, numpy.ndarray]

Numpy binning edges including gaps.

Parameters:bins (1-D (n) or 2-D (n, 2) array of edges) –
Returns:
  • edges (All edges)
  • mask (List of indices that correspond to bins that have to be included)

Examples

>>> to_numpy_bins_with_mask([0, 1, 2])
(array([0.,   1.,   2.]), array([0, 1]))
>>> to_numpy_bins_with_mask([[0, 1], [2, 3]])
(array([0, 1, 2, 3]), array([0, 2])
physt.binnings module

Different binning algorithms/schemas for the histograms.

class physt.binnings.BinningBase(bins: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, numpy_bins: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, includes_right_edge: bool = False, adaptive: bool = False)

Bases: object

Abstract base class for binning schemas.

  • define at least one of the following properties: bins, numpy_bins (cached conversion exists)
  • if you modify bins, put _bins and _numpy_bins into proper state (None may be sufficient)
  • checking of proper bins should be done in __init__
  • if you want to support adaptive histogram, override _force_bin_existence
  • implement _update_dict to contain the binning representation
  • the constructor (and facade methods) must accept any kwargs (and ignores those that are not used).
adaptive_allowed

Whether is possible to update the bins dynamically

Type:bool
inconsecutive_allowed

Whether it is possible to have bins with gaps

Type:bool
TODO
Type:Check the last point (does it make sense?)
adapt(other: physt.binnings.BinningBase)

Adapt this binning so that it contains all bins of another binning.

Parameters:other (BinningBase) –
adaptive_allowed = False
apply_bin_map(bin_map) → physt.binnings.BinningBase
Parameters:bin_map (Iterator(tuple)) – The bins must be in ascending order
as_fixed_width(copy: bool = True) → physt.binnings.FixedWidthBinning

Convert binning to recipe with fixed width (if possible.)

Parameters:copy (If True, ensure that we receive another object.) –
as_static(copy: bool = True) → physt.binnings.StaticBinning

Convert binning to a static form.

Parameters:copy (bool) – Ensure that we receive another object
Returns:A new static binning with a copy of bins.
Return type:StaticBinning
bin_count

The total number of bins.

bins

Bins in the wider format (as edge pairs)

Returns:bins – shape=(bin_count, 2)
Return type:np.ndarray
copy() → BinningType

An identical, independent copy.

first_edge

The left edge of the first bin.

force_bin_existence(values)

Change schema so that there is a bin for value.

It is necessary to implement the _force_bin_existence template method.

Parameters:values (np.ndarray) – All values we want bins for.
Returns:bin_map – None => There was no change in bins int => The bins are only shifted (allows mass assignment) Otherwise => the iterable contains tuples (old bin index, new bin index)
new bin index can occur multiple times, which corresponds to bin merging
Return type:Iterable[tuple] or None or int
static from_dict(a_dict)
includes_right_edge
inconsecutive_allowed = False
is_adaptive() → bool

Whether the binning can be adapted to include values not currently spanned.

is_consecutive(rtol: float = 1e-05, atol: float = 1e-08) → bool

Whether all bins are in a growing order.

Parameters:atol (rtol,) –
is_regular(*, rtol: float = 1e-05, atol: float = 1e-08) → bool

Whether all bins have the same width.

Parameters:atol (rtol,) –
last_edge

The right edge of the last bin.

numpy_bins

Bins in the numpy format

This might not be available for inconsecutive binnings.

Returns:edges – shape=(bin_count+1,)
Return type:np.ndarray
numpy_bins_with_mask

Bins in the numpy format, including the gaps in inconsecutive binnings.

Returns:edges, mask
Return type:np.ndarray

See also

bin_utils.to_numpy_bins_with_mask

set_adaptive(value: bool = True) → None

Set/unset the adaptive property of the binning.

This is available only for some of the binning types.

to_dict() → Dict[str, Any]

Dictionary representation of the binning schema.

This serves as template method, please implement _update_dict

physt.binnings.BinningLike = typing.Union[physt.binnings.BinningBase, numpy.ndarray, typing.Iterable, int, float]

Anything that can be converted to a binning.

class physt.binnings.ExponentialBinning(log_min: float, log_width: float, bin_count: int, includes_right_edge: bool = True, adaptive: bool = False, **kwargs)

Bases: physt.binnings.BinningBase

Binning schema with exponentially distributed bins.

adaptive_allowed = False
copy() → physt.binnings.ExponentialBinning

An identical, independent copy.

is_regular(**kwargs) → bool

Whether all bins have the same width.

Parameters:atol (rtol,) –
numpy_bins

Bins in the numpy format

This might not be available for inconsecutive binnings.

Returns:edges – shape=(bin_count+1,)
Return type:np.ndarray
class physt.binnings.FixedWidthBinning(*, bin_width, bin_count=0, bin_times_min=None, min=None, includes_right_edge=False, adaptive=False, bin_shift=None, align=True, **kwargs)

Bases: physt.binnings.BinningBase

Binning schema with predefined bin width.

adaptive_allowed = True
as_fixed_width(copy: bool = True) → physt.binnings.FixedWidthBinning

Convert binning to recipe with fixed width (if possible.)

Parameters:copy (If True, ensure that we receive another object.) –
bin_count

The total number of bins.

bin_width
copy()

An identical, independent copy.

first_edge

The left edge of the first bin.

is_regular(**kwargs) → bool

Whether all bins have the same width.

Parameters:atol (rtol,) –
last_edge

The right edge of the last bin.

numpy_bins

Bins in the numpy format

This might not be available for inconsecutive binnings.

Returns:edges – shape=(bin_count+1,)
Return type:np.ndarray
class physt.binnings.NumpyBinning(numpy_bins: Union[numpy.ndarray, Iterable[T_co], int, float], includes_right_edge=True, **kwargs)

Bases: physt.binnings.BinningBase

Binning schema working as numpy.histogram.

copy() → physt.binnings.NumpyBinning

An identical, independent copy.

numpy_bins

Bins in the numpy format

This might not be available for inconsecutive binnings.

Returns:edges – shape=(bin_count+1,)
Return type:np.ndarray
class physt.binnings.StaticBinning(bins, includes_right_edge=True, **kwargs)

Bases: physt.binnings.BinningBase

Binning defined by an array of bin edge pairs.

as_static(copy: bool = True) → physt.binnings.StaticBinning

Convert binning to a static form.

Returns:A new static binning with a copy of bins.
Return type:StaticBinning
Parameters:copy (if True, returns itself (already satisfying conditions)) –
copy()

An identical, independent copy.

inconsecutive_allowed = True
physt.binnings.as_binning(obj: Union[physt.binnings.BinningBase, numpy.ndarray, Iterable[T_co], int, float], copy: bool = False) → physt.binnings.BinningBase

Ensure that an object is a binning

Parameters:
  • obj (BinningBase or array_like) – Can be a binning, numpy-like bins or full physt bins
  • copy (If true, ensure that the returned object is independent) –
physt.binnings.binning_methods = {'exponential': <function exponential_binning>, 'fixed_width': <function fixed_width_binning>, 'human': <function human_binning>, 'integer': <function integer_binning>, 'numpy': <function numpy_binning>, 'quantile': <function quantile_binning>, 'static': <function static_binning>}

Dictionary of available binnnings.

physt.binnings.calculate_bins(array, _=None, **kwargs) → physt.binnings.BinningBase

Find optimal binning from arguments.

Parameters:
  • array (arraylike) – Data from which the bins should be decided (sometimes used, sometimes not)
  • _ (int or str or Callable or arraylike or Iterable or BinningBase) – To-be-guessed parameter that specifies what kind of binning should be done
  • check_nan (bool) – Check for the presence of nan’s in array? Default: True
  • range (tuple) – Limit values to a range. Some of the binning methods also (subsequently) use this parameter for the bin shape.
Returns:

A two-dimensional array with pairs of bin edges (not necessarily consecutive).

Return type:

BinningBase

physt.binnings.calculate_bins_nd(array: Optional[numpy.ndarray], bins=None, dim: Optional[int] = None, check_nan=True, **kwargs) → List[physt.binnings.BinningBase]

Find optimal binning from arguments (n-dimensional variant)

Usage similar to calculate_bins.

physt.binnings.exponential_binning(data=None, bin_count: Optional[int] = None, *, range: Optional[Tuple[float, float]] = None, **kwargs) → physt.binnings.ExponentialBinning

Construct exponential binning schema.

Parameters:
  • bin_count (Number of bins) –
  • range ((min, max)) –

See also

numpy.logspace()

physt.binnings.fixed_width_binning(data=None, bin_width: Union[float, int] = 1, *, range: Optional[Tuple[float, float]] = None, includes_right_edge: bool = False, **kwargs) → physt.binnings.FixedWidthBinning

Construct fixed-width binning schema.

Parameters:
  • bin_width (float) –
  • range (Optional[tuple]) – (min, max)
  • align (Optional[float]) – Must be multiple of bin_width
physt.binnings.human_binning(data: Optional[numpy.ndarray] = None, bin_count: Optional[int] = None, *, kind: Optional[str] = None, range: Optional[Tuple[float, float]] = None, min_bin_width: Optional[float] = None, max_bin_width: Optional[float] = None, **kwargs) → physt.binnings.FixedWidthBinning

Construct fixed-width ninning schema with bins automatically optimized to human-friendly widths.

Typical widths are: 1.0, 25,0, 0.02, 500, 2.5e-7, …

Parameters:
  • bin_count (Number of bins) –
  • kind (Optional value "time" works in h,m,s scale instead of seconds) –
  • range (Tuple of (min, max)) –
  • min_bin_width (If present, the bin cannot be narrower than this.) –
  • max_bin_width (If present, the bin cannot be wider than this.) –
physt.binnings.ideal_bin_count(data: numpy.ndarray, method: str = 'default') → int

A theoretically ideal bin count.

Parameters:
  • data (Data to work on. Most methods don't use this.) –
  • method (str) –
    Name of the method to apply, available values:
    • default (~sturges)
    • sqrt
    • sturges
    • doane
    • rice

    See https://en.wikipedia.org/wiki/Histogram for the description

physt.binnings.integer_binning(data=None, **kwargs) → physt.binnings.FixedWidthBinning

Construct fixed-width binning schema with bins centered around integers.

Parameters:
  • range (Optional[Tuple[int]]) – min (included) and max integer (excluded) bin
  • bin_width (Optional[int]) – group “bin_width” integers into one bin (not recommended)
physt.binnings.numpy_binning(data: Optional[numpy.ndarray], bin_count: int = 10, range: Optional[Tuple[float, float]] = None, **kwargs) → physt.binnings.NumpyBinning

Construct binning schema compatible with numpy.histogram together with int argument

Parameters:
  • data (array_like, optional) – This is optional if both bins and range are set
  • bin_count (int) –
  • range (Optional[tuple]) – (min, max)
  • includes_right_edge (Optional[bool]) – default: True

See also

numpy.histogram(), static_binning()

physt.binnings.quantile_binning(data: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, bin_count: Optional[int] = None, q: Optional[Sequence[int]] = None, qrange: Optional[Tuple[float, float]] = None, **kwargs) → physt.binnings.StaticBinning

Binning schema based on quantile ranges.

This binning finds equally spaced quantiles. This should lead to all bins having roughly the same frequencies.

Note: weights are not (yet) take into account for calculating quantiles.

Parameters:
  • bin_count (Number of bins) –
  • q (Sequence of quantiles to be used as edges (a la numpy)) –
  • qrange (Two floats as minimum and maximum quantile (default: 0.0, 1.0)) –
Returns:

Return type:

StaticBinning

physt.binnings.register_binning(f=None, *, name: Optional[str] = None)

Decorator to register among available binning methods.

physt.binnings.static_binning(data=None, bins=None, **kwargs) → physt.binnings.StaticBinning

Construct static binning with whatever bins.

physt.config module
physt.facade module
physt.facade.h1(data: Union[numpy.ndarray, Iterable[T_co], int, float, None], bins=None, *, adaptive: bool = False, dropna: bool = True, dtype: Union[type, numpy.dtype, str, None] = None, weights: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, keep_missed: bool = True, name: Optional[str] = None, title: Optional[str] = None, axis_name: Optional[str] = None, **kwargs) → physt.histogram1d.Histogram1D

Facade function to create 1D histograms.

This proceeds in three steps: 1) Based on magical parameter bins, construct bins for the histogram 2) Calculate frequencies for the bins 3) Construct the histogram object itself

Guiding principle: parameters understood by numpy.histogram should be understood also by physt.histogram as well and should result in a Histogram1D object with (h.numpy_bins, h.frequencies) same as the numpy.histogram output. Additional functionality is a bonus.

Parameters:
  • data (array_like, optional) – Container of all the values (tuple, list, np.ndarray, pd.Series)
  • bins (int or sequence of scalars or callable or str, optional) – If iterable => the bins themselves If int => number of bins for default binning If callable => use binning method (+ args, kwargs) If string => use named binning method (+ args, kwargs)
  • weights (array_like, optional) – (as numpy.histogram)
  • keep_missed (Store statistics about how many values were lower than limits) – and how many higher than limits (default: True)
  • dropna (Whether to clear data from nan's before histogramming) –
  • name (Name of the histogram) –
  • title (What will be displayed in the title of the plot) –
  • axis_name (Name of the variable on x axis) –
  • adaptive (Whether we want the bins to be modifiable) – (useful for continuous filling of a priori unknown data)
  • dtype (Customize underlying data type: default int64 (without weight) or float (with weights)) –
  • numpy.histogram parameters are excluded, see the methods of the Histogram1D class itself. (Other) –

See also

numpy.histogram()

physt.facade.h2(data1: Union[numpy.ndarray, Iterable[T_co], int, float, None], data2: Union[numpy.ndarray, Iterable[T_co], int, float, None], bins=10, **kwargs) → physt.histogram_nd.Histogram2D

Facade function to create 2D histograms.

For implementation and parameters, see histogramdd.

See also

numpy.histogram2d(), histogramdd()

physt.facade.h3(data: Union[numpy.ndarray, Iterable[T_co], int, float, None], bins=None, **kwargs) → physt.histogram_nd.HistogramND

Facade function to create 3D histograms.

Parameters:data (array_like or list[array_like] or tuple[array_like]) – Can be a single array (with three columns) or three different arrays (for each component)
physt.facade.histogram(data: Union[numpy.ndarray, Iterable[T_co], int, float, None], bins=None, *, adaptive: bool = False, dropna: bool = True, dtype: Union[type, numpy.dtype, str, None] = None, weights: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, keep_missed: bool = True, name: Optional[str] = None, title: Optional[str] = None, axis_name: Optional[str] = None, **kwargs) → physt.histogram1d.Histogram1D

Facade function to create 1D histograms.

This proceeds in three steps: 1) Based on magical parameter bins, construct bins for the histogram 2) Calculate frequencies for the bins 3) Construct the histogram object itself

Guiding principle: parameters understood by numpy.histogram should be understood also by physt.histogram as well and should result in a Histogram1D object with (h.numpy_bins, h.frequencies) same as the numpy.histogram output. Additional functionality is a bonus.

Parameters:
  • data (array_like, optional) – Container of all the values (tuple, list, np.ndarray, pd.Series)
  • bins (int or sequence of scalars or callable or str, optional) – If iterable => the bins themselves If int => number of bins for default binning If callable => use binning method (+ args, kwargs) If string => use named binning method (+ args, kwargs)
  • weights (array_like, optional) – (as numpy.histogram)
  • keep_missed (Store statistics about how many values were lower than limits) – and how many higher than limits (default: True)
  • dropna (Whether to clear data from nan's before histogramming) –
  • name (Name of the histogram) –
  • title (What will be displayed in the title of the plot) –
  • axis_name (Name of the variable on x axis) –
  • adaptive (Whether we want the bins to be modifiable) – (useful for continuous filling of a priori unknown data)
  • dtype (Customize underlying data type: default int64 (without weight) or float (with weights)) –
  • numpy.histogram parameters are excluded, see the methods of the Histogram1D class itself. (Other) –

See also

numpy.histogram()

physt.facade.histogram2d(data1: Union[numpy.ndarray, Iterable[T_co], int, float, None], data2: Union[numpy.ndarray, Iterable[T_co], int, float, None], bins=10, **kwargs) → physt.histogram_nd.Histogram2D

Facade function to create 2D histograms.

For implementation and parameters, see histogramdd.

See also

numpy.histogram2d(), histogramdd()

physt.facade.histogramdd(data: Union[numpy.ndarray, Iterable[T_co], int, float, None], bins=10, *, adaptive=False, dropna: bool = True, name: Optional[str] = None, title: Optional[str] = None, axis_names: Optional[Iterable[str]] = None, dim: Optional[int] = None, weights: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, **kwargs) → physt.histogram_nd.HistogramND

Facade function to create n-dimensional histograms.

3D variant of this function is also aliased as “h3”.

Parameters:
  • data (array_like) – Container of all the values
  • bins (Any) –
  • weights (array_like, optional) – (as numpy.histogram)
  • dropna (Whether to clear data from nan's before histogramming) –
  • name (Name of the histogram) –
  • axis_names (Names of the variable on x axis) –
  • adaptive (Whether the bins should be updated when new non-fitting value are filled) –
  • dtype (Optional[type]) – Underlying type for the histogram. If weights are specified, default is float. Otherwise int64
  • title (What will be displayed in the title of the plot) –
  • dim (Dimension - necessary if you are creating an empty adaptive histogram) –
  • Note (For most arguments, if a list is passed, its values are used as values for) –
  • axes. (individual) –

See also

numpy.histogramdd()

physt.facade.collection(data, bins=10, **kwargs) → physt.histogram_collection.HistogramCollection

Create histogram collection with shared binnning.

physt.facade.polar(xdata: Union[numpy.ndarray, Iterable[T_co], int, float], ydata: Union[numpy.ndarray, Iterable[T_co], int, float], *, radial_bins='numpy', radial_range: Optional[Tuple[float, float]] = None, phi_bins=16, phi_range: Tuple[float, float] = (0, 6.283185307179586), dropna: bool = False, weights: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, transformed: bool = False, **kwargs) → physt.special_histograms.PolarHistogram

Facade construction function for the PolarHistogram.

physt.facade.azimuthal(xdata: Union[numpy.ndarray, Iterable[T_co], int, float], ydata: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, bins=16, range: Tuple[float, float] = (0, 6.283185307179586), dropna: bool = False, weights=None, transformed: bool = False, **kwargs) → physt.special_histograms.AzimuthalHistogram

Facade function to create an AzimuthalHistogram.

physt.facade.radial(xdata: Union[numpy.ndarray, Iterable[T_co], int, float], ydata: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, zdata: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, bins='numpy', range: Optional[Tuple[float, float]] = None, dropna: bool = False, weights: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, transformed: bool = False, **kwargs) → physt.special_histograms.RadialHistogram

Facade function to create a radial histogram.

physt.facade.cylindrical(data: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, rho_bins='numpy', phi_bins=16, z_bins='numpy', transformed: bool = False, dropna: bool = True, rho_range: Optional[Tuple[float, float]] = None, phi_range: Tuple[float, float] = (0, 6.283185307179586), weights: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, z_range: Optional[Tuple[float, float]] = None, **kwargs) → physt.special_histograms.CylindricalHistogram

Facade function to create a cylindrical histogram.

physt.facade.cylindrical_surface(data=None, *, phi_bins=16, z_bins='numpy', transformed: bool = False, radius: Optional[float] = None, dropna: bool = False, weights=None, phi_range: Tuple[float, float] = (0, 6.283185307179586), z_range: Optional[Tuple[float, float]] = None, **kwargs) → physt.special_histograms.CylindricalSurfaceHistogram

Facade function to create a cylindrical surface histogram.

physt.facade.spherical(data: Union[numpy.ndarray, Iterable[T_co], int, float], *, radial_bins='numpy', theta_bins=16, phi_bins=16, dropna: bool = True, transformed: bool = False, theta_range: Tuple[float, float] = (0, 3.141592653589793), phi_range: Tuple[float, float] = (0, 6.283185307179586), radial_range: Optional[Tuple[float, float]] = None, weights=None, **kwargs) → physt.special_histograms.SphericalHistogram

Facade function to create a speherical histogram.

physt.facade.spherical_surface(data: Union[numpy.ndarray, Iterable[T_co], int, float], *, theta_bins=16, phi_bins=16, transformed: bool = False, radius: Optional[float] = None, dropna: bool = False, weights: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, theta_range: Tuple[float, float] = (0, 3.141592653589793), phi_range: Tuple[float, float] = (0, 6.283185307179586), **kwargs) → physt.special_histograms.SphericalSurfaceHistogram

Facade construction function for the SphericalSurfaceHistogram.

physt.histogram1d module

One-dimensional histograms.

class physt.histogram1d.Histogram1D(binning: Union[physt.binnings.BinningBase, numpy.ndarray, Iterable[T_co], int, float], frequencies: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, errors2: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, keep_missed: bool = True, stats: Optional[Dict[str, float]] = None, overflow: Optional[float] = 0.0, underflow: Optional[float] = 0.0, inner_missed: Optional[float] = 0.0, axis_name: Optional[str] = None, **kwargs)

Bases: physt.histogram1d.ObjectWithBinning, physt.histogram_base.HistogramBase

One-dimensional histogram data.

The bins can be of different widths.

The bins need not be consecutive. However, some functionality may not be available for non-consecutive bins (like keeping information about underflow and overflow).

_stats
Type:dict

These are the basic attributes that can be used in the constructor (see there) Other attributes are dynamic.

EMPTY_STATS = {'sum': 0.0, 'sum2': 0.0}
axis_name
binning

The binning.

Note: Please, do not try to update the object itself.

cumulative_frequencies

Cumulative frequencies.

Note: underflow values are not considered

fill(value: float, weight: float = 1, **kwargs) → Optional[int]

Update histogram with a new value.

Parameters:
  • value (Value to be added.) –
  • weight (Weight assigned to the value.) –
Returns:

  • index of bin which was incremented (-1=underflow, N=overflow, None=not found)
  • Note (If a gap in unconsecutive bins is matched, underflow & overflow are not valid anymore.)
  • Note (Name was selected because of the eponymous method in ROOT)

fill_n(values: Union[numpy.ndarray, Iterable[T_co], int, float], weights: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, dropna: bool = True) → None

Update histogram with more values at once.

It is an in-place operation.

Parameters:
  • values (Values to add) –
  • weights (Optional weights to assign to each value) –
  • drop_na (If true (default), all nan's are skipped.) –

Note

This method should be overloaded with a more efficient one.

May change the dtype if weight is set.

find_bin(value: Union[numpy.ndarray, Iterable[T_co], int, float], axis: Union[int, str, None] = None) → Optional[int]

Index of bin corresponding to a value.

Returns:(-1=underflow, N=overflow, None=not found - inconsecutive)
Return type:index of bin to which value belongs
classmethod from_calculate_frequencies(data: Union[numpy.ndarray, Iterable[T_co], int, float], binning: physt.binnings.BinningBase, weights: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, validate_bins: bool = True, already_sorted: bool = False, dtype: Union[type, numpy.dtype, str, None] = None, **kwargs) → Histogram1DType

Construct the histogram from values and bins.

classmethod from_xarray(arr: xarray.Dataset) → Histogram1D

Convert form xarray.Dataset

Parameters:arr (The data in xarray representation) –
inner_missed
mean() → Optional[float]

Statistical mean of all values entered into histogram.

This number is precise, because we keep the necessary data separate from bin contents.

numpy_like

Same result as would the numpy.histogram function return.

overflow
select(axis, index, *, force_copy: bool = False) → Union[physt.histogram1d.Histogram1D, Tuple[numpy.ndarray, float]]

Alias for [] to be compatible with HistogramND.

std() → Optional[float]

Standard deviation of all values entered into histogram.

This number is precise, because we keep the necessary data separate from bin contents.

Returns:
Return type:float
to_dataframe() → pandas.DataFrame

Convert to pandas DataFrame.

This is not a lossless conversion - (under/over)flow info is lost.

to_xarray() → xarray.Dataset

Convert to xarray.Dataset

underflow
variance() → Optional[float]

Statistical variance of all values entered into histogram.

This number is precise, because we keep the necessary data separate from bin contents.

Returns:
Return type:float
class physt.histogram1d.ObjectWithBinning

Bases: abc.ABC

Mixin with shared methods for 1D objects that have a binning.

Note: Used to share behaviour between Histogram1D and HistogramCollection.

bin_centers

Centers of all bins.

bin_left_edges

Left edges of all bins.

bin_right_edges

Right edges of all bins.

bin_sizes
bin_widths

Widths of all bins.

binning

The binning itself.

bins

Array of all bin edges.

Returns:
Return type:Wide-format [[leftedge1, rightedge1], .. [leftedgeN, rightedgeN]]
edges
get_bin_left_edges(i)
get_bin_right_edges(i)
max_edge

Right edge of the last bin.

min_edge

Left edge of the first bin.

ndim
numpy_bins

Bins in the format of numpy.

total_width

Total width of all bins.

In inconsecutive histograms, the missing intervals are not counted in.

physt.histogram1d.calculate_frequencies(data: Union[numpy.ndarray, Iterable[T_co], int, float], binning: physt.binnings.BinningBase, weights: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, validate_bins: bool = True, already_sorted: bool = False, dtype: Union[type, numpy.dtype, str, None] = None) → Tuple[numpy.ndarray, numpy.ndarray, float, float, dict]

Get frequencies and bin errors from the data.

Parameters:
  • data (Data items to work on.) –
  • binning (A set of bins.) –
  • weights (Weights of the items.) –
  • validate_bins (If True (default), bins are validated to be in ascending order.) –
  • already_sorted (If True, the data being entered are already sorted, no need to sort them once more.) –
  • dtype (Underlying type for the histogram.) – (If weights are specified, default is float. Otherwise long.)
Returns:

  • frequencies (Bin contents)
  • errors2 (Error squares of the bins)
  • underflow (Weight of items smaller than the first bin)
  • overflow (Weight of items larger than the last bin)
  • stats (dict) – { sum: …, sum2: …}

Note

Checks that the bins are in a correct order (not necessarily consecutive). Does not check for numerical overflows in bins.

physt.histogram_base module

HistogramBase - base for all histogram classes.

class physt.histogram_base.HistogramBase(binnings: Iterable[Union[physt.binnings.BinningBase, numpy.ndarray, Iterable[T_co], int, float]], frequencies: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, errors2: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, axis_names: Optional[Iterable[str]] = None, dtype: Union[type, numpy.dtype, str, None] = None, keep_missed: bool = True, **kwargs)

Bases: abc.ABC

Histogram base class.

Behaviour shared by all histogram classes.

The most important daughter classes are: - Histogram1D - HistogramND

There are also special histogram types that are modifications of these classes.

The methods you should override: - fill - fill_n (optional) - copy - _update_dict (optional)

Underlying data type is int64 / float or an explicitly specified other type (dtype).

_binnings
Type:Schema for binning(s)
frequencies

Bin contents

Type:np.ndarray
errors2

Square errors associated with the bin contents

Type:np.ndarray
_meta_data

All meta-data (names, user-custom values, …). Anything can be put in. When exported, all information is kept.

Type:dict
_dtype

Type of the frequencies and also errors (int64, float64 or user-overridden)

Type:np.dtype
_missed

Various storage for missed values in different histogram types (1 value for multi-dimensional, 3 values for one-dimensional)

Type:array_like
Invariants
----------
- Frequencies in the histogram should always be non-negative.
Many operations rely on that, but it is not always enforced.
(if you set config.free_arithmetics (see below), negative frequencies are also
allowed.
Arithmetics
-----------
Histograms offer standard arithmetic operators that by default allow only
meaningful application (i.e. addition / subtraction of two histograms
with matching or mutually adaptable bin sets, multiplication and division by a constant).
If you relax the criteria by setting `config.free_aritmetics` or inside
the config.enable_free_arithmetics() context manager, you are in addition
allowed to use any array-like with matching shape.

See also

histogram1d, histogram_nd, special

adaptive
axis_names

Names of axes (stored in meta-data).

bin_count

Total number of bins.

bin_sizes
binnings

The binnings.

Note: Please, do not try to update the objects themselves.

bins
copy(*, include_frequencies: bool = True) → HistogramType

Copy the histogram.

Parameters:include_frequencies (If false, all frequencies are set to zero.) –
default_axis_names

Axis names to be used when an instance does not define them.

default_init_values = {}
densities

Frequencies normalized by bin sizes.

Useful when bins are not of the same size.

dtype

Data type of the bin contents.

errors

Bin errors.

errors2

Squares of the bin errors.

fill(value: float, weight: float = 1, **kwargs) → Union[None, int, Tuple[int, ...]]

Update histogram with a new value.

It is an in-place operation.

Parameters:
  • value (Value to be added. Can be scalar or array depending on the histogram type.) –
  • weight (Weight of the value) –

Note

May change the dtype if weight is set

fill_n(values: Union[numpy.ndarray, Iterable[T_co], int, float], weights: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, dropna: bool = True)

Update histogram with more values at once.

It is an in-place operation.

Parameters:
  • values (Values to add) –
  • weights (Optional weights to assign to each value) –
  • drop_na (If true (default), all nan's are skipped.) –

Note

This method should be overloaded with a more efficient one.

May change the dtype if weight is set.

find_bin(value: Union[numpy.ndarray, Iterable[T_co], int, float], axis: Union[int, str, None] = None) → Union[None, int, Tuple[int, ...]]

Index(-ices) of bin corresponding to a value.

Parameters:
  • value (Value with dimensionality equal to histogram) –
  • axis (If set, find axis along an axis. Otherwise, find bins along all axes.) – None = outside the bins
Returns:

Return type:

If axis is specified (or the histogram is 1D), a number. Otherwise, a tuple. If not available, None.

frequencies

Frequencies (values, contents) of the histogram bins.

classmethod from_dict(a_dict: Mapping[str, Any]) → physt.histogram_base.HistogramBase

Create an instance from a dictionary.

If customization is necessary, override the _from_dict_kwargs template method, not this one.

has_same_bins(other: physt.histogram_base.HistogramBase) → bool

Whether two histograms share the same binning.

is_adaptive() → bool

Whether the binning can be changed with operations.

merge_bins(amount: Optional[int] = None, *, min_frequency: Optional[float] = None, axis: Union[int, str, None] = None, inplace: bool = False) → HistogramType

Reduce the number of bins and add their content:

Parameters:
  • amount (How many adjacent bins to join together.) –
  • min_frequency (Try to have at least this value in each bin) – (this is not enforce e.g. for minima between high bins)
  • axis (On which axis to do this (None => all)) –
  • inplace (Whether to modify this histogram or return a new one) –
meta_data

A dictionary of non-numerical information about the histogram.

It contains several pre-defined ones, but you can add any other. These are preserved when saving and also in operations.

missed

Total number (weight) of entries that missed the bins.

name

Name of the histogram (stored in meta-data).

ndim

Dimensionality of histogram’s data.

i.e. the number of axes along which we bin the values.

normalize(inplace: bool = False, percent: bool = False) → physt.histogram_base.HistogramBase

Normalize the histogram, so that the total weight is equal to 1.

Parameters:
  • inplace (If True, updates itself. If False (default), returns copy) –
  • percent (If True, normalizes to percent instead of 1. Default: False) –
Returns:

Return type:

either modified copy or self

See also

densities(), HistogramND.partial_normalize()

plot

Proxy to plotting.

This attribute is a special proxy to plotting. In the most simple cases, it can be used as a method. For more sophisticated use, see the documentation for physt.plotting package.

select(axis: Union[int, str], index: Union[int, slice], *, force_copy: bool = False) → Any

Select in an axis.

Parameters:
  • axis (Axis, in which we select.) –
  • index (Index of bin (as in numpy)) –
  • force_copy (If True, identity slice force a copy to be made.) –
set_adaptive(value: bool = True)

Change the histogram binning to (non)adaptive.

This requires binning in all dimensions to allow this.

set_dtype(value: Union[type, numpy.dtype, str], *, check: bool = True) → None

Change data type of the bin contents.

Allowed conversions: - from integral to float types - between the same category of type (float/integer) - from float types to integer if weights are trivial

Parameters:
  • value (np.dtype or something convertible to it.) –
  • check (If True (default), all values are checked against the limits) –
shape

Shape of histogram’s data.

Returns:
Return type:Tuple with the number of bins along each axis.
title

Title of the histogram to be displayed when plotted (stored in meta-data).

If not specified, defaults to name.

to_dict() → Dict[str, Any]

Dictionary with all data in the histogram.

This is used for export into various formats (e.g. JSON) If a descendant class needs to update the dictionary in some way (put some more information), override the _update_dict method.

to_json(path: Optional[str] = None, **kwargs) → str

Convert to JSON representation.

Parameters:path (Where to write the JSON.) –
Returns:
Return type:The JSON representation.
total

Total number (sum of weights) of entries excluding underflow and overflow.

physt.histogram_collection module
class physt.histogram_collection.HistogramCollection(*histograms, binning: Union[physt.binnings.BinningBase, numpy.ndarray, Iterable[T_co], int, float, None] = None, title: Optional[str] = None, name: Optional[str] = None)

Bases: collections.abc.Container, typing.Generic, physt.histogram1d.ObjectWithBinning

Experimental collection of histograms.

It contains (potentially name-addressable) 1-D histograms with a shared binning.

add(histogram: physt.histogram1d.Histogram1D) → None

Add a histogram to the collection.

axis_name
axis_names
binning

The binning itself.

copy() → physt.histogram_collection.HistogramCollection
create(name: str, values, *, weights=None, dropna: bool = True, **kwargs) → physt.histogram1d.Histogram1D
classmethod from_dict(a_dict: Dict[str, Any]) → physt.histogram_collection.HistogramCollection
classmethod multi_h1(a_dict: Dict[str, Union[numpy.ndarray, Iterable[T_co], int, float]], bins=None, **kwargs) → physt.histogram_collection.HistogramCollection

Create a collection from multiple datasets.

normalize_all(inplace: bool = False) → physt.histogram_collection.HistogramCollection

Normalize all histograms so that total content of each of them is equal to 1.0.

normalize_bins(inplace: bool = False) → physt.histogram_collection.HistogramCollection

Normalize each bin in the collection so that the sum is 1.0 for each bin.

Note: If a bin is zero in all collections, the result will be inf.

plot

Proxy to plotting.

This attribute is a special proxy to plotting. In the most simple cases, it can be used as a method. For more sophisticated use, see the documentation for physt.plotting package.

sum() → physt.histogram1d.Histogram1D

Return the sum of all contained histograms.

to_dict() → Dict[str, Any]
to_json(path: Optional[str] = None, **kwargs) → str

Convert to JSON representation.

Parameters:path (Where to write the JSON.) –
Returns:
Return type:The JSON representation.
physt.histogram_nd module

Multi-dimensional histograms.

class physt.histogram_nd.Histogram2D(binnings, frequencies=None, **kwargs)

Bases: physt.histogram_nd.HistogramND

Specialized 2D variant of the general HistogramND class.

In contrast to general HistogramND, it is plottable.

T

Histogram with swapped axes.

Returns:
Return type:Histogram2D - a copy with swapped axes
numpy_like

Same result as would the numpy.histogram function return.

partial_normalize(axis: Union[int, str] = 0, inplace: bool = False)

Normalize in rows or columns.

Parameters:
  • axis (int or str) – Along which axis to sum (numpy-sense)
  • inplace (bool) – Update the object itself
Returns:

hist

Return type:

Histogram2D

class physt.histogram_nd.HistogramND(binnings: Iterable[Union[physt.binnings.BinningBase, numpy.ndarray, Iterable[T_co], int, float]], frequencies: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, dimension: Optional[int] = None, missed=0, **kwargs)

Bases: physt.histogram_base.HistogramBase

Multi-dimensional histogram data.

accumulate(axis: Union[int, str]) → physt.histogram_base.HistogramBase

Calculate cumulative frequencies along a certain axis.

Returns:new_hist
Return type:Histogram of the same type & size
bin_sizes
bins

List of bin matrices.

edges
fill(value, weight=1, **kwargs)

Update histogram with a new value.

It is an in-place operation.

Parameters:
  • value (Value to be added. Can be scalar or array depending on the histogram type.) –
  • weight (Weight of the value) –

Note

May change the dtype if weight is set

fill_n(values: Union[numpy.ndarray, Iterable[T_co], int, float], weights: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, dropna: bool = True, columns: bool = False)

Add more values at once.

Parameters:
  • values (array_like) – Values to add. Can be array of shape (count, ndim) or array of shape (ndim, count) [use columns=True] or something convertible to it
  • weights (array_like) – Weights for values (optional)
  • dropna (bool) – Whether to remove NaN values. If False and such value is met, exception is thrown.
  • columns (bool) – Signal that the data are transposed (in columns, instead of rows). This allows to pass list of arrays in values.
find_bin(value, axis: Union[int, str, None] = None) → Union[None, int, Tuple[int, ...]]

Index(-ices) of bin corresponding to a value.

Parameters:
  • value (Value with dimensionality equal to histogram) –
  • axis (If set, find axis along an axis. Otherwise, find bins along all axes.) – None = outside the bins
Returns:

Return type:

If axis is specified (or the histogram is 1D), a number. Otherwise, a tuple. If not available, None.

classmethod from_calculate_frequencies(data, binnings, weights=None, *, dtype=None, **kwargs)
get_bin_centers(axis: Union[int, str, None] = None) → numpy.ndarray
get_bin_edges(axis: Union[int, str, None] = None) → Tuple[numpy.ndarray, ...]
get_bin_left_edges(axis: Union[int, str, None] = None) → numpy.ndarray
get_bin_right_edges(axis: Union[int, str, None] = None) → numpy.ndarray
get_bin_widths(axis: Union[int, str, None] = None) → numpy.ndarray
numpy_bins

Numpy-like bins (if available).

numpy_like

Same result as would the numpy.histogram function return.

projection(*axes, **kwargs) → physt.histogram_base.HistogramBase

Reduce dimensionality by summing along axis/axes.

Parameters:
  • axes (Iterable[int or str]) – List of axes for the new histogram. Could be either numbers or names. Must contain at least one axis.
  • name (Optional[str] # TODO: Check) – Name for the projected histogram (default: same)
  • type (Optional[type] # TODO: Check) – If set, predefined class for the projection
Returns:

Return type:

HistogramND or Histogram2D or Histogram1D (or others in special cases)

select(axis: Union[int, str], index: Union[int, slice], *, force_copy: bool = False) → physt.histogram_base.HistogramBase

Select in an axis.

Parameters:
  • axis (Axis, in which we select.) –
  • index (Index of bin (as in numpy)) –
  • force_copy (If True, identity slice force a copy to be made.) –
total_size

The total size of the bin space.

Note

Perhaps not optimized, but should work also with transformed axes

physt.histogram_nd.calculate_frequencies(data: Union[numpy.ndarray, Iterable[T_co], int, float, None], binnings: Iterable[physt.binnings.BinningBase], weights: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, dtype: Union[type, numpy.dtype, str, None] = None) → Tuple[Optional[numpy.ndarray], Optional[numpy.ndarray], float]

“Get frequencies and bin errors from the data (n-dimensional variant).

Parameters:
  • data (2D array with ndim columns and row for each entry.) –
  • binnings (Binnings to apply in all axes.) –
  • weights (1D array of weights to assign to values.) – (If present, must have same length as the number of rows.)
  • dtype (Underlying type for the histogram.) – (If weights are specified, default is float. Otherwise int64.)
Returns:

  • frequencies (Frequencies (if data supplied))
  • errors2 (Errors squared if different from frequencies)
  • missing (scalar[dtype])

physt.special module
physt.special_histograms module

Transformed histograms.

These histograms use a transformation from input values to bins in a different coordinate system.

There are three basic classes:

  • PolarHistogram
  • CylindricalHistogram
  • SphericalHistogram

Apart from these, there are their projections into lower dimensions.

And of course, it is possible to re-use the general transforming functionality by adding TransformedHistogramMixin among the custom histogram class superclasses.

class physt.special_histograms.AzimuthalHistogram(binning: Union[physt.binnings.BinningBase, numpy.ndarray, Iterable[T_co], int, float], frequencies: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, errors2: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, keep_missed: bool = True, stats: Optional[Dict[str, float]] = None, overflow: Optional[float] = 0.0, underflow: Optional[float] = 0.0, inner_missed: Optional[float] = 0.0, axis_name: Optional[str] = None, **kwargs)

Bases: physt.special_histograms.TransformedHistogramMixin, physt.histogram1d.Histogram1D

Projection of polar histogram to 1D with respect to phi.

This is a special case of a 1D histogram with transformed coordinates.

bin_sizes
default_axis_names = ['phi']
default_init_values = {'radius': 1}
radius

Radius of the surface.

Useful for calculating densities.

source_ndim = 2
class physt.special_histograms.CylindricalHistogram(binnings: Iterable[Union[physt.binnings.BinningBase, numpy.ndarray, Iterable[T_co], int, float]], frequencies: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, dimension: Optional[int] = None, missed=0, **kwargs)

Bases: physt.special_histograms.TransformedHistogramMixin, physt.histogram_nd.HistogramND

3D histogram in cylindrical coordinates.

This is a special case of a 3D histogram with transformed coordinates: - r as radius projection to xy plane in the (0, +inf) range - phi as azimuthal angle (in the xy projection) in the (0, 2*pi) range - z as the last direction without modification, in (-inf, +inf) range

bin_sizes
default_axis_names = ['rho', 'phi', 'z']
projection(*axes, **kwargs)

Projection to lower-dimensional histogram.

The inheriting class should implement the _projection_class_map class attribute to suggest class for the projection. If the arguments don’t match any of the map keys, HistogramND is used.

source_ndim = 3
class physt.special_histograms.CylindricalSurfaceHistogram(binnings: Iterable[Union[physt.binnings.BinningBase, numpy.ndarray, Iterable[T_co], int, float]], frequencies: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, dimension: Optional[int] = None, missed=0, **kwargs)

Bases: physt.special_histograms.TransformedHistogramMixin, physt.histogram_nd.HistogramND

2D histogram in coordinates on cylinder surface.

This is a special case of a 2D histogram with transformed coordinates: - phi as azimuthal angle (in the xy projection) in the (0, 2*pi) range - z as the last direction without modification, in (-inf, +inf) range

radius

The radius of the surface. Useful for plotting

Type:float
bin_sizes
default_axis_names = ['rho', 'phi', 'z']
default_init_values = {'radius': 1}
radius

Radius of the cylindrical surface.

Useful for calculating densities.

source_ndim = 3
class physt.special_histograms.PolarHistogram(binnings: Iterable[Union[physt.binnings.BinningBase, numpy.ndarray, Iterable[T_co], int, float]], frequencies: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, dimension: Optional[int] = None, missed=0, **kwargs)

Bases: physt.special_histograms.TransformedHistogramMixin, physt.histogram_nd.HistogramND

2D histogram in polar coordinates.

This is a special case of a 2D histogram with transformed coordinates: - r as radius in the (0, +inf) range - phi as azimuthal angle in the (0, 2*pi) range

bin_sizes
default_axis_names = ['r', 'phi']
source_ndim = 2
class physt.special_histograms.RadialHistogram(binning: Union[physt.binnings.BinningBase, numpy.ndarray, Iterable[T_co], int, float], frequencies: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, errors2: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, keep_missed: bool = True, stats: Optional[Dict[str, float]] = None, overflow: Optional[float] = 0.0, underflow: Optional[float] = 0.0, inner_missed: Optional[float] = 0.0, axis_name: Optional[str] = None, **kwargs)

Bases: physt.special_histograms.TransformedHistogramMixin, physt.histogram1d.Histogram1D

Projection of polar histogram to 1D with respect to radius.

This is a special case of a 1D histogram with transformed coordinates.

bin_sizes
default_axis_names = ['r']
source_ndim = (2, 3)
class physt.special_histograms.SphericalHistogram(binnings: Iterable[Union[physt.binnings.BinningBase, numpy.ndarray, Iterable[T_co], int, float]], frequencies: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, dimension: Optional[int] = None, missed=0, **kwargs)

Bases: physt.special_histograms.TransformedHistogramMixin, physt.histogram_nd.HistogramND

3D histogram in spherical coordinates.

This is a special case of a 3D histogram with transformed coordinates: - r as radius in the (0, +inf) range - theta as angle between z axis and the vector, in the (0, 2*pi) range - phi as azimuthal angle (in the xy projection) in the (0, 2*pi) range

bin_sizes
default_axis_names = ['r', 'theta', 'phi']
source_ndim = 3
class physt.special_histograms.SphericalSurfaceHistogram(binnings: Iterable[Union[physt.binnings.BinningBase, numpy.ndarray, Iterable[T_co], int, float]], frequencies: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, dimension: Optional[int] = None, missed=0, **kwargs)

Bases: physt.special_histograms.TransformedHistogramMixin, physt.histogram_nd.HistogramND

2D histogram in spherical coordinates.

This is a special case of a 2D histogram with transformed coordinates: - theta as angle between z axis and the vector, in the (0, 2*pi) range - phi as azimuthal angle (in the xy projection) in the (0, 2*pi) range

bin_sizes
default_axis_names = ['theta', 'phi']
default_init_values = {'radius': 1}
radius

Radius of the surface.

Useful for calculating densities.

source_ndim = 3
class physt.special_histograms.TransformedHistogramMixin

Bases: abc.ABC

Histogram with non-cartesian (or otherwise transformed) axes.

This is a mixin, providing transform-aware find_bin, fill and fill_n.

When implementing, you are required to provide tbe following: - _transform_correct_dimension method to convert rectangular (it must be a classmethod) - bin_sizes property

In certain cases, you may want to have default axis names + projections. Look at PolarHistogram / SphericalHistogram / CylindricalHistogram as an example.

bin_sizes
fill(value: Union[numpy.ndarray, Iterable[T_co], int, float], weight: Union[numpy.ndarray, Iterable[T_co], int, float, None] = 1, *, transformed: bool = False, **kwargs)
fill_n(values: Union[numpy.ndarray, Iterable[T_co], int, float], weights: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, dropna: bool = True, transformed: bool = False, **kwargs)
find_bin(value, axis: Union[int, str, None] = None, transformed: bool = False)
Parameters:
  • value (array_like) – Value with dimensionality equal to histogram.
  • transformed (bool) – If true, the value is already transformed and has same axes as the bins.
projection(*axes, **kwargs)

Projection to lower-dimensional histogram.

The inheriting class should implement the _projection_class_map class attribute to suggest class for the projection. If the arguments don’t match any of the map keys, HistogramND is used.

classmethod transform(value) → Union[numpy.ndarray, float]

Convert cartesian (general) coordinates into internal ones.

Parameters:
  • value (array_like) – This method should accept both scalars and numpy arrays. If multiple values are to be transformed, it should of (nvalues, ndim) shape.
  • Note (Implement _) –
physt.special_histograms.azimuthal(xdata: Union[numpy.ndarray, Iterable[T_co], int, float], ydata: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, bins=16, range: Tuple[float, float] = (0, 6.283185307179586), dropna: bool = False, weights=None, transformed: bool = False, **kwargs) → physt.special_histograms.AzimuthalHistogram

Facade function to create an AzimuthalHistogram.

physt.special_histograms.azimuthal_histogram(xdata: Union[numpy.ndarray, Iterable[T_co], int, float], ydata: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, bins=16, range: Tuple[float, float] = (0, 6.283185307179586), dropna: bool = False, weights=None, transformed: bool = False, **kwargs) → physt.special_histograms.AzimuthalHistogram

Facade function to create an AzimuthalHistogram.

physt.special_histograms.cylindrical(data: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, rho_bins='numpy', phi_bins=16, z_bins='numpy', transformed: bool = False, dropna: bool = True, rho_range: Optional[Tuple[float, float]] = None, phi_range: Tuple[float, float] = (0, 6.283185307179586), weights: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, z_range: Optional[Tuple[float, float]] = None, **kwargs) → physt.special_histograms.CylindricalHistogram

Facade function to create a cylindrical histogram.

physt.special_histograms.cylindrical_histogram(data: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, rho_bins='numpy', phi_bins=16, z_bins='numpy', transformed: bool = False, dropna: bool = True, rho_range: Optional[Tuple[float, float]] = None, phi_range: Tuple[float, float] = (0, 6.283185307179586), weights: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, z_range: Optional[Tuple[float, float]] = None, **kwargs) → physt.special_histograms.CylindricalHistogram

Facade function to create a cylindrical histogram.

physt.special_histograms.cylindrical_surface(data=None, *, phi_bins=16, z_bins='numpy', transformed: bool = False, radius: Optional[float] = None, dropna: bool = False, weights=None, phi_range: Tuple[float, float] = (0, 6.283185307179586), z_range: Optional[Tuple[float, float]] = None, **kwargs) → physt.special_histograms.CylindricalSurfaceHistogram

Facade function to create a cylindrical surface histogram.

physt.special_histograms.cylindrical_surface_histogram(data=None, *, phi_bins=16, z_bins='numpy', transformed: bool = False, radius: Optional[float] = None, dropna: bool = False, weights=None, phi_range: Tuple[float, float] = (0, 6.283185307179586), z_range: Optional[Tuple[float, float]] = None, **kwargs) → physt.special_histograms.CylindricalSurfaceHistogram

Facade function to create a cylindrical surface histogram.

physt.special_histograms.polar(xdata: Union[numpy.ndarray, Iterable[T_co], int, float], ydata: Union[numpy.ndarray, Iterable[T_co], int, float], *, radial_bins='numpy', radial_range: Optional[Tuple[float, float]] = None, phi_bins=16, phi_range: Tuple[float, float] = (0, 6.283185307179586), dropna: bool = False, weights: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, transformed: bool = False, **kwargs) → physt.special_histograms.PolarHistogram

Facade construction function for the PolarHistogram.

physt.special_histograms.polar_histogram(xdata: Union[numpy.ndarray, Iterable[T_co], int, float], ydata: Union[numpy.ndarray, Iterable[T_co], int, float], *, radial_bins='numpy', radial_range: Optional[Tuple[float, float]] = None, phi_bins=16, phi_range: Tuple[float, float] = (0, 6.283185307179586), dropna: bool = False, weights: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, transformed: bool = False, **kwargs) → physt.special_histograms.PolarHistogram

Facade construction function for the PolarHistogram.

physt.special_histograms.radial(xdata: Union[numpy.ndarray, Iterable[T_co], int, float], ydata: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, zdata: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, bins='numpy', range: Optional[Tuple[float, float]] = None, dropna: bool = False, weights: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, transformed: bool = False, **kwargs) → physt.special_histograms.RadialHistogram

Facade function to create a radial histogram.

physt.special_histograms.radial_histogram(xdata: Union[numpy.ndarray, Iterable[T_co], int, float], ydata: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, zdata: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, *, bins='numpy', range: Optional[Tuple[float, float]] = None, dropna: bool = False, weights: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, transformed: bool = False, **kwargs) → physt.special_histograms.RadialHistogram

Facade function to create a radial histogram.

physt.special_histograms.spherical(data: Union[numpy.ndarray, Iterable[T_co], int, float], *, radial_bins='numpy', theta_bins=16, phi_bins=16, dropna: bool = True, transformed: bool = False, theta_range: Tuple[float, float] = (0, 3.141592653589793), phi_range: Tuple[float, float] = (0, 6.283185307179586), radial_range: Optional[Tuple[float, float]] = None, weights=None, **kwargs) → physt.special_histograms.SphericalHistogram

Facade function to create a speherical histogram.

physt.special_histograms.spherical_histogram(xdata: Union[numpy.ndarray, Iterable[T_co], int, float], ydata: Union[numpy.ndarray, Iterable[T_co], int, float], *, radial_bins='numpy', radial_range: Optional[Tuple[float, float]] = None, phi_bins=16, phi_range: Tuple[float, float] = (0, 6.283185307179586), dropna: bool = False, weights: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, transformed: bool = False, **kwargs) → physt.special_histograms.PolarHistogram

Facade construction function for the PolarHistogram.

physt.special_histograms.spherical_surface(data: Union[numpy.ndarray, Iterable[T_co], int, float], *, theta_bins=16, phi_bins=16, transformed: bool = False, radius: Optional[float] = None, dropna: bool = False, weights: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, theta_range: Tuple[float, float] = (0, 3.141592653589793), phi_range: Tuple[float, float] = (0, 6.283185307179586), **kwargs) → physt.special_histograms.SphericalSurfaceHistogram

Facade construction function for the SphericalSurfaceHistogram.

physt.special_histograms.spherical_surface_histogram(xdata: Union[numpy.ndarray, Iterable[T_co], int, float], ydata: Union[numpy.ndarray, Iterable[T_co], int, float], *, radial_bins='numpy', radial_range: Optional[Tuple[float, float]] = None, phi_bins=16, phi_range: Tuple[float, float] = (0, 6.283185307179586), dropna: bool = False, weights: Union[numpy.ndarray, Iterable[T_co], int, float, None] = None, transformed: bool = False, **kwargs) → physt.special_histograms.PolarHistogram

Facade construction function for the PolarHistogram.

physt.time module
physt.typing_aliases module

Definitions for type hints.

physt.util module

Various utility functions to support physt implementation.

These functions are mostly general Python functions, not specific for numerical computing, histogramming, etc.

physt.util.all_subclasses(cls: type) → Tuple[type, ...]

All subclasses of a class.

From: http://stackoverflow.com/a/17246726/2692780

physt.util.deprecation_alias(f, deprecated_name: str)

Provide a deprecated copy of a function.

Parameters:
  • f (The correct function) –
  • deprecated_name (The name the function will be given) –

Examples

>>> def new(x): return 1
>>> old = deprecated_name(new, "old")
physt.util.find_subclass(base: type, name: str) → type

Find a named subclass of a base class.

Uses only the class name without namespace.

physt.util.pop_many(a_dict: Dict[str, Any], *args, **kwargs) → Dict[str, Any]

Pop multiple items from a dictionary.

Parameters:
  • a_dict (Dictionary from which the items will popped) –
  • args (Keys which will be popped (and not included if not present)) –
  • kwargs (Keys + default value pairs (if key not found, this default is included)) –
Returns:

Return type:

A dictionary of collected items.

physt.version module

Package information.

Module contents
physt

P(i/y)thon h(i/y)stograms. Inspired (and based on) numpy.histogram, but designed for humans(TM) on steroids(TM).

(C) Jan Pipek, 2016-2021, MIT licence See https://github.com/janpipek/physt

Indices and tables