Welcome to histlite!¶
histlite is a histogram calculation and plotting library that is “lite” on data structures but rich in statistics and visualization features.
About this project¶
Introduction¶
histlite is a histogram calculation and plotting library that tries to be “lite” on data structures but rich in statistics and visualization features. So far, development has taken place during my (Mike Richman) time as a graduate student and post-doctoral researcher in the field of particle astrophysics — specifically, working with the IceCube Neutrino Observatory. Histlite is intended both to facilitate high-paced exploratory data analysis as well as to serve as a building block for potentially very complex maximum likelihood data analysis implementations.
The core design considerations are:
- It must be trivial to work with and interchange between 1D, 2D, or ND histograms.
- It should be as simple as possible to perform bin-wise arithmetic operations on one or more histograms; to perform sums, integrals, etc. and thus normalizations along one or more axes simultaneously; and to perform spline or user-defined functional fits
- It should be as simple as possible to achieve publication-quality plots.
The primary histogramming functionality consists of a thin wrapper around
numpy.histogramdd()
. Statistical tools leverage scipy but include
custom solutions for some use cases. (Importantly, error propagation is
currently handled manually but may be migrated to the uncertainties
package in the future.) Plotting is done using matplotlib.
Motivation¶
Many problems in statistics involve understanding the behavior of distributions of observables. Histograms are one of the best-understood and most commonly used tools for discerning the properties of those distributions. A key reason for preferring histograms over other tools such as KDE is that per-bin error (or uncertainty) estimates are robust and straightforward to reason about.
Numpy is a fantastic and an extremely general numerical library. Scipy provides many excellent statistical methods. Matplotlib offers beautiful and highly flexible visualizations suitable for publication-quality plots. However, as far as I know, to date there is no standard, widely used histogramming package for synthesizing functionality from these baseline tools.
Numpy provides reasonable (if not particularly optimal) methods for
producing histograms in the form of (values,edges
). Matplotlib is
capable of plotting histograms, but that is about as far as it goes. For
applications that require publication-quality plots only after non-trivial
histogram combinations or manipulations, matplotlib offers little support.
Scipy methods are happy to operate on bin centers and weights, but you’ll
have to keep track of those yourself.
This package is my attempt to simplify histogram calculation and plotting operations, primarily by bringing together functionality that is currently spread across the scientific Python software stack.
Alternatives¶
Alternatives include at least the following:
- plain numpy/matplotlib/scipy: You can get done what you want to get done, but you’ll almost certainly violate the DRY (don’t repeat yourself) guideline.
- physt: I only very recently became aware of this library which seeks to solve generally the same problems. Perhaps someday we will join forces. What I gather from the physt documentation is that it places much more emphasis on flexible (even optionally automatically-optimized) binning, but much less emphasis on working with histograms as probability density functions (PDFs) as required by much of my work.
- ROOT: Many particle physicists use this CERN product for data analysis, but… to be brief, even with its Python bindings it will remain unacceptable for serious work, especially but not only outside of particle physics, until it is rewritten or replaced in its entirety.
Features¶
- ND histograms: all histograms are N-dimensional, which keeps things simple and generic in the case of operations that involve mixed dimensionality (the same as numpy broadcasting).
- easy, flexible plotting: we don’t cover every use case, but you have sane defaults and fine-grained control over 1D and 2D histogram presentation.
- histogram arithmetic: histograms can be added, subtracted, multiplied, divided, log’d, sqrt’d, raised to powers, etc.
- normalizations and projections: histograms can be normalized to
integrate to 1; integrate to 1 accounting for varying bin volume
(
density=True
); or sum to (integrate=False
). Histograms can be projected to lower dimensionality by integrating or summing over one or more axes; or by computing quantiles along a specified axis; or by finding a specified containment interval along a specified axis, centered on the median and expressing the containment interval as errorbars. - error/uncertainty tracking per-bin errors are tracked in all but a handful of cases where their meaning is unclear (e.g. uncertainty on a quantile computed from some oother histogram). Errors are propagated in the standard way under histogram arithmetic.
- simple (“lite”) data model: a
histlite.Hist
is really just any curve with both values and errors defined spanning some specified bins. You don’t need to think about whether(h1 - h2) / (h1 + h2)
is a histogram or some other thing; all the same methods, operators, and plotting functions will work the same. - fitting interface: 1D and 2D (ND still a work in progress) splines can
be computed easily, producing
histlite.SplineFit
functors that know whether the underlying scipy splines were fitted in linear or log space on each axis. For convenience, arbitrary functions can be fit analogously toscipy.optimize.curve_fit()
, using that function internally. - evaluation interface:
histlite.Hist
’s can be generated by evaluating arbitrary functions, or instances ofhistlite.Hist
orhistlite.SplineFit
, on a specified grid. The resultinghistlite.Hist
’s can then of course be used in histogram arithmetic or plotting methds. - smoothing: histograms can be smoothed using
histlite.Hist.gaussian_filter()
orhistlite.Hist.gaussian_filter1d()
. Smooth “histograms” can be used to approximate Kernel Density Estimation (KDE) usinghistlite.kde()
.
Installing histlite¶
As of June, 2019, histlite is hosted on github, with releases accessible via PyPI. For a basic user installation, simply use:
# install from PyPI
pip install --user histlite
For a development install (i.e. you want to modify histlite yourself), navigate
to an appropriate development directory (e.g. $HOME/src
), and then:
# using SSH
git clone git@github.com:zgana/histlite.git
# using HTTPS
git clone https://github.com/zgana/histlite.git
# finally, install
pip install --user --editable ./histlite
Tutorials¶
Getting Started With Histlite¶
In this brief tutorial, we introduce the basics of creating, manipulating, and plotting histograms using histlite.
[1]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import histlite as hl
%matplotlib inline
[2]:
# I prefer a reordered color cycle
from cycler import cycler
mpl_colors_orig = np.array (plt.matplotlib.rcParamsDefault['axes.prop_cycle'].by_key()['color'])
mpl_colors = mpl_colors_orig[[0, 3, 2, 1, 4, 5, 6, 7]]
plt.rc ('axes', prop_cycle=cycler ('color', mpl_colors))
First histogram¶
For our first histogram, let’s use some normally-distributed random data and default binning in one dimension:
[3]:
x = np.random.normal (size=10**4)
[4]:
h = hl.hist (x)
[5]:
fig, ax = plt.subplots ()
hl.plot1d (ax, h)

We immediately notice that for this type of data, it may be useful to select symmetric binning including a central bin at 0:
[6]:
h = hl.hist (x, bins=51, range=(-5, 5))
[7]:
fig, ax = plt.subplots ()
hl.plot1d (ax, h)

Histlite keeps track of per-bin error estimates, and we can display them with many different styles. Let’s try plotting with errorbars, errorbands, errorbands with smooth lines rather than steps, and crosses:
[8]:
fig, ax = plt.subplots (2, 2, figsize=(8, 8))
hl.plot1d (ax[0][0], h, errorbars=True)
hl.plot1d (ax[0][1], h, errorbands=True)
hl.plot1d (ax[1][0], h, errorbands=True, drawstyle='default')
hl.plot1d (ax[1][1], h, crosses=True)

You have probably noticed that the plotting style differs from matplotlib’s standard:
[9]:
fig, ax = plt.subplots (1, 2, figsize=(8, 4))
hl.plot1d (ax[0], h)
ax[1].hist (x, bins=h.bins[0]);

As we will see, much of the design of histlite is centered on simply dealing with binned rather than smooth curves. However, the classic matplotlib style (and more) can still be obtained:
[10]:
fig, ax = plt.subplots (1, 2, figsize=(8, 4))
hl.fill_between (ax[0], 0, h)
ax[0].set_ylim (0)
ax[1].hist (x, bins=h.bins[0]);

Data model¶
The “light” in histlite refers to the deliberately simple data structure. A histlite histogram is essentially just a bundle of bins, values, and error estimates. These are the required initialization arguments which are determined from data and arguments to factory functions such as histlite.hist()
. They are also read-only properties of each Hist
:
[11]:
print (h.bins)
print (h.values)
print (h.errors)
[array([-5. , -4.80392157, -4.60784314, -4.41176471, -4.21568627,
-4.01960784, -3.82352941, -3.62745098, -3.43137255, -3.23529412,
-3.03921569, -2.84313725, -2.64705882, -2.45098039, -2.25490196,
-2.05882353, -1.8627451 , -1.66666667, -1.47058824, -1.2745098 ,
-1.07843137, -0.88235294, -0.68627451, -0.49019608, -0.29411765,
-0.09803922, 0.09803922, 0.29411765, 0.49019608, 0.68627451,
0.88235294, 1.07843137, 1.2745098 , 1.47058824, 1.66666667,
1.8627451 , 2.05882353, 2.25490196, 2.45098039, 2.64705882,
2.84313725, 3.03921569, 3.23529412, 3.43137255, 3.62745098,
3.82352941, 4.01960784, 4.21568627, 4.41176471, 4.60784314,
4.80392157, 5. ])]
[ 0. 0. 0. 0. 0. 1. 1. 1. 4. 11. 15. 27. 24. 38.
75. 132. 159. 230. 313. 386. 461. 542. 641. 706. 777. 788. 818. 700.
666. 619. 448. 381. 325. 240. 166. 108. 85. 39. 29. 23. 10. 4.
3. 3. 0. 0. 1. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 1.
1. 1. 2. 3.31662479 3.87298335 5.19615242
4.89897949 6.164414 8.66025404 11.48912529 12.60952021 15.16575089
17.69180601 19.6468827 21.47091055 23.28089345 25.3179778 26.57066051
27.87471973 28.0713377 28.60069929 26.45751311 25.8069758 24.87971061
21.16601049 19.5192213 18.02775638 15.49193338 12.88409873 10.39230485
9.21954446 6.244998 5.38516481 4.79583152 3.16227766 2.
1.73205081 1.73205081 0. 0. 1. 0.
0. 0. 0. ]
Note that bins
is a list of per-axis ndarrays. Here we have a 1D histogram, and so a single axis, but in general histlite histograms are N-dimensional. Note also that the errors are \(\sqrt{N}\) per bin for unweighted histograms, which is a special case for a general implementation where the errors are \(\sqrt{\left(\sum_i w_i^2\right)}\) for events \(\{i\}\) per bin for weighted histograms.
On construction, it is determined whether bins are log-spaced (which can be conveniently requested of histlite.hist()
, as we will see later). This information is exposed as an array of per-axis bools:
[12]:
print (h.log)
[False]
We also access to the per-axis number of bins, binning ranges, and bin centers:
[13]:
print (h.n_bins)
print (h.range)
print (h.centers)
[51]
[(-5.0, 5.0)]
[array([-4.90196078, -4.70588235, -4.50980392, -4.31372549, -4.11764706,
-3.92156863, -3.7254902 , -3.52941176, -3.33333333, -3.1372549 ,
-2.94117647, -2.74509804, -2.54901961, -2.35294118, -2.15686275,
-1.96078431, -1.76470588, -1.56862745, -1.37254902, -1.17647059,
-0.98039216, -0.78431373, -0.58823529, -0.39215686, -0.19607843,
0. , 0.19607843, 0.39215686, 0.58823529, 0.78431373,
0.98039216, 1.17647059, 1.37254902, 1.56862745, 1.76470588,
1.96078431, 2.15686275, 2.35294118, 2.54901961, 2.74509804,
2.94117647, 3.1372549 , 3.33333333, 3.52941176, 3.7254902 ,
3.92156863, 4.11764706, 4.31372549, 4.50980392, 4.70588235,
4.90196078])]
Working with multiple histograms¶
Let’s generate data for a second histogram:
[14]:
x2 = np.random.normal (loc=1, scale=2, size=int (x.size/2))
We would like a second histogram, h2
, to be used with the original one, h
. For this, we require the binning to match. One way this can be achieved is to bin x2
using h.bins
:
[15]:
h2 = hl.hist (x2, bins=h.bins)
[16]:
fig, ax = plt.subplots ()
hl.plot1d (ax, h)
hl.plot1d (ax, h2)

This can also be expressed as follows:
[17]:
h2 = hl.hist_like (h, x2)
[18]:
fig, ax = plt.subplots ()
hl.plot1d (ax, h)
hl.plot1d (ax, h2)

We can perform a number of intuitive operations using these two histograms. The most obvious application is to sum them:
[19]:
fig, axs = plt.subplots (1, 2, figsize=(8, 4))
ax = axs[0]
hl.plot1d (ax, h)
hl.plot1d (ax, h2)
hl.plot1d (ax, h + h2, color='k')
ax = axs[1]
hl.stack1d (ax, [h, h2], colors='C0 C1'.split (), alpha=.67, lw=0)
hl.plot1d (ax, h + h2, color='k')

We can also create more elaborate constructs — and standard error propagation will be applied throughout:
[20]:
fig, ax = plt.subplots ()
hl.plot1d (ax, h / (h + h2), crosses=True)

An observant reader may notice that in this particular case, the errors on h
and h + h2
are correlated, and thus naive error propagation overestimates the errors. Histlite provides a convenient way to obtain error estimates with proper coverage for the special case of \(\text{denominator} = \text{numerator} + \text{something else}\):
[21]:
fig, ax = plt.subplots ()
hl.plot1d (ax, h.efficiency (h + h2), crosses=True)

We also see from the above examples that histlite.Hist
is not strictly required to be meaningfully interpretted as a “histogram” in the classic sense. For example, it is perfectly happy with negative values:
[22]:
fig, ax = plt.subplots ()
hl.plot1d (ax, h - h2, crosses=True)

One can imagine constructing a highly descriptive class hierarchy to represent all manner of data structures that boil down to “values and errors per bin”. For histlite, we made the design decision to represent all of these with histlite.Hist
to avoid the sort of complexity we see in, for example, integer to float promotion under division.
First 2D histogram¶
As mentioned above, histlite is fully generalized to N-dimensions throughout, with no special-casing for specific numbers of dimensions. Let’s try constructing a 2D histogram:
[23]:
y = np.random.normal (loc=0, scale=2, size=x.size)
[24]:
hh = hl.hist ((x, y), bins=51, range=(-5, 5))
We can plot this histogram, hh
, easily like so:
[25]:
fig, ax = plt.subplots ()
ax.set_aspect ('equal')
hl.plot2d (ax, hh)
[25]:
<matplotlib.collections.QuadMesh at 0x7fbb357285c0>

We have a number of options for customizing this plot. Let’s try adding a colorbar, log-scaling the colormap, and specifying the color limits:
[26]:
fig, ax = plt.subplots ()
ax.set_aspect ('equal')
hl.plot2d (ax, hh, cbar=True, log=True, vmin=1, vmax=1e2)
[26]:
{'colormesh': <matplotlib.collections.QuadMesh at 0x7fbb35681358>,
'colorbar': <matplotlib.colorbar.Colorbar at 0x7fbb356a7940>}

Unlike the histlite.plot1d()
case, it is often convenient to customize the underlying matplotlib objects after histlite.plot2d()
is finished. For example, we may want to label the colorbar. We can do this by digging into the dictionary returned by histlite.plot2d()
:
[27]:
fig, ax = plt.subplots ()
ax.set_aspect ('equal')
d = hl.plot2d (ax, hh, cbar=True, log=True, vmin=1, vmax=1e2)
d['colorbar'].set_label ('counts')

Brief survey of additional features¶
In this tutorial, we try to cover the basics that will let you start making plots as quickly as possible. Histlite has much more to offer. We will cover the details in other notebooks, but here we offer a teaser with little to no explanation.
Normalization¶
[28]:
fig, ax = plt.subplots ()
hn = h.normalize ()
print (hn)
print ('integral of hn is', hn.integrate ().values)
hl.plot1d (ax, hn, crosses=True)
Hist(51 bins in [-5.0,5.0]) with sum 5.1, 11 empty bins, and 0 non-finite values
integral of hn is 1.0

[29]:
fig, ax = plt.subplots ()
ax.set_aspect ('equal')
hl.plot2d (ax, hh.normalize (), cbar=True)
[29]:
{'colormesh': <matplotlib.collections.QuadMesh at 0x7fbb35c7a9e8>,
'colorbar': <matplotlib.colorbar.Colorbar at 0x7fbb35937320>}

Pseudo-KDE (kernel density estimation)¶
[30]:
fig, (ax1, ax2) = plt.subplots (1, 2, figsize=(8, 4), gridspec_kw=dict (width_ratios=(2,1)))
hl.plot1d (ax1, hl.kde (x))
hl.plot2d (ax2, hl.kde ((x, y)), cbar=True)
ax2.set_aspect ('equal')
plt.tight_layout ()

Axis transformation¶
[31]:
fig, ax = plt.subplots ()
h_exp = h.transform_bins (lambda x: 10**x)
ax.semilogx ()
hl.plot1d (ax, h_exp, crosses=True)

Cumulative sums¶
[32]:
fig, (ax1, ax2) = plt.subplots (1, 2, figsize=(8, 4))
hl.plot1d (ax1, h.cumsum () / h.sum ())
hl.plot2d (ax2, hh.cumsum (), cbar=True)
plt.tight_layout ()

Projections¶
[33]:
fig, (ax1, ax2, ax3) = plt.subplots (1, 3, figsize=(12,4))
hl.plot1d (ax1, hh.project ([0]))
hl.plot1d (ax1, h, ls=':') # slightly more elements since some y's were under/overflow
for frac in (.16, .5, .84):
hl.plot1d (ax2, hh.contain (-1, frac))
hl.plot1d (ax3, hh.contain_project (-1), # 1σ errors by default
drawstyle='default', errorbands=True)

[34]:
fig, ax = plt.subplots ()
x3 = np.random.uniform (-5, 5, size=10**4)
y3 = np.random.normal (loc=x3**2, scale=1 + np.abs (x3))
hl.plot1d (ax, hl.hist ((x3, y3), bins=100).contain_project (-1),
errorbands=True, drawstyle='default')

Values transformation¶
[35]:
fig, ax = plt.subplots ()
kw = dict (crosses=True)
hl.plot1d (ax, h**2, label='squared', **kw)
hl.plot1d (ax, h, label='original', color='k', lw=3, **kw)
hl.plot1d (ax, h.sqrt (), label='sqrt', **kw)
hl.plot1d (ax, h.ln (), label='natural log', **kw)
hl.plot1d (ax, h.log10 (), label='log10', **kw)
ax.semilogy (nonposy='clip')
ax.legend ();

Transpose¶
[36]:
fig, ax = plt.subplots (1, 2, figsize=(8, 8))
hl.plot2d (ax[0], hh)
hl.plot2d (ax[1], hh.T)
for a in ax:
a.set_aspect ('equal')

Smoothing¶
[37]:
fig, ax = plt.subplots (1, 2, figsize=(8, 8))
hl.plot2d (ax[0], hh)
hl.plot2d (ax[1], hh.gaussian_filter (1))
for a in ax:
a.set_aspect ('equal')

Spline Fit¶
[38]:
fig, ax = plt.subplots ()
hl.plot1d (ax, h, crosses=True)
X = np.linspace (-5, 5, 1000)
s = h.spline_fit ()
ax.plot (X, s(X)) # naturally, overshoots in region without data...
[38]:
[<matplotlib.lines.Line2D at 0x7fbb3580d710>]

Function fit¶
[39]:
fig, ax = plt.subplots ()
hl.plot1d (ax, hn, crosses=True)
params, cov = hn.curve_fit (lambda x, loc, scale: stats.norm.pdf (x, loc, scale))
ax.plot (X, stats.norm.pdf (X, *params))
[39]:
[<matplotlib.lines.Line2D at 0x7fbb358a2630>]

Slicing in binning units¶
[40]:
fig, ax = plt.subplots (1, 2, figsize=(8, 4))
hl.plot1d (ax[0], h[0:])
hl.plot1d (ax[0], hh[0])
hl.plot2d (ax[1], hh[0:,0:])
[40]:
<matplotlib.collections.QuadMesh at 0x7fbb35849160>

PDFs and Normalization¶
In this tutorial, we discuss the treatment of histograms as probality density functions (PDFs).
[1]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import histlite as hl
%matplotlib inline
[2]:
# I prefer a reordered color cycle
from cycler import cycler
mpl_colors_orig = np.array (plt.matplotlib.rcParamsDefault['axes.prop_cycle'].by_key()['color'])
mpl_colors = mpl_colors_orig[[0, 3, 2, 1, 4, 5, 6, 7]]
plt.rc ('axes', prop_cycle=cycler ('color', mpl_colors))
Histogram as PDF¶
We begin by designing artificial “data” distributions with which to test various features. First, a non-trivial 1D distribution:
[3]:
props1d = dict(loc=np.r_[20, 33, 60], scale=np.r_[5, 8, 13], norm=np.r_[.2, .3, 0.5])
[4]:
def pdf1d(x):
pdf0 = props1d['norm'][0] * stats.norm.pdf(x, props1d['loc'][0], props1d['scale'][0])
pdf1 = props1d['norm'][1] * stats.norm.pdf(x, props1d['loc'][1], props1d['scale'][1])
pdf2 = props1d['norm'][2] * stats.norm.pdf(x, props1d['loc'][2], props1d['scale'][2])
return pdf0 + pdf1 + pdf2
[5]:
fig, ax = plt.subplots()
x = np.linspace(0, 100, 300)
ax.plot(x, pdf1d(x), 'k:')
ax.set_xlabel('$x$')
ax.set_ylabel('PDF');

We can obtain random numbers distributed according to this distribution like so:
[6]:
def random1d(N):
loc = props1d['loc']
scale = props1d['scale']
norm = props1d['norm']
which = np.random.choice(3, p=norm, size=N)
return np.random.normal(loc[which], scale[which])
[7]:
np.random.seed(1)
X = random1d(10000)
The histogram of these values looks like so:
[8]:
h1d = hl.hist(X, bins=50, range=(0, 100))
h1d
[8]:
Hist(50 bins in [0.0,100.0], with sum 9993.0, 1 empty bins, and 0 non-finite values)
[9]:
fig, ax = plt.subplots()
hl.plot1d(h1d, crosses=True)
ax.set_xlabel('$x$')
ax.set_ylabel('counts per bin');

We can obtain a normalized histogram like so:
[10]:
p1d = h1d.normalize()
[11]:
fig, ax = plt.subplots()
hl.plot1d(p1d, crosses=True)
ax.set_xlabel('$x$')
ax.set_ylabel('PDF');

The difference between h1d
and p1d
here is simply a scalar factor; p1d
represents the PDF \(p(x)\). We can compare the integrals of each histogram:
[12]:
print(h1d.integrate())
print(p1d.integrate())
Hist(with value 19986.0)
Hist(with value 1.0)
Notice that integrating p1d
yields a value of 1. Notice also that the result of the integral is still a Hist
instance, similar to certain dimensionality-agnostic numpy operations which yield 0-dimensional ndarray
instances. This generality is useful because, as we will see below, integration and normalization can be performed along specified axes.
We can also compute the probability mass function (PMF) by normalizing such that the sum, rather than integral, will be unity:
[13]:
pmf1d = h1d.normalize(integrate=False)
pmf1d
[13]:
Hist(50 bins in [0.0,100.0], with sum 1.0, 1 empty bins, and 0 non-finite values)
So while the value for each bin in p1d
represents the probability density across that bin, \(p(x)\), the values of pmf1d
represent the total probability within each bin, \(\text{pmf}(i)=\int_{x_i}^{x_{i+1}}p(x)\,dx\).
We can see that the normalized histogram is a decent representation of the underlying distribution:
[14]:
fig, ax = plt.subplots()
x = np.linspace(0, 100, 300)
ax.plot(x, pdf1d(x), 'k:', label='True PDF')
hl.plot1d(p1d, crosses=True, label='50 bins')
ax.set_xlabel('$x$')
ax.set_ylabel('PDF')
ax.legend();

PDF Modeling¶
We might be interested in inferring a smooth PDF given a histogram of some real (in this case, “real”) data. One simple solution is to fit a spline:
[15]:
s1d = p1d.spline_fit()
[16]:
fig, ax = plt.subplots()
x = np.linspace(0, 100, 300)
ax.plot(x, pdf1d(x), 'k:', label='True PDF')
hl.plot1d(p1d, crosses=True, label='50 bins')
ax.plot(x, s1d(x), label='Spline Fit')
ax.set_xlabel('$x$')
ax.set_ylabel('PDF')
ax.legend();

What is s1d
?
[17]:
type(s1d)
[17]:
histlite.SplineFit
histlite spline fits are thin wrappers around scipy.interpolate
objects. In this case, we have a UnivariateSpline
:
[18]:
s1d.spline
[18]:
<scipy.interpolate.fitpack2.UnivariateSpline at 0x7f9b7a537278>
One nice feature of this wrapper is that it is readily interoperable with ordinary Hist
objects:
[19]:
fig, ax = plt.subplots()
hl.plot1d(s1d.eval(bins=p1d.bins) / p1d, crosses=True)
ax.axhline(1, color='k', ls=':', zorder=-1)
ax.set_xlabel('$x$')
ax.set_ylabel('spline/hist');

It can sometimes be convenient to fit the spline in log-value space, which can be very handy e.g. when we know the underlying function is strictly positive. Meanwhile, we also have access to the constructor kwargs for the internally-used scipy.interpolate
object:
[20]:
s1d_pos = p1d.spline_fit(log=True, s=30)
[21]:
fig, ax = plt.subplots()
x = np.linspace(0, 100, 300)
ax.plot(x, pdf1d(x), 'k:', label='True PDF')
hl.plot1d(p1d, crosses=True, label='50 bins')
ax.plot(x, s1d_pos(x), label='Spline Fit (log-value fit)')
ax.set_xlabel('$x$')
ax.set_ylabel('PDF')
ax.legend();

We might, however, notice (or perhaps know a priori) that the function is in fact a combination of three Gaussians. Then we can perform a fit to an analytic model:
[22]:
def model_p1d(x, m0, m1, m2, s0, s1, s2, f0, f1):
f2 = 1 - f0 - f1
pdf0 = f0 * stats.norm.pdf(x, m0, s0)
pdf1 = f1 * stats.norm.pdf(x, m1, s1)
pdf2 = f2 * stats.norm.pdf(x, m2, s2)
return pdf0 + pdf1 + pdf2
[23]:
params1d, cov = p1d.curve_fit(model_p1d, p0=(20, 40, 60, 5, 5, 5, .3, .3))
params1d
[23]:
array([19.70170103, 32.76340277, 60.96181126, 4.83091852, 9.26921554,
12.88940782, 0.16090333, 0.36418666])
How do these compare with the true values?
[24]:
print(params1d)
true_params1d = np.r_[props1d['loc'], props1d['scale'], props1d['norm'][:-1]]
print(true_params1d)
print(params1d / true_params1d)
[19.70170103 32.76340277 60.96181126 4.83091852 9.26921554 12.88940782
0.16090333 0.36418666]
[20. 33. 60. 5. 8. 13. 0.2 0.3]
[0.98508505 0.99283039 1.01603019 0.9661837 1.15865194 0.99149291
0.80451667 1.21395554]
And how does the model fit actually look?
[25]:
fig, ax = plt.subplots()
x = np.linspace(0, 100, 300)
ax.plot(x, pdf1d(x), 'k:', label='True PDF')
hl.plot1d(p1d, crosses=True, label='50 bins')
ax.plot(x, model_p1d(x, *params1d), label='Model Fit')
ax.set_xlabel('$x$')
ax.set_ylabel('PDF')
ax.legend();

Finally, note that we can often get a reasonable representation of a distribution using a sort of approximate KDE provided by histlite:
[26]:
# kernel expressed as fraction of axis range
k1d = hl.kde(X, kernel=.02)
k1d
[26]:
Hist(1000 bins in [2.820370949977928,106.97260109208092], with sum 9.601330654520044, 0 empty bins, and 0 non-finite values)
[27]:
fig, ax = plt.subplots()
x = np.linspace(0, 100, 300)
ax.plot(x, pdf1d(x), 'k:', label='True PDF')
hl.plot1d(p1d, crosses=True, label='50 bins')
hl.plot1d(k1d, label='KDE')
ax.set_xlabel('$x$')
ax.set_ylabel('PDF')
ax.legend();

The approximate KDE implemented by histlite is achieved by starting with a densely binned histogram and then using scipy.ndimage
to apply a Gaussian smoothing with a kernel \(\sigma\) expressed as a percent of the axis range — and then finally (by default) normalized to integrate to unity.
It is up to you, of course, to determine the best way to model your distributions. Let’s wrap up this section by comparing the various models we came up with — by the way, notice that Hist
instances are callable, accepting one positional argument per dimension, returning the value of the corresponding bin(s):
[28]:
fig, (ax, rax) = plt.subplots(2, 1, figsize=(6,5), sharex=True, gridspec_kw=dict(height_ratios=[2,1]))
x = np.linspace(0, 100, 300)
# with multiple axes in play, we need to specify ax (so, not rax) here
hl.plot1d(ax, p1d, color='k', crosses=True, label='50 bins')
ax.plot(x, s1d(x), label='Spline Fit')
ax.plot(x, s1d_pos(x), label='Spline Fit (log-value fit)')
ax.plot(x, model_p1d(x, *params1d), label='Model Fit')
ax.plot(x, k1d(x), label='KDE')
ax.plot(x, pdf1d(x), 'k:', label='True PDF')
rax.plot(x, s1d(x) / pdf1d(x))
rax.plot(x, model_p1d(x, *params1d) / pdf1d(x))
rax.plot(x, k1d(x) / pdf1d(x))
rax.axhline(1, color='k', ls=':', zorder=-1)
ax.legend(ncol=2)
ax.set_ylim(-0.001, .035)
rax.set_ylim(.5, 1.5)
ax.set_ylabel('PDF')
rax.set_xlabel('$x$')
rax.set_ylabel(r'ratio wrt true');

PDF Sampling¶
For many applications, it is useful to be able to randomly sample from a histogram. You may specify a seed
for the random number generator (this uses a local random state and does not affect the behavior of np.random.*
functions). The histogram need not, but may, already be normalized in some way:
[29]:
N, seed = 10, 1
print(h1d.sample(N, seed=seed))
print(p1d.sample(N, seed=seed))
print(pmf1d.sample(N, seed=seed))
[array([36.83838903, 59.370439 , 2.4089045 , 31.75623487, 20.05477519,
19.34093502, 22.8346096 , 33.11737966, 34.28077388, 44.39620298])]
[array([36.83838903, 59.370439 , 2.4089045 , 31.75623487, 20.05477519,
19.34093502, 22.8346096 , 33.11737966, 34.28077388, 44.39620298])]
[array([36.83838903, 59.370439 , 2.4089045 , 31.75623487, 20.05477519,
19.34093502, 22.8346096 , 33.11737966, 34.28077388, 44.39620298])]
If you already have a np.random.RandomState
instance, it can also be used:
[30]:
random = np.random.RandomState(seed)
print(h1d.sample(N, random=random))
[array([36.83838903, 59.370439 , 2.4089045 , 31.75623487, 20.05477519,
19.34093502, 22.8346096 , 33.11737966, 34.28077388, 44.39620298])]
Notice that, similar to Hist.bins
, the return here is a sequence; in the case of higher dimensionality histograms, the result is one array per dimension.
At present, there is no implementation for randomly sampling directly from a SplineFit
. However, sampling is of course possible after evaluating the spline on a grid:
[31]:
print(s1d_pos.eval(bins=1e3).sample(N, seed=seed))
[array([36.74191945, 58.16852195, 2.42044522, 29.98781174, 21.60273876,
18.76704675, 23.64173048, 32.55586898, 35.51403869, 44.71981015])]
And of course sampling also works for the densely binned pseudo-KDE:
[32]:
print(k1d.sample(N, seed=seed))
[array([36.81765802, 58.19657229, 3.1541218 , 30.09556091, 21.57062485,
18.72134062, 23.59012797, 32.56194533, 35.53879283, 44.81435241])]
You may have noticed that the random samples are very similar too each other, even though the details of the distributions differ, as long as the same random state is used. This is because, internally, inverse transform sampling is used, with only a single random.uniform(0, 1, N)
call driving the randomness. In this case, all the CDFs are very similar.
Speaking of CDFs, these are straightforward to compute and plot as well:
[33]:
fig, ax = plt.subplots()
hl.plot1d(pmf1d.cumsum(), color='k', label='50 bins')
hl.plot1d(k1d.cumsum(normalize=True), label='KDE')
ax.set_xlabel(r'$x$')
ax.set_ylabel('CDF')
ax.legend();

[34]:
from importlib import reload
reload(hl)
[34]:
<module 'histlite' from '/home/mike/src/histlite/histlite/__init__.py'>
2D Histograms¶
histlite provides all of the above functionality for 2D histograms — and then some! Let’s walk through each feature, using an 2D elliptical Gaussian as our example:
[35]:
np.random.seed(1)
N = 10000
X0 = np.random.normal(0, np.exp(1), N)
Y0 = np.random.normal(0, 1, N)
theta = np.radians(30)
X = X0 * np.cos(theta) - Y0 * np.sin(theta)
Y = X0 * np.sin(theta) + Y0 * np.cos(theta)
[36]:
h2d = hl.hist((X, Y), bins=50, range=((-6,6), (-6,6)))
h2d
[36]:
Hist(50 bins in [-6.0,6.0], 50 bins in [-6.0,6.0], with sum 9876.0, 1382 empty bins, and 0 non-finite values)
[37]:
fig, ax = plt.subplots()
hl.plot2d(h2d, cbar=True, clabel='counts per bin')
ax.set_xlabel('$x$')
ax.set_ylabel('$y$');

Joint and Conditional PDFs
We can compute the joint PDF \(p(x,y)\):
[38]:
p2d = h2d.normalize()
[39]:
fig, ax = plt.subplots()
hl.plot2d(p2d, cbar=True, clabel='$p(x,y)$')
ax.set_xlabel('$x$')
ax.set_ylabel('$y$');

Or the joint PMF:
[40]:
pmf2d = p2d.normalize(integrate=False)
[41]:
fig, ax = plt.subplots()
hl.plot2d(pmf2d, cbar=True, clabel='$\mathsf{pmf}(x,y)$')
ax.set_xlabel('$x$')
ax.set_ylabel('$y$');

For multi-dimensional histograms, we can also work with conditional PDFs:
[42]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10,4))
hl.plot2d(ax1, h2d.normalize(0), cbar=True, vmax=.5, clabel='$p(x|y)$')
hl.plot2d(ax2, h2d.normalize(1), cbar=True, vmax=.5, clabel='$p(y|x)$')
for ax in (ax1, ax2):
ax.set_xlabel('$x$')
ax.set_ylabel('$y$')
plt.tight_layout();

Depending on the application, the presence of empty bins in \(p(x|y)\) may be a problem. One way to address these is with the to_finite()
method:
[43]:
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(15,4))
hl.plot2d(ax1, h2d.normalize(0).to_finite(), cbar=True, vmax=.5,
clabel='$p(x|y)$ (nan $\to$ 0)')
hl.plot2d(ax2, h2d.normalize(0).to_finite(nan=np.min), cbar=True, vmax=.5,
clabel='$p(x|y)$ (nan $\to$ min of finite values)')
hl.plot2d(ax3, h2d.normalize(0).to_finite(nan=np.mean), cbar=True, vmax=.5,
clabel='$p(x|y)$ (nan $\to$ mean of finite values)')
for ax in (ax1, ax2, ax3):
ax.set_xlabel('$x$')
ax.set_ylabel('$y$')
plt.tight_layout();

Alternatively, if you are sure you know what you’re doing, you can modify the values directly:
[44]:
fig, ax = plt.subplots()
p2d_x_y = h2d.normalize(0)
p2d_x_y.values[~np.isfinite(p2d_x_y.values)] = 0.2
hl.plot2d(p2d_x_y, cbar=True, vmax=.5,
clabel='$p(x|y)$ (nan $\to$ 0.2)')
ax.set_xlabel('$x$')
ax.set_ylabel('$y$');

Modeling
We have spline, curve fitting, and KDE here as well. Here’s a spline fit:
[45]:
s2d = p2d.spline_fit(s=.03, kx=2, ky=2)
[46]:
fig, ax = plt.subplots()
hl.plot2d(s2d.eval(bins=200), cbar=True, clabel='$p(x,y)$: spline')
ax.set_xlabel('$x$')
ax.set_ylabel('$y$');

KDE:
[47]:
k2d = hl.kde((X,Y), range=p2d.range, kernel=.03)
[48]:
fig, ax = plt.subplots()
hl.plot2d(k2d, cbar=True, clabel='$p(x,y)$: spline')
ax.set_xlabel('$x$')
ax.set_ylabel('$y$');

Analytical model:
[49]:
def model_p2d(XY, loc1, scale1, loc2, scale2, theta):
X, Y = XY
X0 = X * np.cos(theta) + Y * np.sin(theta)
Y0 = -X * np.sin(theta) + Y * np.cos(theta)
return stats.norm.pdf(X0, loc1, scale1) * stats.norm.pdf(Y0, loc2, scale2)
[50]:
params2d, cov = p2d.curve_fit(model_p2d, p0=(0, 3, 0, 1, np.radians(10)))
params2d
[50]:
array([0.09721215, 2.64869243, 0.01828913, 0.99518638, 0.5387843 ])
[51]:
np.degrees(params2d[-1])
[51]:
30.870066424450698
Now might be a good time to point out that we can also construct a histogram from a callable function. Let’s try it with this model, and then plot it:
[52]:
h_model_p2d = hl.hist_from_eval(
(lambda x, y: model_p2d((x,y), *params2d)),
vectorize=False, bins=k2d.bins)
[53]:
fig, ax = plt.subplots()
hl.plot2d(h_model_p2d, cbar=True, clabel='$p(x,y)$: model fit')
ax.set_xlabel('$x$')
ax.set_ylabel('$y$');

We can use this to plot the “truth” as well:
[54]:
h_truth_p2d = hl.hist_from_eval(
(lambda x, y: model_p2d((x,y), 0, np.exp(1), 0, 1, np.radians(30))),
vectorize=False, bins=k2d.bins)
[55]:
fig, ax = plt.subplots()
hl.plot2d(h_truth_p2d, cbar=True, clabel='$p(x,y)$: truth')
ax.set_xlabel('$x$')
ax.set_ylabel('$y$');

Let’s sum this up by plotting the various parameterizations:
[56]:
fig, axs = plt.subplots(2, 3, figsize=(15, 8))
axs = np.ravel(axs)
kw = dict(cbar=True, vmax=.06)
bins = p2d.bins
hl.plot2d(axs[0], p2d,
clabel='$p(x,y)$: hist', **kw)
hl.plot2d(axs[1], k2d,
clabel='$p(x,y)$: KDE', **kw)
hl.plot2d(axs[3], s2d.eval(bins=200),
clabel='$p(x,y)$: spline fit', **kw)
hl.plot2d(axs[4], h_model_p2d,
clabel='$p(x,y)$: model fit', **kw)
hl.plot2d(axs[5], h_truth_p2d,
clabel='$p(x,y)$: truth', **kw)
for ax in axs:
ax.set_xlabel(r'$x$')
ax.set_ylabel(r'$y$')
axs[2].set_visible(False)

and their ratios with respect to the truth:
[57]:
fig, axs = plt.subplots(2, 2, figsize=(10, 8))
axs = np.ravel(axs)
kw = dict(cbar=True, log=True, vmin=1e-1, vmax=1e1, cmap='RdBu_r')
bins = p2d.bins
hl.plot2d(axs[0], p2d.eval(bins=bins) / h_truth_p2d.eval(bins=bins),
clabel='$p(x,y)/\mathsf{truth}$: hist', **kw)
hl.plot2d(axs[1], k2d.eval(bins=bins) / h_truth_p2d.eval(bins=bins),
clabel='$p(x,y)/\mathsf{truth}$: KDE', **kw)
hl.plot2d(axs[2], s2d.eval(bins=bins) / h_truth_p2d.eval(bins=bins),
clabel='$p(x,y)/\mathsf{truth}$: spline fit', **kw)
hl.plot2d(axs[3], h_model_p2d.eval(bins=bins) / h_truth_p2d.eval(bins=bins),
clabel='$p(x,y)/\mathsf{truth}$: model fit', **kw)
for ax in axs:
ax.set_xlabel(r'$x$')
ax.set_ylabel(r'$y$')

Whoa, the outliers here look pretty bad. Comparing by difference is a little less pessimistic:
[58]:
fig, axs = plt.subplots(2, 2, figsize=(10, 8))
axs = np.ravel(axs)
kw = dict(cbar=True, vmin=-.03, vmax=.03, cmap='RdBu_r')
bins = p2d.bins
ref = h_truth_p2d.eval(bins=bins)
hl.plot2d(axs[0], p2d.eval(bins=bins) - ref,
clabel='$p(x,y)-\mathsf{truth}$: hist', **kw)
hl.plot2d(axs[1], k2d.eval(bins=bins) - ref,
clabel='$p(x,y)-\mathsf{truth}$: KDE', **kw)
hl.plot2d(axs[2], s2d.eval(bins=bins) - ref,
clabel='$p(x,y)-\mathsf{truth}$: spline fit', **kw)
hl.plot2d(axs[3], h_model_p2d.eval(bins=bins) - ref,
clabel='$p(x,y)-\mathsf{truth}$: model fit', **kw)
for ax in axs:
ax.set_xlabel(r'$x$')
ax.set_ylabel(r'$y$')

Random Sampling
Histlite allows random sampling in any number of dimensions. As an example, consider:
[59]:
N, seed = 5, 1
print(h2d.sample(N, seed=seed))
print(p2d.sample(N, seed=seed))
print(s2d.eval(bins=100).sample(N, seed=seed))
[array([-0.45783874, 1.24470245, -5.91706543, -1.34477581, -2.51068398]), array([-2.05939332, 1.60445268, -4.03093146, 2.37074818, 0.24657302])]
[array([-0.45783874, 1.24470245, -5.91706543, -1.34477581, -2.51068398]), array([-2.05939332, 1.60445268, -4.03093146, 2.37074818, 0.24657302])]
[array([-0.46891937, 1.34235123, -5.95853271, -1.1523879 , -2.45534199]), array([-2.94969666, 0.68222634, -3.69546573, -3.01462591, -0.59671349])]
Note, however, that with more dimensions, far increased statistics are required in the input data for the sampling to be stable with respect to modeling choices.
It is also possible to sample from a subset of dimensions, after specifying some of the parameters:
[60]:
for x in [-3, 0, 3]:
print(h2d.sample(N, x, seed=seed))
[array([-2.25825073, -1.30681426, -1.95666546, -0.71195175, -3.20477583])]
[array([-0.49287041, -0.57318781, 0.07838039, -0.0958898 , 0.03371492])]
[array([0.88990424, 3.48293473, 2.40766102, 2.91407529, 2.77436452])]
[61]:
for y in [-3, 0, 3]:
print(h2d.T.sample(N, y, seed=seed))
[array([-2.0405207 , -0.4614405 , -2.43781501, -1.76153886, -4.42869307])]
[array([-3.41787792, -3.53784644, -2.12928192, 1.98962526, -0.6510304 ])]
[array([4.864174 , 4.7018449 , 2.37746693, 4.06817974, 4.8371715 ])]
Conclusions¶
One of the primary design goals of histlite is to serve as a convenient foundation for various higher-level analysis — especially maximum likelihood analysis. It was originally written to support astrophysical neutrino sources in IceCube. Recently, histlite is also being investigated as a building block in neutrinoless double-beta (0νββ) decay searches by nEXO.
Here we’ve surveyed features supporting 1D and 2D PDFs. It’s no problem to work with higher dimensional PDFs, with the exception that SciPy spline routines become extremely RAM-intensive as the dimensionality increases.
What’s next? Contact me if you have any requests — I think the next tutorial will cover slicing, projections, and more details on histogram arithmetic.
API reference¶
Calculate and plot histograms, easily.
-
class
histlite.
Hist
(bins, values, errors=None, data=None, weights=None)[source]¶ A histogram.
-
T
¶ A transposed version of the Hist.
-
bins
¶ A list of bin edge arrays, one for each dimension.
-
centers
¶ A list of bin center arrays for each dimension.
-
contain
(axis, frac=0.6826894921370859)[source]¶ Project the histogram onto a subset of its dimensions by taking the containment interval along
axis
.Parameters: - axis (int) – the axis along which to measure containment and thus the dimensions no longer present in the resulting Hist.
- frac (float) – the containment interval, measured from
self.range[axis][0]
and moving to the “right”
Returns:
-
contain_project
(axis, frac=0.6826894921370859, n_sigma=None)[source]¶ Project the histogram taking median along one dimension, with errorbars reflecting the eliminated axis.
Parameters: - axis (int) – the axis along which to measure containment and thus the dimensions no longer present in the resulting Hist.
- frac (float) – the containment fraction, which will be centered on the median value along the given axis
- n_sigma (float) – the containment fraction, specified as a number of sigmas
Returns: If given,
n_sigma
overridesfrac
.
-
cumsum
(axes=[-1], normalize=False)[source]¶ Calculate the cumulative sum along specified axes (in order).
-
curve_fit
(func, **kw)[source]¶ Fit a function to the histogram.
Parameters: func (function) – model function as in scipy.optimize.curve_fit(). Returns: popt, pcov as in scipy.optimize.curve_fit() This function unravels the values and bin centers into
n_dim
1D arrays which are then passed, along with any keyword arguments, to scipy.optimize.curve_fit().
-
data
¶ The data used to construct the histogram (if given upon construction).
-
efficiency
(base_hist)[source]¶ Get an efficiency Hist for this Hist divided by base_hist.
Parameters: base_hist ( Hist
) – The base histogram, of which this one should be a subset.This method differs from __div__ in the way that errors are propagated.
-
errors
¶ An nD array of bin errors (sqrt(sum(squares of weights))) in each bin.
-
gaussian_filter
(*a, **kw)[source]¶ Smooth both values and errors with
scipy.ndimage.gaussian_filter()
.
-
gaussian_filter1d
(*a, **kw)[source]¶ Smooth both values and errors with
scipy.ndimage.gaussian_filter1d()
.
-
integrate
(axes=None)[source]¶ Project the histogram onto a subset of its dimensions by integrating over
axes
.Parameters: axes (sequence of int) – the axes along which to integrate and thus the dimensions no longer present in the resulting Hist. Returns: Hist
-
log
¶ A list of bools describing whether each dimenion is binned linearly or logarithmically.
-
median
(axis)[source]¶ Project the histogram onto a subset of its dimensions by taking the median along
axis
.Parameters: axis (int) – the axis along which to find the median and thus the dimensions no longer present in the resulting Hist. Returns: Hist
-
n_bins
¶ A list of the number of bins in each dimension.
-
n_dim
¶ The number of dimensions in the histogram.
-
normalize
(axes=None, integrate=True, density=False)[source]¶ Return a histogram normalized so the sums (or integrals) over the given axes are unity.
Parameters: - axes (sequence of int, optional) – the axes that will sum (or integrate) to unity for the normalized histogram
- integrate (bool) – if True, normalize so the integral is unity; otherwise, normalize so the sum is unity
- density (bool) – if True, normalize so the integral is unity, but
as though the binning were linspaced, even if it is actually not.
This option supersedes the
integrate
argument.
Returns: The norm is found by summing over all axes other than the ones specified, or by summing over all axes if
axis
is not given. Note that settingdensity=True
should obtain equivalent behavior tonumpy.histogram(..., density=True)
.
-
project
(axes=[-1], integrate=False)[source]¶ Project the histogram onto a subset of its dimensions by summing over axes other than those listed in
axes
.Parameters: axes (sequence of int) – the axes along which NOT to sum or integrate, and thus the dimensions no longer present in the resulting Hist. Returns: Hist
-
range
¶ A list of (min_value,max_value) tuples for each dimension.
-
ravel_indices
(*xs)[source]¶ Get the indices into values.ravel() for the specified coordinates.
Index of -1 indicates out-of-bounds
-
rebin
(axis, bins, tol=0.0001)[source]¶ Produce coarser binning along the given axis.
Parameters: - axis (int) – the axis along which to rebin
- bins (sequence) – the new bin edges
- tol (float) – the absolute error between the given
bins
and ones found in the original histogram
Returns: Each bin in bins should be contained in the existing bins, and the endpoints should match. Tolerance for bin agreement is given as an absolute error by tol.
-
sample
(n_samples=1, *values, **kw)[source]¶ Draw n samples from the data.
Parameters: n_samples (int) – the number of samples Any given values select bins in the first len(values) dimensions, such that sampling is done only from the remaining dimensions.
Returns: tuple of arrays of length n_dim
-
spline_fit
(log=False, floor=None, *a, **kw)[source]¶ Get a scipy spline fit to the histogram.
Parameters: - log (bool) – whether to fit in log-value or linear-value
- floor (float) – 10**floor is the arbitrary small number to stand in for
zeros when
log
is true. If not set, log10 of 0.1 times the smallest nonzero value will be used.
Returns: This method produces a spline fit to the histogram values for 1D or 2D histograms. The domain of the fitted spline will be the same as that of the histogram.
-
sum
(axes=None, integrate=False)[source]¶ Project the histogram onto a subset of its dimensions by summing over
axes
.Parameters: - axes (sequence of int) – the axes along which to sum and thus the dimensions no longer present in the resulting Hist.
- integrate (bool) – if True, evaluate sum of (value * width) rather than just value.
Returns:
-
values
¶ An nD array of bin values (sum of weights in each bin).
-
volumes
¶ An nD array of bin volumes (product of bin widths in each dimension).
-
weights
¶ The weights used to construct the histogram (if given upon construction).
-
widths
¶ A list of bin width arrays for each dimension.
-
-
class
histlite.
LineStyle
(line=True, marker='None', errorbars=False, errorcaps=False, errorbands=False, xerrorbars=False, crosses=False, poisson=False, **kwargs)[source]¶ Style object for 1D
Hist
s.-
copy
(**kwargs)[source]¶ Get a copy of this LineStyle, updating the given keyword args.
All arguments accepted by the
LineStyle
constructor may be given, including line, markers, errorbars, errorcaps, and arbitrary matplotlib arguments.
-
errorbands
¶ Whether to draw error bands.
-
errorbars
¶ Whether to draw error bars.
-
errorcaps
¶ Whether to draw error bar caps.
-
kwargs
¶ Keyword args for matplotlib.axes.Axes.errorbar().
-
line
¶ Whether to draw a line.
-
marker
¶ The marker to use.
-
markers
¶ Whether to draw point markers.
-
poisson
¶ Whether to draw Poisson error bars.
-
xerrorbars
¶ Whether to draw error bars.
-
-
class
histlite.
SplineFit
(hist, spline, bin_logs, log, floor)[source]¶ Wrapper for spline fits to histograms.
-
histlite.
fill_between
(ax, h1, h2, interpolate=False, drawstyle='steps', **kw)[source]¶ Fill the region between histograms h1 and h2.
Parameters: - ax (matplotlib Axes) – the axes on which to plot
- h1 (number or
Hist
) – a number or the first histogram - h2 (number or
Hist
) – a number or the second histogram - where – see
pyplot.fill_between()
. - interpolate – see
pyplot.fill_between()
. - drawstyle (str) – if ‘line’, plot smooth curves; otherwise, plot with histogram steps
Other keyword arguments are passed to
pyplot.fill_between()
.
-
histlite.
hist
(data, weights=None, bins=10, range=None, log=False, round_int_bins=False, keep_data=False)[source]¶ Factory function for
Hist
.Parameters: - data (array-like or tuple of array-like) – the source data for the histogram: a tuple of array-like each of length (number of samples), or just a single array of that length
- weights (array-like) – the weights of the source datapoints
- bins (sequence or int, optional) – a numpy.histogramdd() bins specification
- range (sequence, optional) – a numpy.histogramdd() range specification
- log (sequence or bool) – if bins gives bin counts rather than edges, log=True causes logarithmic bin edges to be chosen (can be given per-dimension)
- keep_data (bool) – whether to keep the data and weights for the histogram
Returns: the
Hist
-
histlite.
hist_bootstrap
(N, data, weights=None, *args, **kwargs)[source]¶ Like
hist()
, but for N iterations of sampling fromdata
with replacement.Parameters: - stacked (bool) – if True, output is an ndim+1 dimensional Hist where the first dimension has n_bins=``N`` and values are sorted along this first dimension. otherwise output is ndim dimensional
- errors (str) – ‘original’ to obtain errorbars from standard
(non-bootstrapped) Hist; ‘bootstrap’ to obtain errors from containment
interval along the Hist that would have been returned if
stacked=True
. default is ‘bootstrap’ - frac (float in [0,1]) – fraction of samples to use in each iteration. default is 1.0
- slide_Ns (float or sequence of float) – if given, use hist_slide() instead of hist(), with the given number(s) of slided binnings
-
histlite.
hist_direct
(data, weights=None, bins=None, range=None, keep_data=False)[source]¶ Fast factory function for
Hist
.Parameters: - data (array-like) – the data to be histogrammed. For multidimensional
histograms,
data
will be transposed for input intonp.histogramdd()
. - weights (array-like) – the weights
- bins (sequence or int, optional) – a
np.histogramdd()
bins specification - bins – a
np.histogramdd()
range specification - keep_data (bool) – whether to keep the data and weights for the histogram
Returns: the
Hist
This method creates a
Hist
by callingnp.histogramdd()
as directly as possible. Note that this requires a slightly more constrained format fordata
compared withhist()
.- data (array-like) – the data to be histogrammed. For multidimensional
histograms,
-
histlite.
hist_from_eval
(f, vectorize=True, err_f=None, ndim=None, **kwargs)[source]¶ Create a
Hist
by evaluating a function.Parameters: - f (callable) – the function to evaluate
- ndim (int) – number of arguments to function
- vectorize (bool) – whether
numpy.vectorize
is needed to evaluatef
over many sets of values - err_f (callable) – if given, function to evaluate to obtain “errors”
All other keyword arguments define the binning the same as for
hist()
.This function supersedes
hist_from_function()
.
-
histlite.
hist_from_function
(bins, func, *args, **kwargs)[source]¶ [Deprecated] Create a
Hist
by evaluatingfunc
at the centers of specifiedbins
.Parameters: - bins (list of np.array) – the bin edges
- func (function) – the function to evaluate. it should return floats and respect
numpy broadcasting rules. if
splat
is true, it should takelen(bins)
(that is, n_dim) arguments; otherwise, it should accept a single argument containing all values of all independent variables - err_func (function) – if given, the function that returns the uncertainty at each part of the parameter space
- splat (bool) – determines the signature of
func
as described above (default: True)
Returns: the
Hist
Any additional positional or keyword arguments are passed to
func
.Note: This function is now deprecated; use
hist_from_eval()
instead.
-
histlite.
hist_like
(other, data, weights=None, keep_data=False)[source]¶ Create a
Hist
using the same binning asother
.Parameters: - other (
Hist
) – the other Hist - data (array-like) – the data to be histogrammed. For multidimensional
histograms,
data
will be transposed for input intonp.histogramdd()
. - weights (array-like) – the weights
Returns: the
Hist
- other (
-
histlite.
hist_like_indices
(other, ravel_indices, weights=None)[source]¶ Create a
Hist
using pre-computed per-sample indices fromother
.Parameters: - other (
Hist
) – the pre-existing histogram which defines the binning - ravel_indices (array-like) – result of a
Hist.ravel_indices()
call for the samples. - weights (array-like) – the weights of the source datapoints
- other (
-
histlite.
hist_slide
(Ns, data, weights=None, *args, **kwargs)[source]¶ Construct a “histogram” from
N
partially-overlapping Hist’s.Parameters: - Ns (int or sequence of int) – number of sliding iterations, optionally per-axis
- data (array-like or tuple of array-like) – the source data for the histogram: a tuple of array-like each of length (number of samples), or just a single array of that length
- weights (array-like) – the weights of the source datapoints
- indices (sequence of array-like) – ravel_indices for each individual histogram, obtained from a previous hist_slide call
- get_indices (bool) – if True, obtain ravel_indices values for use in later hist_slide calls
-
histlite.
plot1d
(ax, h=None, style=None, **kwargs)[source]¶ Plot 1D
Hist
h
on matplotlib Axesax
.Parameters: Other keyword arguments are propagated to pyplot.errorbar() and pyplot.plot() as appropriate.
-
histlite.
plot2d
(ax, h=None, log=False, cbar=False, levels=None, **kwargs)[source]¶ Plot 1D
Hist
h
onax
on a color scale.Parameters: - ax (matplotlib Axes) – The main axes on which to plot.
- h (
Hist
) – The 2D histogram to plot. - log (bool) – Whether to use a log color scale
- cbar (bool) – If true, draw colorbar.
- levels (int or array of float) – if given, plot with contourf rather than pcolormesh. if a number is given, automatically select that many levels between vmin and vmax.
- zmin (float) – Minimum value to plot with color; bins below the minimum value will be white.
Returns: If cbar, a dict containing a matplotlib.collection.QuadMesh and a matplotlib.colorbar.Colorbar as values; otherwise, just the QuadMesh.
Other keyword arguments are passed to ax.pcolormesh().
-
histlite.
reindex
(a, order)[source]¶ Rearrange the axes of a multidimensional array.
Parameters: - a (ndarray) – the input array
- order (sequence of int) – the axis that should wind up in each ordinal position
Returns: reindexed array
Note: this is useful for implementing
Hist.sum()
, etc., but you probably should prefernp.swapaxes
, possibly using multiple applications, instead.
-
histlite.
stack1d
(ax, hs, colors=None, labels=None, kws=None, interpolate=False, drawstyle='steps', ymin=0, **morekw)[source]¶ Stack histograms
hs
usingfill_between()
.Parameters: - ax (matplotlib Axes) – the axes on which to plot
- hs (sequence of
Hist
) – the histograms - colors (sequence) – the fill colors
- labels (sequence of str) – the labels
- kws (sequence of str to value mappings) – keyword arguments for individual fills
- interpolate – see
pyplot.fill_between()
- drawstyle (str) – if ‘line’, plot smooth curves; otherwise, plot with histogram steps
- ymin (number) – minimum value (useful for semilogy plots)
Other keyword arguments are passed to each
fill_between()
call.
-
histlite.
unreindex
(a, order)[source]¶ Reverse the effects of
reindex()
.Parameters: - a (ndarray) – the already reindexed array
- order (sequence of int) – order previously applied to
reindex()
Returns: unreindexed array
Note: this is useful for implementing
Hist.sum()
, etc., but you probably should prefernp.swapaxes
, possibly using multiple applications, instead.