Distribution

Download this notebook from GitHub (right-click to download).


Title
Distribution Element
Dependencies
Bokeh, SciPy
Backends
Bokeh
Matplotlib
Plotly
In [1]:
import numpy as np
import holoviews as hv
from holoviews import opts, Cycle
hv.extension('bokeh')

Distribution provides a convenient way to visualize a 1D distribution of values as a Kernel density estimate. Kernel density estimation is a non-parametric way to estimate the probability density function of a random variable.

The KDE works by placing a Gaussian kernel at each sample with the supplied bandwidth, which are then summed to produce the density estimate. By default the bandwidth is determined using the Scott's method, which usually produces good results, but it may be overridden by an explicit value.

To start with we will create a Distribution with 1,000 normally distributed samples:

In [2]:
normal = np.random.randn(1000)
hv.Distribution(normal)
Out[2]:

We can set explicit values for the bandwidth to see the effect and also declare whether we want the plot to be filled:

In [3]:
overlay = hv.NdOverlay({bw: hv.Distribution(normal).opts(bandwidth=bw) for bw in [0.05, 0.1, 0.5, 1]})
overlay.opts(opts.Distribution(filled=False, line_color=Cycle()))
Out[3]:

The Distribution element is also useful to visualize the marginal distribution of a set of points. Here we will declare distributions for the x- and y-values of two sets of Points with slightly different spreads and means and then adjoin these plots:

In [4]:
points = hv.Points(np.random.randn(100,2))
points2 = hv.Points(np.random.randn(100,2)*2+1)

xdist, ydist = ((hv.Distribution(points2, kdims=[dim]) *
                 hv.Distribution(points, kdims=[dim])).redim.range(x=(-5, 5), y=(-5, 5))
                for dim in 'xy')
(points2 * points) << ydist.opts(width=125) << xdist.opts(height=125)
Out[4]:

Underlying the Distribution element is the univariate_kde operation, which computes the KDE for us automatically when we plot the element. We can also use this operation directly and print the output highlighting the fact that the operation simply returns an Area or Curve element. It also affords more control over the parameters letting us directly set not only the bandwidth and cut values but also a bin_range, bw_method and the number of samples (n_samples) to approximate the KDE with:

In [5]:
from holoviews.operation.stats import univariate_kde
dist = hv.Distribution(normal)
kde = univariate_kde(dist, bin_range=(-4, 4), bw_method='silverman', n_samples=20)
kde
Out[5]:

For full documentation and the available style and plot options, use hv.help(hv.Distribution).


Download this notebook from GitHub (right-click to download).