Data Processing Pipelines

In [1]:
import pandas as pd
import holoviews as hv

from holoviews import opts
from bokeh.sampledata import stocks
from holoviews.operation.timeseries import rolling, rolling_outlier_std

hv.extension('bokeh')

opts.defaults(opts.Curve(width=600, framewise=True))

In the previous guides we discovered how to load and declare dynamic, live data and how to transform elements using dim expressions and operations. In this guide we will discover how to combine dynamic data with operations to declare lazy and declarative data processing pipelines, which can be used for interactive exploration but can also drive complex dashboards or even bokeh apps.

Declaring dynamic data

We will begin by declaring a function which loads some data. In this case we will just load some stock data from the bokeh but you could imagine querying this data using REST interface or some other API or even loading some large collection of data from disk or generating the data from some simulation or data processing job.

In [2]:
def load_symbol(symbol, **kwargs):
    df = pd.DataFrame(getattr(stocks, symbol))
    df['date'] = df.date.astype('datetime64[ns]')
    return hv.Curve(df, ('date', 'Date'), ('adj_close', 'Adjusted Close'))

stock_symbols = ['AAPL', 'FB', 'GOOG', 'IBM', 'MSFT']
dmap = hv.DynamicMap(load_symbol, kdims='Symbol').redim.values(Symbol=stock_symbols)

We begin by displaying our DynamicMap to see what we are dealing with. Recall that a DynamicMap is only evaluated when you request the key so the load_symbol function is only executed when first displaying the DynamicMap and whenever we change the widget dropdown:

In [3]:
dmap
Out[3]:

Processing data

It is very common to want to process some data, for this purpose HoloViews provides so-called Operations, which are described in detail in the Transforming Elements. Operations are simply parameterized functions, which take HoloViews objects as input, transform them in some way and then return the output.

In combination with Dimensioned Containers such as HoloMap and GridSpace they are a powerful way to explore how the parameters of your transform affect the data. We will start with a simple example. HoloViews provides a rolling function which smoothes timeseries data with a rolling window. We will apply this operation with a rolling_window of 30, i.e. roughly a month of our daily timeseries data:

In [4]:
smoothed = rolling(dmap, rolling_window=30)
smoothed
Out[4]:

As you can see the rolling operation applies directly to our DynamicMap, smoothing each Curve before it is displayed. Applying an operation to a DynamicMap keeps the data as a DynamicMap, this means the operation is also applied lazily whenever we display or select a different symbol in the dropdown widget.

Dynamically evaluating parameters on operations and transforms with .apply

The .apply method allows us to automatically build a dynamic pipeline given an object and some operation or function along with parameter, stream or widget instances passed in as keyword arguments. Internally it will then build a Stream to ensure that whenever one of these changes the plot is updated. To learn more about streams see the Responding to Events.

This mechanism allows us to build powerful pipelines by linking parameters on a user defined class or even an external widget, e.g. here we import an IntSlider widget from panel:

In [5]:
import panel as pn

slider = pn.widgets.IntSlider(name='rolling_window', start=1, end=100, value=50)

Using the .apply method we could now apply the rolling operation to the DynamicMap and link the slider to the operation's rolling_window parameter (which also works for simple functions as will be shown below). However, to further demonstrate the features of dim expressions and the .transform method, which we first introduced in the Transforming elements user guide, we will instead apply the rolling mean using the .df namespace accessor on a dim expression:

In [6]:
rolled_dmap = dmap.apply.transform(adj_close=hv.dim('adj_close').df.rolling(slider).mean())

rolled_dmap
Out[6]:

The rolled_dmap is another DynamicMap that defines a simple two-step pipeline, which calls the original callback when the symbol changes and reapplies the expression whenever the slider value changes. Since the widget's value is now linked to the plot via a Stream we can display the widget and watch the plot update:

In [7]:
slider
Out[7]:

The power of building pipelines is that different visual components can share the same inputs but compute very different things from that data. The part of the pipeline that is shared is only evaluated once making it easy to build efficient data processing code. To illustrate this we will also apply the rolling_outlier_std operation which computes outliers within the rolling_window and again we will supply the widget value:

In [8]:
outliers = dmap.apply(rolling_outlier_std, rolling_window=slider.param.value)

rolled_dmap * outliers.opts(color='red', marker='triangle')
Out[8]:

We can chain operations like this indefinitely and attach parameters or explicit streams to each stage. By chaining we can watch our visualization update whenever we change a stream value anywhere in the pipeline and HoloViews will be smart about which parts of the pipeline are recomputed, which allows us to build complex visualizations very quickly.

The .apply method is also not limited to operations. We can just as easily apply a simple Python function to each object in the DynamicMap. Here we define a function to compute the residual between the original dmap and the rolled_dmap.

In [9]:
def residual_fn(overlay):
    # Get first and second Element in overlay
    el1, el2 = overlay.get(0), overlay.get(1)

    # Get x-values and y-values of curves
    xvals  = el1.dimension_values(0)
    yvals  = el1.dimension_values(1)
    yvals2 = el2.dimension_values(1)

    # Return new Element with subtracted y-values
    # and new label
    return el1.clone((xvals, yvals-yvals2),
                     vdims='Residual')

If we overlay the two DynamicMaps we can then dynamically broadcast this function to each of the overlays, producing a new DynamicMap which responds to both the symbol selector widget and the slider:

In [10]:
residual = (dmap * rolled_dmap).apply(residual_fn)

residual
Out[10]:

In later guides we will see how we can combine HoloViews plots and Panel widgets into custom layouts allowing us to define complex dashboards. For more information on how to deploy bokeh apps from HoloViews and build dashboards see the Deploying Bokeh Apps and Dashboards guides. To get a quick idea of what this might look like let's compose all the components we have no built: