Interaction
“A graphic is not ‘drawn’ once and for all; it is ‘constructed’ and reconstructed until it reveals all the relationships constituted by the interplay of the data. The best graphic operations are those carried out by the decision-maker themself.” — Jacques Bertin
Visualization provides a powerful means of making sense of data. A single image, however, typically provides answers to, at best, a handful of questions. Through interaction we can transform static images into tools for exploration: highlighting points of interest, zooming in to reveal finer-grained patterns, and linking across multiple views to reason about multi-dimensional relationships.
At the core of interaction is the notion of a selection: a means of indicating to the computer which elements or regions we are interested in. For example, we might hover the mouse over a point, click multiple marks, or draw a bounding box around a region to highlight subsets of the data for further scrutiny.
Alongside visual encodings and data transformations, Vega-Lite provides a selection abstraction for authoring interactions. These selections encompass three aspects:
- Input event handling to select points or regions of interest, such as mouse hover, click, drag, scroll, and touch events.
- Generalizing from the input to form a selection rule (or predicate) that determines whether or not a given data record lies within the selection.
- Using the selection predicate to dynamically configure a visualization by driving conditional encodings, filter transforms, or scale domains.
This notebook introduces interactive selections and explores how to use them to author a variety of interaction techniques, such as dynamic queries, panning & zooming, details-on-demand, and brushing & linking.
Datasets
We will visualize a variety of datasets from the vega-datasets collection:
- A dataset of
cars
from the 1970s and early 1980s, - A dataset of
movies
, previously used in the Data Transformation notebook, - A dataset containing ten years of S&P 500 (
sp500
) stock prices, and - A dataset of
flights
, including departure time, distance, and arrival delay.
const cars = vega_datasets['cars.json'].url // use URL
const movies = vega_datasets['movies.json']() // load dataset
const sp500 = vega_datasets['sp500.csv'].url // use URL
const flights = vega_datasets['flights-5k.json'].url // use URL
Introducing Selections
Let’s start with a basic selection: simply clicking a point to highlight it. Using the cars
dataset, we’ll start with a scatter plot of horsepower versus miles per gallon, with a color encoding for the number cylinders in the car engine.
In addition, we’ll create a selection instance by creating an interactive parameter with select: { type: 'point' }
, indicating we want a selection defined over a point value. By default, the selection uses a mouse click to determine the selected value. To register a selection with a visualized mark, we must add it to the params
list. (Why params? Selections are one example of a broader class of interactive parameters supported by Vega-Lite.)
Once our selection has been defined, we can use it as a parameter for conditional encodings, which apply a different encoding depending on whether a data record lies in or out of the selection. For example, consider the following snippet:
color: {
condition: { param: 'highlight', field: 'Cylinders', type: 'O' },
value: 'grey'
}
This encoding definition states that data points contained within the selection
named 'highlight'
should be colored according to the Cylinder
field. Non-selected data points, meanwhile, should use the color grey
. An empty selection includes all data points, and so initially all points will be colored.
Try clicking different points in the chart below. What happens? (Click the background to clear the selection state and return to an “empty” selection.)
render({
mark: { type: 'circle' },
data: { url: cars },
params: [
{ name: 'highlight', select: { type: 'point' } }
],
encoding: {
x: { field: 'Horsepower', type: 'Q' },
y: { field: 'Miles_per_Gallon', type: 'Q' },
color: {
condition: { param: 'highlight', field: 'Cylinders', type: 'O' },
value: 'grey'
},
opacity: {
condition: { param: 'highlight', value: 0.8 },
value: 0.1
}
}
})
Of course, highlighting individual data points one-at-a-time is not particularly exciting… As we’ll see, point value selections provide a useful building block for more powerful interactions. Moreover, there are additional selection types:
point
- select one or more discrete values. The first value is selected on mouse click and additional values toggled using shift-click.interval
- select a continuous range of values, initiated by mouse drag.
Let’s compare each of these selection types side-by-side. To keep our code tidy we’ll first define a function (plot
) that generates a scatter plot specification just like the one above. We can pass a selection to the plot
function to apply it to the chart:
function plot(selection, title) {
const name = selection.name;
return {
mark: { type: 'circle' },
data: { url: cars },
params: [ selection ],
encoding: {
x: { field: 'Horsepower', type: 'Q' },
y: { field: 'Miles_per_Gallon', type: 'Q' },
color: {
condition: { param: name, field: 'Cylinders', type: 'O' },
value: 'grey'
},
opacity: {
condition: { param: name, value: 0.8 },
value: 0.1
}
},
title,
width: 300,
height: 225
};
}
Let’s use our plot
function to create three chart variants, one per selection type.
The first (point
) chart replicates our earlier example, and supports shift-click interactions to toggle inclusion of multiple points within the selection. The second (interval
) chart generates a selection region (or brush) upon mouse drag. Once created, you can drag the brush around to select different points, or scroll when the cursor is inside the brush to scale (zoom) the brush size.
Try interacting with each of the charts below!
render({
hconcat: [
plot({ name: 'click', select: { type: 'point' } }, 'Point (Click)'),
plot({ name: 'drag', select: { type: 'interval' } }, 'Interval (Drag)')
]
})
The examples above use default interactions (click, shift-click, drag), depending on the selection type. We can further customize the interactions by providing input event specifications using Vega event selector syntax. For example, we can modify our point
chart to trigger upon mouseover
events instead of click
events.
Hold down the shift key to “paint” with data!
render({
hconcat: [
plot({ name: 'click', select: { type: 'point', on: 'mouseover' } }, 'Point (Mouseover)'),
]
})
Now that we’ve covered the basics of Vega-Lite selections, let’s take a tour through the various interaction techniques they enable!
Dynamic Queries
Dynamic queries enable rapid, reversible exploration of data to isolate patterns of interest. As defined by Ahlberg, Williamson, & Shneiderman, a dynamic query:
- represents a query graphically,
- provides visible limits on the query range,
- provides a graphical representation of the data and query result,
- gives immediate feedback of the result after every query adjustment,
- and allows novice users to begin working with little training.
A common approach is to manipulate query parameters using standard user interface widgets such as sliders, radio buttons, and drop-down menus. To generate dynamic query widgets, we can apply a bind
operation to one or more data fields we wish to query.
Let’s build an interactive scatter plot that uses a dynamic query to filter the display. Given a scatter plot of movie ratings (from Rotten Tomates and IMDB), we can add a selection over the Major_Genre
field to enable interactive filtering by film genre.
To start, let’s extract the unique (non-null) genres from the movies
data:
const genres = Array
.from(new Set(movies.map(d => d.Major_Genre).filter(d => d != null)))
.sort(d3.ascending)
For later use, let’s also define a list of unique MPAA_Rating
values:
const mpaa = ['G', 'PG', 'PG-13', 'R', 'NC-17', 'Not Rated']
Now let’s create a point
selection bound to a drop-down menu.
Use the dynamic query menu below to explore the data. How do ratings vary by genre? How would you revise the code to filter MPAA_Rating
(G, PG, PG-13, etc.) instead of Major_Genre
?
render({
mark: { type: 'circle' },
data: { values: movies },
params: [
{
name: 'Select', // name the selection 'Select'
select: { type: 'point', fields: ['Major_Genre'] }, // limit selection to the Major_Genre field
value: { Major_Genre: genres[0] }, // use first genre entry as initial value
bind: {
input: 'select', options: genres, // bind to a menu of unique genre values
name: 'Major Genre' // set the menu label text
}
}
],
encoding: {
x: { field: 'Rotten_Tomatoes_Rating', type: 'Q' },
y: { field: 'IMDB_Rating', type: 'Q' },
tooltip: { field: 'Title', type: 'N' },
opacity: {
condition: { param: 'Select', value: 0.75 },
value: 0.05
}
}
})
Our construction above leverages multiple aspects of selections:
- We constrain the selection to a specific data field (
Major_Genre
). Earlier when we used apoint
selection, the selection mapped to individual data points. By limiting the selection to a specific field, we can select all data points whoseMajor_Genre
field value matches the single selected value. - We initialize
value
the selection to a starting value. - We
bind
the selection to an interface widget, in this case a drop-downselect
menu. Theinput
property (hereselect
) indicates an underlying HTMLinput
element type. - As before, we then use a conditional encoding to control the opacity channel.
Binding Selections to Multiple Inputs
One selection instance can be bound to multiple dynamic query widgets. Let’s modify the example above to provide filters for both Major_Genre
and MPAA_Rating
, using radio buttons instead of a menu. Our point
selection is now defined over a pair of genre and MPAA rating values
Look for surprising conjunctions of genre and rating. Are there G or PG-rated horror films?
render({
mark: { type: 'circle' },
data: { values: movies },
params: [
{
name: 'Select',
// point-value selection over [Major_Genre, MPAA_Rating] pairs
select: { type: 'point', fields: ['Major_Genre', 'MPAA_Rating'] },
// use specific hard-wired values as the initial selected values
value: { Major_Genre: 'Drama', MPAA_Rating: 'R' },
bind: {
Major_Genre: { input: 'select', options: genres, name: 'Major Genre' },
MPAA_Rating: { input: 'radio', options: mpaa, name: 'MPAA Rating' }
}
}
],
// scatter plot, modify opacity based on selection
encoding: {
x: { field: 'Rotten_Tomatoes_Rating', type: 'Q' },
y: { field: 'IMDB_Rating', type: 'Q' },
tooltip: { field: 'Title', type: 'N' },
opacity: {
condition: { param: 'Select', value: 0.75 },
value: 0.05
}
}
})
Fun facts: The PG-13 rating didn’t exist when the movies Jaws and Jaws 2 were released. The first film to receive a PG-13 rating was 1984’s Red Dawn.
Using Visualizations as Dynamic Queries
Though standard interface widgets show the possible query parameter values, they do not visualize the distribution of those values. We might also wish to use richer interactions, such as multi-value or interval selections, rather than input widgets that select only a single value at a time.
To address these issues, we can author additional charts to both visualize data and support dynamic queries. Let’s add a histogram of the count of films per year and use an interval selection to dynamically highlight films over selected time periods.
Interact with the year histogram to explore films from different time periods. Do you see any evidence of sampling bias across the years? (How do year and critics’ ratings relate?)
The years range from 1930 to 2040! Are future films in pre-production, or are there “off-by-one century” errors? Also, depending on which time zone you’re in, you may see a small bump in either 1969 or 1970. Why might that be? (See the end of the notebook for an explanation!)
render({
vconcat: [
{
// dynamic query histogram
mark: { type: 'bar', width: 4 },
data: { values: movies },
params: [
// interval selection, limited to x-axis (year) values
{ name: 'brush', select: { type: 'interval', encodings: ['x'] } }
],
encoding: {
x: { timeUnit: 'year', field: 'Release_Date', title: 'Films by Release Year' },
y: { aggregate: 'count', title: null }
},
width: 600,
height: 50
},
{
mark: { type: 'circle' },
data: { values: movies },
encoding: {
x: { field: 'Rotten_Tomatoes_Rating', type: 'Q' },
y: { field: 'IMDB_Rating', type: 'Q' },
tooltip: { field: 'Title', type: 'N' },
opacity: {
condition: { param: 'brush', value: 0.75 },
value: 0.05
}
},
width: 600,
height: 400
}
],
spacing: 5
})
The example above provides dynamic queries using a linked selection between charts:
- We add an
interval
selection (namedbrush
) underparams
in our histogram of films. - We use
encodings: ['x']
to limit the selection to the x-axis only, resulting in a one-dimensional selection interval. - We use
brush
in a conditional encoding to adjust the scatter plotopacity
.
This interaction technique of selecting elements in one chart and seeing linked highlights in one or more other charts is known as brushing & linking.
Navigation: Panning & Zooming
The movie rating scatter plot is a bit cluttered in places, making it hard to examine points in denser regions. Using the interaction techniques of panning and zooming, we can inspect dense regions more closely.
Let’s start by thinking about how we might express panning and zooming using Vega-Lite selections. What defines the “viewport” of a chart? Axis scale domains!
We can change the scale domains to modify the visualized range of data values. To do so interactively, we can bind an interval
selection to scale domains using bind: 'scales'
. The result is that instead of an interval brush that we can drag and zoom, we instead can drag and zoom the entire plotting area!
In the chart below, click and drag to pan (translate) the view, or scroll to zoom (scale) the view. What can you discover about the precision of the provided rating values?
render({
mark: { type: 'circle' },
data: { values: movies },
params: [
// bind interval selection to scale domains
{ name: 'sel', select: { type: 'interval' }, bind: 'scales' }
],
encoding: {
x: { field: 'Rotten_Tomatoes_Rating', type: 'Q' },
y: {
field: 'IMDB_Rating', type: 'Q',
axis: { minExtent: 30 } // add min extent to stabilize axis title placement
},
tooltip: [
{ field: 'Title', type: 'N' },
{ field: 'Release_Date', type: 'T' },
{ field: 'IMDB_Rating', type: 'Q' },
{ field: 'Rotten_Tomatoes_Rating', type: 'Q' }
]
},
width: 600,
height: 450
})
Zooming in, we can see that the rating values have limited precision! The Rotten Tomatoes ratings are integers, while the IMDB ratings are truncated to tenths. As a result, there is overplotting even when we zoom in, with multiple movies sharing the same rating values.
Reading the code above, you may notice axis: { minExtent: 30 }
in the y
encoding channel. The minExtent
parameter ensures a minimum amount of space is reserved for axis ticks and labels. Why do this? When we pan and zoom, the axis labels may change and cause the axis title position to shift. By setting a minimum extent we can reduce distracting movements in the plot. Try removing the minExtent
value, and then zoom out to see what happens when longer axis labels enter the view.
By default, scale bindings for selections include both the x
and y
encoding channels. What if we want to limit panning and zooming along a single dimension? We can invoke encodings: ['x']
to constrain the selection to the x
channel only:
render({
mark: { type: 'circle' },
data: { values: movies },
params: [
// limit interval selection to x-axis scale only
{ name: 'sel', select: { type: 'interval', encodings: ['x'] }, bind: 'scales' }
],
encoding: {
x: { field: 'Rotten_Tomatoes_Rating', type: 'Q' },
y: {
field: 'IMDB_Rating', type: 'Q',
axis: { minExtent: 30 } // add min extent to stabilize axis title placement
},
tooltip: [
{ field: 'Title', type: 'N' },
{ field: 'Release_Date', type: 'T' },
{ field: 'IMDB_Rating', type: 'Q' },
{ field: 'Rotten_Tomatoes_Rating', type: 'Q' }
]
},
width: 600,
height: 450
})
When zooming along a single axis only, the shape of the visualized data can change, potentially affecting our perception of relationships in the data. Choosing an appropriate aspect ratio is an important visualization design concern!
Navigation: Overview + Detail
When panning and zooming, we directly adjust the “viewport” of a chart. The related navigation strategy of overview + detail instead uses an overview display to show all of the data, while supporting selections that pan and zoom a separate focus display.
Below we have two area charts showing a decade of price fluctuations for the S&P 500 stock index. Initially both charts show the same data range. Click and drag in the bottom overview chart to update the focus display and examine specific time spans.
render({
data: { url: sp500 },
vconcat: [
{
mark: { type: 'area' },
encoding: {
x: {
field: 'date', type: 'T', title: null,
scale: { domain: { param: 'brush' } }
},
y: { field: 'price', type: 'Q' }
},
width: 700
},
{
mark: { type: 'area' },
params: [
{ name: 'brush', select: { type: 'interval', encodings: ['x'] } }
],
encoding: {
x: { field: 'date', type: 'T', title: null },
y: { field: 'price', type: 'Q' }
},
width: 700,
height: 60
}
]
})
Unlike our earlier panning & zooming case, here we don’t want to bind a selection directly to the scales of a single interactive chart. Instead, we want to bind the selection to a scale domain in another chart. To do so, we update the x
encoding channel for our focus chart, setting the scale domain
property to be our brush
selection. If no interval is defined (the selection is empty), Vega-Lite ignores the brush and uses the underlying data to determine the domain. When a brush interval is created, Vega-Lite instead uses that as the scale domain
for the focus chart.
Details on Demand
Once we spot points of interest within a visualization, we often want to know more about them. Details-on-demand refers to interactively querying for more information about selected values. Tooltips are one useful means of providing details on demand. However, tooltips typically only show information for one data point at a time. How might we show more?
The movie ratings scatterplot includes a number of potentially interesting outliers where the Rotten Tomatoes and IMDB ratings disagree. Let’s create a plot that allows us to interactively select points and show their labels.
Mouse over points in the scatter plot below to see a highlight and title label. Shift-click points to make annotations persistent and view multiple labels at once. Which movies are loved by Rotten Tomatoes critics, but not the general audience on IMDB (or vice versa)? See if you can find possible errors, where two different movies with the same name were accidentally combined!
// scatter plot encodings shared by all marks
const xy = {
x: { field: 'Rotten_Tomatoes_Rating', type: 'Q' },
y: { field: 'IMDB_Rating', type: 'Q' }
};
const hover = {
name: 'hover',
select: {
type: 'point',
on: 'mouseover', // select on mouseover
nearest: true, // select nearest point to mouse cursor
toggle: false // do not toggle on shift-hover
}
};
const click = { name: 'click', select: { type: 'point' } };
// logical 'or' to select points in either selection
// empty selections should match nothing
const filter = {
filter: {
or: [
{ param: 'click', empty: false },
{ param: 'hover', empty: false }
]
}
};
render({
// layer scatter plot points, halo annotations, and title labels
layer: [
{
mark: { type: 'circle' },
params: [ hover, click ],
encoding: xy
},
{
mark: { type: 'point', size: 100, stroke: 'firebrick', strokeWidth: 1 },
transform: [filter],
encoding: xy
},
{
mark: { type: 'text', dx: 4, dy: -8, align: 'right', stroke: 'white', strokeWidth: 2 },
transform: [filter],
encoding: { ...xy, text: { field: 'Title', type: 'N' } }
},
{
mark: { type: 'text', dx: 4, dy: -8, align: 'right' },
transform: [filter],
encoding: { ...xy, text: { field: 'Title', type: 'N' } }
}
],
data: { values: movies },
width: 600,
height: 450
})
The example above adds three new layers to the scatter plot: a circular annotation, white text to provide a legible background, and black text showing a film title. In addition, this example uses two selections in tandem:
- A point selection (
hover
) that includesnearest: true
to automatically select the nearest data point as the mouse moves. We settoggle: false
to prevent shift-hover from toggling the selection status. - A point selection (
click
) to create persistent selections via shift-click.
We then combine these selections into a single or
filter predicate to include points that reside in either selection. In constructing this predicate, we also specify empty: false
to indicate that no points should be included if a selection is empty. We use this predicate to filter the new layers to show annotations and labels for selected points only.
Using selections and layers, we can realize a number of different designs for details on demand! For example, here is a log-scaled time series of technology stock prices, annotated with a guideline and labels for the date nearest the mouse cursor:
To see the code for this line chart example, visit the Annotated Time Series example.
Putting into action what we’ve learned so far: can you modify the movie scatter plot above (the one with the dynamic query over years) to include a rule
mark that shows the average IMDB (or Rotten Tomatoes) rating for the data contained within the year interval
selection?
Brushing & Linking, Revisited
Earlier in this notebook we saw an example of brushing & linking: using a dynamic query histogram to highlight points in a movie rating scatter plot. Here, we’ll visit some additional examples involving linked selections.
Returning to the cars
dataset, we can use the repeat
operator to build a scatter plot matrix (SPLOM) that shows associations between mileage, acceleration, and horsepower. We can define an interval
selection and include it within our repeated scatter plot specification to enable linked selections among all the plots.
Moreover, we can combine our brushing selection with a point
selection that is bound to our legend, allowing us to select or de-select points corresponding to specific legend entries. We can combine these two selections into a single selection predicate using the code { and: [{param: 'brush'}, {param: 'legend'}] }
.
Click and drag in any of the plots below to perform brushing & linking. Shift-click legend entries to isolate specific cylinder groups.
const brush = {
name: 'brush',
// resolve all selections to a single global instance
select: { type: 'interval', resolve: 'global' }
};
const legend = {
name: 'legend',
select: { type: 'point', fields: ['Cylinders'] },
bind: 'legend' // bind to interactions with the color legend
};
const brushAndLegend = { and: [ { param: 'brush' }, { param: 'legend' } ] };
return render({
repeat: {
column: ['Acceleration', 'Horsepower', 'Miles_per_Gallon'],
row: ['Miles_per_Gallon', 'Horsepower', 'Acceleration']
},
spec: {
mark: { type: 'circle' },
data: { url: cars },
params: [ brush, legend ],
encoding: {
x: { field: { repeat: 'column' }, type: 'Q' },
y: { field: { repeat: 'row' }, type: 'Q' },
color: {
condition: { test: brushAndLegend, field: 'Cylinders', type: 'O' },
value: 'grey'
},
opacity: {
condition: { test: brushAndLegend, value: 0.8 },
value: 0.1
}
},
width: 140,
height: 140
}
})
Note above the use of the resolve
property on the interval
selection. The default setting of 'global'
indicates that across all plots only one brush can be active at a time. However, in some cases we might want to define brushes in multiple plots and combine the results. If we use resolve: 'union'
, the selection will be the union of all brushes: if a point resides within any brush it will be selected. Alternatively, if we use resolve: 'intersect'
, the selection will consist of the intersection of all brushes: only points that reside within all brushes will be selected.
Try changing the resolve
parameter to 'union'
and 'intersect'
and see how it changes the resulting selection logic.
Cross-Filtering
The brushing & linking examples we’ve looked at all use conditional encodings, for example to change opacity values in response to a selection. Another option is to use a selection defined in one view to filter the content of another view.
Let’s build a collection of histograms for the flights
dataset: arrival delay
(how early or late a flight arrives, in minutes), distance
flown (in miles), and time
of departure (hour of the day). We’ll use the repeat
operator to create the histograms, and add an interval
selection for the x
axis with brushes resolved via intersection.
In particular, each histogram will consist of two layers: a gray background layer and a blue foreground layer, with the foreground layer filtered by our intersection of brush selections. The result is a cross-filtering interaction across the three charts!
Drag out brush intervals in the charts below. As you select flights with longer or shorter arrival delays, how do the distance and time distributions respond?
By cross-filtering you can observe that delayed flights are more likely to depart at later hours. This phenomenon is familiar to frequent fliers: a delay can propagate through the day, affecting subsequent travel by that plane. For the best odds of an on-time arrival, book an early flight!
The combination of multiple views and interactive selections can enable valuable forms of multi-dimensional reasoning, turning even basic histograms into powerful input devices for asking questions of a dataset!
Summary
For more information about the supported interaction options in Vega-Lite, please consult the Vega-Lite interactive selection documentation. For example, you might want to use customized event handlers to compose multiple interaction techniques or support touch-based input on mobile devices.
Interested in learning more?
- The selection abstraction was introduced in the paper Vega-Lite: A Grammar of Interactive Graphics, by Satyanarayan, Moritz, Wongsuphasawat, & Heer.
- The PRIM-9 system (for projection, rotation, isolation, and masking in up to 9 dimensions) is one of the earliest interactive visualization tools, built in the early 1970s by Fisherkeller, Tukey, & Friedman. A retro demo video survives!
- The concept of brushing & linking was crystallized by Becker, Cleveland, & Wilks in their 1987 article Dynamic Graphics for Data Analysis.
- For a comprehensive summary of interaction techniques for visualization, see Interactive Dynamics for Visual Analysis by Heer & Shneiderman.
- Finally, for a treatise on what makes interaction effective, read the classic Direct Manipulation Interfaces paper by Hutchins, Hollan, & Norman.
Appendix: On The Representation of Time
Earlier we observed a small bump in the number of movies in either 1969 and 1970. Where does that bump come from? And why 1969 or 1970? The answer stems from a combination of missing data and how your computer represents time.
Internally, dates and times are represented relative to the UNIX epoch, in which time “zero” corresponds to the stroke of midnight on January 1, 1970 in UTC time, which runs along the prime meridian. It turns out there are a few movies with missing (null
) release dates. Those null
values get interpreted as time 0
, and thus map to January 1, 1970 in UTC time. If you live in the Americas – and thus in “earlier” time zones – this precise point in time corresponds to an earlier hour on December 31, 1969 in your local time zone. On the other hand, if you live near or east of the prime meridian, the date in your local time zone will be January 1, 1970.
The takeaway? Always be skeptical of your data, and be mindful that how data is represented (whether as date times, or floating point numbers, or latitudes and longitudes, etc.) can sometimes lead to artifacts that impact analysis!