A Brave New World
Laboratory
4: Data Mining and Visualization
Due Thursday, October 31.
This week is all about how we can use
computers, and the data visualization techniques they enable, to see,
understand and explore raw data.
You'll have a much better understanding of the power of data
visualization after seeing it in use and exploring some existing
visualizations. Then, you'll have a
chance to create and share your own visualizations.
After completing this assignment, you'll
write a blog post on the course blog with a sample visualization you created.
(More details below.)
Part 1: The Power of
Visualization
To help you understand how powerful
visualization techniques can really improve the way we present and talk about
data, we'll have you watch a skilled presenter interacting with a tool that
helps him present some data, and then have you answer some questions about data
using the same tool.
1.
Watch this TED
talk by Hans Rosling: Pay
attention to how he uses the data visualization tools he has available to show
data changing over time, to track and follow points of interest, and to
emphasize the points he want to make in his talk.
2.
Visit http://www.gapminder.org to play
around with the same tool used by Hans in his talk.
On the home page, on the left, under
`Explore' (also under Gapminder World / Open Graph Menu / Climate),
you'll see several sample visualizations available for
exploration. Click the link to view CO2 emissions since 1820. Study the axes; note the X-axis is in
logarithmic scale. Try changing it to linear and back, note the difference.
Note that you can change what the axes represent (you can always go back to the
original visualization). You can also change the speed so when you play the
visualization, you can see what's going on more easily. You can always stop
(pause) the visualization. Now find out which country had the greatest *yearly*
CO2 emission in 1955 (hint: this is
represented by the size of the circle).
Call it country A. Now, see which country had the greatest yearly CO2
emission in 2010. This will be
country B. In which year was country A surpassed by country B in terms of
yearly emission? What about emission per person (Y-axis); what is the
relationship between country A and country B? Can you think of a reason? Hint:
note what the X-axis represents.
Next click the Open graph menu button in the upper left of the page to explore
other visualizations on your own.
Look at one or more graphs that interest you and explore their
visualizations. You should be able to rewind and play data evolving over time,
and you can select data points of interest so you can the `history' over time
for that particular data point.
Make a note of at least two interesting
facts or observations make, which you will include in your blog post. Include the name of the Gapminder data
set you used to discover these interesting facts/observations.
Part 2: Online Visualization
Tools
There are online tools available to help
users upload their own data and create visualizations for it. We will be using a tool called Many
Eyes to explore data and create visualizations.
Many Eyes is an online tool designed to
help users collaboratively explore the potential of data visualization to spark
insight. It allows users to upload
their own data sets, from which they can create and share visualizations using
the tools supported by Many Eyes. Check out this
article about it from the New York Times.
Visit
Many Eyes. You may want to start by checking out
the `quick
start' page.
On
the left hand side, under `Explore', click `Data Sets'. Here you will see many different data sets and the existing
visualizations that have been created for them. Note that there are many, many
datasets there, and anybody can upload their own data. As a result, the vast
majority of the data sets are of poor quality, e.g., too small, not real,
lacking in informative titles, etc. Spend a few minutes looking for a data set
that looks good. You might find it helpful to sort the data sets by ratings:
the ones with the highest ratings tend to be of higher quality (although not
necessarily). You can also check out this list
of data set suggestions. A final possibility is for you to upload your own
data set.
Find
a data set that has at least 2 existing visualizations.
You can tell how many visualizations a data set has by counting the
number of icons under `visualizations of this data set' on the bottom right
hand corner of the page.
Explore
the visualizations. Once you have found a data set with
multiple existing visualizations, play with them. You can interact with these data sets;
using them to highlight certain types of data or change the axes. Try to see how some visualizations are
more effective than others at making a particular point.
Make
your own visualizations. Now,
find another data set on Many Eyes that has zero or one visualizations. Look over the raw data to understand it
better before creating your own.
Then do the following:
Create
two new visualizations of two different data sets. Try out different visualization styles.
Click on the `visualize' button to create your own visualization. For each one, you will need to select a
style of visualization from the list provided, and fine tune it by playing
around with the way the data is presented by flipping the axes or selecting
particular data series to be presented.
Share
one of your visualizations on the blog.
Of the
visualizations you've created, select the one that you think most effectively
conveys the meaning of the data and include it in your blog post. Be sure to include a description of the
data and the meaning your visualization conveys. First you must `publish' your
visualization by clicking the publish
button at the bottom. Then you can click the share this button and highlight and copy/paste the code for either
the live visualization or static image into your blog post.
Part 3: Blogging
To complete this lab, we ask that you
make a short posting on the course blog.
Your
blog entry should be posted using the `Lab 4' category.
List
some of the insights you gained from exploring the gap minder data sets. Tell us what data set you explored, what you found, and what
was interesting about it.
Post
the visualization you created, and a few sentences describing the data as well
as what you think the visualization highlights. You can post the visualization by clicking `share this'
under the visualization, and then copy/pasting the code provided into the blog
entry.
Click
`Publish' to post your blog entry. You
can also click `Save Draft' to save a draft of your post before you are ready
to publish it.