A Brave New World

Laboratory 4: Data Mining and Visualization

Due Thursday, October 31.

This week is all about how we can use computers, and the data visualization techniques they enable, to see, understand and explore raw data.  You'll have a much better understanding of the power of data visualization after seeing it in use and exploring some existing visualizations.  Then, you'll have a chance to create and share your own visualizations.

After completing this assignment, you'll write a blog post on the course blog with a sample visualization you created. (More details below.)

 

Part 1: The Power of Visualization

To help you understand how powerful visualization techniques can really improve the way we present and talk about data, we'll have you watch a skilled presenter interacting with a tool that helps him present some data, and then have you answer some questions about data using the same tool.

1. Watch this TED talk by Hans Rosling: Pay attention to how he uses the data visualization tools he has available to show data changing over time, to track and follow points of interest, and to emphasize the points he want to make in his talk.

2. Visit http://www.gapminder.org to play around with the same tool used by Hans in his talk. 

On the home page, on the left, under `Explore' (also under Gapminder World / Open Graph Menu / Climate), you'll see several sample visualizations available for exploration. Click the link to view CO2 emissions since 1820. Study the axes; note the X-axis is in logarithmic scale. Try changing it to linear and back, note the difference. Note that you can change what the axes represent (you can always go back to the original visualization). You can also change the speed so when you play the visualization, you can see what's going on more easily. You can always stop (pause) the visualization. Now find out which country had the greatest *yearly* CO2 emission  in 1955 (hint: this is represented by the size of the circle).  Call it country A. Now, see which country had the greatest yearly CO2 emission in 2010.  This will be country B. In which year was country A surpassed by country B in terms of yearly emission? What about emission per person (Y-axis); what is the relationship between country A and country B? Can you think of a reason? Hint: note what the X-axis represents.

Next click the Open graph menu button in the upper left of the page to explore other visualizations on your own.  Look at one or more graphs that interest you and explore their visualizations. You should be able to rewind and play data evolving over time, and you can select data points of interest so you can the `history' over time for that particular data point.

Make a note of at least two interesting facts or observations make, which you will include in your blog post.  Include the name of the Gapminder data set you used to discover these interesting facts/observations.

 

Part 2: Online Visualization Tools

There are online tools available to help users upload their own data and create visualizations for it.  We will be using a tool called Many Eyes to explore data and create visualizations. 

Many Eyes is an online tool designed to help users collaboratively explore the potential of data visualization to spark insight.  It allows users to upload their own data sets, from which they can create and share visualizations using the tools supported by Many Eyes. Check out this article about it from the New York Times.

Visit Many Eyes. You may want to start by checking out the `quick start' page.

On the left hand side, under `Explore', click `Data Sets'.  Here you will see many different data sets and the existing visualizations that have been created for them. Note that there are many, many datasets there, and anybody can upload their own data. As a result, the vast majority of the data sets are of poor quality, e.g., too small, not real, lacking in informative titles, etc. Spend a few minutes looking for a data set that looks good. You might find it helpful to sort the data sets by ratings: the ones with the highest ratings tend to be of higher quality (although not necessarily). You can also check out this list of data set suggestions. A final possibility is for you to upload your own data set.

Find a data set that has at least 2 existing visualizations.  You can tell how many visualizations a data set has by counting the number of icons under `visualizations of this data set' on the bottom right hand corner of the page. 

Explore the visualizations.  Once you have found a data set with multiple existing visualizations, play with them.  You can interact with these data sets; using them to highlight certain types of data or change the axes.  Try to see how some visualizations are more effective than others at making a particular point.

Make your own visualizations. Now, find another data set on Many Eyes that has zero or one visualizations.  Look over the raw data to understand it better before creating your own.  Then do the following:

Create two new visualizations of two different data sets. Try out different visualization styles. Click on the `visualize' button to create your own visualization.  For each one, you will need to select a style of visualization from the list provided, and fine tune it by playing around with the way the data is presented by flipping the axes or selecting particular data series to be presented.

Share one of your visualizations on the blog.  Of the visualizations you've created, select the one that you think most effectively conveys the meaning of the data and include it in your blog post.  Be sure to include a description of the data and the meaning your visualization conveys.  First you must `publish' your visualization by clicking the publish button at the bottom. Then you can click the share this button and highlight and copy/paste the code for either the live visualization or static image into your blog post.

 

Part 3: Blogging  

To complete this lab, we ask that you make a short posting on the course blog.

Your blog entry should be posted using the `Lab 4' category.

List some of the insights you gained from exploring the gap minder data sets.  Tell us what data set you explored, what you found, and what was interesting about it.

Post the visualization you created, and a few sentences describing the data as well as what you think the visualization highlights.  You can post the visualization by clicking `share this' under the visualization, and then copy/pasting the code provided into the blog entry.

Click `Publish' to post your blog entry. You can also click `Save Draft' to save a draft of your post before you are ready to publish it.