+ - 0:00:00
Notes for current slide
Notes for next slide

Analyzing Quantitative Data

Lauren Bricker

CSE 340 Spring 23

Slide 1 of 42

Today's Agenda

  • Administrivia
    • Thank you to Thevina for covering for me!
    • If you can't complete Menus with your custom menu by Fri 5-May, reach out on Ed and we'll send you an .apk that you can load and test with.
  • Questions on Menus experiment design
  • Analyzing data
    • Dependent and independent variables
    • Discuss how we determine causality
    • Practice onboarding participants
    • Practice data analysis
Slide 2 of 42

Reminder on Academic Conduct

  • You are expected to turn in your own work that demonstrates your understanding
    • You may not copy your code or reflection answers directly from another student or any other source on the internet.
    • You may not use/copy any part of your course work from AI-assistance such as GitHub Copilot, ChatGPT, etc.
    • You will be asked, in each reflection, to acknowledge any help you received from another person or internet resource. Additionally it is always good practice to also cite your sources with a comment in your code as well.
    • Students are expected not to share code/solutions with the broader public, and not to plagiarize or cheat, as described in the Allen school conduct guidelines.
Slide 3 of 42

Determining Causality

thinking theoretically...

Slide 4 of 42

Dependence/independence

Events that are independent

  • Flipping heads and then tails
  • Day of week and whether a patient had a heart attack (probably?)

Events that are dependent

  • Vice presidential candidate and presidential nominee
  • Diagnostic test being positive and whether patient has a disease
Slide 5 of 42

Dependence/independence

Events that are independent

  • Flipping heads and then tails
  • Day of week and whether a patient had a heart attack (probably?)

Events that are dependent

  • Vice presidential candidate and presidential nominee
  • Diagnostic test being positive and whether patient has a disease

What are some dependent and indepenent relationships in the Menus study?

Slide 6 of 42

Think about a coin flip: one flip does not depend on another. What if we collect this kind of data, what might be true about it?

What might we expect to see is true of dependent variables?

  • When one changes, the other changes at the same rate
  • When one changes, the other changes at a faster rate
  • When one changes, the other does the opposite
Slide 7 of 42

Called a correlation We can see it in a scatter plot Go look at data and make one

This is called Correlation

We can see it in a scatterplot

Example of correlations for heigh vs age (positively correlated) and
height vs birth month (uncorrelated)

Slide 8 of 42

Correlation (active) Demo

Active demo:

Data courtesy of Deepti Ramani from a CSE 163 assignment

  • Base on the Tiktok API and Spotify API
  • Data gathered monthly from Jan 20 - Mar 21
    • The rank is the ranking (top 100) of the audio on TikTok for that month
    • No Spotify data for many songs.
Slide 9 of 42

Correlation demo

TikTok vs Spotify data set

  • Select two columns (use command/control click to select 2 columns)
  • Use Insert->Chart - Make sure this is a scatterplot
Slide 10 of 42

Correlation demo

TikTok vs Spotify data set

  • Select two columns (use command/control click to select 2 columns)
  • Use Insert->Chart - Make sure this is a scatterplot

Corellation of number of TikTok views vs. Rank

Is there a correllation?

Slide 11 of 42

Correlation != Causation

Correlation of number of people who drowned per year and films
nicolas cage appeared in

Slide 12 of 42

Correlation != Causation

Two people talking. Says one: I used to think correlation
implied causation. Says the other: Then I took a statistics
course. Now I don't. Says the other: Sounds like the class
helped. Says the first: Well maybe

XKCD

Slide 13 of 42

Filtering for analysis

Select column B, then select Data->Create A Filter. Click on the newly shown green downward facing triangle, then select a month (like June 2020). This filters so that only the data from June 2020 is shown.

Create a Rank vs Views and Videos chart with this filtered data (columns E, G, I)

Slide 14 of 42

Filtering for analysis

Select column B, then select Data->Create A Filter. Click on the newly shown green downward facing triangle, then select a month (like June 2020). This filters so that only the data from June 2020 is shown.

Create a Rank vs Views and Videos chart with this filtered data (columns E, G, I)

It might be helpful for this chart to use a log scale for the vertical axis. In the chart editor, select the Customize tab, then open up the Vertical Axis. Toggle the Log Scale checkbox.

TikTok vs

Slide 14 of 42

Grouping for analysis

TikTok vs Spotify data set

Pivot Tables can help to sort out a lot of data. Example

  • Select Insert->Pivot Table and make it go on a new sheet
  • Values: add Number of TikTok Views, and Number of TikTok Videos
    • try changing from Sum to Average
  • Chart the resulting table
  • Try changing to a line chart (use the hamburger menu on the chart to edit the chart)
Slide 15 of 42

Grouping for analysis

TikTok vs Spotify data set

Pivot Tables can help to sort out a lot of data. Example

  • Select Insert->Pivot Table and make it go on a new sheet
  • Values: add Number of TikTok Views, and Number of TikTok Videos
    • try changing from Sum to Average
  • Chart the resulting table
  • Try changing to a line chart (use the hamburger menu on the chart to edit the chart)

AVERAGE of Number of TikTok views and AVERAGE of Number of TikTok videos per month

Slide 15 of 42

Old data Giving vs party for diff votes

Grouping and charting

What do you see in the difference between tasks on each of these menus types?

Image of bar chart  tasks to menu type showing that
normal menus get progressively slower as items become nonlinear while
pie menus are about the same

Slide 16 of 42

Grouping and charting

What do you see in the difference between tasks on each of these menus types?

Image of bar chart  tasks to menu type showing that
normal menus get progressively slower as items become nonlinear while
pie menus are about the same

Charting gives you a place to start, what questions you might want to ask.

Slide 16 of 42

Grouping and charting helps you check your assumptions

23sp Sample data

Notes

  • this is based off only one person doing all three sessions.
  • You likely will get better results with your data because you're doing 3+ distinct users
  • We will get even better results with ALL of the data merged together
    • (Remember to turn your CSV into the Canvas Assignment so we can combine)
Slide 17 of 42

Data Collection

Image of bar chart comparing tasks to menu type showing that
normal menus get progressively slower as items become nonlinear while
pie menus are about the same

Click on the Example Chart tab. Here you can

  • Analyze and chart data: Simple Statistics
    • Min, Max, Mean (Sum/#), Median (Middle #), Mode (Most Common #)

Demo

Do this for speed and error.

Slide 18 of 42

Comparing groups to see if they are different

Bar charts are not enough to assess difference though. Need to see the distribution

A distribution is looking at how the people we are studying distributed over a specific variable we care about.

A histogram graphically shows a distribution.

Slide 19 of 42

Histogram shows you a distribution

TikTok vs Spotify data set

Pivot table

  • Rows are TikTok rank
  • Values are
    • Acousticness
    • Dancability
    • Energy
    • (pick your own!)

Select Histogram

Histogram of AVERAGE of Acousticness (confidence 0.0 - 1.0) AVERAGE of Danceability (0.0 - 1.0) and AVERAGE of Energy  (0.0 - 1.0)

Slide 20 of 42

Old data Histogram of amount vs party

But having the right chart matters

Histogram of data about speed of pie menu selection and linear
menu selection for each task type

Slide 21 of 42

Normal Vs Pie

Normal Pie
Histogram of data about speed of pie menu selection and linear menu selection for each task type Histogram of data about speed of pie menu selection and linear menu selection for each task type

The cause of the difference only shows here.

Slide 22 of 42

What do we learn from a histogram?

two distributions

Slide 23 of 42
  • shows a distribution
  • helps us tell if things are INDEPENDENT

Histograms

What do we learn from a histogram?

  • shows a distribution
  • helps us tell if things are INDEPENDENT
Slide 24 of 42

Comparing two groups

two distributions showing overlapping 95% confidence intervals

Slide 25 of 42

Comparing two groups

two distributions with less overlapping intervals (because more
data is present)

Slide 26 of 42

Comparing two groups

two distributions with more data but overlapping histograms
thus hard to differentiate

Slide 27 of 42

Comparing two groups

two distributions with less overlapping intervals (because more
data is present)

Slide 28 of 42

Comparing two groups

two distributions not overlapping at all

Slide 29 of 42

Comparing two groups

same two distributions with difference between means marked as
effect size

Slide 30 of 42

Common Statistical Test for comparison: t-test

Tests for difference between two samples

Best used to determine what is ‘worthy of a second look’

Limited in its applicability to normal, independent data

Does not help to document effect size [the actual difference between groups], just effect likelihood

Slide 31 of 42

Problems with t-tests

The more implausible the hypothesis, the greater chance that it is a ‘false alarm’

A picture of the likelihood of a positive t-test as influenced by
priors (the expected likelihood of an outcome)

Slide 32 of 42

Top row: Prior (what's known to be true before the experiment) bottom row: Calculated p-value

Notice the middle column, where something that is a toss-up has higher plausability than we would expect. This is a "Type 1 error"

Alternatively, a small sample may cause a Type II error (failure to detect a true difference) due to random sampling bias

Problems with t-tests

The more implausible the hypothesis, the greater chance that it is a ‘false alarm’

A picture of the likelihood of a positive t-test as influenced by
priors (the expected likelihood of an outcome)

18 separate comparisons (3x3 conditions, 2 measures)

ANOVA (Analysis Of Variance): Fancy t-test that accounts for the whole group effect before doing pairwise comparisons

Slide 32 of 42
  • Doesn’t take prior knowledge into account
  • Susceptible to ‘data dredging’
    • The more tests you conduct the more likely you will find a result even if one is not there
    • Adjustments mid experiment
  • Gives a yes or no answer: Either the null hypothesis is rejected (the result would be unlikely in a world where the null hypothesis was true) or it cannot be rejected
  • Based on assumed ‘average’ sample

Demo of t-tests in our spreadsheet

23wi Sample data

Slide 33 of 42

Document what all of this in your report

Speed Results

Describe your thoughts about overall speed in different conditions. Use at least one chart to illustrate what you say. Here is an example chart generated using our data, when you paste your data into the spreadsheet you’ll see that it updates to reflect your data

Error Results

Describe what happened in terms of errors -- provide at least one chart showing what you learned about errors in different conditions

Slide 34 of 42

Document what all of this in your report

  • Describe your hypothesis
  • Illustrate with graphs
  • Optional: use Table of results found in Speed Analysis and Error Analysis to describe Statistical Significance:

Pie menus were twice as fast as normal menus (M=.48s vs M=.83s), F(1,43)=295.891, p < .05. Unclassified menu items were harder to find than linear and relative ones (M=.84s, .59s, and .59s respectively), F(2,43)=93.778, p < 0.5. We also found an interaction effect between menu and task (as illustrated in the chart above), F(5, 43) = 51.945, p < .001.

Slide 35 of 42

Using the right chart matters

Histogram of data about speed of pie menu selection and linear
menu selection for each task type

Normal Pie
Histogram of data about speed of pie menu selection and linear menu selection for each task type Histogram of data about speed of pie menu selection and linear menu selection for each task type
Slide 36 of 42

Can we determine causality?

We'd like to be able to argue the task/menu type influenced the speed and error results.

Slide 37 of 42

Can we determine causality?

We'd like to be able to argue the task/menu type influenced the speed and error results.

This implies dependence between speed/error, task, and menu type

Slide 38 of 42

Can we determine causality?

We'd like to be able to argue the task/menu type influenced the speed and error results.

This implies dependence between speed/error, task, and menu type

And it assumes you have measured the right variables!

Slide 39 of 42

Drawing Conclusions

Study Design
Run Study
Clean and Prep
Hypothesis:
Decreased seek
time and errors
3 menus x
3 task conditions
Data
Analysis
Conclusions
  • Describe your hypothesis
  • Illustrate with graphs
  • Optional: Statistical Significance

Draw Conclusions

  • Were errors less?
  • Was time faster?

Describe your conclusions. Do you think we should use pie menus more? What can we conclude from your data?

Slide 40 of 42

Limitations of Laboratory Studies

Slide 41 of 42

Simulate real world environments

  • Location and equipment may be unfamiliar to participant [Coyne & Nielsen 2001]
  • Observation may effect performance - “Hawthorne Effect” [Mayo 1933]
  • Participant may become fatigued and not take necessary rest - “Demand Effect” [Orne 1962]
  • Tasks frequently artificial and repetitive, which may bore participants and negatively effect performance Studying real world use removes these limitations

Limitations of Laboratory Studies

Simulate real world environments

  • Location and equipment may be unfamiliar to participant [Coyne & Nielsen 2001]
  • Observation may effect performance - “Hawthorne Effect” [Mayo 1933]
  • Participant may become fatigued and not take necessary rest - “Demand Effect” [Orne 1962]
  • Tasks frequently artificial and repetitive, which may bore participants and negatively effect performance Studying real world use removes these limitations
Slide 42 of 42

Today's Agenda

  • Administrivia
    • Thank you to Thevina for covering for me!
    • If you can't complete Menus with your custom menu by Fri 5-May, reach out on Ed and we'll send you an .apk that you can load and test with.
  • Questions on Menus experiment design
  • Analyzing data
    • Dependent and independent variables
    • Discuss how we determine causality
    • Practice onboarding participants
    • Practice data analysis
Slide 2 of 42
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
s Start & Stop the presentation timer
t Reset the presentation timer
?, h Toggle this help
Esc Back to slideshow