Running a Quantitative Study

name: inverse
layout: true
class: center, middle, inverse
---
# Running a Quantitative Study

Jennifer Mankoff
CSE 340 Spring 2019 
---
layout:false

[//]: # (Outline Slide)
.title[Today's goals]
.body[
- Discuss steps of running a study 
- Practice onboarding participants
- Practice data analysis
- Go over Assignment 1
]
---
.left-column[
## Experiment Design
]
.right-column[

<div class="mermaid">
graph LR
S((.)) --> Hypothesis((Hypothesis))
Hypothesis -- "Study Design" --> Method((Method))
Method -- "Run Study" --> Data((Data))
Data -- "Clean and Prep" --> Analysis((Analysis))
Analysis --> Conclusions((Conclusions))

classDef finish outline-style:double,fill:#d1e0e0,stroke:#333,stroke-width:2px;
classDef normal fill:#e6f3ff,stroke:#333,stroke-width:2px;
classDef start fill:#d1e0e0,stroke:#333,stroke-width:4px;
classDef invisible fill:#FFFFFF,stroke:#FFFFFF,color:#FFFFFF

linkStyle 0 stroke-width:4px;
linkStyle 1 stroke-width:4px;
linkStyle 2 stroke-width:4px;
linkStyle 3 stroke-width:4px;
linkStyle 4 stroke-width:4px;

class S invisible
class Hypothesis,Conclusions start
class Method,Data,Analysis normal
</div>

What is the Hypothesis for the Menus assignment?
]
---
.left-column[
## Method
]
.right-column[

<div class="mermaid">
graph LR
S((.)) --> Hypothesis((Hypothesis:<br>Decreased seek <br>time and errors))
Hypothesis -- "Study Design" --> Method((Method))
Method -- "Run Study" --> Data((Data))
Data -- "Clean and Prep" --> Analysis((Analysis))
Analysis --> Conclusions((Conclusions))

linkStyle 0 stroke-width:4px;
linkStyle 1 stroke-width:4px;
linkStyle 2 stroke-width:4px;
linkStyle 3 stroke-width:4px;
linkStyle 4 stroke-width:4px;

class S invisible
class Hypothesis,Conclusions start
class Method,Data,Analysis normal
</div>

3 tasks x 2 menu types = 6 *conditions*:

| | Normal | Pie |
|--|--|--|
| **Linear** |  |  |
| **Relational** |  |   |
| **Unclassified** |  |  |

- In each *condition* we test  `ITEM_MAX` different menu items
- For each menu item, we repeat `NUM_REPEATS` times
]

---
.left-column[
## Other Method considerations
]
.right-column[
An experimental *session* consists of 3 tasks x 2 menu types x
`ITEM_MAX` items x `NUM_REPEATS` repetitions = 72 *trials* in your homework

You have to run four participants through a complete session = 72 * 4
or 288 data points.

]

--
.right-column[
Participants do *all* trials (some designs participants only do some conditions)

Order of presentation of conditions and items is randomized (why?)
]
---
.left-column[
## Document what all of this in your [report](/interaction/assignments/menu-report)
]
.right-column[
Introduce study purpose
`Write two sentences describing the purpose of the experiment. This
can be the same text you use in your consent form`

Introduce study method - tasks
`Describe the 6 conditions of the study. Explain how many
items were selected per menu, and how many times each item was
repeated. Describe how many trials each participant completed. This
should be at most one paragraph`
]
---
.left-column[
## Document what all of this in your [report](/interaction/assignments/menu-report)
]
.right-column[
Introduce study purpose
`Write two sentences describing the purpose of the experiment. This
can be the same text you use in your consent form`

---
.left-column[
## Study Ethics
]
.right-column[
<div class="mermaid">
graph LR
S((.)) --> Hypothesis((Hypothesis:<br>Decreased seek <br>time and errors))
Hypothesis -- "Study Design" --> Method((2 menu x <br> 3 task conditions ))
Method -- "Run Study" --> Data((Data))
Data -- "Clean and Prep" --> Analysis((Analysis))
Analysis --> Conclusions((Conclusions))

linkStyle 0 stroke-width:4px;
linkStyle 1 stroke-width:4px;
linkStyle 2 stroke-width:4px;
linkStyle 3 stroke-width:4px;
linkStyle 4 stroke-width:4px;

class S invisible
class Hypothesis,Conclusions start
class Method,Data,Analysis normal
</div>

Ethical Principles for running participants. Driven by
[Criminal/Racist/Harmful studies](https://www.nytimes.com/2017/05/22/science/social-science-research-institutional-review-boards-common-rule.html)
 - Nazi war crimes
 - Tuskegee Syphilis study
 - Epilepsy studies of institutionalized children
 - [16,000 people involuntarily included in radiation
   studies](https://www.nytimes.com/1995/08/20/us/count-of-subjects-in-radiation-experiments-is-raised-to-16000.html?module=inline) 
 - [Milgram's study of electric shocking](https://www.simplypsychology.org/milgram.html)
 - [Stanford prison experiment](https://www.simplypsychology.org/zimbardo.html)
]

---
.left-column[
## Study Ethics
]
.right-column[

<div class="mermaid">
graph LR
S((.)) --> Hypothesis((Hypothesis:<br>Decreased seek <br>time and errors))
Hypothesis -- "Study Design" --> Method((2 menu x <br> 3 task conditions ))
Method -- "Run Study" --> Data((Data))
Data -- "Clean and Prep" --> Analysis((Analysis))
Analysis --> Conclusions((Conclusions))

linkStyle 0 stroke-width:4px;
linkStyle 1 stroke-width:4px;
linkStyle 2 stroke-width:4px;
linkStyle 3 stroke-width:4px;
linkStyle 4 stroke-width:4px;

class S invisible
class Hypothesis,Conclusions start
class Method,Data,Analysis normal
</div>

Basic ethics ([Belmont Report](https://www.hhs.gov/ohrp/regulations-and-policy/belmont-report/read-the-belmont-report/index.html))
- Beneficence --> 
 - Value of research higher than risks
 - Do no harm
- Respect for Persons --> 
 - Fully informed of intent and purpose
 - Informed consent
 - May opt out at any time, for any reason
- Justice
 - equitable, representative selection of participants
]
---
.left-column[
## Study Ethics
]
.right-column[

linkStyle 0 stroke-width:4px;
linkStyle 1 stroke-width:4px;
linkStyle 2 stroke-width:4px;
linkStyle 3 stroke-width:4px;
linkStyle 4 stroke-width:4px;

class S invisible
class Hypothesis,Conclusions start
class Method,Data,Analysis normal
</div>

Basic ethics ([Belmont Report](https://www.hhs.gov/ohrp/regulations-and-policy/belmont-report/read-the-belmont-report/index.html))
- Beneficence --> 
 - Value of research higher than risks
 - Do no harm
- Respect for Persons --> 
 - **Fully informed of intent and purpose**
 - **Informed consent**
 - **May opt out at any time, for any reason**
- Justice
 - equitable, representative selection of participants
]

---
.left-column[
## Consent
]
.right-column[
Write your [consent](/interaction/assignments/consent) form

- Purpose of study (Beneficience)
- Requirements for participation (Respect for Persons)
- Study procedures (Respect for Persons)
- Voluntariness (Respect for Persons)
- Benefits to Society (Beneficience) 
- Contact (of IRB typically; Me in this case)

]

???
- Beneficence --> 
 - Value of research higher than risks
 - Do no harm
- Respect for Persons --> 
 - Fully informed of intent and purpose
 - Informed consent
 - May opt out at any time, for any reason
- Justice
 - equitable, representative selection of participants
---
.left-column[
## Document what all of this in your [report](/interaction/assignments/menu-report)
]
.right-column[
Method

- Participants

`Describe your participants (without identifying
them). How were they recruited? How many were there?  Were
they consented? You can also add
some optional information such as: What was there average age? What
genders were present? How experienced were they with android?`
]
---
.left-column[
## Data Collection
]
.right-column[

linkStyle 0 stroke-width:4px;
linkStyle 1 stroke-width:4px;
linkStyle 2 stroke-width:4px;
linkStyle 3 stroke-width:4px;
linkStyle 4 stroke-width:4px;

class S invisible
class Hypothesis,Conclusions start
class Method,Data,Analysis normal
</div>

**Clear your data** file before you start the **first participant
only**

Have participant read and sign the consent form

Emphasize key points verbally

Be Consistent

Download result
- you can use
a tool window called `Device File Manager`.

]
???
**voluntariness**

---
.left-column[
## Document what all of this in your [report](/interaction/assignments/menu-report)
]
.right-column[
Method - Setting
`What device was used? Was it an emulator? Where did the experiment take place?`

Method - Data Collected
`What information was collected (time, errors, etc)`
]
---
.left-column[
## Data Collection

![:img Picture of a dialogue box called Import file showing that you should replace current sheet and automatically detect separator type and convert text to numbers dates and formulas,100%](img/studies/import.png)

]
.right-column[

<div class="mermaid">
graph LR
S((.)) --> Hypothesis((Hypothesis:<br>Decreased seek <br>time and errors))
Hypothesis -- "Study Design" --> Method((2 menu x <br> 3 task conditions ))
Method -- "Run Study" --> Data((Data<br>Consent<br>Consistency))
Data -- "Clean and Prep" --> Analysis((Analysis))
Analysis --> Conclusions((Conclusions))

linkStyle 0 stroke-width:4px;
linkStyle 1 stroke-width:4px;
linkStyle 2 stroke-width:4px;
linkStyle 3 stroke-width:4px;
linkStyle 4 stroke-width:4px;

class S invisible
class Hypothesis,Conclusions start
class Method,Data,Analysis normal
</div>

Select the 'raw' sheet of your spreadsheet
![:img Picture of a spreadsheet with the tab titled 'Raw' selected](img/studies/raw.png)

Load your file into the spreadsheet

]

---
.left-column[
## Data Collection

![:img Image of bar chart comparing tasks to menu type showing that
normal menus get progressively slower as items become nonlinear while
pie menus are about the same, 100%](img/studies/chart.png)
]
.right-column[

<div class="mermaid">
graph LR
S((.)) --> Hypothesis((Hypothesis:<br>Decreased seek <br>time and errors))
Hypothesis -- "Study Design" --> Method((2 menu x <br> 3 task conditions ))
Method -- "Run Study" --> Data((Data<br>Consent<br>Consistency))
Data -- "Clean and Prep" --> Analysis((Analysis))
Analysis --> Conclusions((Conclusions))

linkStyle 0 stroke-width:4px;
linkStyle 1 stroke-width:4px;
linkStyle 2 stroke-width:4px;
linkStyle 3 stroke-width:4px;
linkStyle 4 stroke-width:4px;

class S invisible
class Hypothesis,Conclusions start
class Method,Data,Analysis normal
</div>

Now click on `Example Chart`. Here you can

Analyze and chart data: Simple Statistics
- Min, Max, Mean (Sum/#), Median (Middle #), Mode (Most Common #)

Demo

Do this for speed *and* error.

]
---
.left-column[
## Document what all of this in your [report](/interaction/assignments/menu-report)
]
.right-column[
Speed Results

`Describe your thoughts about overall speed in different
conditions. Use at least one chart to illustrate what you say. Here is
an example chart generated using our data, when you paste your data
into the spreadsheet you’ll see that it updates to reflect your data`

Error Results

`Describe what happened in terms of errors -- provide at least one chart showing
what you learned about errors in different conditions`
]
---
.title[Can we determine *causality*?]
.body[
Implies *dependence* between speed/error, task, and menu type

Assumes you have measured the right variables!
]
---
.title[Dependence/independence]
.body[
Events that are independent
- Flipping heads and then tails
- Day of week and whether a patient had a heart attack (probably?)

Events that are dependent
- Vice presidential candidate and presidential nominee
- Diagnostic test being positive and whether patient has a disease
]
---
.title[Correlation]
.body[
![:img Example of correlations for heigh vs age (positively correlated) and
height vs birth month (uncorrelated),80%](img/studies/correlation.png)
]
---
.title[Correlation != Causation]
.body[
![:img Correlation of number of people who drowned per year and films
nicolas cage appeared in, 80%](img/studies/cagelation.png)
![:img Two people talking. Says one: I used to think correlation
implied causation. Says the other: Then I took a statistics
course. Now I don't. Says the other: Sounds like the class
helped. Says the first: Well maybe, 50%](img/studies/correlation-cartoon.png)
]
---
.title[Charting helps you check your assumptions
]
.body[
![:img Image of bar chart comparing tasks to menu type showing that
normal menus get progressively slower as items become nonlinear while
pie menus are about the same, 100%](img/studies/chart.png)

]
---
.title[But having the right chart matters
]
.body[
![:img Histogram of data about speed of pie menu selection and linear
menu selection for each task type, 50%](img/studies/histogram.png)

]
---
.title[Normal Vs Pie
]
.body[
|Normal | Pie|
|--|--|
|![:img Histogram of data about speed of pie menu selection and linear menu selection for each task type,100%](img/studies/normal-only.png)|![:img Histogram of data about speed of pie menu selection and linear menu selection for each task type, 100%](img/studies/pie-only.png)|

]
---
.title[Comparing two groups to see if they are different]
.body[
![:img two distributions, 80%](img/studies/samples1.png)
]
---
.title[Comparing two groups]
.body[
![:img two distributions showing overlapping 95% confidence intervals,
80%](img/studies/samples2.png)
]
---
.title[Comparing two groups]
.body[
![:img two distributions with less overlapping intervals (because more
data is present), 80%](img/studies/samples3.png)
]
---
.title[Comparing two groups]
.body[
![:img two distributions with more data but overlapping histograms
thus hard to differentiate, 80%](img/studies/samples4.png)
]
---
.title[Comparing two groups]
.body[
![:img two distributions with less overlapping intervals (because more
data is present), 80%](img/studies/samples5.png)
]
---
.title[Comparing two groups]
.body[
![:img two distributions not overlapping at all, 80%](img/studies/samples6.png)
]
---
.title[Comparing two groups]
.body[
![:img same two distributions with difference between means marked as
effect size, 80%](img/studies/samples7.png)
]
---
.title[Our data]
.body[
Not that different in my sample set (just me doing it 3 times). We
hope to see better results in your data, and even better if we merge
all your data!

![:img Image of bar chart comparing tasks to menu type showing that
normal menus get progressively slower as items become nonlinear while
pie menus are about the same, 50%](img/studies/chart.png)

]
---
.title[Common Statistical Test for comparison: t-test]
.body[
Tests for difference between two samples

Best used to determine what is ‘worthy of a second look’

Limited in its applicability to normal, independent data

Does not help to document effect size [the actual difference between groups], just effect likelihood
]
---
.left-column[#Problems with t-tests

The more implausible the hypothesis, the greater chance that it is a
‘false alarm’
]
.right-column[
![:img A picture of the likelihood of a positive t-test as influenced by
priors (the expected likelihood of an outcome), 80%](img/studies/priors.png)

]
???
Top row: Prior (what's known to be true before the experiment)
bottom row: Calculated p-value

Notice the middle column, where something that is a toss-up has higher
plausability than we would expect. This is a "Type 1 error"

Alternatively, a small sample may cause a Type II error (failure to
detect a true difference) due to random sampling bias

---
.left-column[#Problems with t-tests]
.right-column[
- Doesn’t take prior knowledge into account
- Susceptible to ‘data dredging’ 
 - The more tests you conduct the more likely you will find a result
   even if one is not there
 - Adjustments mid experiment
- Gives a yes or no answer: Either the null hypothesis is rejected (the result would be unlikely in a world where the null hypothesis was true) or it cannot be rejected
- Based on assumed ‘average’ sample
]
---
.title[Which problems might affect our study?]
???
Too many comparisons
--
.body[
12 separate comparisons (2x3 conditions, 2 measures)

ANOVA (**An**alysis **O**f **Va**riance): Fancy t-test that accounts for the whole group effect before
doing pairwise comparisons
]
???
- Doesn’t take prior knowledge into account
- Susceptible to ‘data dredging’ 
 - The more tests you conduct the more likely you will find a result
   even if one is not there
 - Adjustments mid experiment
- Gives a yes or no answer: Either the null hypothesis is rejected (the result would be unlikely in a world where the null hypothesis was true) or it cannot be rejected
- Based on assumed ‘average’ sample
---
.left-column[
## Document what all of this in your [report](/interaction/assignments/menu-report)
]
.right-column[
Statistical Significance: Something like
`Pie menus were twice as fast as normal menus (M=.48s vs M=.83s), F(1,43)=295.891, p<.05. Unclassified menu items were harder to find than linear and relative ones (M=.84s, .59s, and .59s respectively), F(2,43)=93.778, p<0.5. We also found an interaction effect between menu and task (as illustrated in the chart above), F(5, 43) = 51.945, p<.001.`

Table of results found in `Speed Analysis` and `Error Analysis`

Demo

]

---
.title[Data Collection]
.body[

<div class="mermaid">
graph LR
S((.)) --> Hypothesis((Hypothesis:<br>Decreased seek <br>time and errors))
Hypothesis -- "Study Design" --> Method((2 menu x <br> 3 task conditions ))
Method -- "Run Study" --> Data((Consent<br>Consistency))
Data -- "Clean and Prep" --> Analysis((Clean<br>Compute))
Analysis --> Conclusions((Conclusions))

linkStyle 0 stroke-width:4px;
linkStyle 1 stroke-width:4px;
linkStyle 2 stroke-width:4px;
linkStyle 3 stroke-width:4px;
linkStyle 4 stroke-width:4px;

class S invisible
class Hypothesis,Conclusions start
class Method,Data,Analysis normal
</div>

Draw Conclusions
- Were errors less?
- Was time faster?

`Describe your conclusions. Do you think we should use pie menus more? What can we conclude from your data?`
]

---
.title[Limitations of Laboratory Studies]

???
Simulate real world environments
- Location and equipment may be unfamiliar to participant [Coyne & Nielsen 2001]
- Observation may effect performance - “Hawthorne Effect” [Mayo 1933] 
- Participant may become fatigued and not take necessary rest - “Demand Effect” [Orne 1962]
- Tasks frequently artificial and repetitive, which may bore participants and negatively effect performance
Studying real world use removes these limitations

--
.body[
Simulate real world environments
- Location and equipment may be unfamiliar to participant [Coyne & Nielsen 2001]
- Observation may effect performance - “Hawthorne Effect” [Mayo 1933] 
- Participant may become fatigued and not take necessary rest - “Demand Effect” [Orne 1962]
- Tasks frequently artificial and repetitive, which may bore participants and negatively effect performance
Studying real world use removes these limitations
]

layout: true