name: inverse layout: true class: center, middle, inverse --- # Running a Quantitative Study Jennifer Mankoff CSE 340 Spring 2019 --- layout:false [//]: # (Outline Slide) .title[Today's goals] .body[ - Discuss steps of running a study - Practice onboarding participants - Practice data analysis - Go over Assignment 1 ] --- .left-column[ ## Experiment Design ] .right-column[
graph LR S((.)) --> Hypothesis((Hypothesis)) Hypothesis -- "Study Design" --> Method((Method)) Method -- "Run Study" --> Data((Data)) Data -- "Clean and Prep" --> Analysis((Analysis)) Analysis --> Conclusions((Conclusions)) classDef finish outline-style:double,fill:#d1e0e0,stroke:#333,stroke-width:2px; classDef normal fill:#e6f3ff,stroke:#333,stroke-width:2px; classDef start fill:#d1e0e0,stroke:#333,stroke-width:4px; classDef invisible fill:#FFFFFF,stroke:#FFFFFF,color:#FFFFFF linkStyle 0 stroke-width:4px; linkStyle 1 stroke-width:4px; linkStyle 2 stroke-width:4px; linkStyle 3 stroke-width:4px; linkStyle 4 stroke-width:4px; class S invisible class Hypothesis,Conclusions start class Method,Data,Analysis normal
What is the Hypothesis for the Menus assignment? ] --- .left-column[ ## Method ] .right-column[
graph LR S((.)) --> Hypothesis((Hypothesis:
Decreased seek
time and errors)) Hypothesis -- "Study Design" --> Method((Method)) Method -- "Run Study" --> Data((Data)) Data -- "Clean and Prep" --> Analysis((Analysis)) Analysis --> Conclusions((Conclusions)) classDef finish outline-style:double,fill:#d1e0e0,stroke:#333,stroke-width:2px; classDef normal fill:#e6f3ff,stroke:#333,stroke-width:2px; classDef start fill:#d1e0e0,stroke:#333,stroke-width:4px; classDef invisible fill:#FFFFFF,stroke:#FFFFFF,color:#FFFFFF linkStyle 0 stroke-width:4px; linkStyle 1 stroke-width:4px; linkStyle 2 stroke-width:4px; linkStyle 3 stroke-width:4px; linkStyle 4 stroke-width:4px; class S invisible class Hypothesis,Conclusions start class Method,Data,Analysis normal
3 tasks x 2 menu types = 6 *conditions*: | | Normal | Pie | |--|--|--| | **Linear** | | | | **Relational** | | | | **Unclassified** | | | - In each *condition* we test `ITEM_MAX` different menu items - For each menu item, we repeat `NUM_REPEATS` times ] --- .left-column[ ## Other Method considerations ] .right-column[ An experimental *session* consists of 3 tasks x 2 menu types x `ITEM_MAX` items x `NUM_REPEATS` repetitions = 72 *trials* in your homework You have to run four participants through a complete session = 72 * 4 or 288 data points. ] -- .right-column[ Participants do *all* trials (some designs participants only do some conditions) Order of presentation of conditions and items is randomized (why?) ] --- .left-column[ ## Document what all of this in your [report](/interaction/assignments/menu-report) ] .right-column[ Introduce study purpose `Write two sentences describing the purpose of the experiment. This can be the same text you use in your consent form` Introduce study method - tasks `Describe the 6 conditions of the study. Explain how many items were selected per menu, and how many times each item was repeated. Describe how many trials each participant completed. This should be at most one paragraph` ] --- .left-column[ ## Document what all of this in your [report](/interaction/assignments/menu-report) ] .right-column[ Introduce study purpose `Write two sentences describing the purpose of the experiment. This can be the same text you use in your consent form` Introduce study method - tasks `Describe the 6 conditions of the study. Explain how many items were selected per menu, and how many times each item was repeated. Describe how many trials each participant completed. This should be at most one paragraph` ] --- .left-column[ ## Study Ethics ] .right-column[
graph LR S((.)) --> Hypothesis((Hypothesis:
Decreased seek
time and errors)) Hypothesis -- "Study Design" --> Method((2 menu x
3 task conditions )) Method -- "Run Study" --> Data((Data)) Data -- "Clean and Prep" --> Analysis((Analysis)) Analysis --> Conclusions((Conclusions)) classDef finish outline-style:double,fill:#d1e0e0,stroke:#333,stroke-width:2px; classDef normal fill:#e6f3ff,stroke:#333,stroke-width:2px; classDef start fill:#d1e0e0,stroke:#333,stroke-width:4px; classDef invisible fill:#FFFFFF,stroke:#FFFFFF,color:#FFFFFF linkStyle 0 stroke-width:4px; linkStyle 1 stroke-width:4px; linkStyle 2 stroke-width:4px; linkStyle 3 stroke-width:4px; linkStyle 4 stroke-width:4px; class S invisible class Hypothesis,Conclusions start class Method,Data,Analysis normal
Ethical Principles for running participants. Driven by [Criminal/Racist/Harmful studies](https://www.nytimes.com/2017/05/22/science/social-science-research-institutional-review-boards-common-rule.html) - Nazi war crimes - Tuskegee Syphilis study - Epilepsy studies of institutionalized children - [16,000 people involuntarily included in radiation studies](https://www.nytimes.com/1995/08/20/us/count-of-subjects-in-radiation-experiments-is-raised-to-16000.html?module=inline) - [Milgram's study of electric shocking](https://www.simplypsychology.org/milgram.html) - [Stanford prison experiment](https://www.simplypsychology.org/zimbardo.html) ] --- .left-column[ ## Study Ethics ] .right-column[
graph LR S((.)) --> Hypothesis((Hypothesis:
Decreased seek
time and errors)) Hypothesis -- "Study Design" --> Method((2 menu x
3 task conditions )) Method -- "Run Study" --> Data((Data)) Data -- "Clean and Prep" --> Analysis((Analysis)) Analysis --> Conclusions((Conclusions)) classDef finish outline-style:double,fill:#d1e0e0,stroke:#333,stroke-width:2px; classDef normal fill:#e6f3ff,stroke:#333,stroke-width:2px; classDef start fill:#d1e0e0,stroke:#333,stroke-width:4px; classDef invisible fill:#FFFFFF,stroke:#FFFFFF,color:#FFFFFF linkStyle 0 stroke-width:4px; linkStyle 1 stroke-width:4px; linkStyle 2 stroke-width:4px; linkStyle 3 stroke-width:4px; linkStyle 4 stroke-width:4px; class S invisible class Hypothesis,Conclusions start class Method,Data,Analysis normal
Basic ethics ([Belmont Report](https://www.hhs.gov/ohrp/regulations-and-policy/belmont-report/read-the-belmont-report/index.html)) - Beneficence --> - Value of research higher than risks - Do no harm - Respect for Persons --> - Fully informed of intent and purpose - Informed consent - May opt out at any time, for any reason - Justice - equitable, representative selection of participants ] --- .left-column[ ## Study Ethics ] .right-column[
graph LR S((.)) --> Hypothesis((Hypothesis:
Decreased seek
time and errors)) Hypothesis -- "Study Design" --> Method((2 menu x
3 task conditions )) Method -- "Run Study" --> Data((Data)) Data -- "Clean and Prep" --> Analysis((Analysis)) Analysis --> Conclusions((Conclusions)) classDef finish outline-style:double,fill:#d1e0e0,stroke:#333,stroke-width:2px; classDef normal fill:#e6f3ff,stroke:#333,stroke-width:2px; classDef start fill:#d1e0e0,stroke:#333,stroke-width:4px; classDef invisible fill:#FFFFFF,stroke:#FFFFFF,color:#FFFFFF linkStyle 0 stroke-width:4px; linkStyle 1 stroke-width:4px; linkStyle 2 stroke-width:4px; linkStyle 3 stroke-width:4px; linkStyle 4 stroke-width:4px; class S invisible class Hypothesis,Conclusions start class Method,Data,Analysis normal
Basic ethics ([Belmont Report](https://www.hhs.gov/ohrp/regulations-and-policy/belmont-report/read-the-belmont-report/index.html)) - Beneficence --> - Value of research higher than risks - Do no harm - Respect for Persons --> - **Fully informed of intent and purpose** - **Informed consent** - **May opt out at any time, for any reason** - Justice - equitable, representative selection of participants ] --- .left-column[ ## Consent ] .right-column[ Write your [consent](/interaction/assignments/consent) form - Purpose of study (Beneficience) - Requirements for participation (Respect for Persons) - Study procedures (Respect for Persons) - Voluntariness (Respect for Persons) - Benefits to Society (Beneficience) - Contact (of IRB typically; Me in this case) ] ??? - Beneficence --> - Value of research higher than risks - Do no harm - Respect for Persons --> - Fully informed of intent and purpose - Informed consent - May opt out at any time, for any reason - Justice - equitable, representative selection of participants --- .left-column[ ## Document what all of this in your [report](/interaction/assignments/menu-report) ] .right-column[ Method - Participants `Describe your participants (without identifying them). How were they recruited? How many were there? Were they consented? You can also add some optional information such as: What was there average age? What genders were present? How experienced were they with android?` ] --- .left-column[ ## Data Collection ] .right-column[
graph LR S((.)) --> Hypothesis((Hypothesis:
Decreased seek
time and errors)) Hypothesis -- "Study Design" --> Method((2 menu x
3 task conditions )) Method -- "Run Study" --> Data((Data)) Data -- "Clean and Prep" --> Analysis((Analysis)) Analysis --> Conclusions((Conclusions)) classDef finish outline-style:double,fill:#d1e0e0,stroke:#333,stroke-width:2px; classDef normal fill:#e6f3ff,stroke:#333,stroke-width:2px; classDef start fill:#d1e0e0,stroke:#333,stroke-width:4px; classDef invisible fill:#FFFFFF,stroke:#FFFFFF,color:#FFFFFF linkStyle 0 stroke-width:4px; linkStyle 1 stroke-width:4px; linkStyle 2 stroke-width:4px; linkStyle 3 stroke-width:4px; linkStyle 4 stroke-width:4px; class S invisible class Hypothesis,Conclusions start class Method,Data,Analysis normal
**Clear your data** file before you start the **first participant only** Have participant read and sign the consent form Emphasize key points verbally Be Consistent Download result - you can use a tool window called `Device File Manager`. ] ??? **voluntariness** --- .left-column[ ## Document what all of this in your [report](/interaction/assignments/menu-report) ] .right-column[ Method - Setting `What device was used? Was it an emulator? Where did the experiment take place?` Method - Data Collected `What information was collected (time, errors, etc)` ] --- .left-column[ ## Data Collection ![:img Picture of a dialogue box called Import file showing that you should replace current sheet and automatically detect separator type and convert text to numbers dates and formulas,100%](img/studies/import.png) ] .right-column[
graph LR S((.)) --> Hypothesis((Hypothesis:
Decreased seek
time and errors)) Hypothesis -- "Study Design" --> Method((2 menu x
3 task conditions )) Method -- "Run Study" --> Data((Data
Consent
Consistency)) Data -- "Clean and Prep" --> Analysis((Analysis)) Analysis --> Conclusions((Conclusions)) classDef finish outline-style:double,fill:#d1e0e0,stroke:#333,stroke-width:2px; classDef normal fill:#e6f3ff,stroke:#333,stroke-width:2px; classDef start fill:#d1e0e0,stroke:#333,stroke-width:4px; classDef invisible fill:#FFFFFF,stroke:#FFFFFF,color:#FFFFFF linkStyle 0 stroke-width:4px; linkStyle 1 stroke-width:4px; linkStyle 2 stroke-width:4px; linkStyle 3 stroke-width:4px; linkStyle 4 stroke-width:4px; class S invisible class Hypothesis,Conclusions start class Method,Data,Analysis normal
Select the 'raw' sheet of your spreadsheet ![:img Picture of a spreadsheet with the tab titled 'Raw' selected](img/studies/raw.png) Load your file into the spreadsheet ] --- .left-column[ ## Data Collection ![:img Image of bar chart comparing tasks to menu type showing that normal menus get progressively slower as items become nonlinear while pie menus are about the same, 100%](img/studies/chart.png) ] .right-column[
graph LR S((.)) --> Hypothesis((Hypothesis:
Decreased seek
time and errors)) Hypothesis -- "Study Design" --> Method((2 menu x
3 task conditions )) Method -- "Run Study" --> Data((Data
Consent
Consistency)) Data -- "Clean and Prep" --> Analysis((Analysis)) Analysis --> Conclusions((Conclusions)) classDef finish outline-style:double,fill:#d1e0e0,stroke:#333,stroke-width:2px; classDef normal fill:#e6f3ff,stroke:#333,stroke-width:2px; classDef start fill:#d1e0e0,stroke:#333,stroke-width:4px; classDef invisible fill:#FFFFFF,stroke:#FFFFFF,color:#FFFFFF linkStyle 0 stroke-width:4px; linkStyle 1 stroke-width:4px; linkStyle 2 stroke-width:4px; linkStyle 3 stroke-width:4px; linkStyle 4 stroke-width:4px; class S invisible class Hypothesis,Conclusions start class Method,Data,Analysis normal
Now click on `Example Chart`. Here you can Analyze and chart data: Simple Statistics - Min, Max, Mean (Sum/#), Median (Middle #), Mode (Most Common #) Demo Do this for speed *and* error. ] --- .left-column[ ## Document what all of this in your [report](/interaction/assignments/menu-report) ] .right-column[ Speed Results `Describe your thoughts about overall speed in different conditions. Use at least one chart to illustrate what you say. Here is an example chart generated using our data, when you paste your data into the spreadsheet you’ll see that it updates to reflect your data` Error Results `Describe what happened in terms of errors -- provide at least one chart showing what you learned about errors in different conditions` ] --- .title[Can we determine *causality*?] .body[ Implies *dependence* between speed/error, task, and menu type Assumes you have measured the right variables! ] --- .title[Dependence/independence] .body[ Events that are independent - Flipping heads and then tails - Day of week and whether a patient had a heart attack (probably?) Events that are dependent - Vice presidential candidate and presidential nominee - Diagnostic test being positive and whether patient has a disease ] --- .title[Correlation] .body[ ![:img Example of correlations for heigh vs age (positively correlated) and height vs birth month (uncorrelated),80%](img/studies/correlation.png) ] --- .title[Correlation != Causation] .body[ ![:img Correlation of number of people who drowned per year and films nicolas cage appeared in, 80%](img/studies/cagelation.png) ![:img Two people talking. Says one: I used to think correlation implied causation. Says the other: Then I took a statistics course. Now I don't. Says the other: Sounds like the class helped. Says the first: Well maybe, 50%](img/studies/correlation-cartoon.png) ] --- .title[Charting helps you check your assumptions ] .body[ ![:img Image of bar chart comparing tasks to menu type showing that normal menus get progressively slower as items become nonlinear while pie menus are about the same, 100%](img/studies/chart.png) ] --- .title[But having the right chart matters ] .body[ ![:img Histogram of data about speed of pie menu selection and linear menu selection for each task type, 50%](img/studies/histogram.png) ] --- .title[Normal Vs Pie ] .body[ |Normal | Pie| |--|--| |![:img Histogram of data about speed of pie menu selection and linear menu selection for each task type,100%](img/studies/normal-only.png)|![:img Histogram of data about speed of pie menu selection and linear menu selection for each task type, 100%](img/studies/pie-only.png)| ] --- .title[Comparing two groups to see if they are different] .body[ ![:img two distributions, 80%](img/studies/samples1.png) ] --- .title[Comparing two groups] .body[ ![:img two distributions showing overlapping 95% confidence intervals, 80%](img/studies/samples2.png) ] --- .title[Comparing two groups] .body[ ![:img two distributions with less overlapping intervals (because more data is present), 80%](img/studies/samples3.png) ] --- .title[Comparing two groups] .body[ ![:img two distributions with more data but overlapping histograms thus hard to differentiate, 80%](img/studies/samples4.png) ] --- .title[Comparing two groups] .body[ ![:img two distributions with less overlapping intervals (because more data is present), 80%](img/studies/samples5.png) ] --- .title[Comparing two groups] .body[ ![:img two distributions not overlapping at all, 80%](img/studies/samples6.png) ] --- .title[Comparing two groups] .body[ ![:img same two distributions with difference between means marked as effect size, 80%](img/studies/samples7.png) ] --- .title[Our data] .body[ Not that different in my sample set (just me doing it 3 times). We hope to see better results in your data, and even better if we merge all your data! ![:img Image of bar chart comparing tasks to menu type showing that normal menus get progressively slower as items become nonlinear while pie menus are about the same, 50%](img/studies/chart.png) ] --- .title[Common Statistical Test for comparison: t-test] .body[ Tests for difference between two samples Best used to determine what is ‘worthy of a second look’ Limited in its applicability to normal, independent data Does not help to document effect size [the actual difference between groups], just effect likelihood ] --- .left-column[#Problems with t-tests The more implausible the hypothesis, the greater chance that it is a ‘false alarm’ ] .right-column[ ![:img A picture of the likelihood of a positive t-test as influenced by priors (the expected likelihood of an outcome), 80%](img/studies/priors.png) ] ??? Top row: Prior (what's known to be true before the experiment) bottom row: Calculated p-value Notice the middle column, where something that is a toss-up has higher plausability than we would expect. This is a "Type 1 error" Alternatively, a small sample may cause a Type II error (failure to detect a true difference) due to random sampling bias --- .left-column[#Problems with t-tests] .right-column[ - Doesn’t take prior knowledge into account - Susceptible to ‘data dredging’ - The more tests you conduct the more likely you will find a result even if one is not there - Adjustments mid experiment - Gives a yes or no answer: Either the null hypothesis is rejected (the result would be unlikely in a world where the null hypothesis was true) or it cannot be rejected - Based on assumed ‘average’ sample ] --- .title[Which problems might affect our study?] ??? Too many comparisons -- .body[ 12 separate comparisons (2x3 conditions, 2 measures) ANOVA (**An**alysis **O**f **Va**riance): Fancy t-test that accounts for the whole group effect before doing pairwise comparisons ] ??? - Doesn’t take prior knowledge into account - Susceptible to ‘data dredging’ - The more tests you conduct the more likely you will find a result even if one is not there - Adjustments mid experiment - Gives a yes or no answer: Either the null hypothesis is rejected (the result would be unlikely in a world where the null hypothesis was true) or it cannot be rejected - Based on assumed ‘average’ sample --- .left-column[ ## Document what all of this in your [report](/interaction/assignments/menu-report) ] .right-column[ Statistical Significance: Something like `Pie menus were twice as fast as normal menus (M=.48s vs M=.83s), F(1,43)=295.891, p<.05. Unclassified menu items were harder to find than linear and relative ones (M=.84s, .59s, and .59s respectively), F(2,43)=93.778, p<0.5. We also found an interaction effect between menu and task (as illustrated in the chart above), F(5, 43) = 51.945, p<.001.` Table of results found in `Speed Analysis` and `Error Analysis` Demo ] --- .title[Data Collection] .body[
graph LR S((.)) --> Hypothesis((Hypothesis:
Decreased seek
time and errors)) Hypothesis -- "Study Design" --> Method((2 menu x
3 task conditions )) Method -- "Run Study" --> Data((Consent
Consistency)) Data -- "Clean and Prep" --> Analysis((Clean
Compute)) Analysis --> Conclusions((Conclusions)) classDef finish outline-style:double,fill:#d1e0e0,stroke:#333,stroke-width:2px; classDef normal fill:#e6f3ff,stroke:#333,stroke-width:2px; classDef start fill:#d1e0e0,stroke:#333,stroke-width:4px; classDef invisible fill:#FFFFFF,stroke:#FFFFFF,color:#FFFFFF linkStyle 0 stroke-width:4px; linkStyle 1 stroke-width:4px; linkStyle 2 stroke-width:4px; linkStyle 3 stroke-width:4px; linkStyle 4 stroke-width:4px; class S invisible class Hypothesis,Conclusions start class Method,Data,Analysis normal
Draw Conclusions - Were errors less? - Was time faster? `Describe your conclusions. Do you think we should use pie menus more? What can we conclude from your data?` ] --- .title[Limitations of Laboratory Studies] ??? Simulate real world environments - Location and equipment may be unfamiliar to participant [Coyne & Nielsen 2001] - Observation may effect performance - “Hawthorne Effect” [Mayo 1933] - Participant may become fatigued and not take necessary rest - “Demand Effect” [Orne 1962] - Tasks frequently artificial and repetitive, which may bore participants and negatively effect performance Studying real world use removes these limitations -- .body[ Simulate real world environments - Location and equipment may be unfamiliar to participant [Coyne & Nielsen 2001] - Observation may effect performance - “Hawthorne Effect” [Mayo 1933] - Participant may become fatigued and not take necessary rest - “Demand Effect” [Orne 1962] - Tasks frequently artificial and repetitive, which may bore participants and negatively effect performance Studying real world use removes these limitations ]
layout: true