Using Mobile Phones for Machine Learning

name: inverse
layout: true
class: center, middle, inverse
---
# Google maps

![![:img google maps,100%](img/ml/google.jpg)](img/ml/google.jpg)

.footnote[Picture from [Machine Learning on your Phone](https://www.appypie.com/top-machine-learning-mobile-apps)]

???
fame/shame
neighborhood traffic...
shorter commutes...
---
# Machine Learning and your Phone

Jennifer Mankoff

CSE 340 Spring 2019

.footnote[Slides credit: Jason Hong, Carnegie Mellon University
 [Are my Devices Spying on Me? Living in a World of
Ubiquitous
Computing](https://www.slideshare.net/jas0nh0ng/are-my-devices-spying-on-me-living-in-a-world-of-ubiquitous-computing);
]

---
layout: false

.left-column[
## Smartphones are Intimate Fun Facts about Millennials

![:fa thumbs-down] 83% sleep with phones
]
.right-column[
![:img Millenial with phone in bed, 100%](img/ml/phone-bed.jpg)
]
---
.left-column[
## Smartphones are Intimate Fun Facts about Millennials

![:fa thumbs-down] 83% sleep with phones

![:fa thumbs-down]  90% check first thing in morning
]
.right-column[
![:img Millenial with phone in bed, 100%](img/ml/phone-bed.jpg)
]
---
.left-column[
## Smartphones are Intimate Fun Facts about Millennials

![:fa thumbs-down] 83% sleep with phones

![:fa thumbs-down] 90% check first thing in morning

![:fa thumbs-down]  1 in 3 use in bathroom

]
.right-column[
![:img Millenial with phone in bed, 100%](img/ml/phone-bed.jpg)

]
---
.title[Smartphone Data is Intimate]
.body[
![:img Picture of smart phone screens with phone numbers; map; and sensor data, 100%](img/ml/personal.png)

| Who we know           | Sensors               | Where we go   |
|-----------------------|-----------------------|---------------|
| (contacts + call log) | (accel, sound, light) | (gps, photos) |

]
---
.title[Some useful applications of this data]
.body[
![:img Picture of LeafSnap app, 100%](img/ml/leafsnap.jpg)
]
.footnote[[LeafSnap](http://leafsnap.com/) uses computer vision to
identify trees by their leaves]

---
.title[Some useful applications of this data]
.body[
![:img Picture of Aipoly app, 100%](img/ml/aipoly.jpg)
]
.footnote[[Vision AI](https://www.aipoly.com/) uses computer vision to
identify images for the Blind and Visually Impaired]

---
.title[Some useful applications of this data]
.body[
![:img Picture of Carat app, 100%](img/ml/Carat.jpg)
]
.footnote[[Carat: Collaborative Energy
Diagnosis](http://carat.cs.helsinki.fi/) uses machine learning to save
battery life]
---
.title[Some useful applications of this data]
.body[
![:img Picture of Imprompdo app, 100%](img/ml/imprompdo.jpg)
]
.footnote[[Imprompdo](http://imprompdo.webflow.io/
) uses machine learning to recommend activities to do, both fund and todos]
---
.title[How do these systems work?]
.body[

Machine Learning is used to make these kinds of predictions 
- Machine learning is one area of Artificial Intelligence 
- This is the kind that’s been getting lots of press

The goal of machine learning is to develop systems that can improve
performance with more experience 
- Can use "example data" as "experience"
- Uses these examples to discern patterns 
- And to make predictions
]
---
.title[Two main approaches]
.body[
![:fa eye] *Supervised learning* (we have lots of examples of what should be
 predicted)
 
![:fa eye-slash] *Unsupervised learning* (e.g. clustering into groups and inferring what
they are about)

![:fa low-vision] Can combine these (semi-supervised)

![:fa history]  Can learn over time or train up front
]

---
.left-column[
## In class exercise

![:fa bed, fa-7x] 
]
.right-column[
How might you recognize sleep?

- What recognition question
- What sensors
]
???

(sleep quality? length?...)

How to interpret sensors?

---
.left-column[
## In class exercise

- What recognition question (sleep quality? length?...)
- What sensors
- How to interpret sensors?
]
.right-column[
![:img Sleep trace for accelerometer and sound, 80%](img/ml/sleep.png)
]

---
.title[How do we program this?]
.body[
Write down some rules

Implement them
]
---
.title[ML is a major shift in thinking]
.body[
Old Approach: Create software by hand
- Use libraries (like JQuery) and frameworks
- Create content, do layout, code up functionality
- Deterministic (code does what you tell it to)

New Approach: Collect data and train algorithms
- Will still do the above, but will also have some functionality based
on ML 
- *Collect lots of examples and train a ML algorithm*
- *Statistical way of thinking*
]

---
.title[How Machine Learning is Typically Used]
.body[
Step 1: Gather lots of data (easy on a phone!)
]
--
.body[

Step 2: Figure out useful features
- Convert data to information (not knowledge!)
- (typically) Collect labels

]
---
.title[How Machine Learning is Typically Used]
.body[
Step 1: Gather lots of data (easy on a phone!)

Step 2: Figure out useful features

Step 3: Select and train the ML algorithm
- Lots of toolkits for this
- Lots of algorithms to choose from
- Mostly treat as a "black box"

]
---
.left-column[
## Regression

![:img Example of regression, 100%](img/ml/regression.png)
]
.right-column[
Predicting a *continuous value* based on inputs
- Ex. House price based on #rooms, #bathrooms, etc
- Ex. #views based on page content

Simple example: linear regression
- Same as in statistics
- Seeks to minimize errorin predictions

Lots of algorithms
- See Wikipedia
]
---
.left-column[
## Classification
]
.right-column[
Predicting from a *set of categories*
- Ex. {Spam, Ham}?
- Ex. {Chalupa, Taco, Burrito}?

Lots of variants
- Multi-class (the examples above)
- One-class (identifies all objects in that class)
- Multi-label (it’s both a Chalupa and a Burrito)

Also lots of algorithms
- See Wikipedia
]
---

.left-column[
## Example classification algorithms]
.right-column[
Naïve Bayes (probabilities)

Neural Networks / Deep Learning (human brain)

**Decision Tree (workflow)**

Support Vector Machine (analogy / similarity)

]
---
.body[
![:img decision tree, 80%](img/ml/decisiontree.png)
]
---
.title[Recent advances: Deep Learning]
.body[
![:img Captioning Images. Note the errors,
60%](img/ml/captioning.png)
]
.footnote[[Captioning images. Note the
errors.](http://cs.stanford.edu/people/karpathy/deepimagesent/) Deep
learning now
[available on your phone!](https://www.tensorflow.org/lite)]

---
.title[Training process]
.body[
![:img ML Training Process, 100%](img/ml/training.png)
]
---
.title[How Machine Learning is Typically Used]
.body[
Step 1: Gather lots of data (easy on a phone!)

Step 2: Figure out useful features

Step 3: Select and train the ML algorithm

Step 4: Evaluate metrics (and iterate)
]
???
See how well algorithm does using several metrics
Error analysis: what went wrong and why
Iterate: get new data, make new features
---
.title[Evaluation Concerns]
.body[
Accuracy: Might be too error-prone

]
---
.left-column[
## Assessing Accuracy]
.right-column[

Prior probabilities
- Probability before any observations (ie just guessing)
- Ex. ML classifier to guess if a person is male or female based on name
 - Just assume all names are female (50% will be right)
- Your trained model needs to do better than prior

Other baseline approaches
- Cheap and dumb algorithms
- Ex. Names that end in vowel are female
- Your model needs to do better than these too
]

---
.left-column[
## Assessing Accuracy]

.right-column[
Don't just measure accuracy (percent right)

Sometimes we care about *False positives* vs *False negatives*
]
---
.left-column[
## Assessing Accuracy

## Confusion matrix helps show this]

.right-column[

|             |              | .red[Prediction]     |                      |
|-------------|--------------|----------------------|----------------------|
|             |              | **Positive**         | **Negative**         |
| .red[Label] | **Positive** | True Positive (good) | False Negative (bad) |
|             | **Negative** | False Positive (bad) | True Negative (good) |

Accuracy is (TP + TN) / (TP + FP + TN + FN)

]
---
.left-column[
## Assessing Accuracy

## Precision
]

.right-column[

|             |              | .red[Prediction]           |                      |
|-------------|--------------|----------------------------|----------------------|
|             |              | **Positive**               | **Negative**         |
| .red[Label] | **Positive** | .red[True Positive (good)] | False Negative (bad) |
|             | **Negative** | .ref[False Positive (bad)] | True Negative (good) |

Precision = TP / (TP+FP)

Intuition: Of the positive items, how many right?

]

---
.left-column[
## Assessing Accuracy

## Recall
]
.right-column[

|        |              | Prediction                 |                            |
|--------|--------------|----------------------------|----------------------------|
| Actual |              | **Positive**               | **Negative**               |
|        | **Positive** | .red[True Positive (good)] | .red[False Negative (bad)] |
|        | **Negative** | False Positive (bad)       | True Negative (good)       |

Recall = TP / (TP+FN)

Intuition: Of all things that should have been positive, how many actually labeled correctly?
]

---
.title[Evaluation Concerns]
.body[
Accuracy: Might be too error-prone

Overfitting: Your ML model is too specific for data you have
- Might not generalize well

![:img overfitting, 100%](img/ml/overfitting.png)
]

---
.title[Avoiding Overfitting]
.body[

To avoid overfitting, typically split data into training set and test set

Train model on training set, and test on test set

Often do this through cross validation

![:img cross validation, 60%](img/ml/cross-validation.png)
]

---
.title[How Machine Learning is Typically Used]
.body[
Step 1: Gather lots of data (easy on a phone!)

Step 2: Figure out useful features

Step 3: Select and train the ML algorithm

Step 4: Evaluate metrics (and iterate)

Step 5: Deploy
]
---
.title[What makes this work well?]
.body[
Typically more data is better

Accurate labels important

Quality of features determines quality of results

.red[*NOT* as sophisticated as the media makes out]
]
--
.body[
.red[*BUT* can infer all sorts of things]
]
---
.title[AI / Machine Learning Not As Sophisticated as in Media]
.body[

A lot of people outside of computer science often ascribe human
behaviors to AI systems 
- Especially desires and intentions 
- Works well for sci-fi, but not for today or near future

These systems only do: 
- What we program them to do 
- What they are trained to do (based on the (possibly biased) data) 
]
---
.title[Concerns]

.body[
Significant Societal Challenges for Privacy
]
---
.left-column[
## Wide Range of Privacy Risks]
.right-column[

| Everyday Risks     | Medium Risk         | Extreme Risks     |
|--------------------|---------------------|-------------------|
| Friends, Family    | Employer/Government | Stalkers, Hackers |
| Over-protection    | Over-monitoring     | Well-being        |
| Social obligations | Discrimination      | Personal safety   |
| Embarrassment      | Reputation          | Blackmail         |
|                    | Civil Liberties     |                   |

- It's not just Big Brother 
- It-s not just corporations 
- Privacy is about our relationships with every other individual and
  organization out there
  
]
---
.title[Five Reasons Why Privacy is Hard]
.body[
###1 Strong Incentives to for Companies to Collect Data

- Barriers to collecting data are also really low 
- More data means better predictive models

Data has strong potential to affect bottom line 
- Increasing relevance of online ads worth millions 
- "Post-purchase monetization"
]
---
.title[Five Reasons Why Privacy is Hard]
.body[
###2 Low Knowledge, Awareness, Motivation by Devs 
- Even those with CS degrees have little knowledge 
- In surveys and interviews, vast majority of app developers knew
  little about what privacy issues there were and how to address them 
- Many developers don’t realize how much data their app is collecting
(Or that it was collecting data at all)
 - App developers often use third-party libraries 
 - In one study, over 40% of apps collect data only because of these
 libraries
 - Some apps use several libraries, which mean your data is being sent
   to lots of third parties
]
---
.title[Five Reasons Why Privacy is Hard]
.body[
###3 Companies Get Little Pushback on Privacy

Let’s say you want to purchase a web cam 
- Go into store, can compare price, color, features
- But can’t easily compare privacy (or security) 
- So, privacy does not influence customer purchases 
- So, companies not incentivized to improve

Less than 0.1% of reviews on Google Play mention privacy concerns

This is a market failure – This is why companies assign privacy a low
priority
]

---
.title[Five Reasons Why Privacy is Hard]
.body[
###4 Unclear What the Right Thing To Do Is

Even if a company wants to be privacy-sensitive, it’s not always clear
what the right thing to do is 
]
???
privacy policies very long

Still state of the art for privacy notices

No one reads these
---
.title[Five Reasons Why Privacy is Hard]
.body[
###4 Unclear  What the Right Thing To Do Is 
 
Even if a company wants to be privacy-sensitive, it’s not always clear
what the right thing to do is
- What is the best way
of informing people? 
- What is the best way of storing data? 
- How to best assess what is / isn’t acceptable?
]
---
.title[Five Reasons Why Privacy is Hard]
.body[
###5 Burden on End-Users is Too High

Individuals have to be constantly vigilant

Individuals also   have to make too many decisions 
- Is this device good with respect   to privacy? 
- Should I install this app? 
- What are all the settings  I need to know? 
- What are all the terms and conditions? 
– Trackers,  cookies, VPNs, anonymizers, etc 
]
---
.title[Concerns]

.body[
Significant Societal Challenges for Privacy

Who should have the initiative?

]
---
.title[Mixed-initiative interfaces]
.body[
Basically, who is in charge?
- Does person initiate things? Or computer?
- How much does computer system do on your behalf?

Example: Autonomous vehicles
- Some people think Tesla autopilot is full autonomous, leads to risky actions

Why initiative matters
- Potential major shift: instead of direct manipulation, some smarts (intelligent agent) for automation
]
---
.left-column[
## Mixed-initiative best practices
- Significant value-added automation
- Considering uncertainty
- Socially appropriate interaction w/ agent
- Consider cost, benefit, uncertainty
- Use dialog to resolve uncertainty
- Support direct invocation and termination
- Remember recent interactions
]
.right-column[

![:img mixed initiative figure, 100%](img/ml/mixed-initiative.png)
]
---
.left-column[
## Mixed-initiative best practices
- Significant value-added automation
- Considering uncertainty
- Socially appropriate interaction w/ agent
- Consider cost, benefit, uncertainty
- Use dialog to resolve uncertainty
- Support direct invocation and termination
- Remember recent interactions
]
.right-column[

![:img mixed initiative figure, 100%](img/ml/mixed2.png)
]
???
Can see what agent is suggesting, in terms of scheduling a meeting

---
.left-column[
## Mixed-initiative best practices
- Significant value-added automation
- Considering uncertainty
- Socially appropriate interaction w/ agent
- Consider cost, benefit, uncertainty
- Use dialog to resolve uncertainty
- Support direct invocation and termination
- Remember recent interactions
]
.right-column[

![:img mixed initiative figure, 100%](img/ml/mixed3.png)
]
???
Uses anthropomorphized aganet
Uses speech for input
Uses mediation to help resolve conflict
---
.left-column[
## Mixed-initiative best practices
]

.right-column[
Built-in cost-benefit model in system
- If perceived benefit >> cost, then do the action
- Otherwise wait

Note that this is just one point in design space (1999), and still lots of open questions
- Ex. Should “intelligence” be anthropomorphized?
- Ex. How to learn what system can and can’t do?
- Ex. What kinds of tasks should be automated / not?
- Ex. What are strategies for showing state of system?
- Ex. What are strategies for preventing errors?
]
---

.title[Concerns]

.body[
Significant Societal Challenges for Privacy

Who should have the initiative?

Bias in Machine Learning
]
---
background-image: url(img/ml/gma.png)

.body[

.quote[Johnson says his jaw dropped when he read one of the reasons American
Express gave for lowering his credit limit:

![:fa quote-left] Other customers who have used their card at establishments
where you recently shopped have a poor repayment history with American
Express.
]
]
---
.right-column[

![:img bias figure, 100%](img/ml/bias.png)
]
---

.title[Concerns]

.body[
Significant Societal Challenges for Privacy

Who should have the initiative?

Bias in Machine Learning

Understanding ML
]
---
.title[Understanding what is going on: Forming Mental Models]
.body[
How does a system know I am addressing it?

How do I know a system is attending to me?

When I issue a command/action, how does the system know what it relates to?

How do I know that the system correctly understands my command and correctly executes my intended action?

]
.footnote[
Belloti et al., CHI 2002 ‘Making Sense of Sensing’
]
---
.title[Wrong location-based rec]
.body[

![:img wrong, 100%](img/ml/wrong.png)
]
???
Why did it not tell me about the Museum? How does it determine my location? 
Providing explana7ons to these ques7ons can make Intelligent systems Intelligible

other examples: caregiving hours by insurance company, etc

---
.title[Types of feedback]
.body[

Feedback: crucial to user’s understanding of how a system works and helping guide future action 
- What did the system do?
- What if I do W, what will the system do? 
- Why did the system do X?
- Why did the system not do Y
- How do I get the system to do Z? 
]
---
.title[Summary ML and ethics]
.body[

ML is powerful (but not perfect), often better than heuristics

Basic approach is collect data, train, test, deploy

Hard to understand what algorithms are doing (transparency)
- ML algorithms just try to optimize, but might end up finding a proxy for race, gender, computer, etc
- But hard to inspect these algorithms
- Still a huge open question

Privacy
- How much data should be collected about people?
- How to communicate this to people?
- What kinds of inferences are ok?
]
---

layout: true