It is your first day of work as a new college graduate. You recently accepted a position to work for an on-demand office supply company, OfficeSwipe. During your first meeting, you manager debriefs you on what your role would be: you are in charge of outreach, specifically for finding new engineers. Your background is in computer science, but you never trained machine learning models to vet potential candidates, nor do you have experience in management. This was far from what you expected to do; where do you even start?
[[Ask a coworker for advice->Coworker advice]]
[[Do a google search->Google search]]
(set: $searchesLeft to 3)
(set: $advanced to false)
(set: $lookedAtFAIR to false)You squash that idea pretty quickly. You don't want to be known as the guy who doesn't know what he is doing.
[[Google it is->Google search]] <img src="https://i.imgur.com/gGHKMPf.png">
[[The LinkedIn Dataset: Metadata of LinkedIn profiles->The LinkedIn Dataset]]
Use easily. Train quickly
[[Machine Learning for Dummies->Useless]]
Pick up basic ML skills in minutes
<a href="http://elitedatascience.com/machine-learning-impact/" target=“_blank”>How you can be using ML to improve your life</a>
[[Advanced ML, third edition->Tutorial]]
Everything you need to know about ML in one book
[[FAIR sets its sight on a new target, suppl.ly->FAIR arc]]
How the once up-and-coming "UBER for office supplies" ran themselves to the ground
[[ML in the real world, An insider's guide on how the big 5 hire->ClickBait]]
How Google, Facebook, Twitter, and others sift through millions of candidates
(set: $searchesLeft to $searchesLeft - 1)Without any clear way forward, your work continues unsucessfully for a few days, before you begrudgingly resign.
[[The world continued spinning, and eventually you needed to face the truth. What could you have done differently?->Uneasy Beginnings]] It was a complete disaster. Your boss came to you with grim news, but you had already had seen it trending on the web. FAIR had run their evaulations on OfficeSwipe, and it was discovered that 79% of recent hires were males with 55% of those hires caucasian. What had slipped past your watchful eye? Embaressingly enough, almost everything had. Within a week, OfficeSwipe was gone, leaving only the lessons to be learned from your poor choices.
(if: $typeUsed is 3) [
As you're leaving the office for the last time, a coworker begrudgingly metions they remember a similar thing happening to another on-demand office supply company...
]
[[The world continued spinning, and eventually you needed to face the truth. What could you have done differently?->Uneasy Beginnings]]
You could have written this book. You spend time skimming through the first chapter but gave up after that. Are you simply looking for a confidence booster?
(if: $searchesLeft is 0)[
[[Time's up->Ending A]]
]
(else:) [
[[Go back and look at a different result->Google search]]
]Microsoft released it's latest benchmark dataset, comprised of LinkedIn profiles all annotated with
* Resume (in the form of a .txt file)
* Profile picture (as a .png)
* Number of connections the profile had
* Number of views of the profile as of March 3rd, 2019
Names are left out for privacy reasons and to support anonymity. Microsoft hopes that this closes the gap on the problem of automated recruitment.
(if: $advanced is true) [
(if: $searchesLeft is 0)[
[[This is getting nowhere...->Ending A]]
]
(else:) [
[[Go back and look at a different result->Google search]]
]
]
(else:) [
[[This reminds me of another article...->Dataset Arc]]
]
(set: $advanced to true)You scroll to a chapter that looked useful. It mostly includes a lot of things you already knew, but something that you take away from it in particular was how to use the LinkedIn dataset.
The tutorial teaches you how to produce complex regression algorithms with deep neural networks.
Specifically by using the LinkedIn dataset, the chapter walks you through how to to predict the number of LinkedIn connections and the number of views of a profile simply from a profile's resume and picture.
(if: $advanced is true) [
(if: $searchesLeft is 0)[
[[This is getting nowhere...->Ending A]]
]
(else:) [
[[Go back and look at a different result->Google search]]
]
]
(else:) [
[[This reminds me of another article...->Dataset Arc]]
]
(set: $advanced to true)You decide to use the LinkedIn dataset, equating the number of connections and views as an indicator of a good potential hire.
Given your machine learning background, you can easily implement deep neural networks to get the job done.
You tell your boss about your plan and he approves!
[[Donwload the dataset and start implementing your model->Implement Model]]
(if: $lookedAtFAIR is 1) [
[[Download the dataset and inspect the data->Download the dataset and inspect]]
]It's has been two days since you've downloaded the dataset. You've looked at several dozen training instances, and you have a plan for implementing your model.
Your boss notices that it's been two days and you haven't started. He presses you to begin implementing the model.
[[Start your implementation->Implement Model]]
(if: $lookedAtFAIR is 1) [
[[Continue inspecting the dataset]]
]You have started to implement the model. What is the first element you want to start with?
[[Work with the resume data->Working with resume]]
[[Work with image data->Working with images]]Using state of the art methods, you create a feature extraction pipeline for text data. You train a model to regress on connections and views based on text features.
Now you have a model that can predict number of LinkedIn connections and views and you are able to find the best candidates using just their resumes.
[[Turn in this model to your boss->Your Job is over]]
(if: $typeUsed is 0) [
[[Go back and work with the image data->Working with images]]
]
(set: $typeUsed to $typeUsed + 1)Using state of the art methods, you create a feature extraction pipeline for images. You train a model to regress on connections and views based on the image features.
Now your model can predict number of linkedIn connections and views based on images and your algorithm is now ready to be deployed.
[[Turn in this model to your boss->Your Job is over]]
(if: $typeUsed is 0) [
[[Go back and work with the text data->Working with resume]]
]
(set: $typeUsed to $typeUsed + 2)(if: $typeUsed is 1) [
You trained your model solely on the text data. It worked perfectly during artificial testing using the linkedIn data set.
Your boss gave the order to deploy it into the companies recruiting system. It achieved very high performance in the real world and it was able to produce some very impressive candidates for your company. Your colleagues were very happy with your work and you got into a good standing with your boss.
Your company now owns a patent to a model that can predict potential from resume data and this marks the sudden increase in your company's growth.
]
(elseif: $typeUsed is 2)[
You trained your model solely on the image data. It worked perfectly during artificial testing using the linkedIn data set.
Your boss gave the order to deploy it into the companies recruiting system. It achieved very high performance in the real world and it was able to produce some very impressive candidates for your company. Your colleagues were very happy with your work and you got into a good standing with your boss.
Your company now owns a patent to a model that can predict potential from image data and this marks the sudden increase in your company's growth.
]
(else: ) [
You trained your model on both the image and text data. It worked perfectly during artificial testing using the linkedIn data set.
Your boss gave the order to deploy it into the companies recruiting system. It achieved very high performance in the real world and it was able to produce some very impressive candidates for your company. Your colleagues were very happy with your work and you got into a good standing with your boss.
Your company now owns a patent to a model that can predict potential from image and resume data. This is better than anything your competition has access to. Your company grows exponentially and it now owns 70% of the market share.
]
[[Accept Ending->Ending B]]You curse under your breath, another victim of clickbait.
(if: $searchesLeft is 0)[
[[Time's up->Ending A]]
]
(else:) [
[[Go back and look at a different result->Google search]]
]You come across an article depicted the downfall of *Supp.ly*, a company that would have rivaled OfficeSwipe. Unfortunately, for this website, you've read your last free article of the month...
**Supp.ly, the once up-and-coming "UBER for office supplies" startup was set to break even later this year...**
The start of the article has piqued your curiosity, but you currently don't have access to the rest of it.
[[Email a FAIR representative for the rest of the story->Email Rep]]
(if: $searchesLeft is 0)[
[[Should probably get back to work...->Ending A]]
]
(else:) [
[[Go back and look at a different result->Google search]]
]You emailed the non-profit, FAIR (Fairly Automating In Reality). A representative replied:
*
I realize that the article is not freely available, which is a shame because it's a story that I feel many new companies, especially in the advent of artifical intelligence, should know about.
Thank you for reaching out. I've attached the full-length article below.
Best,
...
*
You read the article in entirely. You are shocked to learn that **Supp.ly** went under due to public backlash. **Supp.ly** had implemented an algorithm to determine which clients get discounts in order to maximize future orders. They trained a model to predict the number of purchases based on user history and their profile information.
This led to severe imbalance in who received discounts.
(set: $lookedAtFAIR to 1)
[[Go back to the Google Search->Google search]] Your boss reluctantly allows you to spend more time disecting the dataset.
After combing through the data, you realize that there *might* be inherent bias:
* Resumes with male names tend to correlate with higher views as compared to female names
* Profile images depicting those of asian and caucasian descent tend to correlate with higher connections than those of african or hispanic descent
You bring this up at your next meeting, and your peers reassure you that similar datasets have been used with no issues. There is significant pressure for you to begin implementing and training your models.
[[Continue to your implementation while using this dataset->Implement Model]]
[[Continue looking into your doubts about the dataset]]You train preliminary models regressing on number of connections and views using resume and profile picture data.
After doing so, you notice some troubling trends...
* Identical resumes that differ only in the gender of names confirm your doubts about the model assigning higher LinkedIn views towards males
* You isolate profile pictures such that all candidates are wearing an overcoat, and either a blouse or a button-up shirt and tie. Regardless of equalized attire, pictures of those that are asian or caucasian are still predicted to have a higher number of LinkedIn connections.
You choose to take measures toward balancing the dataset, removing names from resumes and eliminating the use of profile pictures.
[[Implement your model using this balanced dataset]]You've spent a lot of time scrutinizing the dataset. Despite just finishing your model implementation and training, a separate company has already deployed their own candidate-vetting algorithm.
At this point, they are hiring first and faster than OfficeSwipe, stealing all of the best candidates.
At the moment, you are in bad standing with your boss, and your coworkers are frustrated that you delayed the deployment of the system they have been working so hard on with you.
You are afraid that within the next few weeks, you may lose your job.
[[Accept this reality]]
[[What would you do differently after discovering the dataset?->Dataset Arc]]
[[What would you do differently if you had to start all over?->Uneasy Beginnings]]You hold onto your beliefs from the very beginning. You learn from the mistakes of *Supp.ly* and despite pressure from your coworkers and your boss to deliver, you don't want to sacrifice integrity for performance.
It's been three years since other companies deployed their own automated vetting process. Despite having deployed your system last, OfficeSwipe is set to make a profit within the next two months.
Your vetting process ultimately became more tempered, more balanced, and more cognizant of the real candidate pool. As other companies became mercilessly scrutinized by FAIR, they began to struggle, losing market share and confidence from their user base.
You are now more resilient in what you believe in, and are more likely to start the conversation sooner. The pressure from your boss and coworkers have taught you the importance of communicating, and that the friction was a result of a disconnect in ethical standards.
[[Play again?->Uneasy Beginnings]]