Could this be a direct indicator of a powerful subconscious bias in Amazon's exi...

skohan · on Oct 10, 2018

So it depends how you set up the experiment right? I would assume the question you are posing to AI is not how much the resume in question resembles the set of resumes of hired engineers, but rather: given a resume, what is the probability that candidate will eventually be hired?

So the classification function should take into account the resumes of rejected engineers, rather than the pool of resumes of hired employees at Amazon. If someone is seeking a position as an engineer, it is not relevant how much their resume resembles that of HR people, but it is very relevant how much it resembles that of rejected engineering candidates.

If that's the case, then something like having the phrase "women's chess club" in one's resume should not be a meaningful factor for the classifier unless it disproportionately leads to rejection in the current process.

sanxiyn · on Oct 10, 2018

Why would one train on "all hires CVs"? It'd be "engineering CVs", moreover it'd be "engineering applicants CVs", not "engineering hires CVs".

Bartweiss · on Oct 10, 2018

In the article I noticed: "Problems with the data that underpinned the models’ judgments meant that unqualified candidates were often recommended for all manner of jobs"

That language is a bit ambiguous, it could just mean that the algorithm failed on a wide variety of jobs beyond engineering. But another reading suggests that the algorithm was not asked "is this person a good fit for this role" but instead "what, if anything, is this person qualified for?"

If that's the case, then the problem starts to make more sense: the algorithm learned a correlation between male-sounding resumes and being hired for engineering roles. That could produce a biased approach even if the decisions in the training data were gender neutral but position-specific. Of course, it would also mean that an Amazon ML team trained an algorithm with inputs that didn't match to its eventual task, and makes me wonder what they used as a test set...

(Anecdotally, Amazon spent quite a while recruiting me for SysEng work I'm wildly unqualified for and uninterested in, even suggesting a switch to applying for that team when I was already in the funnel for something I'm more qualified at. When my resume eventually made it to a syseng engineer, they were rightly baffled that I had landed on in their stack, giving me the sense that something was screwy with how Amazon decides who heads towards which role.)

michaelt · on Oct 10, 2018

Because you were trying to build a system that would perform CV filtering for your entire company, and you figured Deep Learning would just kinda take care of everything.

You're right you'd want to look at applicants' CVs - I skipped over that to make the numbers readily comprehensible.

megy · on Oct 11, 2018

Because we are guessing at the problem, we don't have complete data.

nkassis · on Oct 10, 2018

If they naively just feed the entire dataset without some inspection maybe. But I doubt the people working on this model stopped at the most basic level. What you describe is a common class imbalance issue. I would expect that they have accounted and addressed (at a minimum oversampling the less represented class for example) for this issue while working on the model.

So I doubt it's enough to explain their issue here. I agree that we can't really take any conclusion of their broader hiring patterns from this experiment.