sprinkles120 t1_j1rqgga wrote

Basically, the raw data can be biased. If you just take all your company's hiring data and feed it into a model, the model will learn to replicate any discriminatory practices that historically existed at your company. (And there are plenty of studies that suggest such bias exists even among well-meaning hiring managers who attempt to be race/gender neutral.) Suppose you have a raw dataset where 20% of white applicants are hired and only 10% of applicants of color are hired. Even if you exclude the applicants' race from the features used by the model, you will likely end up with a system that is half as likely to hire applicants' of color compared to white applicants. AI is extremely good at extracting patterns from disparate data points, so it will find other, subtler indicators of race and learn to penalize them. Maybe it decides that degrees from historically black universities are less valuable than degrees from predominantly white liberal arts schools. Maybe it decides that guys named DeSean are less qualified than guys named Sean. You get the picture. Correcting these biases in the raw data isn't quite the same as filling quotas. The idea is that two equally qualified applicants have the same likelihood of getting hired. You could have a perfectly unbiased model and still fail to meet a quota because no people of color apply in the first place.