Viewing a single comment thread. View all comments

SakanaToDoubutsu t1_j1rtpbv wrote

I work in data science and I have absolutely zero idea how you'd actually implement this? Granted NLP isn't my forte but other than adding a bunch of words that potentially identify someone's demographics in your stop-word list I don't see what else you could really do without undermining the integrity of the technique.

78

Armoogeddon t1_j1s60ej wrote

It’s been a few years, but also a data scientist with experience in NLP.

NLP would only be one component in a model(s), but even if you somehow standardized on that - and it would be really, really hard to do that - it would be virtually impossible to create what the author(s) of any bill would deem unbiased.

I suspect these people hear “bias” in machine learning and presume it’s a pejorative. It’s not; models trained by humans (“supervised machine learning”) are “biased” with their experience intentionally. Training models isn’t some Klan rally to go after people, at least not in my experience. I have serious qualms about how this stuff gets used and that’s partly why I left the field, but lawyers and career politicians aren’t helping by passing laws regulating a field they don’t understand any better than rocket science (to them).

25

SakanaToDoubutsu t1_j1s9eay wrote

>I suspect these people hear “bias” in machine learning and presume it’s a pejorative. It’s not; models trained by humans (“supervised machine learning”) are “biased” with their experience intentionally. Training models isn’t some Klan rally to go after people, at least not in my experience.

This is exactly it. This reminds me of a project my thesis advisor did where they were looking at retention rates and trying to limit freshman dropouts. One of the best predictors of dropping out they found was that people who self-identifyed as black or mixed race, and as a result anyone who entered the university who identified as black or mixed-race was automatically placed in a sort of academic probation.

Under this program dropout rates went down pretty substantially, but once the student body found out about it they protested and the statistics department could no longer use demographics data for identifying students at risk of dropping out. However, once they couldn't use that data dropout rates went back up again, so you're damned if you do and damned if you don't.

17

myassholealt t1_j1spejk wrote

Anecdote here, but based on my life experience and thinking back on my experience in college, I was a working class first generation college student. Being in the college environment was so different from the world I knew through the 12th grade. My new classmates were so different. I remember overhearing a conversation between classmates where a guy was complaining about being too poor, but then talking about how he can't wait for spring break cause he just wants to go somewhere with a beach to lie down and tan the whole time. Meanwhile I was planning to pick up extra hours at my minimum wage part time job.

Or the dorm experience. I couldn't afford dorming. I commuted 90 minutes each way to class, and would go him, try to take a 40 minute nap before heading out to my 6-10 shift at my job. I had no time for clubs or events, and didn't have a lot of social interactions outside of classes and class work. Hell, I didn't even know my supervisor advisor till my senior year. As a first generation student I had no clue about that stuff. This and so many other experiences all tied together to make college very hard at times. And yes, I am a minority. But it's not my being a minority that triggered this. It was my socioeconomic status, my family history, and the access to experiences I had or didn't have before getting to college.

So when someone sees the correlation as black = high drop out rate. The immediate reaction may be to be offended or object, but while on the data side that correlation is the easiest identifier, it doesn't actually identify what may be the real issues. And let's face it, with the history of this country, lots of black people have gone through life with obstacles intentionally erected to make sure this is their reality and the reality of their children. For example, while white wwii veterans were coming back home to buy homes with their GI bill and pay for education to build a foundation that roots their family and future generations solidly in the middle class, black veterans where not allowed those same privileges in many areas, rooting them and their future generations in the working class.

9

supermechace t1_j1sqmce wrote

I wouldn't say that's the correct conclusion. Techies and academics tend to be weak at understanding optics and cultural/racial issues. supposedly claiming "data is king". Academic probation is a negative term, automatic dumping people into that bucket is the most head slapping PR decision. 20 years ago there were equal opportunity programs at colleges which basically used income level and minority as a filter to quality applicants for additional college aid, work employment, and mentoring all without statistical modelling. The correct takeaway in your advisors case is to bring findings to a holistic cross discipline cross culture committee to examine the root cause such as minorities coming from underfunded school districts that poorly prepared people for college. Descions done in secret and especially without racial representative input continues blindness.

7

IIAOPSW t1_j1sdlh6 wrote

I guess this is inherently unknowable, but I am itching to know if the dropout stats were meaningfully different for black people who choose not to self identify as black on the form. For that matter, what fraction of black people pick "prefer not to say" on these sorts of forms, and is that fraction higher or lower than any other racial demographic?

5

DifficultyNext7666 t1_j1tk95v wrote

I was just told a model wasn't inclusive enough. It didn't choose enough DEI vendors.

I was like the model doesn't even look at that. I finally was like do you want this to be the most efficient or the blackest?

I eventually just marked the black vendors and took the top 25% before taking the better vendors.

"Why did costs go up?" Was the next question. I don't hate what I do but God damn do I hate other people

8

myassholealt t1_j1snzt5 wrote

>Training models isn’t some Klan rally to go after people, at least not in my experience.

In all that I've read about the biases, I never came away with the impression that it was this, or that any biases that exist were maliciously built in, but they nevertheless exist. And when it's implemented in daily life, it has the potential to negatively affect members of the public. And that's not a good thing.

4

Armoogeddon t1_j1spw46 wrote

I agree wholeheartedly with your last sentence, but it goes way beyond “bias” in models. Models are only one piece of an ever more complex system.

In terms of the impressions you’ve inferred, we could talk in good conscience about that for hours. Maybe five or six years ago, it came to light that visual recognition models performed inherently worse on people of dark skin. The tech companies (I was there at the time at a big prominent one) decided to jump ahead of the bad press by condemning themselves and promising to do better. The media fallout was negligible.

It was bunk. Did AI models perform generally worse on black/people of African descent photos? In some cases yes. Was the training data cribbed from the US? Yes. Where black people made up, what 13% of the population? Of course they performed worse: there was 1/10 the data available to train them! It wasn’t racist; it wasn’t some bias built into the models by the human trainers - there was simply less data. But nobody bothered to elaborate on what should have been a nuanced conversation and the prevailing opinion jumped to the wrong perception and the wrong remediation. It kicked off an idiotic path upon which we still find ourselves. Or watch others traversing.

The real problem is nobody understands what’s behind these models. We understand the approaches they take generally, the “convolutions” applied at various training layers - but nobody understands the logic behind the output models any better than we understand the models behind human reasoning. We can infer things but there’s nothing known; not in a binary or truly understood way.

Yet everybody keeps racing ahead to apply these models in ever more profound and - if you’re in the space - unnerving ways. It’s getting scary, and it’s way worse than the stuff that’s being discussed here, which is also a bad idea.

I guess what I’m saying is it’s so much worse than these idiot politicians realize. They’re fighting a battle that was lost ten years ago.

0

supermechace t1_j1sp1sa wrote

Tech bros used to say the same thing about data privacy, social media, and other diaruptors

0

fafalone t1_j1t884o wrote

Given how people like to define "unbiased", it's going to wind up needing to be explicitly targeted to enforce equity... because thanks to centuries of deliberate oppression we simply don't have a country where there's no actual, empirical differences between demographics, and the people who pass laws like this believe that's solved by simply pretending they don't exist and enforcing equal outcomes by e.g. simply adding or subtracting points from scores to make all groups equal.

Because that's something that can be done now. Why spend decades doing the actual hard work of building an equitable society when you can prove how virtuous you right now by simply rigging the numbers?

−1

supermechace t1_j1so0rp wrote

In all honesty, if these resume screening software are the typical rush to market software products produced at the cheapest cost, the "AI" is probably some hack job ducted taped together from googling code, apis, stackexchange posts or even if there was a data scientist on the project the programmer gave up understanding the requirements in order to finish the code on time. Resulting in the resume/interview screener being basically a glorified keyword scoring filter. If the bill allows the source code to be audited it will be easy to spot inherent keywords bias like demographics or colleges. I haven't heard of interviews being recorded to run through software but it'll be easy to spot that programmer took shortcuts such as training the model on the same demographic over and over again to get through QA. QA is usually the lowest on the totem pole. Look at the lack of regulation in social media and data privacy, the current laws are already behind in America.

3

DifficultyNext7666 t1_j1tlo7e wrote

It shouldn't be that hard. It's an imbalanced class problem. Either adjust the weighting or oversample.

The issue is the system will do a worse job. Is that trade off worth it? I think the powers that be would say yes.

Well the bigger issue is how do these idiots enforce it? We'd have to open up code and training data to a 3rd party bias police.

1