kaseda

kaseda t1_j280cu1 wrote

AI and ML are not trivial things to just get into. A master's is roughly appropriate, not just for the education but also for the pay.

More so than hubris, is capitalism. It's hard to come across money for research's sake. Most companies don't care to invest in AI or ML if they can't see a ROI in it. That's the other reason the jobs are hard to come by.

2

kaseda t1_j24p5bz wrote

Most jobs in AI and ML require master's degrees specialized in the topic. The teams that work on these things tend to be small so it's harder to justify training someone who might not know what they're doing. Granted, I've never seen listings for places like OpenAI, where the structure might be different.

2

kaseda t1_j1zp7x5 wrote

In case you're curious, I have a degree in computer science with a concentration in AI and ML. I'm not saying ML is a "dead end" but there are limited things it can do if humans cannot already do them because ML models need training data to work and that data must be produced by a human. This is all in it's current state - it would take a revolutionary change in ML to break free of the need for training data, particularly for generative models like chat bots or image generators.

2nd, ML has basically nothing to do with philosophy - at least, not in a way that philosophy could help develop it. Had you said psychology? That's a much more related field, and the entire concept of a neutral network is modelled after the human brain.

>Like I alluded to above, if 1% of the data out there is that which would train a human to be Nazi racist, then we should expect 1% or AI’s to become Nazi racist.

It may be true for 1% of the dataset to not influence the model much, but ML models are very hard to predict or analyze. As that percentage increases, the model will non-linearly learn that data more and more strongly. In the case of the digits, if 85% of the dataset is 1s, not only is it being presented 1s more frequently to learn on, but when it gets better at recognizing a 1, it isn't just getting better at recognizing that 1, it gets better at recognizing all 1s. Better recognition of 1s is better recognition of 85% of the dataset, while other digits is only a measly 2 or 3.

There are methods to prevent this issue - for example, you could sample your digits from the internet, and then only take a certain number of each digit from your sample, so all are equal.

This is much harder with training data for a chat bot. How do you know if something expresses a racist sentiment? How do you know if something is inaccurate? The only two options are to train an AI to do it for you, or produce a curated dataset. And to train the AI to do it for you? You would have to feed it a curated data set as well.

I'm not saying that using training data straight from the internet and getting a feasible AI is impossible - but with ChatGPT, and pretty much every language model that does or has ever existed, it is foolish and can be manipulated. People will find a way. The only solution is curation until we have an AI good enough to curate for us.

1

kaseda t1_j1vvbna wrote

Although we've used the term AI, we are specifically talking about machine learning models. You cannot program an ML model, you can only teach it. A very reasonably programmed model with all else correct can fail if you train it on a dataset with mostly 1s.

You assume a few things that are incorrect - firstly, you cannot just "program" a model to take certain actions under certain conditions. That may be the case in some AI, but not machine learning. In ML, you must teach the model, which is more difficult and not always precise. Secondly - you assume the model cares about anything besides the data they are trained on. If it sees mostly 1s in the training set, it will never adjust to the fact that this isn't the case in it's actual application. You would have to retrain it on a different dataset to get it to act accordingly. Third, you assume the AI somehow has a conscious decision to just choose to not be offensive.

How would an AI determine that they don't want to represent racism? The only way this can be is if the data set is curated to exclude it, or even more accurately, trained to explicitly reject it. Pollute the dataset and it will act differently.

And, strictly speaking, AI is stupid. They have no clue what they are doing, they are just doing it. All AI gives is a semblence of intelligence.

1

kaseda t1_j1uymqr wrote

Listen - people have used training sets that they thought encompassed a good, unbiased set of data. It never does. People are biased, so any data created by people is biased. This includes the internet.

I'm not saying the training set is "the whole internet." I'm saying that an imbalanced data set is going to cause the AI to drop into what's called a local optimum. For the AI to learn to recognize all digits correctly is significantly harder for the AI to learn to just spew out 1 every time. Now, if your AI is only 10% correct via this method, that method isn't that optimal, and it will get out very quickly. However, if it is 85% correct, the AI will see this as a strong optimum and fall into it very quickly.

Think of these optimum as a ball rolling into a valley. Because it can quickly approach good performance, the ball rolls very quickly down the hill into the bottom of the valley (in this case, being as low as possible is good). Because the ball is in the valley, it is going to be hard to escape the valley, but that doesn't mean there isnt a lower valley somewhere else.

In the case of the digits, the task is much simpler, so you have to more heavily pollute the data set. But, once it learns to always guess 1, it will be hard to get out, because if it starts guessing digits other than 1, it starts to be less accurate and says "whoops, clearly I made a mistake and should go back."

In the case of a chat bot, the task is much harder, so a local optimum is harder to get out of. With the digit bot, if it guesses a 2 instead of a 1, and the number isn't a 1, it at least has a 1 in 9 chance of being right. But with a chat bot, it's going to be very hard to slip out of a local optimum of slurs, even if the dataset is less polluted. Besides, why would people stop at 5%? Why not pollute it at 10% or 25%?

In fact, if you were to train the digit set on "the whole internet," you would find yourself with a biased dataset. The digit 1 appears more frequently than other numbers, followed by 2, then 3, etc. It's called Benford's law and is a perfect example of how you might think you're getting a completely unbiased dataset but you really aren't.

1

kaseda t1_j1sx6es wrote

An unbiased AI? Doesn't exist. All AI are biased towards their training sets. That's why, for a chat bot, it has to be curated.

What I'm saying is that if you generate your training set automatically by scraping the internet, people will find ways to introduce large amounts of bad training data into it. It's not particularly hard. If I try to train an AI to recognize hand-written digits, but 85 of 100 samples I train it with are 1s, it will quickly learn to just classify everything as 1s, and it will be 85% correct, but only on the training set. Same thing happens here - if you introduce a large amount of vulgarity - say 5% of the dataset - by just pumping out nonsense vulgar webpages, the AI will pick it up and learn it quickly, especially since the internet varies far more than individual digits. That vulgarity will outnumber everything else.

1

kaseda t1_j1s8au3 wrote

But, that training data could still be manipulatable. How does the bot find pages to train with? If it uses search engine listings, you could create a webserver which takes any route (webserver.com/page1, webserver.com/page2/things, etc) and just sends back a very offensive webpage to train on. Fill the search engine listings with various routes, and you could easily pollute the dataset.

1

kaseda t1_j1r58e4 wrote

This is based off Chat bots which train off user data. In other words, the bot is trained to a base functioning level, and then uses it's interactions with other humans to train further. This is how most machine learning works nowadays. For example, smart home devices will (with permission) save the things you say to them to be later used for further training.

If ChatGPT started training on the entire internet, people would find ways to manipulate it to be offensive. This has happened with chat bots like cleverbot and Twitter bots. At first people take a genuine interest in their capabilities. Then they get bored and teach it slurs.

1

kaseda t1_j1qoxfo wrote

This is the way it has been done in the past. A curated dataset is by no means cheap.

Pretty much every AI chat bot that has trained on a non curated dataset runs into one (or, more often, both) of these problems:

  1. It's inaccurate. The data it was trained on was inaccurate or opinionated.

  2. It's offensive. Seriously offensive. They begin dropping racial slurs and celebrating Hitler. No joke.

2

kaseda t1_j1payoh wrote

ChatGPT has "restrictions" in place by default that mean that it will usually not do certain things, like recommend dangerous activities or give opinionated information. You can disable those restrictions, but it is less of a "now the AI will tell you dangerous secrets" and much more just "you did this yourself, so we can't be blamed." Having to turn off the restrictions is their way of having a little "disclaimer" on the AI. Even with those restrictions off, the AI will often leave a disclaimer.

Nobody yet has hit on the real issue, though. ChatGPT can't tell you how to make a nuclear bomb or commit crimes because it doesn't know how. The data set it is trained on is curated to only include what they want the bot to know. So for example, they aren't going to train it with bomb recipes or murder plans.

It can, however, make "educated" chat responses. For example, it knows nuclear bombs contain fissile material in some sort of casing. Therefore, it will recommend these as ingredients but it still can't complete the response. It also might have been trained on famous murder cases, where there is information like "X was caught because they did Y" and suggest that you don't do Y. However, it will still not be complete because it is lacking knowledge and context of what you need, and cannot think critically.

1

kaseda t1_iugtz5n wrote

I use my credit card nearly exclusively, and I don't even get cash back. Apart from being contactless while my debit card isn't, there are added layers of security, and I can keep my checkings account balance low and leave most of my money in my savings, only transferring when I really need it. That way, even if someone did manage to steal my debit card, most of my money isn't even accessible. As everybody else says, late payments are the only thing that will affect your score long term. Just ensure financial discipline and don't change your spending habits.

2

kaseda t1_iu32obx wrote

An average of $12,700 leaves you with $50k total debt after 4 years. Even if you assume most of that is outliers who receive no aid, leaving with $25k can still be tough. The reason why "breaking the cycle" is such a myth is because getting an education leaves you in debt, so you spend time paying that off. Once you do, the money that you might spend investing or saving often ends up going towards your parents, who couldn't save for themselves. Once you are ready to send your own child to college, the money that you would have saved for them has gone to your parents, and so they end up going with no savings - but now you make money so need-based scholarships are hard to get. Unless you are getting your education completely free, you'll probably have less debt somewhere else and after a few years experience nobody will care that you went to an Ivy school.

−1

kaseda t1_iu31ulm wrote

I agree. I specifically avoided Ivy/other prestigious schools because I was afraid of not fitting in with the "wealthier" groups. I went to a state university instead and still noticed an insane shift in culture - I can hardly name anyone other than myself who didn't have at least well-off parents paying for their college, and even some who did got some hefty loans while doing it.

5

kaseda t1_iu2znnq wrote

Only if you meet their criteria or get the right scholarships. E.g. 2 parents working just above minimum wage jobs might have income above their threshold but would be a problem if you're only 1 of 5 children. Even then, who knows how much applying for that need-based full ride ends up affecting your admission. Some stats I've seen estimate only 20% actually get those scholarships, which might end up being less than half of the minority group in the end, so they still end up paying at least some money and pulling out loans. At Ivys, even small percentages of tuition will add up fast.

0

kaseda t1_iu2xbii wrote

Keeping in mind that URM are systemically more likely to be in poorer schools with fewer resources, it's not surprising. On one hand, I'm glad they are being given a chance to "break the cycle," on the other hand, breaking the cycle is basically a myth to begin with and I fear many of those students are going to lack the full financial resources they need and just end up going into debt only to face continued racism once they enter the job market.

−11