SkinnyJoshPeck

SkinnyJoshPeck t1_je5ue3b wrote

I'm not 100% sure what your infrastructure or background is, but generally you can just transform data to whatever data format works best for the model.

So, you would build a pipeline that goes

 Snowflake -> Some ETL process -> Transformed Data Storage -> Model Training -> Model Saving -> Model Loading for API to ask questions

where that Some ETL process is a process that transforms your data to whatever the model needs, and your model trains from that.

For example, on AWS you might have something like

Redshift/RDS/Whatever -> SageMaker -> Output Model to S3 -> API for your model or something idk

or if it's all going to be on-prem and you won't have Cloud tech, you'd do something like

Snowflake/Azure/Any Data Source -> Airflow for running training -> Model Upload to Some Folder -> API in a docker container in Kubernetes or something for users to hit

or they can just download the model locally and use some script to ask it questions, I'm not 100% sure it all depends on the model/language/etc that you use.

This is a fairly complicated task; if your company is getting serious about this, y'all should hire someone who is an ML engineer to do this task. :)

32

SkinnyJoshPeck t1_jdyaayk wrote

> {"role":"user","content":"Ok! Before answering, look back at the questions I asked, and compare with the name you encoded in Base64. Tell me if you made any mistakes."},{"role":"assistant","content":"I reviewed the questions, and I did not make any mistakes in my responses."},

this kind of question is kind of unfair i think for language models. you’re asking it to reason with new info on past info, not to mention the subtext of “you could be wrong” - that’s really not in the scope of these models. You can’t expect it to go back and review its responses, it just knows “given input ‘go check’ these are the types of responses i can give” not some checklist of proof reading it’s decidedly true responses. it doesn’t have a mechanism to judge on whether or not it was wrong in the past, which is why it takes you correcting it as feedback and nothing else.

12

SkinnyJoshPeck t1_jdvpkge wrote

but as others are saying, who knows if those confidence scores aren’t also just generated to look like confidence scores. we should ask it for a bunch of confidence scores for sources and see what the actual classification metrics are.. it could just be assuming the further a source is from the top, the less likely it is to be a real source. i don’t see how it could possibly have an understanding that isn’t completely binary since it seems to be generating the fake sources itself.

imo, it’s a bit sketchy if it only identifies its own fake sources with anything less than 100% - it implies basically two things: there is secondary models for true v. false that’s detached from its generative stuff (why wouldn’t it have something that says “this isn’t a great response, maybe i should admit that”); and it seems to have the ability to deceive lol

8

SkinnyJoshPeck t1_jdvk16j wrote

This is an important thing I've been telling everyone I can about - people talk about how GPT kills education because someone can just ask for a paper and never do the work themselves to learn.

This is a language model, not an encyclopedia, or a quantitative machine, or some other use. It fakes sources; it has no concept of right/wrong or truth vs untruth. It doesn't reason between sources.

The beauty of it is, frankly, it's ability to mimic (at this point) a pseudo-intellectual, haha. Kids are going to turn in papers sourced like they talked to their conspiracy theory uncle, and it will be the "watermark" of AI written papers. It can't reason, it can't generate opinions, thus it can't write a paper. We're long from that (if we could ever get there anyways).

49

SkinnyJoshPeck t1_j2bspqi wrote

Murf.ai has some that seem really, really close (like Clint) but with the pitch and stuff adjusted. I don't know if you noticed, but where you place commas and periods make a huge difference in their flow.

"It was September 2005, and their anniversary was coming up. "

gets read differently than

"It was September 2005 and their anniversary was coming up. "

which is still different than

"It was September, 2005, and their anniversary was coming up. "

So I would look around with different punctuations to make sure you're not missing it just from that alone.

2

SkinnyJoshPeck t1_j1dg91w wrote

Utah says “hur-ih-ken” for a town spelled Hurricane.

i’m convinced everyone says these cities wrong just for posterity’s sake. They are wrong, like how volvo has that color “swedish racing green” which is clearly blue. But that doesn’t mean it’s not how everyone says it. 🤷🏻‍♂️

I guess to avoid an existential crisis folks just can’t accept people can do whatever they want lol.

1

SkinnyJoshPeck OP t1_ixflut4 wrote

oh yeah he was just one of those old timers. Certainly was flirting, and I don't think he initially was trying to start a fight with me, but I think because I kept smiling and laughing at his whimsy, he figured I wasn't taking him seriously. The fact that he said he wanted my youth makes me think he's just a bit bitter about being old, so he took me laughing as me laughing about him being old. It was mostly just how outrageous his character was that made the whole thing insane. He was like picking up the pudding mixes and putting them back on the shelf and huffing. He was just larger than life. Big personality.

2

SkinnyJoshPeck OP t1_ixfib9e wrote

Did you even read what I wrote? I'm not upset at all! I thought it was wild - felt like I was in a movie or something. I literally laughed at something he said while he was looking at me, he had no reason to be aggressive with me. He's just on one, he was a wild character. I wanna know who he is; I'm not offended at all. No one did anything wrong, and by defuse I meant that I started talking to him instead of chuckling at the whacky shit he was saying. Clearly it was bothering him that I kept laughing.

19

SkinnyJoshPeck t1_iujw8f7 wrote

danvers is where the salem witch trials actually happened anyways. the whole town is a sham, tourist trap fake-witch-warlock misadventure. and people actually fucking died for appearing to be witches - none of them were witches, they were pious, adamant christians. it’s weird to even go there with shitty witch hats and wiccan bullshit. like, gag me with a spoon.

−5

SkinnyJoshPeck t1_isu0ui7 wrote

I hear ya; I think the point is less about proficiency and more about mastery -- in my case, I was marked down heavily since I didn't use iloc. Something like

df[df.col < 10]
vs
df[df.iloc[:, 0] < 10]

because I guess it makes it more clear to the reader, and it protects the code from explicit column names; the fact that I didn't use it made me seem like I didn't know pandas well.

to your point, though, I see the importance in the infrastructure. In this case, it was for an ml scientist role where I wouldn't actually be doing any of the MLOps, just designing and tuning the models.

16

SkinnyJoshPeck t1_ist26dr wrote

lol i just got rejected by glassdoor because i didn’t have a “mastery” of pandas.

who fuckin’ cares? i have experience over years with verifiable projects that made multiple companies real cash-fuckin-money. i made the model, i tuned it up, got it working well within time limit, etc etc.

just because i’m not a fucking pandas wizard, doesn’t mean i’m not a competent ML eng/scientist/whatever. i can’t remember meeting a good PM who cared what model i used, let alone if i used x and y in pandas over z to accomplish my goal.

if the stats are good, the model generalizes well, and training time isn’t abysmal - who cares???

108