ReginaldIII t1_jchu5xq wrote on March 16, 2023 at 10:52 PM

Reply to comment by Philpax in [N] PyTorch 2.0: Our next generation release that is faster, more Pythonic and Dynamic as ever by [deleted]

You're right. They're worse.

ReginaldIII t1_jcdwasr wrote on March 16, 2023 at 3:03 AM

Reply to comment by Philpax in [N] PyTorch 2.0: Our next generation release that is faster, more Pythonic and Dynamic as ever by [deleted]

LPT copy pasting the bullet point change notes uses fewer GPUs. The more you know!

ReginaldIII t1_jcdbpqz wrote on March 16, 2023 at 12:31 AM

Reply to [N] PyTorch 2.0: Our next generation release that is faster, more Pythonic and Dynamic as ever by [deleted]

"GPT summary" jesus wept. As if reddit posts weren't already low effort enough.

Neat news about Pytorch.

ReginaldIII t1_jb9xlil wrote on March 7, 2023 at 3:10 PM

Reply to comment by abnormal_human in [R] Created a Discord server with LLaMA 13B by ortegaalfredo

Fair enough, I didn't realize that hosting a publicly available service is not the same as distributing.

ReginaldIII t1_jb9goco wrote on March 7, 2023 at 12:57 PM

Reply to comment by ortegaalfredo in [R] Created a Discord server with LLaMA 13B by ortegaalfredo

Link to your code? It needs to be GPLv3 to be compliant with LLama's licensing.

How are you finding the quality of the output? I've had a little play around with the model but wasn't overly impressed. That said, a nice big parameter set like this is a nice test bed for looking at things like pruning methods.

ReginaldIII t1_j8e9sc3 wrote on February 13, 2023 at 5:53 PM

Reply to comment by dojoteef in [D] Quality of posts in this sub going down by MurlocXYZ

It's been going downhill for a lot longer than that, and it's not something that can be solved with better moderation.

The people who are engaging with the sub in higher and higher frequencies than before simply do not know anything substantive about this field.

How many times will we have people try to asininely argue about stuff like a models "rights" or that "they" (the model) have "learned just like a person does", when the discussion should have just been about data licensing laws, intellectual property, and research ethics.

People just don't understand what it is that we actually do anymore.

ReginaldIII t1_j6ybiju wrote on February 2, 2023 at 7:24 PM

Reply to comment by wintermute93 in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata

This isn't being used for autocomplete or any user text generation purposes though.

They're using it to summarize and make todo lists from the Whisper extracted transcripts of video meetings. Users aren't getting a frontend to run arbitrary stuff through the model. Seems like a pretty legitimate use case.

ReginaldIII t1_j61nlno wrote on January 27, 2023 at 1:55 AM

Reply to comment by fernandocamargoti in [P] EvoTorch 0.4.0 dropped with GPU-accelerated implementations of CMA-ES, MAP-Elites and NSGA-II. by NaturalGradient

Trying to force these things into a pure hierarchy sounds nothing short of an exercise in pedantry.

And to what end? You make up your own distinctions that no one else agrees with and you lose your ability to communicate ideas to people because you're talking a different language to them.

If you are so caught up on the "is a" part. Have you studied any programming languages that support "multiple inheritance" ?

ReginaldIII t1_j5zzhj1 wrote on January 26, 2023 at 7:14 PM

Reply to comment by ML4Bratwurst in [P] EvoTorch 0.4.0 dropped with GPU-accelerated implementations of CMA-ES, MAP-Elites and NSGA-II. by NaturalGradient

Pick the tools that work for the problems you have. If you are online training a model on an embedded device you need something optimized for that hardware.

I gave you a generic example of a problem domain where this applies. You can search for online training on embedded devices if you are interested but I can't talk about specific applications because they are not public.

All I'm saying is drawing a line in the sand and saying you'd never use X if it doesn't have Y is silly because what if you end up working on something in the future where the constraints are different?

ReginaldIII t1_j5zvqal wrote on January 26, 2023 at 6:51 PM

Reply to comment by ML4Bratwurst in [P] EvoTorch 0.4.0 dropped with GPU-accelerated implementations of CMA-ES, MAP-Elites and NSGA-II. by NaturalGradient

Okay, you're picky :p

Try deploying a model for realtime online learning of streaming sensor data that needs to runs on battery power and then insist it needs to run on GPUs.

Plenty of legitimate use cases for non GPU ML.

ReginaldIII t1_j5zv9gz wrote on January 26, 2023 at 6:48 PM

Reply to comment by fernandocamargoti in [P] EvoTorch 0.4.0 dropped with GPU-accelerated implementations of CMA-ES, MAP-Elites and NSGA-II. by NaturalGradient

Thats such a tenuous distinction and you're wrong anyway because you can pose any learning from data problem as a generic optimization problem.

They're very useful when your loss function is not differentiable but you still want to fit a model to input+output data pairs.

They're also useful when your model parameters have domain specific meaning and you can derive rules for how two parameter sets can be meaningfully combined with one another

Decision trees and random forests are ML too. What you probably mean is Deep Learning. But even that has a fuzzy boundary to surrounding methods.

Being a prescriptionist with these definitions is a waste of time because the research community as a whole cannot draw clear lines in the sand.

ReginaldIII t1_j3epizn wrote on January 8, 2023 at 1:07 AM

Reply to comment by ThatInternetGuy in [R] Greg Yang's work on a rigorous mathematical theory for neural networks by IamTimNguyen

No need to downvote, it was an honest question not an attack. Have you studied the literature and background mathematics of this area much?

Regime is a well established term in mathematics and many other fields, and one example of a "regime" (a domain under rules or constrains) is what you are likely familiar with as a political regime.

With respect to "punchline", I'm going to assume you didn't look at the video at the timestamp listed? Here it is https://youtu.be/1aXOXHA7Jcw?t=6105 All he is saying is that, after a few minutes long tangent talking about something the "punchline" is him circling back around to the point he was trying to make.

It isn't a literal haha punchline, it's not a mathematical term, the punchline comes at the end of a joke, a joke often takes you on a journey before circling back to some type of point. He used the word to mean that here too.

Timothy Nguyen, OP of this post and the host of the video, made a light hearted chapter title within a long video based on a term that Greg Yang used on his whiteboard.

ReginaldIII t1_j3elb8t wrote on January 8, 2023 at 12:36 AM

Reply to comment by ThatInternetGuy in [R] Greg Yang's work on a rigorous mathematical theory for neural networks by IamTimNguyen

I've read this several times and I don't really understand what it is you are trying to say. Where does politics or dating come into it?

ReginaldIII t1_j33ff9r wrote on January 5, 2023 at 8:02 PM

Reply to comment by Nhabls in [News] AMD Instinct MI300 APU for AI and HPC announced by samobon

Except there is an ecosystem monopoly at the cluster level too because some of the most established, scalable, and reliable software (like those used in fields like bio-informatics as an example) only provide CUDA implementations of key algorithms and being able to accurately reproduce results computed by them is vital.

This essentially limits those software to only running on large CUDA clusters. You can't reproduce the results without the scale of a cluster.

Consider software for processing Cryo-Electron Microscopy and Ptychography data. Very very few people are actually "developing" those software packages, but thousands of researchers around the world are using them at scale to process their micrographs. Those microscopists are not programmers, or really even cluster experts, and they just don't have the skillsets to develop on these code bases. They just need it work reliably and reproducibly.

I've been working in HPC on a range of large scale clusters for a long time. There has been a massive and dramatic demographic shift in terms of the skillsets that our cluster users have. A decade ago you wouldn't dream of letting someone not a HPC expert anywhere near your cluster. If a team of non-HPC people needed HPC you'd hire HPC experts into your team to handle that for you and tune the workloads onto the cluster and develop the code to make it work best. Now we have an environment where non-HPC people can pay for access and run their workloads directly because they leverage these pre-tinned software packages.

ReginaldIII t1_j1pqbdl wrote on December 26, 2022 at 11:57 AM

Reply to comment by dranaei in Definitely brothers by westondeboer

Not you, you fool... The two incel reprobates saying she clearly has a butt plug in. Jesus fucking christ.

ReginaldIII t1_j1npjkd wrote on December 25, 2022 at 10:35 PM

Reply to comment by DjangoBaggins in Definitely brothers by westondeboer

People who think like you and that other cretin literally make the world a worse place. Get help.

ReginaldIII t1_j0i9imv wrote on December 16, 2022 at 9:03 PM

Reply to comment by red75prime in [R] Talking About Large Language Models - Murray Shanahan 2022 by Singularian2501

That you have come to that conclusion is ultimately a failing of the primary education system.

Its late. Im tired. And I dont have to argue about this. Good night.

ReginaldIII t1_j0i67uc wrote on December 16, 2022 at 8:41 PM

Reply to comment by red75prime in [R] Talking About Large Language Models - Murray Shanahan 2022 by Singularian2501

Linear regression / logistic regression is all just curve fitting.

> A picture is just a number, but in higher dimensions.

Yes... It literally is. A 10x10 RGB 24bpp image is just a point in the 100 dimensional hypercube bounded by 0-255 with 256 discrete steps. In each 10x10 spatial location there are 256^3 == 2^24 possible colours, meaning there are 256^3^100 possible images in that entire domain. Any one image you can come up with or randomly generate is a unique point in that space.

I'm not sure what you are trying to argue...

When a GAN is trained to map between points on some input manifold (a 512 dimensional unit hypersphere) to points on some output manifold (natural looking images of cats embedded within the 256x256x3 dimensional space bounded between 0-255 and discretized into 256 distinct intensity values) then yes -- the GAN has mapped a projection from one high dimensional manifold to a point on another.

It is quite literally just a bijective function.

ReginaldIII t1_j0gdsis wrote on December 16, 2022 at 1:22 PM

Reply to comment by jms74 in [P] Medical question-answering without hallucinating by tmblweeds

This tool is such an unbelievably bad idea.

It really upsets me when i see people using unrestrained models to do what only a safety critical system should do.

With no clinical study or oversight. No ethics review before work on the project can start. No consideration for the collateral damage that can be caused.

Really really unethical behaviour.

If someone hooked up a bare CNN trained via RL to a real car and put it on the roads everyone would be rightfully screaming OP is a unethical fool for endangering the public. But somehow people think it's okay to screw around with medical data... The mind boggles.

ReginaldIII t1_j0cxciw wrote on December 15, 2022 at 7:05 PM

Reply to comment by jms4607 in [R] Talking About Large Language Models - Murray Shanahan 2022 by Singularian2501

That we have the ability to project concepts into the scaffold of other concepts? Imagine a puppy wearing a sailor hat. Yup we definitely can do that.

f(x) = 2x

I can put x=1 in, I can put x=2 but if I don't put anything in then it just exists as a mathematical construct and it doesn't sit their pondering its own existence or the nature of what x even is. "I mean, why 2x ?!"

If I write an equation c(Φ,ω) =（Φ ω Φ）do you zoomorphise it because it looks like a cat?

What about this function which plots out Simba. Is it aware of how cute it is?

x(t) = ((-1/12 sin(3/2 - 49 t) - 1/4 sin(19/13 - 44 t) - 1/7 sin(37/25 - 39 t) - 3/10 sin(20/13 - 32 t) - 5/16 sin(23/15 - 27 t) - 1/7 sin(11/7 - 25 t) - 7/4 sin(14/9 - 18 t) - 5/3 sin(14/9 - 6 t) - 31/10 sin(11/7 - 3 t) - 39/4 sin(11/7 - t) + 6/5 sin(2 t + 47/10) + 34/11 sin(4 t + 19/12) + 83/10 sin(5 t + 19/12) + 13/3 sin(7 t + 19/12) + 94/13 sin(8 t + 8/5) + 19/8 sin(9 t + 19/12) + 9/10 sin(10 t + 61/13) + 13/6 sin(11 t + 13/8) + 23/9 sin(12 t + 33/7) + 2/9 sin(13 t + 37/8) + 4/9 sin(14 t + 19/11) + 37/16 sin(15 t + 8/5) + 7/9 sin(16 t + 5/3) + 2/11 sin(17 t + 47/10) + 3/4 sin(19 t + 5/3) + 1/20 sin(20 t + 24/11) + 11/10 sin(21 t + 21/13) + 1/5 sin(22 t + 22/13) + 2/11 sin(23 t + 11/7) + 3/11 sin(24 t + 22/13) + 1/9 sin(26 t + 17/9) + 1/63 sin(28 t + 43/13) + 3/10 sin(29 t + 23/14) + 1/45 sin(30 t + 45/23) + 1/7 sin(31 t + 5/3) + 3/7 sin(33 t + 5/3) + 1/23 sin(34 t + 9/2) + 1/6 sin(35 t + 8/5) + 1/7 sin(36 t + 7/4) + 1/10 sin(37 t + 8/5) + 1/6 sin(38 t + 16/9) + 1/28 sin(40 t + 4) + 1/41 sin(41 t + 31/7) + 1/37 sin(42 t + 25/6) + 3/14 sin(43 t + 12/7) + 2/7 sin(45 t + 22/13) + 1/9 sin(46 t + 17/10) + 1/26 sin(47 t + 12/7) + 1/23 sin(48 t + 58/13) - 55/4) θ(111 π - t) θ(t - 107 π) + (-1/5 sin(25/17 - 43 t) - 1/42 sin(1/38 - 41 t) - 1/9 sin(17/11 - 37 t) - 1/5 sin(4/3 - 25 t) - 10/9 sin(17/11 - 19 t) - 1/6 sin(20/19 - 17 t) - 161/17 sin(14/9 - 2 t) + 34/9 sin(t + 11/7) + 78/7 sin(3 t + 8/5) + 494/11 sin(4 t + 33/7) + 15/4 sin(5 t + 51/11) + 9/4 sin(6 t + 47/10) + 123/19 sin(7 t + 33/7) + 49/24 sin(8 t + 8/5) + 32/19 sin(9 t + 17/11) + 55/18 sin(10 t + 17/11) + 16/5 sin(11 t + 29/19) + 4 sin(12 t + 14/9) + 77/19 sin(13 t + 61/13) + 29/12 sin(14 t + 14/3) + 13/7 sin(15 t + 29/19) + 13/4 sin(16 t + 23/15) ...

ReginaldIII t1_j0cuujj wrote on December 15, 2022 at 6:49 PM

Reply to comment by jms4607 in [R] Talking About Large Language Models - Murray Shanahan 2022 by Singularian2501

It mimics statistical trends from the training data. It uses embeddings that make related semantics and concepts near to one another, and unrelated ones far from one another. Therefore, when it regurgitates structures and logical templates that were observed in the training data it is able to project other similar concepts and semantics into those structures, making them look convincingly like entirely novel and intentional responses.

ReginaldIII t1_j0cl6lg wrote on December 15, 2022 at 5:47 PM

Reply to comment by jms4607 in [R] Talking About Large Language Models - Murray Shanahan 2022 by Singularian2501

That's not really true because because both under- and over-fitting can happen.

And it doesn't reinforce your assertion that ChatGPT has awareness or intent.

ReginaldIII t1_j0b9rwb wrote on December 15, 2022 at 11:56 AM

Reply to comment by jms4607 in [R] Talking About Large Language Models - Murray Shanahan 2022 by Singularian2501

RL is being used to apply weight updates during fine tuning. The resulting LLM is still just a static LLM with the same architecture.

It has no intent and has no awareness. It is just a model, being shown some prior, and being asked to sample the next token.

It is just an LLM. The method of fine tuning just creates a high quality looking LLM for the specific task of conversationally structured inputs and outputs.

You would never take your linear regression model that happens to perfectly fit the data, take a new prior of some X value, see that it gives a good Y value that makes sense, and come to the conclusion "Look my linear regression is really aware of the problem domain!"

Nope. Your linear regression model fit the data well, and you were able to sample something from it that was on the manifold the training data also lived on. That's all that's going on. Just in higher dimensions.

ReginaldIII t1_j06nan5 wrote on December 14, 2022 at 1:32 PM

Reply to comment by thecodethinker in [Project] Run and fine-tune BLOOM-176B at home using a peer-to-peer network by hx-zero

> Though blockchains would probably be too slow for something like this.

This is the key point. Blockchains give a confidence bound on trustworthiness by being too slow moving and computationally expensive to manipulate. This is vital when proving a historical audit trail is correct and immutable.

It just isn't important or applicable for high throughput applications where you just care about local immediate correctness of intermediate results.

To quote one of my other comments in this thread

> Blockchains also don't present a solution to trustworthiness here. In the same way that a wallet being present in a transaction on the blockchain says nothing about the real identity of the parties, nor does it say anything about whether the goods or services the transaction was for were carried out honestly.

We care about whether or not you got ripped off by the guy you gave money to (the GPU you gave data to). We don't care about proving you did actually give them the money at a specific point in time.

ReginaldIII t1_j06mqeo wrote on December 14, 2022 at 1:27 PM

Reply to comment by kaibee in [Project] Run and fine-tune BLOOM-176B at home using a peer-to-peer network by hx-zero

In rendertoken's scenario we don't have a requirement on high throughput of one job feeding into another.

The individual units of work are expensive and long lived. Rendering a frame of a film takes roughly the same amount of time it did a few years ago, we just get higher fidelity output for that same render budget. All the frames can be processed lazily by the compute farm, and the results just go into a pool for later collection.

Because the collation of the results happens in a more offline fashion from the actual computation, you have time and resources to encode the results on a blockchain. Auditing that your requested work was processed is a desirable quality, and so a blockchain does provide a benefit.

In the case of distributed model training the scenario is different. We have high throughput of comparatively small chunks of work. Other than passing the results to the next immediate worker to do the next part of the computation, we have no desire (or storage capacity) to keep any of the intermediate results. Because we have high throughput of many small chunks a blockchain encoding these chunks would need a small proof of work and so would not be a reliable source of truth anyway.

Then consider that we don't even care about having an audit trail to prove historical chunks really were processed when we think they were. We only care about checking results are valid on the fly as we are doing the compute.

We just need a vote by agreement on the immediate results so they can be handed off to the next workers. Yes blockchains often have a vote by agreement part to how they decide what the actual state of the blockchain is, but we just need that part. We don't actually need the blockchain itself.