cdsmith t1_j93fd9q wrote on February 18, 2023 at 11:09 PM

Reply to comment by CustosEcheveria in Fort Myers man arrested after disrespectfully consuming key lime pie by curlyhairlad

Pouring water on someone who is upset at him for eating THEIR pie.

cdsmith t1_j8gq1gt wrote on February 14, 2023 at 4:18 AM

Reply to comment by I_will_delete_myself in [D] Can Google sue OpenAI for using the Transformer in their products? by t0t0t4t4

Imagine you just didn't invest those millions of dollars, then, and instead someone else developed the idea and didn't want to freeze the rest of the world out of using it.

Patents only makes sense if you assume that the alternative to you inventing something is no one inventing it. Experience shows that's very rarely the case; in general, when an idea's time has come (the base knowledge is there to understand it, the infrastructure is in place to use it effectively, etc.), there is a race between many parties to develop the idea. Applies to everything from machine learning models to the light bulb or telephone, both of which were famously being developed by multiple inventors simultaneously before one person got lucky, often by a matter of mere days, and was issued an exclusive license to the invention, while everyone else who had the same idea was out of luck.

cdsmith t1_j8gpcrf wrote on February 14, 2023 at 4:12 AM

Reply to comment by womenrespecter-69 in [D] Can Google sue OpenAI for using the Transformer in their products? by t0t0t4t4

We're off-topic for this forum, but since we're here anyway...

Patents are tricky when it comes to stuff like this. To successfully patent something software related, you must be able to convince the patent office that what you're patenting counts as a "process", and not as an "idea", or "concept" or "principle" or "algorithm", all of which are explicitly not patentable. The nuances of how you draw the lines between these categories are fairly complex, but in general it often comes down to being able to patent engineering details of HOW you do something in the face of a bunch of real-world constraints, but not WHAT you are doing or any broad generalization of the bigger picture.

It's likely that Swype didn't just screw up and write their patent poorly, but rather wrote the only patent their legal team could succeed in getting approved. If it didn't apply to what other companies did later because they used a different "process" (for nuanced lawyer meanings of that word) to accomplish the same goal, that is an intentional feature of the patent system, not a failure by Swype.

cdsmith t1_j6tg9z7 wrote on February 1, 2023 at 7:42 PM

Reply to comment by smyliest in [R] SETI finds eight potential alien signals with ML by logTom

Awesome question! I definitely laughed.

The serious answer that the GitHub link clarifies is that the model is semi-unsupervised. That means they have a lot of data, but only some of it is labeled. Presumably, the labeled data is all negative because we understand its natural origin. So effectively this becomes almost an anomaly detection sort of thing, looking for data that is least like the known natural signals.

Even if it just directs scientists to look at new natural phenomena, this sounds like a valuable task.

cdsmith t1_j62a3yv wrote on January 27, 2023 at 4:55 AM

Reply to comment by lucidrage in Few questions about scalability of chatGPT [D] by besabestin

I'm not aware of any effort to build it into Keras, but Keras models are one of the things you can easily convert to Groq chips using groqflow.

cdsmith t1_j626c0c wrote on January 27, 2023 at 4:21 AM

Reply to comment by gradientpenalty in Few questions about scalability of chatGPT [D] by besabestin

I honestly don't know the price or terms of use, for this or any other company. I'm not in sales or marketing at all. I said you don't need to be Google; obviously you have to have some amount of money, whether you're buying a GPU or some other piece of hardware.

cdsmith t1_j60q0bs wrote on January 26, 2023 at 9:59 PM

Reply to comment by Taenk in Few questions about scalability of chatGPT [D] by besabestin

I can only answer about Groq. I'm not trying to sell you Groq hardware, honestly... I just honestly don't know the answers for other accelerator chips.

Groq very likely increases inference speed and power efficiency over GPUs; that's actually its main purpose. How much depends on the model, though. I'm not in marketing so I probably don't have the best resources here, but there are some general performance numbers (unfortunately no comparisons) in this article, and this one talks about a very specific case where a Groq chip gets you a 1000x inference performance advantage over the A100.

To run a model on a Groq chip, you would typically start before CUDA enters the picture at all, and convert from PyTorch, Tensorflow, or a model in several other common formats into a Groq program using https://github.com/groq/groqflow. If you have custom-written CUDA code, then it's likely you've got some programming work ahead of you to run on something besides a GPU.

cdsmith t1_j5z0rrm wrote on January 26, 2023 at 3:37 PM

Reply to comment by Dendriform1491 in Few questions about scalability of chatGPT [D] by besabestin

You don't have to be Google to use special-purpose hardware for machine learning, either. I work for a company (Groq) that makes a machine learning acceleration chip available to anyone. Groq has competitors, like SambaNova and Cerebras, with different architectures.

cdsmith t1_j460nf2 wrote on January 13, 2023 at 12:18 PM

Reply to comment by FallUpJV in [D] Has ML become synonymous with AI? by Valachio

Tree search means precisely that: searching a tree. In the context of AlphaZero, the tree is the game tree. That is:

I can move my pawn to e4. Then:
- You could move your knight to c6
  - ...
- Or you could move your pawn to e6
  - ...
- Or ...
Or, I could move my pawn to d4. Then:
- You could take my pawn with your pawn on c5.
  - ...
- Or you could move your knight to c6.
  - ...
- Or you could move your pawn to d5.
  - ...
- Or ...
Or, I could ...

That's it. The possible moves at each game state, and the game states that they lead to, form a tree. (Actually more like a DAG, since transpositions are possible, but it's often simplified by calling it a tree.) Searching that tree up to a certain depth amounts to thinking forward that many moves in the game. The way you search the tree is some variation on minimax: that is, you want to choose the best move for yourself now, but that means at the next level down, you want to pessimistically only consider the best move for your opponent (which is the worst one for you), etc. Variations come in terms of what order you visit the various nodes of the tree. You could just do a straight-forward depth-first traversal up to a certain depth, in which case this is traditional minimax search. You can refuse to ever visit some nodes, because you know they can't possibly matter, and that's alpha-beta pruning. You could even visit nodes in a random order, changing the likelihood of visiting each node based on a constantly updated estimate of how likely it is to matter, and that's roughly what happens in monte carlo tree search. Either way, you're just traversing that tree in some order.

AlphaZero combines this with machine learning by using two empirically trained machine learning algorithms to tweak the traversal order of the tree, by identifying moves that seem likely to be good, as well as to evaluate partially completed games to estimate how good they look for each player. But ultimately, the machine learning models just plug into certain holes in the tree search algorithm.

cdsmith t1_j45f4sd wrote on January 13, 2023 at 7:45 AM

Reply to comment by [deleted] in [D] What's your opinion on "neurocompositional computing"? (Microsoft paper from April 2022) by currentscurrents

Aside from a general similarity of goals, do you really think the paper you linked makes this one non-novel? I have trouble seeing that. As far as I can tell, there's absolutely nothing comparable to tensor product representations or NECST in your link.

cdsmith t1_j45e09w wrote on January 13, 2023 at 7:31 AM

Reply to comment by Diffeologician in [D] What's your opinion on "neurocompositional computing"? (Microsoft paper from April 2022) by currentscurrents

Sort of. The promise of differentiable programming is to be able to implement discrete algorithms in ways that are transparent to gradient descent, but it's really only the numerical values of the inputs that are transparent to gradient descent, not the structure itself. The key idea here is the use of so-called TPRs (tensor product representations) to encode not just values but structure as well in a continuous way, so that one has an entire continuous deformation from the representation of one discrete structure to another. (Obviously, this deformation has to pass through intermediate states that are not directly interpretable as a single discrete structure, but the article argues that even these can represent valid states in some situations.)

cdsmith t1_j3zni9b wrote on January 12, 2023 at 3:46 AM

Reply to comment by hisglasses66 in [News] "Once $92 billion in profit plus $13 billion in initial investment are repaid (to Microsoft) and once the other venture investors earn $150 billion, all of the equity reverts back to OpenAI." by Gmroo

I'm pretty sure Watson was IBM.

cdsmith t1_j3heb7r wrote on January 8, 2023 at 4:27 PM

Reply to [R] Greg Yang's work on a rigorous mathematical theory for neural networks by IamTimNguyen

I'm not at all up to speed on this, but I followed most of the presentation. I was left with this question, though.

Up to the latter part of the video, I was left with the impression that this was building a rigorous theory of what happens if you forget to train your neural network. That is, the assumption was that all the weights were taken from independently sampled Gaussian distributions. The "master theorem" as stated here definitely assumed that all the weights in the network were random. But then suddenly about 2.5 hours in, they are talking about the behavior of the network under training, and as far as I can tell, there's no discussion at all of how the theorems they have painstakingly established for random weights tell you anything about learning behavior.

Did I miss something, or was this just left out of the video? They do seem to have switched by this point from covering proofs to just stating results... which is fine, the video is long enough already, but I'd love to have some intuition for how this model treats training, as opposed to inference with random weights.

cdsmith t1_j3evg7e wrote on January 8, 2023 at 1:50 AM

Reply to comment by ThatInternetGuy in [R] Greg Yang's work on a rigorous mathematical theory for neural networks by IamTimNguyen

Punchline is just sort of common vernacular for "here's where all the parts come together in a moment of realization". It's a metaphor to a joke, where you have all the setup, and then there's the moment when you "get it" and laugh.

cdsmith t1_j3ev3je wrote on January 8, 2023 at 1:48 AM

Reply to comment by ThatInternetGuy in [R] Greg Yang's work on a rigorous mathematical theory for neural networks by IamTimNguyen

This is definitely a theory presentation, though it does end with some applications to hyperparameter transfer when scaling model size. But if your main experience with ML is building models and applications, I'm not surprised it looks unfamiliar.

That being said, though, give it a chance if you're interested. Some parts of the outline didn't look familiar to me either, but the video is well-made and stops to explain most of the background knowledge. And you can always gloss over the bits you don't understand.

cdsmith t1_j2yk9jb wrote on January 4, 2023 at 9:08 PM

Reply to [Discussion] If ML is based on data generated by humans, can it truly outperform humans? by groman434

I think the best way to answer your question is to ask you to be more precise about what, exactly, you mean by "outperform".

There's some limited sense in which your reasoning works as you seem to have envisioned. A generative model like GPT or GANs is typically built at least partly to produce output that's indistinguishable from what is produced by a human, using some kind of autoregressive data set or adversarial objective. By definition, it cannot do better at that goal, because a human has a 100% success rate, by definition, at producing something indistinguishable from what is produced by a human.

But there are limitations to this reasoning:

Producing any arbitrary human-like output is not actually the goal. People don't evaluate generative models on how human-like they are, but rather on how useful their results are. There are lots of ways their results can be more useful even if they aren't quite as "human-like". In fact, the motivation for trying to keep the results human-like is mainly that allowing a generative model too much freedom to generate samples that are very different from its training set decreases accuracy, not that it's a goal in its own right.
That's not all of machine learning anyway. Another very common task is, for example, Netflix predicting what movies you will want to watch to build their recommendations. Humans are involved in producing that data, but it's not learning from data about what other humans predicted users would watch. It's learning directly from observed data about what humans really did watch. Such a system isn't aiming to emulate humans at all. Some machine learning is even trained on data that's not generated by humans at all, but rather the objective it's training to optimize is either directly observed and measured, or directly computed.
Even in cases where a supervised model is learning to predict human labeling, which is where your reasoning best applies, the quantity of data can overcome human accuracy. Imagine this simpler scenario: I am learning to predict which President is on a U.S. bill, given the denomination amount. This is an extremely simple function to learn, of course, but let's say I only have access to data with a rather poor accuracy rate of 60%, with errors occurring uniformly. Well, with enough of that data, I can still learn to be 100% accurate, simply by noting which answer is the most common for each input! That's only a theoretical argument, and in a realistic ML context it's very difficult to get better-than-human performance on a supervised human-labeled task like this. But it's not impossible.
And, of course, if you look at more than just accuracy, ML can be "better" than humans in many ways. They can be cheaper, faster, more easily accessible, more deterministic, etc.

cdsmith t1_j2uzks4 wrote on January 4, 2023 at 3:09 AM

Reply to comment by itsnotlupus in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon

The idea is that there's an inflection point: at first you are mainly removing (masking with zeros) dimensions whose values are extremely small anyway and don't make much difference in the response, so you don't lose much accuracy. But after you're removed those dimensions, the remaining dimensions are specifically the ones that do matter, so you can't just go find more non-impactful dimensions again. They are already gone.

As far as what would happen if you over-pruned a model trained on a large number of parameters, I'd naively expect it to do much worse. If you train on more parameters and then zero out significant weights, then not only do you have a lower-dimensional space to model in (which is unavoidable), but you also lose out on the information that was correlated with the dimensions you've captured, because at training time your model relied on the parameters you have now zeroed out to capture that information.