hayAbhay

hayAbhay t1_j9i9nfv wrote

If you have the hardware, and if you have a lot of those input-output examples, you can use alternative smaller models in the gpt family.

Should work reasonably well especially if the variance in the input-output isn't too much. (A lot depends on your dataset here)

Definitely tradeoffs here in terms of model dev, inference and maintenance of it. If the expected costs aren't too high, I'd strongly recommend gpt3 as a base.

1

hayAbhay t1_j9duoda wrote

Create a Corpus C like this

<source text from corpus A> <human generated text from corpus B> . . .

Make sure you add some unique tokens marking the start and end of each example and the input and output within it.

Then, take any pretrained LLM (tuning gpt3 is trivial with ~10-20 lines of code).

For inference, use the tuned model and give it the input and let it complete the output. You can add the "end" marker token to get generation to complete.

[Source: trained/tuned several language models including gpt3]

3

hayAbhay t1_j2m7l6n wrote

If you're a complete beginner and you're okay with a specific domain, I highly recommend the UMich Deep learning for Computer Vision by Justin Johnson.This is an excellent introductory course since it assumes no prior knowledge but most importantly Justin does an excellent job at providing solid foundational intuitions for deep learning (he taught CS231 with Karpathy). If you don't like Computer Vision, I still recommend the first 6-7 lectures.

I'll always recommend Andrew Ng's course for some broad basics alongside it as well. After that, you can jump into NYU's DL course by Yann and Alfredo. Imo Yann provides some of the best and most concise abstractions for some very complex concepts. If you're a beginner, some of it might go over your head. But once you have some general sense for the lay of the land and hands on experience, his abstractions are profound.

2

hayAbhay t1_iz8xbk0 wrote

I'm not entirely sure what those different categorizations entail but they seem really an application of reasoning. At it's core, everything we do is based on logical reasoning. There are paradoxes but it's the best we have. Within this, there are three core categories

  1. Deductive reasoning - This is the core of how we reason about this. If we know "If A, then B" as a "rule" and if we observe "A" then it follows that B is definitely true - Premise: A, A => B, Conclusion: B
  2. Inductive reasoning - This is coming up with the rule itself as a means of observation i.e if you observe many different instances (you notice the grass gets wet after it rains everytime - observation) and you concur, that "if it rains, then grass gets wet" or "A => B"
  3. Abductive reasoning - This is a sort of reverse reasoning where you observe something and "hypothesize" the cause. This is inherently uncertain and makes a lot of assumptions (closed world). So here, Premise: A=>B, B, Conclusion - A? (yes if closed world and no other rule exists that entail B, uncertain otherwise)

There are several variations of these as well. Everything you mentioned are really applications of these. Natural language is inherently uncertain and so is reality itself! The closest any natural language comes to capturing logic is legal documents (and we know the semantic games that happen there :) )

In terms of AI, logic based systems got pretty popular in the 80s but they're very brittle given our reality but they do have their place. This is the knowledge-based/logical reasoning you mentioned. Knowledge bases are simply a format in which "knowledge" or in other words some textual representation of real world concepts live and have a structure that you can apply logic based rules over.

With LLMs, they're probabilistic in a weird sort of way. Their optimization task is largely predicting the next word and essentially modeling the language underneath (which is inherently filled with ambiguities). Given large repetitions in text, it can easily do what appears to be reasoning largely from high probability occurrence. But, it won't be able to say, systematically pick concepts and trace up the reasoning like a human can. However, their biggest advantage is general utility. They can, as a single algorithm, solve a wide range of problems that would otherwise require a lot of bespoke systems. And, LLMs over the past 5-6 years have consistently hammered bespoke special purpose systems built from scratch. After all, for a human to apply crisp reasoning, they need some language :)

If you're curious, look up "Markov Logic Networks". Its from Pedro Domingos (his book "Master Algorithm" is also worth a read) and it tried to tie logic & probability too but had this intense expectation maximization over a combinatorial explosion. Also, check out yann lecunn talk at berkeley last month (he shared some of that at neurips from what i heard)

5

hayAbhay t1_iz7vok7 wrote

It's important to note here that llms are NOT very good at reasoning but they are perhaps the best when you consider a "generic" algorithm i.e. without a lot of domain specific work.

For logical reasoning, you'll usually need to resort to symbolic representations underneath and apply the rules of logic. ChatGPT may appear to do that well especially with 1st and even 2nd order but longer chains will make it stumble.

8

hayAbhay t1_iysq7vu wrote

So vectors to vectors are extremely abstract and it's hard to understand what those vectors are. ML/DL are functions at the end of the day that are learned from observing a lot of examples of vectors -> vector transformations (inductive learning).

If these transformations are fairly easy to model, you might just be able to solve the function directly. Or you might need simple ML algorithms. If it's very complex, you might need stronger models and/or more data.

For instance image -> vector of probabilities over possible categories requires some powerful models where as a vector of 1 feature (height) -> probability of not hitting the head on the roof requires a basic model.

1

hayAbhay t1_iysnyi1 wrote

Again, unless you describe your actual problem, it's very hard to help. ML is a vast field with a lot of different approaches that come with their tradeoffs depending on the specific problem.

Simply throwing diffusers without understanding the space is like wanting to cut a cake but using a random household object like a chair to do so.

3