Submitted by currentscurrents t3_10adz19 in MachineLearning

Paper: https://arxiv.org/abs/2205.01128

TL;DR It's a paper that tries to design systems that generalize. They argue there are two forms of computing: Compositional and Continuous.

Continuous computation is what neural networks are traditionally good at - creating a function that approximates a solution to a problem. Compositional computation is directly manipulating symbols, logic, ideas, etc - and unlike continuous computation, it's capable of generalizing from small datasets. But so far it's only useful inside carefully-constructed formal systems.

The authors believe research should be focused on combining the two, and implementing Compositionality fully with neural networks. They suggest some ways to do this. They also believe that the success of architectures like CNNs and Transformers comes from implementing a limited form of Compositionality.

This is a very interesting idea, but I have a little bit of skeptism:

  • This paper is heavy on theory and less so on practice. Has any followup work in this direction produced measurable results?

  • The lead author seems to have been saying things like this for a while. Sometimes older researchers have pet theories that are not broadly accepted in the field. What do other researchers think about this?

Thoughts?

90

Comments

You must log in or register to comment.

navillusr t1_j43zaqk wrote

I think this is a very common belief. Symbolic systems can do many things that neural networks struggle with very sample efficiently. But they’ve failed to scale with more data as well as neural networks for most tasks, and are harder to train. If we could magically combine the reasoning ability of symbolic systems with the pattern recognition and generalization of neural networks, we would be getting very close to AGI imo. That being said idk much about recent research in symbolic reasoning so my knowledge might be outdated.

41

Farconion t1_j44ebwm wrote

this is why neuro-symbolic computing hasn't gotten much traction right

4

currentscurrents OP t1_j44ycdz wrote

From what I've seen, it's a promising field that should be possible. But so far but nobody's made it work for more than toy problems.

3

throwaway2676 t1_j47m2r9 wrote

> If we could magically combine the reasoning ability of symbolic systems with the pattern recognition and generalization of neural networks, we would be getting very close to AGI imo.

I must be misunderstanding your meaning, because I don't see why this is particularly difficult. Train an AI to recognize deductive/mathematical reasoning and translate it into symbolic or mathematical logic. Run an automated proof assistant or computer algebra system on the result. Use the AI to translate back into natural language. Shouldn't be much more difficult than creating code, which ChatGPT can already do, and it would instantly eliminate 95% of the goofy problems LLMs get wrong.

4

navillusr t1_j47wc3c wrote

It’s definitely a hard problem. The challenge isn’t a pipeline problem of “solve this reasoning task” where you can just take the english task -> convert to code -> run code-> convert to english answer. We could probably do that with some degree of accuracy in some contexts.

The hard part is having the agent solve reasoning tasks without prompt engineering, when they appear, without telling it that it’s a reasoning task. In essence it should be able to combine reasoning and planning seamlessly with the generative side of intelligence, not just piece them together when you tell it to outsource the task to a reasoning engine (assuming it could even do this accurately)

For example, if you ask ChatGPT to play rock paper scissors, but choose the option that beats the option that beats the option that you pick. (i.e if I pick Rock, you pick Scissors, because scissors beats paper which beats rock), it cant plan that far ahead.

> Let’s play a modified version of Rock Paper Scissors, but to win, you have to pick the option that beats the option that beats the option that I pick.

> Sure, I'd be happy to play a modified version of Rock Paper Scissors with you. Please go ahead and make your selection, and I'll pick the option that beats the option that beats it.

> Rock

> In that case, I will pick paper.

Since this game requires 2 steps of thinking, and goes against the statistically likely answer in this scenario it fails. As you described, you could maybe write code that identifies a rock paper scissor game, generates and runs code, then answers in english, but there are many real world tasks that require more than 1 step planning that the agent needs to be able to seamlessly identify and work through. (For the record, it also outputs incorrect python code for this game when prompted)

I don’t do research in this specific area so again I could be off base here, but I think that’s why its harder than you’re imagining.

Fwiw, there was a recent paper (the method was called the Mind’s Eye) where they used an LLM to generate physics simulator code to answer physics question similar to what you described.

9

actualsnek t1_j44m1z9 wrote

Compositionality is increasingly a significant area of concern across many subfields of deep learning. Winoground recently showed that all state-of-the-art vision-language models drastically fail to comprehend compositional structure, a feature which many linguists would argue is fundamental to the expressive power of language.

Smolensky is also a great guy and was affiliated with the PDP group that developed backprop in the 80's. The best path to neurosymbolic computing & compositional reasoning remains unclear, but Smolensky and his student Tom McCoy have done some great work over the last few years exploring how symbolic structures are implicitly represented in neural nets.

12

giga-chad99 t1_j45pa30 wrote

Regarding that Winoground paper: Isn't compositionally what DALLE-2, Image, Parti, etc are famous for? Like the avocado chair, or some very specific images like "a raccoon in a spacesuit playing poker". SOTA vision language model are the only models that actually show convincing compositionally, or am I wrong?

4

chaosmosis t1_j45vdll wrote

With enough scale we get crude compositionality, yes. That trend will probably continue, but I don't think it'll take us to the moon.

3

yldedly t1_j45ycm8 wrote

>With enough scale we get crude compositionality, yes.

Depends on exactly what we mean. To take a simple example, if you have cos(x) and x^2, you can compose these to produce cos(x)^2 (or cos(x^2)). You can approximate the composition using a neural network if you have enough data on some interval x \in [a;b]. It will work well even for x that weren't part of the training set, as long as they are in the interval. Outside the interval the approximation will be bad though. But if you take cos(x), x^2 and compose(f, g) as building blocks, and search for a combination of these that approximate the data, the approximation will be good for all real numbers.

In the same way, you can learn a concept like "subject, preposition, object A, transitive verb, object B", where e.g. subject = "raccoon", preposition = "in a", object A = "spacesuit", transitive verb = "playing" and object B = "poker", by approximating it with a neural network, and it will work well if you have enough data in some high-dimensional subspace. But it won't work with any substitutions. Is it fair to call that crude compositionality?

4

actualsnek t1_j4931de wrote

Text2image generation models do anecdotally appear to be better than image-text matching models at compositional tasks, but if you look closely at some generated images, you'll notice compositional failures. They often apply properties to entities on which the text did not describe them as applied to, or misunderstand the described relation between entities as a more common relation between those entities.

Try a prompt like "man with dog ears running in the park", and it'll generate images of a man with a dog (sometimes with amplified ears) running in the park. Why? Because models don't have the underlying ability to create compositional representations, they instead simply approximate their training data distribution.

Examples like "a raccoon in a spacesuit playing poker" often do well because spacesuits are only ever worn and poker is only ever played (i.e. relations that are common in the training distribution). Try a prompt like "a raccoon sitting on a poker chip and holding a spacesuit" and you'll see pretty drastic failures.

All this being said, generative models *still* appear better than discriminative models for vision-language compositionality tasks, and our current work is exploring approaches to impart this ability onto discriminative models to solve tasks like Winoground.

3

visarga t1_j46a4x8 wrote

Would a dataset engineering approach work here? - generate and solve training problems with compositional structure, after sufficient examples it should generalise.

2

actualsnek t1_j493haq wrote

We're exploring some data augmentation approaches right now (see my response to u/giga-chad99) but how would you propose generating those problems with compositional structure?

1

visarga t1_j4cqkkb wrote

Sometimes you can exploit asymmetrical difficulty. For example, factorising polynomials is hard but multiplying a bunch of degree 1 polynomials is easy. So you can generate data for free, and it will be very diverse. The data is such that is has a compositional structure, it will necessitate applying rules correctly without overfitting.

Taking derivatives and integrals is similar - easy one way, hard the other way. And solving the task will teach the model something about symbolic manipulation.

More generally you can use an external process, a simulator, an algorithm or a search engine to obtain a transformation of input X to Y, then learn to predict Y from X or X from Y. "Given this partial game of chess, predict who wins" and such. If X has compositional structure, solving the task would teach the model how to generalise, because you can generate as much data as necessary to force it not to overfit.

2

omniron t1_j44m4s1 wrote

I think LLM and large transformer networks are basically finding structure and composition in raw data

And there’s some (as yet unknown) way to get rote symbolic manipulation by stacking some similar system in top of them— similar to how LLM guides diffusion models work

6

currentscurrents OP t1_j44nngb wrote

The paper does talk about this and calls transformers "first generation compositional systems" - but limited ones.

>Transformers, on the other hand, use graphs, which in principle can encode general, abstract structure, including webs of inter-related concepts and facts.

> However, in Transformers, a layer’s graph is defined by its data flow, yet this data flow cannot be accessed by the rest of the network—once a given layer’s data-flow graph has been used by that layer, the graph disappears. For the graph to be a bona fide encoding, carrying information to the rest of the network, it would need to be represented with an activation vector that encodes the graph’s abstract, compositionally-structured internal information.

>The technique we introduce next—NECST computing—provides exactly this type of activation vector.

They then talk about a more advanced variant called NECSTransformers, which they consider a 2nd generation compositional system. But I haven't heard of this system before and I'm not clear if it actually performs better.

10

Diffeologician t1_j43qshg wrote

Isn’t that the whole point of differentiable programming?

5

cdsmith t1_j45e09w wrote

Sort of. The promise of differentiable programming is to be able to implement discrete algorithms in ways that are transparent to gradient descent, but it's really only the numerical values of the inputs that are transparent to gradient descent, not the structure itself. The key idea here is the use of so-called TPRs (tensor product representations) to encode not just values but structure as well in a continuous way, so that one has an entire continuous deformation from the representation of one discrete structure to another. (Obviously, this deformation has to pass through intermediate states that are not directly interpretable as a single discrete structure, but the article argues that even these can represent valid states in some situations.)

9

Diffeologician t1_j45j7c4 wrote

So, there’s a trick where you write a differentiable program and swap out expensive bits with a neural network, which I think is probably related to this. Looking at the article, I think you would very quickly run into some hard problems in differential geometry if you tried to make this formal.

3

currentscurrents OP t1_j43s8ki wrote

In the paper they talk about "first generation compositional systems" and I believe they would include differentiable programming in that category. It has some compositional structure, but the structure is created by the programmer.

Ideally the system would be able to create it's own arbitrarily complex structures and systems to understand abstract ideas, like humans can.

4

ReasonablyBadass t1_j455zjq wrote

So they invented a new term for neuro symbolic computing?

3

actualsnek t1_j495737 wrote

Crazy that this has even a single upvote and proof that this subreddit is no longer the community for academic discourse it once was. Do you know who Paul Smolensky is? He practically invented the term "neuro-symbolic" and was virtually the only researcher seriously working on it in the 20 years leading up to the deep learning revolution. Harmonic Grammar, Optimality Theory, Tensor Product Representations. Please pick up perhaps any article on connectionism before 2010.

No, this is not a new term for for neuro-symbolic computing (which is now just a buzzword applicable to half of the field), it's a specific theoretical take on how compositional structure could be captured by vectorial representations.

2

[deleted] t1_j44g0ov wrote

[deleted]

2

cdsmith t1_j45f4sd wrote

Aside from a general similarity of goals, do you really think the paper you linked makes this one non-novel? I have trouble seeing that. As far as I can tell, there's absolutely nothing comparable to tensor product representations or NECST in your link.

2