yldedly t1_j9k5n8n wrote on February 22, 2023 at 4:06 PM

Reply to comment by GraciousReformer in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer

It depends a lot on what you mean by works. You can get a low test error with NNs on tabular data if you have enough of it. For smaller datasets, you'll get a lower test error using tree ensembles. For low out-of-distribution error neither will work.

yldedly t1_j9k3orr wrote on February 22, 2023 at 3:52 PM

Reply to comment by GraciousReformer in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer

Not sure what you're asking. CNNs have inductive biases suited for images.

yldedly t1_j9judc7 wrote on February 22, 2023 at 2:17 PM

Reply to comment by [deleted] in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer

Any interval [a; b] where a and b are numbers. In practice, it means that the approximation will be good in the parts of the domain where there is training data. I have a concrete example in a blog post of mine: https://deoxyribose.github.io/No-Shortcuts-to-Knowledge/

yldedly t1_j9jtuzy wrote on February 22, 2023 at 2:13 PM

Reply to comment by [deleted] in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer

I'll link you to an old comment: https://www.reddit.com/r/MachineLearning/comments/z12zxj/comment/ix9t149/?utm_source=share&utm_medium=web2x&context=3

yldedly t1_j9jr821 wrote on February 22, 2023 at 1:53 PM

Reply to comment by GraciousReformer in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer

This one is pretty good: https://arxiv.org/abs/2207.08815

yldedly t1_j9jpuky wrote on February 22, 2023 at 1:43 PM

Reply to comment by GraciousReformer in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer

There are two aspects, scalability and inductive bias. DL is scalable because compositions of differentiable functions make backpropagation fast, and those functions being mostly matrix multiplications make GPU acceleration effective. Combine this with stochastic gradients, and you can train on very large datasets very quickly.
Inductive biases make DL effective in practice, not just in theory. While the universal approximation theorem guarantees that an architecture and weight-setting exist that approximate a given function, the bias of DL towards low-dimensional smooth manifolds reflects many real-world datasets, meaning that SGD will easily find a local optimum with these properties (and when it doesn't, for example on tabular data where discontinuities are common, DL performs worse than alternatives, even if with more data it would eventually approximate a discontinuity).

yldedly t1_j9jorh1 wrote on February 22, 2023 at 1:34 PM

Reply to comment by ewankenobi in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer

It's not from a paper, but it's pretty uncontroversial I think - though people like to forget about the "bounded interval" part, or at least what it implies about extrapolation.

yldedly t1_j9j6gk8 wrote on February 22, 2023 at 10:19 AM

Reply to [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer

>discover arbitrary functions

Uh, no. Not even close. DL can approximate arbitrary functions on a bounded interval given enough data, parameters and compute.

yldedly t1_j75rw5b wrote on February 4, 2023 at 7:34 AM

Reply to [R] Topologically evolving new self-modifying multi-task learning algorithms by Feeling_Card_4162

Speaking as someone also working on an ambitious project that deviates a lot from mainstream ML, I encourage you to do the same thing I'm struggling with:

Try to implement the simplest possible version of your idea and test it on some toy problem to quickly get some insight.

Maybe start with one type of modulatory node and see how NEAT ends up using it?

yldedly t1_j5ybg0x wrote on January 26, 2023 at 12:19 PM

Reply to Machine learning and black box numerical solver[D] by Due-Wall-915

https://docs.sciml.ai/DiffEqFlux/stable/

yldedly t1_j5xun55 wrote on January 26, 2023 at 8:40 AM

Reply to comment by merlinsbeers in Researchers unveil the least costly carbon capture system to date - down to $39 per metric ton. by PNNL

Yea, their nuts are shriveled

yldedly t1_j4atech wrote on January 14, 2023 at 11:12 AM

Reply to comment by AImSamy in [D] Unpopular opinion : It's not always a good idea to train a model from by AImSamy

So far, mlflow + docker + torchserve has been enough. Soon I'll have to implement training, active learning and maintenance in the cloud as well, which will probably require more tools.

yldedly t1_j4asn14 wrote on January 14, 2023 at 11:02 AM

Reply to [D] Unpopular opinion : It's not always a good idea to train a model from by AImSamy

In my job, I solve almost everything with pre-trained model + a few hours of labeling + active learning.

yldedly t1_j45ycm8 wrote on January 13, 2023 at 11:54 AM

Reply to comment by chaosmosis in [D] What's your opinion on "neurocompositional computing"? (Microsoft paper from April 2022) by currentscurrents

>With enough scale we get crude compositionality, yes.

Depends on exactly what we mean. To take a simple example, if you have cos(x) and x^2, you can compose these to produce cos(x)^2 (or cos(x^2)). You can approximate the composition using a neural network if you have enough data on some interval x \in [a;b]. It will work well even for x that weren't part of the training set, as long as they are in the interval. Outside the interval the approximation will be bad though. But if you take cos(x), x^2 and compose(f, g) as building blocks, and search for a combination of these that approximate the data, the approximation will be good for all real numbers.

In the same way, you can learn a concept like "subject, preposition, object A, transitive verb, object B", where e.g. subject = "raccoon", preposition = "in a", object A = "spacesuit", transitive verb = "playing" and object B = "poker", by approximating it with a neural network, and it will work well if you have enough data in some high-dimensional subspace. But it won't work with any substitutions. Is it fair to call that crude compositionality?

yldedly t1_j3dwdv6 wrote on January 7, 2023 at 9:44 PM

Reply to comment by IntelArtiGen in [Discussion] Is there any alternative of deep learning ? by sidney_lumet

I agree of course, you can't compress more than some hard limit, even in lossy compression. I just think DL finds very poor compression schemes compared to what's possible (compare DL for that handwriting problem above to the solution constructed by human experts).

yldedly t1_j3ds4h2 wrote on January 7, 2023 at 9:16 PM

Reply to comment by IntelArtiGen in [Discussion] Is there any alternative of deep learning ? by sidney_lumet

Imo there's no reason why we can't have much smaller models that do well on these tasks, but I admit it's just a hypothesis at this point. Specifically for images, an inverse graphics approach wouldn't require nearly as many parameters: http://sunw.csail.mit.edu/2015/papers/75_Kulkarni_SUNw.pdf

yldedly t1_j3dn5mb wrote on January 7, 2023 at 8:43 PM

Reply to comment by IntelArtiGen in [Discussion] Is there any alternative of deep learning ? by sidney_lumet

>Any alternative which would be able
to solve the same problems would probably require a similar
architecture: lot of parameters, deep connections.

If handwritten character recognition (and generation) counts as one such problem, then here is a model that solves it with a handful of parameters: https://www.cs.cmu.edu/~rsalakhu/papers/LakeEtAl2015Science.pdf

yldedly t1_iydiq69 wrote on November 30, 2022 at 4:29 PM

Reply to comment by currentscurrents in [D] Other than data what are the common problems holding back machine learning/artificial intelligence by BadKarma-18

>The goal isn't to pass as human, it's to solve whatever problem is in front of you.

It's worth disambiguating between solving specific business problems, and creating intelligent (meaning broadly generalizing) programs that can solve problems. For the former, what Francois Chollet calls cognitive automation is often sufficient, if you can get enough data, and we're making great progress. For the latter, we haven't made much progress, and few people are even working on it. Lots of people are working on the former, and deluding themselves that one day it will magically become the latter.

yldedly t1_ixigekm wrote on November 23, 2022 at 6:05 PM

Reply to comment by Matsarj in [R] Category Theory for AI,AI for Category theory by FresckleFart19

I stumbled on this thesis some time ago, where the author formulates a category of causal model, where arrows are structure-preserving transformations between models. Seems like it would be useful for causal model discovery.

yldedly t1_iwpdtne wrote on November 17, 2022 at 10:26 AM

Reply to comment by dat_cosmo_cat in [R] The Near Future of AI is Action-Driven by hardmaru

"Wow, this test accuracy is way better!" "Ok, how does it do on OOD data?" "Hmm, not great. Let's train a bigger model."

"Wow, this test accuracy is way better!" "Ok, how does it do on OOD data?" "Hmm, not great. Let's..."

yldedly t1_isecsiy wrote on October 15, 2022 at 9:39 AM

Reply to comment by evanthebouncy in [P] a minimalist guide to program synthesis by evanthebouncy

>I think my work "communicating natural programs to humans and machines" will entertain you for hours. Give it a go.

I will, looks super interesting. I'm so jealous of you guys at MIT working on all this fascinating stuff :D

>It's my belief that we should program computers using natural utterances such as language, demonstration, doodles, ect. These "programs" are fundamentally probablistic and admits multiple interpretations/executions.

That's an ambitious vision. I can totally see how that's the way to go if we want "human compatible" AI, in Stuart Russell's sense where AI is learning what the human wants to achieve, by observing their behavior (including language, demonstrations, etc).

yldedly t1_isb5nsi wrote on October 14, 2022 at 4:54 PM

Reply to comment by evanthebouncy in [P] a minimalist guide to program synthesis by evanthebouncy

What evocative examples :P
I know probmods.org well, it's excellent. I wrote a blogpost about program synthesis. I stumbled on the area during my phd where I did structure learning for probabilistic programs, and realized (a bit late) that I was actually trying to do program synthesis. So I'm very interested in it, wish I had the chance to work with it more professionally. Looking forward to reading your blog!

yldedly t1_isaznkg wrote on October 14, 2022 at 4:14 PM

Reply to [P] a minimalist guide to program synthesis by evanthebouncy

Have you tried synthesizing probabilistic programs and inference programs? Any general thoughts on the topic?

yldedly t1_irvmn2v wrote on October 11, 2022 at 11:46 AM

Reply to comment by Empty-Painter-3868 in [D] What are your thoughts about weak supervision? by ratatouille_artist

>The correct answer nobody wants to hear is: "I should have spent a week labelling data"

... with active learning?

yldedly t1_irvfafm wrote on October 11, 2022 at 10:18 AM

Reply to comment by Competitive-Rub-1958 in [R] Self-Programming Artificial Intelligence Using Code-Generating Language Models by Ash3nBlue

There's a lot to unpack here. I agree that a large part of creating AGI is building in the right priors ("learning priors" is a bit of an oxymoron imo, since a prior is exactly the part you don't learn, but it makes sense that a posterior for a pre-trained model is a prior for a fine-tuned model).

Invariance and equivariance are a great example. Expressed mathematically, using symbols, it makes no sense to say a model is more or less equivariant - it either is or it isn't. If you explicitly build equivariance into a model (and apparently it's not as straightforward as e.g. just using convolutions), then this is really what you get. For example, the handwriting model from my blogpost has real translational equivariance (because the location of a character is sampled).

If you instead learn the equivariance, you will only ever learn a shortcut - something that works on training and test data, but not universally, as the paper from the twitter thread shows. Just like the networks that can solve the LEGO task for 6 variables don't generalize to any number of variables, learning "equivariance" on one dataset (even if it's a huge one) doesn't guarantee equivariance on another. A neural network can't represent an algorithm like "for all variables, do x", or constraints like "f(g(x)) = g(f(x)), for all x" - you can't represent universal quantifiers using finite dimensional vectors.

That being said, you can definitely learn some useful priors by training very large networks on very large data. An architecture like the Transformer allows for some very general-purpose priors, like "do something for pairs of tokens 4 tokens apart".