yldedly
yldedly t1_j9k3orr wrote
Reply to comment by GraciousReformer in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer
Not sure what you're asking. CNNs have inductive biases suited for images.
yldedly t1_j9judc7 wrote
Reply to comment by [deleted] in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer
Any interval [a; b] where a and b are numbers. In practice, it means that the approximation will be good in the parts of the domain where there is training data. I have a concrete example in a blog post of mine: https://deoxyribose.github.io/No-Shortcuts-to-Knowledge/
yldedly t1_j9jtuzy wrote
Reply to comment by [deleted] in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer
I'll link you to an old comment: https://www.reddit.com/r/MachineLearning/comments/z12zxj/comment/ix9t149/?utm_source=share&utm_medium=web2x&context=3
yldedly t1_j9jr821 wrote
Reply to comment by GraciousReformer in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer
This one is pretty good: https://arxiv.org/abs/2207.08815
yldedly t1_j9jpuky wrote
Reply to comment by GraciousReformer in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer
There are two aspects, scalability and inductive bias. DL is scalable because compositions of differentiable functions make backpropagation fast, and those functions being mostly matrix multiplications make GPU acceleration effective. Combine this with stochastic gradients, and you can train on very large datasets very quickly.
Inductive biases make DL effective in practice, not just in theory. While the universal approximation theorem guarantees that an architecture and weight-setting exist that approximate a given function, the bias of DL towards low-dimensional smooth manifolds reflects many real-world datasets, meaning that SGD will easily find a local optimum with these properties (and when it doesn't, for example on tabular data where discontinuities are common, DL performs worse than alternatives, even if with more data it would eventually approximate a discontinuity).
yldedly t1_j9jorh1 wrote
Reply to comment by ewankenobi in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer
It's not from a paper, but it's pretty uncontroversial I think - though people like to forget about the "bounded interval" part, or at least what it implies about extrapolation.
yldedly t1_j9j6gk8 wrote
>discover arbitrary functions
Uh, no. Not even close. DL can approximate arbitrary functions on a bounded interval given enough data, parameters and compute.
yldedly t1_j75rw5b wrote
Reply to [R] Topologically evolving new self-modifying multi-task learning algorithms by Feeling_Card_4162
Speaking as someone also working on an ambitious project that deviates a lot from mainstream ML, I encourage you to do the same thing I'm struggling with:
Try to implement the simplest possible version of your idea and test it on some toy problem to quickly get some insight.
Maybe start with one type of modulatory node and see how NEAT ends up using it?
yldedly t1_j5ybg0x wrote
yldedly t1_j5xun55 wrote
Reply to comment by merlinsbeers in Researchers unveil the least costly carbon capture system to date - down to $39 per metric ton. by PNNL
Yea, their nuts are shriveled
yldedly t1_j4atech wrote
Reply to comment by AImSamy in [D] Unpopular opinion : It's not always a good idea to train a model from by AImSamy
So far, mlflow + docker + torchserve has been enough. Soon I'll have to implement training, active learning and maintenance in the cloud as well, which will probably require more tools.
yldedly t1_j4asn14 wrote
In my job, I solve almost everything with pre-trained model + a few hours of labeling + active learning.
yldedly t1_j45ycm8 wrote
Reply to comment by chaosmosis in [D] What's your opinion on "neurocompositional computing"? (Microsoft paper from April 2022) by currentscurrents
>With enough scale we get crude compositionality, yes.
Depends on exactly what we mean. To take a simple example, if you have cos(x) and x^2, you can compose these to produce cos(x)^2 (or cos(x^2)). You can approximate the composition using a neural network if you have enough data on some interval x \in [a;b]. It will work well even for x that weren't part of the training set, as long as they are in the interval. Outside the interval the approximation will be bad though. But if you take cos(x), x^2 and compose(f, g) as building blocks, and search for a combination of these that approximate the data, the approximation will be good for all real numbers.
In the same way, you can learn a concept like "subject, preposition, object A, transitive verb, object B", where e.g. subject = "raccoon", preposition = "in a", object A = "spacesuit", transitive verb = "playing" and object B = "poker", by approximating it with a neural network, and it will work well if you have enough data in some high-dimensional subspace. But it won't work with any substitutions. Is it fair to call that crude compositionality?
yldedly t1_j3dwdv6 wrote
Reply to comment by IntelArtiGen in [Discussion] Is there any alternative of deep learning ? by sidney_lumet
I agree of course, you can't compress more than some hard limit, even in lossy compression. I just think DL finds very poor compression schemes compared to what's possible (compare DL for that handwriting problem above to the solution constructed by human experts).
yldedly t1_j3ds4h2 wrote
Reply to comment by IntelArtiGen in [Discussion] Is there any alternative of deep learning ? by sidney_lumet
Imo there's no reason why we can't have much smaller models that do well on these tasks, but I admit it's just a hypothesis at this point. Specifically for images, an inverse graphics approach wouldn't require nearly as many parameters: http://sunw.csail.mit.edu/2015/papers/75_Kulkarni_SUNw.pdf
yldedly t1_j3dn5mb wrote
Reply to comment by IntelArtiGen in [Discussion] Is there any alternative of deep learning ? by sidney_lumet
>Any alternative which would be able
to solve the same problems would probably require a similar
architecture: lot of parameters, deep connections.
If handwritten character recognition (and generation) counts as one such problem, then here is a model that solves it with a handful of parameters: https://www.cs.cmu.edu/~rsalakhu/papers/LakeEtAl2015Science.pdf
yldedly t1_iydiq69 wrote
Reply to comment by currentscurrents in [D] Other than data what are the common problems holding back machine learning/artificial intelligence by BadKarma-18
>The goal isn't to pass as human, it's to solve whatever problem is in front of you.
It's worth disambiguating between solving specific business problems, and creating intelligent (meaning broadly generalizing) programs that can solve problems. For the former, what Francois Chollet calls cognitive automation is often sufficient, if you can get enough data, and we're making great progress. For the latter, we haven't made much progress, and few people are even working on it. Lots of people are working on the former, and deluding themselves that one day it will magically become the latter.
yldedly t1_ixigekm wrote
Reply to comment by Matsarj in [R] Category Theory for AI,AI for Category theory by FresckleFart19
I stumbled on this thesis some time ago, where the author formulates a category of causal model, where arrows are structure-preserving transformations between models. Seems like it would be useful for causal model discovery.
yldedly t1_iwpdtne wrote
Reply to comment by dat_cosmo_cat in [R] The Near Future of AI is Action-Driven by hardmaru
"Wow, this test accuracy is way better!" "Ok, how does it do on OOD data?" "Hmm, not great. Let's train a bigger model."
"Wow, this test accuracy is way better!" "Ok, how does it do on OOD data?" "Hmm, not great. Let's..."
yldedly t1_isecsiy wrote
Reply to comment by evanthebouncy in [P] a minimalist guide to program synthesis by evanthebouncy
>I think my work "communicating natural programs to humans and machines" will entertain you for hours. Give it a go.
I will, looks super interesting. I'm so jealous of you guys at MIT working on all this fascinating stuff :D
>It's my belief that we should program computers using natural utterances such as language, demonstration, doodles, ect. These "programs" are fundamentally probablistic and admits multiple interpretations/executions.
That's an ambitious vision. I can totally see how that's the way to go if we want "human compatible" AI, in Stuart Russell's sense where AI is learning what the human wants to achieve, by observing their behavior (including language, demonstrations, etc).
yldedly t1_isb5nsi wrote
Reply to comment by evanthebouncy in [P] a minimalist guide to program synthesis by evanthebouncy
What evocative examples :P
I know probmods.org well, it's excellent. I wrote a blogpost about program synthesis. I stumbled on the area during my phd where I did structure learning for probabilistic programs, and realized (a bit late) that I was actually trying to do program synthesis. So I'm very interested in it, wish I had the chance to work with it more professionally. Looking forward to reading your blog!
yldedly t1_isaznkg wrote
Have you tried synthesizing probabilistic programs and inference programs? Any general thoughts on the topic?
yldedly t1_irvmn2v wrote
Reply to comment by Empty-Painter-3868 in [D] What are your thoughts about weak supervision? by ratatouille_artist
>The correct answer nobody wants to hear is: "I should have spent a week labelling data"
... with active learning?
yldedly t1_irvfafm wrote
Reply to comment by Competitive-Rub-1958 in [R] Self-Programming Artificial Intelligence Using Code-Generating Language Models by Ash3nBlue
There's a lot to unpack here. I agree that a large part of creating AGI is building in the right priors ("learning priors" is a bit of an oxymoron imo, since a prior is exactly the part you don't learn, but it makes sense that a posterior for a pre-trained model is a prior for a fine-tuned model).
Invariance and equivariance are a great example. Expressed mathematically, using symbols, it makes no sense to say a model is more or less equivariant - it either is or it isn't. If you explicitly build equivariance into a model (and apparently it's not as straightforward as e.g. just using convolutions), then this is really what you get. For example, the handwriting model from my blogpost has real translational equivariance (because the location of a character is sampled).
If you instead learn the equivariance, you will only ever learn a shortcut - something that works on training and test data, but not universally, as the paper from the twitter thread shows. Just like the networks that can solve the LEGO task for 6 variables don't generalize to any number of variables, learning "equivariance" on one dataset (even if it's a huge one) doesn't guarantee equivariance on another. A neural network can't represent an algorithm like "for all variables, do x", or constraints like "f(g(x)) = g(f(x)), for all x" - you can't represent universal quantifiers using finite dimensional vectors.
That being said, you can definitely learn some useful priors by training very large networks on very large data. An architecture like the Transformer allows for some very general-purpose priors, like "do something for pairs of tokens 4 tokens apart".
yldedly t1_j9k5n8n wrote
Reply to comment by GraciousReformer in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer
It depends a lot on what you mean by works. You can get a low test error with NNs on tabular data if you have enough of it. For smaller datasets, you'll get a lower test error using tree ensembles. For low out-of-distribution error neither will work.