Zermelane

Zermelane t1_jedt8ps wrote

Yep. Matt Levine already coined the Elon Markets Hypothesis, but the Elon Media Hypothesis is even more powerful: Media stories are interesting not based on their significance or urgency, but based on their proximity to Elon Musk.

Even OpenAI still regularly gets called Musk's AI company, despite him having had no involvement for half a decade. Not because anyone's intentionally trying to spread a narrative that it's still his company, but either because they are just trying to get clicks, or because they genuinely believe it themselves since those clickbait stories are the only ones they've seen.

2

Zermelane t1_je8lss0 wrote

Better parallelism in training, and a more direct way to reference past information, than in RNNs (recurrent neural networks) which seemed like the "obvious" way to process text before transformers came by.

These days we have RNN architectures that can achieve transformer-like training parallelism, the most interesting-looking one being RWKV. They are still badly disadvantaged when needing information directly from the past, for instance to repeat a name that's been mentioned before, but they have other advantages, and their performance gets close enough to transformers that it could be just a question of scaling exponents which architecture ends up winning out.

3

Zermelane t1_j6cqvb1 wrote

No overall disagreement, but a couple of points that I thought were worth sharpening.

> I'm not going to lie, I didn't expect this. Even 6 months ago, I was of the mind that once I had a magic media machine, I would eschew all human-created media and leave that to the hipsters. But now I'm increasingly feeling like this fear that all human-created art is dying is a very, very premature call.

Consider the possibility that you were previously seeing the situation from afar, and thinking about the long term, but now that you're seeing it from up close, it's harder to emotionally see past the limitations of current technology.

We don't make people dig ditches without excavators not just because it's economically inefficient, but because making people do work that could very cheaply and easily be automated is not compatible with human dignity: The idea of paying artists when you have that magic media machine should feel the same. Maybe it just doesn't right now because where it used to seem like an abstraction but a possible one, now it seems like a reality but an unachievably distant one.

> I've noticed on DeviantArt and ArtStation, 90% to 95% of people using AI-generation tools are actually kind enough to mark their creations as AI-generated. The fear that sinister and lazy techbros will pretend they themselves created Midjourney and DALL-E 2 generations to trick consumers and rob from hard-working artists is just that: a relatively unfounded fear.

The stakes are very low there. We weren't really worried about people freely uploading stuff to DeviantArt being cheated out of anything, as they weren't being paid in the first place. The place you should be looking is how concept artists, visual designers, commissioned artists etc. are doing.

1

Zermelane t1_j562q6r wrote

I like your mentality, OP. I have absolutely no idea how it works and I think I disagree with it on a really fundamental level, but it's at least very different from the usual redditor fare, and it's making me think really damn hard.

> they exceed others and these “dummies” pretending to be real people with boring lives or bad jobs give YOU a sense of superiority, jobbers as they are called in wrestling

I think you are overestimating how much people think about each other's lives.

I can see humans retreating into video games for other reasons, like, maybe you want to be able to hit on anyone and always be accepted or something. But purely to see other people have sadder lives than you? No, that in itself isn't a fantasy that I've ever seen video games sell. A better, more significant life than you have right now, absolutely, but not comparing your character to NPCs.

But okay, you sure do mow down a lot of mooks in some games, and that means that your life is clearly more important than theirs. Fair enough: A multiplayer game where one player is the hero and the others are grunts that go down in a few seconds wouldn't be very interesting.

... and I guess I'll grant you that. I'd have to do some work to probe out the distinction to really figure out the difference, but I do think it's there. Well. At least now, Maybe future video game technology will change what experiences are fun, and it turns out that people do enjoy games where they compare themselves to NPCs with shitty jobs.

3

Zermelane t1_j4pj1oe wrote

> So the question is what is going on with AI art compared to what a human does to create CGI images that makes them seem different. Like I kind of get how CGI is done, it's like modelling and adding textures and all that different stuff but AI doesn't do that. It isn't building up a model from a sketch to a complete design it's doing something different.

This question is unfortunately both technical and deep, and it takes a lot of background to answer it well. It doesn't help that the technical details are changing fast, and the diffusion model architectures that are popular now are completely different from the GANs that were popular a few years ago; and maybe in the next year we'll have completely different models again.

But for a taste, look at the grid of horse images in this post or the sequence of drawing the beach in this one. It's a little bit misleading to show those as a description of the process, as it doesn't explain anything about what happens inside the U-Net to get from one step to another. But it does show that there is at least a sort of an iterative process and it does add detail over time.

At least with this architecture, anyway. GANs were different. Well, they probably still had internal representations that started off at a more sketch-like level, but that would have been harder to see in action. Recent models like MaskGIT do the process of adding detail in a completely different way yet.

3

Zermelane t1_j36fufw wrote

I don't think this thing is going to get popular. It'd only be used by people who:

  • know it exists (so probably internet-savvy, highly literate English speakers)
  • have a refund or cancellation/etc. to ask for
  • ... that's either so low-value or inconvenient to them that they can't or won't ask for it themselves
  • ... but important enough that they do bother using a service for it

And I am really glad of that, because otherwise this could mean the end of being able to get any decent customer support at all. It's already an ugly equilibrium between awful customers, and companies putting in the absolute minimum resources they can get away with.

If you give the customers a let-the-AI-go-bother-customer-support-for-me button, and they decide to use it, that does not mean that the customers win, it means the equilibrium is going to swing somewhere completely different.

3

Zermelane t1_j2d8bms wrote

> This week, Philip Wang, the developer responsible for reverse-engineering closed-sourced AI systems including Meta’s Make-A-Video, released PaLM + RLHF, a text-generating model that behaves similarly to ChatGPT

Oh, yeah, he does that, quite a lot, too. Basically, most of the cool ML things that come out have a simple enough core that if you are a brilliant enough engineer (and Phil Wang is), and familiar enough with the common concepts that they tend to be built on, you can reimplement it on top of a ML framework in an evening or two.

Think of it as basically being an executable version of the mathematics describing the model. It would take not just GPUs and data, but also still a whole bunch of engineering and code, to actually get from this to a trained model.

Unrelatedly, the same guy made This Person Does Not Exist a few years back, so that might be what you also know him for.

10

Zermelane t1_j29mja2 wrote

Sounds scammy. ChatGPT itself does not have an official API at all and the research preview is free to use, while with the other models (that do have official APIs) that OpenAI does charge for, they charge by usage, and purchasing lifetime access from a third party is... not likely to be a good deal, is the best I can say.

7

Zermelane t1_j291pkq wrote

Oh hey, for once, a topic I have a strong opinion about:

I think we already fucked up, hard. Friendlessness is already growing at a fast pace. Why, I'm not sure. Social media probably has something to do with it. Personally, I think the growth of obesity and the way it messes with your sense of self-worth might have a bigger effect than people consider.

What that does to you is, it means you just get no positive experiences of friendship at all. You don't get to have people be present with you, and pay attention to you, and laugh with you, etc.. And my view of human psychology is, that makes it way, way harder to have the energy and confidence to seek out friendship or companionship at all, especially as an adult when you tend have to be intentional and even strategic about it if you really want it to work.

I don't see friendship with AI chatbots as humankind's coming savior or anything, but I do think that the order of effects it has in terms of size is:

  1. People get more positive experiences from interacting with their AI friends
  2. People seek out more friendship with other humans, to get more of the good stuff and to seek out some of the friction
  3. A few people seek out less of it, because they get enough from the AI

... and even if you disagree, I think the first effect is just overwhelmingly larger than the other two, and the thing you might quibble about is switching 2 and 3's places.

13

Zermelane t1_j282ln7 wrote

> So the amount of computational resources required to emulate a brain is orders if magnitude higher than that suggested by the model of a neuron as a dumb transistor and the brain as a network of switches.

It is very popular to look at how biological neurons and artificial neurons are bad at modelling each other, and immediately, without a second thought, assume that it means that biological neurons must be a thousand times powerful, no, ten thousand times more powerful than artificial ones.

It is astonishingly unpopular to actually do the count, and notice that something like Stable Diffusion contains the gist of all of art history and the personal styles of basically all famous artists, thousands of celebrities, the appearance of all sorts of objects, etc., in a model that in a synapse-for-parameter count matches the brain of a cockroach.

(same with backprop: Backpropagation does things that biology can't do, so people just... assume that it means biology is doing something even better, and nobody seems to want to think the thought that backprop might be using its biologically implausible feedback mechanism to do things better than biology)

11

Zermelane t1_j1svt44 wrote

> I’m quite skeptical of Elon Musk being on the board for OpenAI

Plenty of reason to be skeptical indeed, because he isn't! Hasn't been for almost half a decade.

> there are efforts to make the singularity awful for everyone by people lobbying to ban the use of AI for some things like art, but let’s be honest, the rich will still be able to use it, just not the general population

This has some plausibility, but let me throw a couple of random points at it:

It depends on the government being really awful. Which it is, but only so awful, and generally less so in liberal societies. Governments might do things like kill babies by banning parenteral nutrition with a healthy distribution of fatty acids, but most didn't ban cellphones, the Internet, CRISPR, etc..

Also: These days governments have foresight in the sense that they might well ban things ahead of time, preventing even the rich from getting them. For instance, maybe you saw that recent story about trying to finally get the FDA to allow aging to be treated as a disease. If you're considering working on an aging treatment, and you are greedy, you want to sell it to everyone; but if the government says you won't be allowed to sell it to anyone, then what you do won't be that you will work on it and then only sell it to rich people, you just won't work on it at all.

2

Zermelane t1_j0o59yu wrote

There won't be a point where everything is mostly like now except everyone has 50% more non-work time, things are going to get wild and crazy and bizarre even while most people do still work.

So, on the theory that "when AI automates all the jobs" in fact refers to a large amount of progress ahead, I'm going to say, grow more into being myself, solve interesting problems that I never could before, and be my fursona.

1

Zermelane t1_izvvg1v wrote

I enjoy all the comments completely failing to get that OP wasn't making an argument from fast capability gain post-AGI.

FWIW, I don't really 100% agree with the argument myself. Integration and generalization have costs. If for instance you just want to generate random images of human faces, our best text-to-image diffusion models are much, much more expensive to train and run than an unconditional StyleGAN2 trained on FFHQ, and still have a hard time matching how well it does at that task. These costs might turn out very large once we're really trying to do AGI.

That said, you can take the the fast capability gain argument and make it relevant here again: Having an AGI should make it a lot easier to take all the research we've done into reaching superhuman capability in all sorts of narrow domains, and integrate it into one agent.

If nothing fancier, that might simply mean doing the programming to, say, set up an AlphaGo instance and call out to it when someone wants you to play Go, etc., and that does indeed get you an agent that, as far as you can tell from the outside, is an AGI and also superhuman at Go.

3

Zermelane t1_iwvefy4 wrote

This question really deserves a great answer, and I've tried to write an okay one over two days now, but there's an intuition that I don't really know how to express. Or that might just be wrong, I'm not a ML researcher. But even if it's just disjointed chunks of an argument, here goes anyway:

You can run GPT-2 on a weak GPU and it'll do a great job dealing with language and text as such. On the one hand, that's a non-obvious accomplishment on its own right, see nostalgebraist on GPT-2 being able to write for more on that; but on the other hand, well, when was the last time you actually used GPT-2 for anything?

And the reason why you don't do that is... text models have long ago stopped being about text. By far most of what they model is just, well, everything else. Stories, logic, physical intuition, theory of mind, etc.. GPT-2 can do language, and language is pretty straightforward, but all that other stuff is general intelligence, and general intelligence is very, very hard.

But if you're going to do general intelligence, text is a really great modality. It comes pre-processed by language evolution to have a nice, even, and high rate of communicated information, so that if you just compress it a tiny bit, you get a lot of structure and meaning in a tiny amount of input bits. Which in turn means that you can process those with a model that can just focus right away on the hard parts, and use an even amount of computation for everything, and still not really leave much performance on the table.

Image models on the other hand model far less - just the visual universe of pictures on the internet, no big deal - and you probably aren't trying to get them to pull off anything like the feats of reasoning that you expect from language models. Hence, they can do seemingly a lot with little. I've seen someone pull off having Stable Diffusion outpaint the right side of a blackboard with "1+1=" written on the left side, and I think it did pull off putting in a 2, but that's probably just about the extent of reasoning that people expect from image models right now.

Audio I don't really have much of a handle on. One issue with audio models is that if you really want to represent most audio you find online well, you kind of need to be a great language model as well, considering how much audio is speech or song. But at the same time, audio is a far heavier way to represent that language than text is, so it's far harder to learn all of language from audio.

2

Zermelane t1_iw1nx4n wrote

High-confidence predictions:

  • GPT-4, unless it comes out this year. The rumors about its capabilities and architecture have been so all over the place that I have no idea what to expect of it, but the part I'm confident about is, it's coming.

  • Publicly available text-to-image models conditioned on a good text encoder's embeddings and not CLIP (or, with eDiff-I's example, not only CLIP). We will collectively realize just how frustratingly vague and gist-based our current text-to-image models really were.

  • H100s go brrr. 2-3x cost decreases in workloads doing anything A100s were already good at, more if you can make use of stuff like fp8, with matching improvements in AI services.

  • Some crazy thing will happen in BioML that nobody will be able to agree on whether it's a huge breakthrough or an insignificant increment.

...And some spicy low-confidence ones:

  • Some cool architectural improvement to diffusion models turns out to work really well and make them significantly cheaper, don't know what. Pyramidal diffusion? Maybe someone figures out how to do StyleGAN3's equivariances on a U-Net? Maybe some trick that's particularly good for video?

  • Someone figures out how to get text-to-image to competently use references when drawing, without textual inversion's crazy overfitting problems.

  • One of the big labs gets a LLM to usefully critique and correct its own chain-of-thought reasoning, bumps MMLU results by some scary number in the 5-10% range. (Bonus points if they also apply that to codegen)

  • Someone trains a TTS to use T5 embeddings, and suddenly it just gets emotional prosody right because it actually has some idea of what it's saying.

4

Zermelane t1_ivdkpjk wrote

That paper is a fun read, if only for some of the truly galaxy-brained takes in it. My favorite is this:

> ◦ We may have a special relationship with the precursors of very powerful AI systems due to their importance to society and the accompanying burdens placed upon them. > > ■ Misaligned AIs produced in such development may be owed compensation for restrictions placed on them for public safety, while successfully aligned AIs may be due compensation for the great benefit they confer on others. > > ■ The case for such compensation is especially strong when it can be conferred after the need for intense safety measures has passed—for example, because of the presence of sophisticated AI law enforcement. > > ■ Ensuring copies of the states of early potential precursor AIs are preserved to later receive benefits would permit some separation of immediate safety needs and fair compensation.

Ah, yes, just pay the paperclip maximizer.

Not to cast shade on Nick Bostrom, he's absolutely a one-of-a-kind visionary and the one who came up with these concepts in the first place, and the paper is explicitly just him throwing out a lot of random ideas. But that's still a funny quote.

36

Zermelane t1_itfr3j9 wrote

There are so, so many incremental steps between here and straight-out text-to-movie that will each be mind-blowing advances on their own.

  • Much more controllable text-to-image, that actually consistently stays on model, not to mention consistently giving people the right number of limbs
  • Voice synthesis that can actually stay convincing and express different emotions through hours of generated audio
  • Audio synthesis to generate all of the sounds of a movie, in addition to the voices
  • Video synthesis that has all of those above properties, not to mention having far greater detail, resolution and accuracy than what we have now
  • Text generation that can maintain plot coherence and develop a plot through the length of an entire movie script
  • Either an amazing amount of engineering work to put together a system using separate models for all of the above (at least prompt-to-script and script-to-video), or maybe even more astonishingly, a single system somehow doing it all end-to-end
  • All of the above as tools integrated into existing workflows
  • Systems that can critique and edit the text, image, audio and video outputs of other AIs, the way a workflow with an image generation system right now might involve a human doing cherry-picking and inpainting

I'm not saying we mightn't get all the way to text-to-movie fast. I am saying that even if it took even several decades to happen, those would still be decades full of astonishing advances, most of which I couldn't even predict here.

51

Zermelane t1_itf46c7 wrote

You're not going to stop the global maritime shipping industry by peeing in the ocean.

The datasets that large language models are trained on are already full of absolute junk. My favorite example is from Deduplicating Training Data Makes Language Models Better, a sentence that was repeated more than 60,000 times in a version of the Common Crawl used for training some significant models at Google:

> by combining fantastic ideas, interesting arrangements, and follow the current trends in the field of that make you more inspired and give artistic touches. We’d be honored if you can apply some or all of these design in your wedding. believe me, brilliant ideas would be perfect if it can be applied in real and make the people around you amazed!

... not to mention, I've heard stories of training instabilities caused by entire batches consisting of backslashes, or bee emojis. The former I can at least understand how you'd end up with (backslash escapes grow exponentially if you re-escape them), but the bee emojis are, I don't know, someone just wanted to put a lot of bee emojis online, and they ended up messing with someone's language model training.

1