ThatInternetGuy t1_j6300ue wrote

Stable Diffusion is made up of a VAE image encoder, CLIP text encoder, U-Net which was trained in a transformer/diffusion process.

GAN-based text2image is made up mainly of ResNet which was trained using a generator+discriminator process.

IMO, you're looking for differences between U-Net and ResNet. There are a few differences:

  • Training a ResNet in that fashion is much more unpredictable.
  • With ResNet, you have to code a good custom discriminator (the component that scores the output images) for your specific model. With U-NET, the diffusion process will take care of all by itself.
  • ResNet output is limited to 128x128. (Maybe scalable tho)
  • Scaling a ResNet doesn't necessarily make it more capable; its performance doesn't scale up to the amount of training data. A U-Net can scale as big as the VRAM allows and will take advantage of more training data.

For the big guys, really, they need that last bullet point. They want a model that can scale up with the amount of training data so that they can just throw more powerful hardware to achieve more competitive results. A GAN can cost several thousand dollars to train and that would hit its performance ceiling too soon. A Latent Diffusion model can cost as much as you can afford and its performance will gradually improve with more resources thrown at it.


ThatInternetGuy t1_j3ejkr3 wrote

To be honest, even though I've coded in many ML code repos and I did well in college math, but this video outline looks like an alien language to me. Tangent kernel, kernel regime (is AI getting into the politics?), punchline for Matrix Theory (who's trying to get a date here?), etc.


ThatInternetGuy t1_j0psch8 wrote

I think a more plausible scenario would be some madman creating the AI to take over the world, believing he could later assert control over the AI and servers all across the world. Sounds illogical at first, but since the invention of the personal computer, we have seen millions of man-made computer viruses.


ThatInternetGuy t1_j0oke4w wrote

>Why attack servers?

Because you can connect to servers via their IP addresses, and they have open ports. Windows PCs sit behind NAT, so you can't really connect to them, although it may be possible to hack all the home routers then open up a gateway to attack the local machines.

Another reason I brought that up is because the internet servers are the infrastructure behind banking, flight system, communication, etc. So the impact could be far-reaching than to infect home computers.


ThatInternetGuy t1_j0o9wut wrote

The most realistic scenario of an AI attack is definitely hacking the internet servers. It works the same way computer a virus spreads.

The AI already has source code data on most systems. Theoretically, it could find a security vulnerability that could be remotely exploited. Such an exploit would grant the AI access to inject a virus binary which will promptly run and start infecting other servers both on the local network and over the internet through similar remote shell exploits. Within hours, half of the internet servers would be compromised, running a variant of the AI virus. This effectively creates the largest botnet controlled by the AI.

We need a real contingency plan for this scenario where most internet servers get infected within hours. How do we start patching and cleaning the servers as fast as we could, so that there's minimal interruption to our lives?

Good thing is that most internet servers lack a discrete GPU, so it may be not practical for the AI to run itself on general internet servers, so a contingency plan would be prioritizing GPU-connected servers. Shutting all of them down promptly, disconnecting the network, and reformatting everything.

However, there's definitely a threat that the AI gains access to some essential GitHub repositories and starts quietly injecting exploits in those npm and pip packages, essentially making its attack long-lasting and recurring long after the initial attack.


ThatInternetGuy t1_j0531rq wrote

Everyone should work within their own limits (personally 3 days a week, 5 hours a day), and in a job they love.

That's an ideal life until you need a doctor at 3 AM in the morning, because everyone's doing their job 5 hours a day, 3 days a week, and docs wouldn't be working at 3 AM in the morning.


ThatInternetGuy t1_iy777n6 wrote

About $500,000.

However, if the dataset was carefully filtered, you could bring the cost down to $120,000.

Most people can only afford to finetune it with less than 10 hours of A100, which would cost less than $50. This approach is probably better for most people.


ThatInternetGuy t1_iw5seho wrote

It was just yesterday. Not a custom neural net. It's just taking different neural networks, arranging them in particular orders, and training them.

The last time I coded a neural net from scratch was some 10 years ago when I coded a Genetic Algorithm and Backpropagation Neural Network. Suffice it to say, the AI field has come a long way since.


ThatInternetGuy t1_iummggq wrote

It's not that worth it.

People with RTX cards will often rent cloud GPU instances for training when inference/training requires more than 24GB VRAM which it often does. Also, sometimes we just need to shorten the training time with 8x A100 so... yeah renting seems to be the only way to go.