arg_max t1_jdud5wz wrote on March 27, 2023 at 7:11 AM

Reply to comment by MysteryInc152 in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9

But we don't know if the text output actually gives us access to those confidences or if it is just making them up, do we?

arg_max t1_j9rt2ew wrote on February 24, 2023 at 2:31 AM

Reply to comment by Small-Fall-6500 in [D] To the ML researchers and practitioners here, do you worry about AI safety/alignment of the type Eliezer Yudkowsky describes? by SchmidhuberDidIt

The thing is that the theory behind diffusion models is at least 40-50 years old. Forward diffusion is a discretization of a stochastic differential equations that transforms the data distribution into a normal distribution. People figured out that it is possible to reverse this process, so to go from the normal distribution back to the data distribution using another sde In the 1970s. The thing is that this reverse SDE contains the score function, so the gradient of the log density of the data and people just didn't really know how to get that from data. Then some smart guys came along, found the ideas about denoising score matching from the 2000s and did the necessary engineering to make it work with deep nets.

The point I am making is that this problem was theoretically well understood a long time ago, it just took humanity lots of years to actually be able to compute it. But for AGI, we don't have such a recipe. There's not one equation hidden in some old math book that will suddenly get us AGI. Reinforcement learning really is the only approach I could think of but even there I just don't see how we would get there with the algorithms we are currently using.

arg_max t1_j7v4mzf wrote on February 9, 2023 at 5:05 PM

Reply to Taking a ML Grad class without any ML experience? [D] by LeadershipComplex958

When I went to university, a course with a similar Syllabus was also my first experience with ML and it went fine. This is basically the content of any "old-school" ML lecture and as long as you know your maths you're fine.

arg_max t1_j6mg664 wrote on January 31, 2023 at 11:02 AM

Reply to comment by jiamengial in [D] Have researchers given up on traditional machine learning methods? by fujidaiti

I think diffusion models are kind of a bad example. The SDE paper from Yang Song has shown that it's all about modeling the score function and this can't be done with simple models. Apart from that, the big text2img models work inside the latent space of a deep vae, make use of conditioning using cross attention which isn't a thing in traditional ML and use large language models to process the text input. All their components are very dl based.

arg_max t1_j6ald1n wrote on January 28, 2023 at 11:19 PM

Reply to What can AI do with video games by Spiritual-Flower155

I think the things that will come first will be automated art. It starts with concept art creation but I believe that we will soon see usable 3D mesh generators, so you put in a prompt like "creepy alien with claws and a tail" and get a 3D mesh out of it. AI chats like many people here suggested are obviously possible with things like GPT, but at the end all of this has to be linked back to game logic. When an NPC talks with you that he saw something at some place in the world, the game also should generate something interesting that you can discover in that location. I don't think there are solutions for this yet but I don't see why it can not happen. The problem is always that you need huge train sets to create those generative models and there just do not exist train sets for things like levels, quest or so, so we will have to see if people figure out smart ways to solve this.

arg_max t1_j69zq9x wrote on January 28, 2023 at 8:45 PM

Reply to Why not use chat gpt to spot obvious fake news? by Irate_Librarian1503

The issue is that gpt is trined on previously collected data and is not kept up to date. It might be able to tell you if an article from 2020 is fake news because it might know what actually happened that day from news articles from that time. But gpt has no idea of what happened today so it won't be able to tell what is real and what is fake. You'd need to use some sort of continuous online learning to do this properly. Obviously, it might be able to detect the real crazy stuff but it might even produce false negatives or real news if they are unexpected. For example, gpt probably has no idea that there currently is a war going on in ukraine, so how should it know whether or not an article about this topic is fake?

arg_max t1_j65z4o9 wrote on January 27, 2023 at 11:12 PM

Reply to [D] ImageNet2012 Advice by MyActualUserName99

You could use ffcv, it improves data loading speed and some bottlenecks without you needing to change train code much.

arg_max t1_j60qav1 wrote on January 26, 2023 at 10:01 PM

Reply to Machine learning and black box numerical solver[D] by Due-Wall-915

Typically, if you're solver is not written in Pytorch/tensorflow itself you can't easily calculate gradients through them as your computational graph doesn't capture the solver. If your soler is also written in the framework and differentiable you might be able to just backpropagate through it though. Otherwise, the Neural ODE paper that was linked here a few times has an adjoint formulation that gives you the gradient wrt to the solver as a solution to another ode, but this is specific to their problem and won't apply to non-differential equations.

arg_max t1_j60jz1r wrote on January 26, 2023 at 9:21 PM

Reply to [D] Why are GANs worse than (Latent) Diffusion Models for text2img generation? by TheCockatoo

Iterative refinement seems to be a big part of it. In a GAN, your network has to produce one image in a single forward pass. In diffusion models, the model actually sees the intermediate steps over and over and can make gradual improvements. Also, if you think about what the noise does, in the first few steps it will remove all small details and only keep low frequent, large structures. Basically, in the first steps, the model kind of has to focus on overall composition. Then, as the noise level goes down, it can gradually start adding all the small details. On a more mathematical level, the noise smoothes the distribution and widens the support in the [0,1]^D cube (D=image dimension, like 256x256x3). Typically people assume that this manifold is low-dimensional which can make sampling from it hard.

Some support for this claim is that people were able to improve other generative models like autoregressive models using similar noisy distributions. Also, you can run GANs to sample from the intermediate distributions which works better than standard GANs.

arg_max t1_j5r8qe6 wrote on January 25, 2023 at 12:18 AM

Reply to comment by gunshoes in [D] are two linear layers better than one? by alex_lite_21

What do you mean by "function represented by a neural network"? If you are hinting in the direction of universal approximation, then yes, you can learn any continuous function arbitrarily close with a single layer, sigmoid activation and infinite width. But similarly, there exist some results that show you can achieve a similar statement with a width-limited and "infinite depth" network (the required depth is not infinite but depends on the function you want to approximate and is afaik unbounded over the space of continuous functions). In practice, we are far away from either infinite width or depth so specific configurations can matter.

arg_max t1_j2hw6f8 wrote on January 1, 2023 at 12:48 PM

Reply to comment by yaosio in [D] Is there any research into using neural networks to discover classical algorithms? by currentscurrents

That is an interesting paper BUT their method relies heavily on the structure of the task. In general, if you want to create a method that outputs algorithms choosing the output format is already non-trivial. For humans, pseudo-code probably is the most natural way to present algorithms but then you will require some kind of language model or at least a recurrent architecture that can output solutions of different lengths (as not every program has a fixed length). And even once you get your output from the model you have to first make sure that is a valid program and more importantly that it solves the task. This means that you have to verify the correctness of every method your model creates before being able to measure runtime.

But matrix multiplication is different. If you read the paper, you will see that every matrix multiplication algorithm can be written as a higher order Tensor and given a Tensor decomposition its trivial to check the correctness of the matrix multiplication algorithm. This is not even a super novel insight, people knew that you can formulate the task of finding better matrix multiplication algorithms as Tensor decomposition optimization problem BUT the problem is super hard to solve.

But not many real world tasks are like this. For most problems you don't have such a nice output space and at that point it becomes much much harder to to learn algorithms. I guess once people figure out a way to make models that can output verifiably correct pseudo code we will start to see tons of papers of new AI generated heuristics for NP hard and other problems that cannot be solved in optimal time yet.

arg_max t1_j136y5q wrote on December 21, 2022 at 9:35 AM

Reply to comment by arg_max in [D] Running large language models on a home PC? by Zondartul

Just to give you an idea about "optimal configuration" though, this is way beyond desktop PC levels:
You will need at least 350GB GPU memory on your entire cluster to serve the OPT-175B model. For example, you can use 4 x AWS p3.16xlarge instances, which provide 4 (instance) x 8 (GPU/instance) x 16 (GB/GPU) = 512 GB memory.

https://alpa.ai/tutorials/opt_serving.html

arg_max t1_j136nbo wrote on December 21, 2022 at 9:31 AM

Reply to [D] Running large language models on a home PC? by Zondartul

CPU implementations are going to be very slow. I'd probably try renting an A100 VM, running some experiments, and measuring VRAM and RAM usage. But I'd be surprised if anything below a 24G 3090TI is going to do the job. The issue is that bigger than 24GB means you have to go A6000 which costs as much as 4 3090s.

arg_max t1_j0z1p30 wrote on December 20, 2022 at 2:00 PM

Reply to [D] What happens when / when will video and audio be possible? by TdotONY

When? Probably now if someone decides to put enough money into it.
All the big Text-To-Image models like Dall-E, Imagen, Stable Diffusion are not very novel in terms of metrology. They all rely heavily on existing ideas and then combine them with more compute, bigger datasets and some tweaks.

Videos are not much more than 3D images with certain temporal constraints. There are already small scale Diffusion models for videos and I'm not saying that it's trivial to get longer videos, recurrent learning often is a bit tricky but I don't see why it would be impossible. Probably takes a few years before consumer hardware can run video generation though, after all we just about manage images at the moment.

arg_max t1_izymwa9 wrote on December 12, 2022 at 8:32 PM

Reply to [D] ML to solve the division of people into teams by Clouwels

You basically need some kind of value function that estimates how good one assignment of teams is. For example, if each player has score between 1 and 100 your value function could simply be to minimize the difference between the strongest and weakest team. Typically you design this by hand. Then you run a constraint optimization method that makes sure that each player gets assigned to exactly one team and probably also takes team size into account. Then you can optimize this. It's not really ML but more of an optimization problem. Though if you really want to you might try to learn a player score, although it might be hard to collect training data for that.

arg_max t1_izpadl8 wrote on December 10, 2022 at 8:44 PM

Reply to [D] When to use 1x1 convolution by Ananth_A_007

I think the most prominent use case in CNN is as a very simple, localised and fast operation that changes the number of channels without touching the spatial dimensions.

For example, deep resnets have a bottleneck design. The input is something like a Nx256xHxW Tensor (N batch size, H, W spatial dimensions) with 256 channels. To save compute/memory, we might not want to actually use the 3x3 conv on all 256 channels. Thus we use a 1x1 conv first to change the number of channels from 256 to 64. On this smaller Tensor, we then implement a 3x3 conv that doesn't change the number of channels. Finally, we use another 1x1 conv to convert back from 64 to 256 channels. So here the first 1x1 conv decreases the number of channels while the second one restores the output back to the original shape with 256 channels.

arg_max t1_ixh2wrq wrote on November 23, 2022 at 11:51 AM

Reply to comment by Acceptable-Cress-374 in [D] Schmidhuber: LeCun's "5 best ideas 2012-22” are mostly from my lab, and older by RobbinDeBank

thats not how scientific citation works though. just because people in the 90s didn't train on JFT-3B and got close to 90% imagenet ACC doesn't mean its purely theoretical. And if ideas were presented earlier they should be cited.

arg_max t1_iwcn3y0 wrote on November 14, 2022 at 5:36 PM

Reply to comment by Tiny-Mud6713 in [P] Need help with this CNN transfer learning problem by Tiny-Mud6713

Imagenet 1k pretraining might not be the best for this as it contains few plant classes. The bigger in-21k has a much larger selection of plants and might be better suited for you. Timm has efficient net v2, beit, vit and convnext models pretrained on this though I don't use keras you might be able to find them for this framework.

arg_max t1_iv2cb5u wrote on November 4, 2022 at 7:20 PM

Reply to Machine learning with kotlin? [D] by Aggravating-Shake289

I think it kind of depends on what you want to do in the end. Machine learning can be complex and learning how to implement state of the art methods and understanding how they work can take years. If you want to do rather simple stuff like linear regression, you can probably just use a java linear algebra library and implement it yourself. But more complex stuff like deep learning is done using specialised libraries like Tensor flow, pytorch and so on. And I don't think you want to reimplement them yourself in java. Now you could either use pytorch in c++, wrap it and call from java or write the ml stuff in python which has the best framework support and then pass the data from java to your python program, calculate in python and send results back to java. There also is a deep java library but I have no experience with it and can't tell you how well it works. But yeah, ml is mostly done in python or c++ these days.