nutpeabutter t1_j7wxagp wrote on February 9, 2023 at 11:54 PM

Reply to [D] Using LLMs as decision engines by These-Assignment-936

Using language for long term planning in Minecraft.

nutpeabutter t1_j7rxvb8 wrote on February 8, 2023 at 11:48 PM

Reply to comment by DMLearn in Is there any AI-distinguishing models? by Such_Share8197

Your argument falls apart when you realize that there are training artifacts. Ever wonder why FID scales inversely with model size?

nutpeabutter t1_j7os2xj wrote on February 8, 2023 at 9:36 AM

Reply to comment by levand in Is there any AI-distinguishing models? by Such_Share8197

Just because it can imitate doesn't mean it can do so perfectly.

nutpeabutter t1_j6n2eaf wrote on January 31, 2023 at 2:32 PM

Reply to Best practice for capping a softmax by neuralbeans

Taking a leaf out of RL, you can add an additional entropy loss.

Alternatively, clip the logits but apply STE (copy gradients) on backprop

nutpeabutter t1_j2snx76 wrote on January 3, 2023 at 6:04 PM

Reply to comment by Taenk in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon

From my personal interactions it just gave off this vibe that it was trained on websites, rather than the GPT-3 (both base and chat) models which felt much more natural. Something to do with having to learn too many languages?

nutpeabutter t1_iyh6f22 wrote on December 1, 2022 at 10:53 AM

Reply to RTX 2060 or RTX 3050 by democracyab

3 times the tensor cores and 1.5x the bandwidth. 6 and 8gb really ain't much.

nutpeabutter t1_iy3z9lc wrote on November 28, 2022 at 4:06 PM

Reply to comment by Own-Archer7158 in Can someone pls help me with this question and explain to me why you chose your answers? Many thanks! by CustardSignificant24

There is indeed a non-zero gradient. However, symmetric initialization introduces a plethora of problems:

The only way to break the symmetry is through the random biases. A fully symmetric network effectively means that individual layers act as a though they are a single weight(1 input 1 output layer), this means that it cannot learn complex functions until the symmetry is broken. Learning will thus be highly delayed as it has to first break the symmetry before being able to learn a useful function. This can explain the plateau at the start.
Similar weights at the start, even if symmetry is broken, will lead to poor performance. It is easy to get trapped in local minima if your outputs are constrained due to your weights not having sufficient variance, there is a reason why weights are typically randomly initalized
Random weights also allow for more "learning pathways" to be established, by pure chance alone, a certain combination of weights will be slightly more correct than others. The network can then abuse this to speed up it's learning, by changing it's other weights to support these pathways. Symmetric weights do not possess such an advantage.

nutpeabutter t1_iy3kb5n wrote on November 28, 2022 at 2:14 PM

Reply to comment by Own-Archer7158 in Can someone pls help me with this question and explain to me why you chose your answers? Many thanks! by CustardSignificant24

>Bad initialization is rarely a problem

What if all weights are the same?

nutpeabutter t1_iv50p8e wrote on November 5, 2022 at 10:40 AM

Reply to Are AMD GPUs an option? by xyrlor

If you are a masochist then yes

nutpeabutter t1_iuvl7on wrote on November 3, 2022 at 10:28 AM

Reply to comment by Niu_Davinci in Can someone help me to create a STYLEGAN (1/2 or 3) with a dataset of my psychedelic handrawn/ A.I. Colored artworks? (280 in dataset, I have more iterations, maybe 600 total) by Niu_Davinci

I really can't give a solid answer, but 600 is definetly too few (a few thousand is already on the low end).

nutpeabutter t1_iuvkakt wrote on November 3, 2022 at 10:15 AM

Reply to Can someone help me to create a STYLEGAN (1/2 or 3) with a dataset of my psychedelic handrawn/ A.I. Colored artworks? (280 in dataset, I have more iterations, maybe 600 total) by Niu_Davinci

Try dreambooth. 600 is far too little for training a stylegan from scratch (heavy augmentation could help but I doubt it)

nutpeabutter t1_iu3v2bd wrote on October 28, 2022 at 11:00 AM

Reply to Question about using more than one gpu for deeplearning tasks. by sabeansauce

There is currently no easy way of pooling vram. If the model can't fit onto vram I suggest you check out https://huggingface.co/transformers/v4.9.2/parallelism.html#tensor-parallelism.

nutpeabutter t1_itjon0i wrote on October 24, 2022 at 3:24 AM

Reply to Use DALL-E or other models, GANs to generate images of a real person? by nishu3210

Just going to put this and this here

^(this is for non nefarious purposes right)

nutpeabutter t1_iqpxj3a wrote on October 2, 2022 at 6:32 AM

Reply to comment by incrediblediy in New Laptop for Deep/Machine Learning by MyActualUserName99

Kinda frustrating how half of the help posts here are requrests for laptops. Like have they not bothered to do even the tiniest bit of research?? At the same price you could get an equivalently speced desktop/server AND an additional laptop, with the added bonus of being able to run long training sessions without needing to disrupt it.