nutpeabutter
nutpeabutter t1_j7rxvb8 wrote
Reply to comment by DMLearn in Is there any AI-distinguishing models? by Such_Share8197
Your argument falls apart when you realize that there are training artifacts. Ever wonder why FID scales inversely with model size?
nutpeabutter t1_j7os2xj wrote
Reply to comment by levand in Is there any AI-distinguishing models? by Such_Share8197
Just because it can imitate doesn't mean it can do so perfectly.
nutpeabutter t1_j6n2eaf wrote
Reply to Best practice for capping a softmax by neuralbeans
Taking a leaf out of RL, you can add an additional entropy loss.
Alternatively, clip the logits but apply STE (copy gradients) on backprop
nutpeabutter t1_j2snx76 wrote
Reply to comment by Taenk in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon
From my personal interactions it just gave off this vibe that it was trained on websites, rather than the GPT-3 (both base and chat) models which felt much more natural. Something to do with having to learn too many languages?
nutpeabutter t1_iyh6f22 wrote
Reply to RTX 2060 or RTX 3050 by democracyab
- 3 times the tensor cores and 1.5x the bandwidth. 6 and 8gb really ain't much.
nutpeabutter t1_iy3z9lc wrote
Reply to comment by Own-Archer7158 in Can someone pls help me with this question and explain to me why you chose your answers? Many thanks! by CustardSignificant24
There is indeed a non-zero gradient. However, symmetric initialization introduces a plethora of problems:
- The only way to break the symmetry is through the random biases. A fully symmetric network effectively means that individual layers act as a though they are a single weight(1 input 1 output layer), this means that it cannot learn complex functions until the symmetry is broken. Learning will thus be highly delayed as it has to first break the symmetry before being able to learn a useful function. This can explain the plateau at the start.
- Similar weights at the start, even if symmetry is broken, will lead to poor performance. It is easy to get trapped in local minima if your outputs are constrained due to your weights not having sufficient variance, there is a reason why weights are typically randomly initalized
- Random weights also allow for more "learning pathways" to be established, by pure chance alone, a certain combination of weights will be slightly more correct than others. The network can then abuse this to speed up it's learning, by changing it's other weights to support these pathways. Symmetric weights do not possess such an advantage.
nutpeabutter t1_iy3kb5n wrote
Reply to comment by Own-Archer7158 in Can someone pls help me with this question and explain to me why you chose your answers? Many thanks! by CustardSignificant24
>Bad initialization is rarely a problem
What if all weights are the same?
nutpeabutter t1_iv50p8e wrote
Reply to Are AMD GPUs an option? by xyrlor
If you are a masochist then yes
nutpeabutter t1_iuvl7on wrote
Reply to comment by Niu_Davinci in Can someone help me to create a STYLEGAN (1/2 or 3) with a dataset of my psychedelic handrawn/ A.I. Colored artworks? (280 in dataset, I have more iterations, maybe 600 total) by Niu_Davinci
I really can't give a solid answer, but 600 is definetly too few (a few thousand is already on the low end).
nutpeabutter t1_iuvkakt wrote
Reply to Can someone help me to create a STYLEGAN (1/2 or 3) with a dataset of my psychedelic handrawn/ A.I. Colored artworks? (280 in dataset, I have more iterations, maybe 600 total) by Niu_Davinci
Try dreambooth. 600 is far too little for training a stylegan from scratch (heavy augmentation could help but I doubt it)
nutpeabutter t1_iu3v2bd wrote
There is currently no easy way of pooling vram. If the model can't fit onto vram I suggest you check out https://huggingface.co/transformers/v4.9.2/parallelism.html#tensor-parallelism.
nutpeabutter t1_itjon0i wrote
nutpeabutter t1_iqpxj3a wrote
Reply to comment by incrediblediy in New Laptop for Deep/Machine Learning by MyActualUserName99
Kinda frustrating how half of the help posts here are requrests for laptops. Like have they not bothered to do even the tiniest bit of research?? At the same price you could get an equivalently speced desktop/server AND an additional laptop, with the added bonus of being able to run long training sessions without needing to disrupt it.
nutpeabutter t1_j7wxagp wrote
Reply to [D] Using LLMs as decision engines by These-Assignment-936
https://arxiv.org/abs/2302.01560
Using language for long term planning in Minecraft.