Dankmemexplorer

Dankmemexplorer t1_jchlw3t wrote on March 16, 2023 at 9:56 PM

Reply to comment by currentscurrents in [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234

man its funny that 250M is a toy now

how far we've come...

Dankmemexplorer t1_jb9xjl9 wrote on March 7, 2023 at 3:10 PM

Reply to [D] Neat project that would "fit" onto a 4090? by lifesthateasy

-stable diffusion would be fun to play with

-you can try simple computer vision tasks / finetune a model to detect your cat or something

Dankmemexplorer t1_j3mpmt0 wrote on January 9, 2023 at 5:17 PM

Reply to comment by rockpooperscissors in Building an NBA game prediction model - failing to improve between epochs by vagartha

this is likely the problem

Dankmemexplorer t1_j27hf6g wrote on December 30, 2022 at 4:53 AM

Reply to comment by artoftheproblem in [R] LAMBADA: Backward Chaining for Automated Reasoning in Natural Language - Google Research 2022 - Significantly outperforms Chain of Thought and Select Inference in terms of prediction accuracy and proof accuracy. by Singularian2501

that was like 4 months ago right???

Dankmemexplorer t1_j13k11f wrote on December 21, 2022 at 12:23 PM

Reply to comment by farmingvillein in [R] Nonparametric Masked Language Modeling - MetaAi 2022 - NPM - 500x fewer parameters than GPT-3 while outperforming it on zero-shot tasks by Singularian2501

aint that just the way

Dankmemexplorer t1_j123o1b wrote on December 21, 2022 at 2:37 AM

Reply to [R] Nonparametric Masked Language Modeling - MetaAi 2022 - NPM - 500x fewer parameters than GPT-3 while outperforming it on zero-shot tasks by Singularian2501

time to train gpt-4 on my mom's laptop

Dankmemexplorer t1_iymsbgo wrote on December 2, 2022 at 3:44 PM

Reply to comment by Deep-Station-1746 in [D] What advances need to happen for something like gpt3 to be able to run on consumer devices and laptops locally? Is it even a possibility? by aero_oliver2

my current gpu is 4 years old 😖

state of the art has gotten a lot better since then but not that much better

Dankmemexplorer t1_iymjsty wrote on December 2, 2022 at 2:45 PM

Reply to comment by aero_oliver2 in [D] What advances need to happen for something like gpt3 to be able to run on consumer devices and laptops locally? Is it even a possibility? by aero_oliver2

running the full gpt-3 on a laptop would be like running crysis 3 on a commodore 64. you cant pare it down enough to run without ruining it

Dankmemexplorer t1_iymieav wrote on December 2, 2022 at 2:34 PM

Reply to [D] What advances need to happen for something like gpt3 to be able to run on consumer devices and laptops locally? Is it even a possibility? by aero_oliver2

for a sense of scale, GPT-NeoX, a 20 billion parameter model, requires ~45GB of vram to run. gpt-3 davinci is 175 billion parameters.

unless these models can be pared down somehow (unlikely, the whole point of training these huge models is because their performance scales with size), we will have to wait a decade or two for consumer electronics to catch up

Dankmemexplorer t1_iwqebgp wrote on November 17, 2022 at 3:59 PM

Reply to comment by dat_cosmo_cat in [R] The Near Future of AI is Action-Driven by hardmaru

true, they do keep getting gooder and people are like "we solved it this year"

i think it got good enough for most things in 2020 with gpt-3 like how dall-e/SD is good enough for most things now