satireplusplus
satireplusplus t1_jczz8e6 wrote
Reply to comment by currentscurrents in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
VRAM is the limiting factor to run these things though, not tensor cores
satireplusplus t1_jcp6bu4 wrote
Reply to comment by FallUpJV in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
This model uses a "trick" to efficiently train RNNs at scale and I still I have to take a look to understand how it works. Hopefully the paper is out soon!
Otherwise size is what matters! To get there it's a combination of factors - the transformer architecture scales well and was the first architecture that allowed to train these LLMs cranked up to enormous sizes. Enterprise GPU hardware with lots of memory (40G, 80G) and frameworks like pytorch that make parallelizing training across multiple GPUs easy.
And OPs 14B model might be "small" by today's standard, but its still gigantic compared to a few years ago. It's ~27GB of FP16 weights.
Having access to 1TB of preprocessed text data that you can download right away without doing your own crawling is also neat (pile).
satireplusplus t1_jcbq2ik wrote
Reply to comment by OptimizedGarbage in [D] What do people think about OpenAI not releasing its research but benefiting from others’ research? Should google meta enforce its patents against them? by [deleted]
> most notably dropout.
Probably unenforable and math shouldn't be patentable. Might as well try to patent matrix multiplications (I'm sure someone tried). Also dropout isn't even complex math. It's an elementwise multiplication with randomized 1's and 0's, thats all it is.
satireplusplus t1_jcbpgio wrote
Reply to comment by ScientiaEtVeritas in [D] What do people think about OpenAI not releasing its research but benefiting from others’ research? Should google meta enforce its patents against them? by [deleted]
They should rename themselves to ClosedAI. Would be a better name for what the company is doing now.
satireplusplus t1_jaiwxlo wrote
Reply to [P] ChatRWKV v2 (can run RWKV 14B with 3G VRAM), RWKV pip package, and finetuning to ctx16K by bo_peng
Wow, nice, I will try it out!
Btw: If you want to format your code in your post, you need to add 4 spaces in front of any line in your post. Otherwise all newlines are lost.
Lines starting with four spaces are treated like code:
if 1 * 2 < 3:
print("hello, world!")
satireplusplus t1_ja9vh89 wrote
Reply to comment by HegemonNYC in Revealed: Europe's Oldest Humans had Surprisingly Frequent Intermingling with Neanderthals by OptimalCrew7992
It's not the same, but somewhat similar. While they were not entirely sterile, it's likely that 1st generation neanderthal+sapiens had trouble making (male) babies as well:
satireplusplus t1_ja9nry8 wrote
Reply to comment by HegemonNYC in Revealed: Europe's Oldest Humans had Surprisingly Frequent Intermingling with Neanderthals by OptimalCrew7992
Could be very similar to the concept of a mule, these offspring would be called hybrids:
> The mule is a domestic equine hybrid between a donkey and a horse. It is the offspring of a male donkey (a jack) and a female horse (a mare).[1][2] The horse and the donkey are different species, with different numbers of chromosomes; of the two possible first-generation hybrids between them, the mule is easier to obtain and more common than the hinny, which is the offspring of a female donkey (a jenny) and a male horse (a stallion).
satireplusplus t1_j9o6ojn wrote
Reply to comment by [deleted] in NVIDIA Free Cash Flow Massively Down. -53% for Twelve Months Ended by FI_investor
stock: up 8%
satireplusplus t1_j5v24u2 wrote
Reply to comment by Paedor in [D] Self-Supervised Contrastive Approaches that don’t use large batch size. by shingekichan1996
If you don't have 8 GPUs you can always run the same computation 8x in series on one GPU. Then you merge the results the same way the parallel implementation would do it. In most cases that's probably gonna end up being a form of gradient accumulation. Think of it this way: you basically compute your distances on a subset of n, but since there are much fewer pairs of distances, the gradient would be noisy. So you just run it a couple of times and average the result to get an approximation of the real thing. Very likely that this is what the parallel implementation does too.
satireplusplus t1_j3xkvn2 wrote
Reply to comment by SwitchOrganic in [D] Microsoft ChatGPT investment isn't about Bing but about Cortana by fintechSGNYC
What ChatGPT does really well is dialog and its useful for programming as well. You ask it to write a bash script, but it messes up a line. You tell it line number 9 didn't work and you ask it to fix it. It comes up with a fixed solution that runs. Really cool.
satireplusplus t1_j3i1grq wrote
Reply to comment by uoftsuxalot in [P] I built Adrenaline, a debugger that fixes errors and explains them with GPT-3 by jsonathan
LLMs are our new overlords, it's crazy
satireplusplus t1_j2bjsc5 wrote
Reply to comment by gypsies232 in Decided to YOLO my Christmas bonus on 0DTE TSLA puts instead of doubling down on my calls today by gypsies232
goin all in selling call spreads on a meme ticker isn't really thetagang
satireplusplus t1_j1afqub wrote
Reply to comment by londons_explorer in [D] When chatGPT stops being free: Run SOTA LLM in cloud by _underlines_
This ^^
Compared to GPT3, ChatGPT is a huge step up. There is basically an entire new reward network, as large as the LM, that is able to judge the quality of the answers. See https://cdn.openai.com/chatgpt/draft-20221129c/ChatGPT_Diagram.svg
That said, I'd welome a community effort to build an open source version of this.
satireplusplus t1_iznuvy5 wrote
Reply to comment by ReginaldIII in [P] I made a command-line tool that explains your errors using ChatGPT (link in comments) by jsonathan
Generating this takes a couple of seconds and it can probably be done on a single high end GPU (for example, eleuther.ai models run just fine on one GPU). Ever played a video game? You probably "wasted" 1000x as much energy in just one hour.
The real advantage is that this can really speed up your programming and it can program small functions all by itself. It is much better than stackoverflow.
satireplusplus t1_izntr6m wrote
Reply to comment by ReginaldIII in [P] I made a command-line tool that explains your errors using ChatGPT (link in comments) by jsonathan
Amusing question. It's a tool like any other, you're using a computer too to avoid doing basic tasks by hand. Inference actually isn't that energy expensive for GPT type models. And the way I used it, it's probably more useful than generating AI art.
satireplusplus t1_iznkxx5 wrote
Reply to comment by GavinBelson3077 in [P] I made a command-line tool that explains your errors using ChatGPT (link in comments) by jsonathan
I've actually had it explain an obscure warning, faster than googling it and already tells you what to do to get rid of the warning.
I've also found ChatGPT super useful for mudane stuff too, create a regex for a certain pattern giving it just a description and one example, create a flask API end point with a description of what it does etc. Code often works out of the box, sometimes needs minor tweeks. But its much easier to correct a regex with one minor issue than writing it from scratch.
satireplusplus t1_iukba1h wrote
Reply to comment by calguy1955 in A moth on my mini Buddha statue makes it look like Buddha's wearing a coat or cape by ScruffyTree
I don't know. I only see a moth that's really good at blending in.
satireplusplus t1_je9fxei wrote
Reply to comment by [deleted] in BlackRock warns that investors are making a mistake by betting on the Fed to cut rates by uslvdslv
Market: "No more rate hikes!"
Powell: listen, I said "No, more rate hikes!"
Market: "No more rate hikes!"