visarga
visarga t1_j46b2po wrote
Reply to comment by Chemont in [R] Is there any research on allowing Transformers to spent more compute on more difficult to predict tokens? by Chemont
No but if you use a decoder model (autoregressive) you can generate more tokens for the same task, depending on its difficulty. Chain-of-thought makes use of this trick.
visarga t1_j46af21 wrote
Reply to comment by ml-research in [D] Bitter lesson 2.0? by Tea_Pearce
Exfiltrate the large language models - get them to (pre)label your data. Then use this data to fine-tune a small and efficient HF model. You only pay for the training data.
visarga t1_j46a4x8 wrote
Reply to comment by actualsnek in [D] What's your opinion on "neurocompositional computing"? (Microsoft paper from April 2022) by currentscurrents
Would a dataset engineering approach work here? - generate and solve training problems with compositional structure, after sufficient examples it should generalise.
visarga t1_j41dn2n wrote
Reply to comment by ayoubmtd2 in [D] Microsoft ChatGPT investment isn't about Bing but about Cortana by fintechSGNYC
For Amazon it was just a speaker and an ordering system. It has never been truly passionate about the chatbot part.
visarga t1_j41d1c5 wrote
Reply to comment by gamingyesterday in [D] Microsoft ChatGPT investment isn't about Bing but about Cortana by fintechSGNYC
You do the analysis, paste your raw notebook into chatGPT and ask it to write the report for you in business language. It can be very skilled at corporate speak.
visarga t1_j41cfzx wrote
Reply to comment by zeidrich in [D] Microsoft ChatGPT investment isn't about Bing but about Cortana by fintechSGNYC
I assume they have more/better task demonstrations for the multi-task finetuning phase. But that kind of data would be very easy to generate by calling their APIs. It's also possible to use a LLM to generate this kind of data from scratch, and even to do without RLHF by using Constitutional AI.
visarga t1_j41bvdy wrote
Reply to comment by TheLexoPlexx in [D] Microsoft ChatGPT investment isn't about Bing but about Cortana by fintechSGNYC
Most companies already have their mail in Microsoft Office. They already trust MS.
visarga t1_j41avq0 wrote
Reply to comment by starstruckmon in [D] Microsoft ChatGPT investment isn't about Bing but about Cortana by fintechSGNYC
Many smaller models give good results on classification and extractive tasks. But when they need to get creative they don't sound so great. I don't know if Chinchilla is as creative as the latest from OpenAI, but my gut feeling says it isn't.
visarga t1_j41aj3a wrote
Reply to comment by 42gether in [D] Microsoft ChatGPT investment isn't about Bing but about Cortana by fintechSGNYC
> meanwhile Google fired a guy that mass mailed people saying their own ai was sentient.
Never imagined it would turn out so bad for Google to need Lemoine's testimony
visarga t1_j419sn0 wrote
Reply to comment by starstruckmon in [D] Microsoft ChatGPT investment isn't about Bing but about Cortana by fintechSGNYC
MS failed the search, abandoned the browser, missed the mobile, now they want to hit. It's about not fucking up again.
I don't think the GPT-3 model itself is a moat, someone will surpass it and make a free version soon enough. But the long term strategy is to become a preferred hosting provider. In a gold rush, sell shovels.
visarga t1_j4196t8 wrote
Reply to comment by dogs_like_me in [D] Microsoft ChatGPT investment isn't about Bing but about Cortana by fintechSGNYC
Yes, that's probably it - they will rent tons and tons of GPUs and make profit on datacenters.
visarga t1_j418rlj wrote
Reply to comment by BlobbyMcBlobber in [D] Microsoft ChatGPT investment isn't about Bing but about Cortana by fintechSGNYC
They are pre-trained as language models, but later can be used in genetic programming or RL to learn from outcomes. They could iterate on problem solving.
visarga t1_j415y3g wrote
Reply to comment by krali_ in [D] Microsoft ChatGPT investment isn't about Bing but about Cortana by fintechSGNYC
Yes, just try searching "What is the world record for crossing the English Channel entirely on foot?" and enjoy the litany of unrelated answers, mostly about swimming across.
visarga t1_j4157w3 wrote
Reply to comment by Agreeable-Tomatillo2 in [D] Microsoft ChatGPT investment isn't about Bing but about Cortana by fintechSGNYC
Of course the code fails at first run. My code fails at first run, too. But I can iterate. If MS allows feedback from the debugger, the model could fix most of its errors.
And when you want to solve a quantitative question the best way is to ask for a Python script that would print the answer when executed.
visarga t1_j414x9n wrote
Reply to comment by SwitchOrganic in [D] Microsoft ChatGPT investment isn't about Bing but about Cortana by fintechSGNYC
Copilot is not prompt-tuned, chatGPT would understand new tasks much easier.
visarga t1_j414pbh wrote
Reply to comment by Lulonaro in [D] Microsoft ChatGPT investment isn't about Bing but about Cortana by fintechSGNYC
You didn't think 175B parameters would make a difference, did you?
visarga t1_j3j58vc wrote
Reply to comment by turnip_burrito in Organic AI by Dramatic-Economy3399
I'd like to have an AI chatbot or assistant in the web browser to summarise, search, answer and validate stuff. Especially when the search results are full of useless ads and crap, I don't want to see them anymore. But I want vadliation.
This AI assistant will be my "kid" (run on my own machine) and listen to my instructions, not Google's or anyone else's. Any interaction with it remains private unlike web search. It should run efficiently on a normal desktop like Stable Diffusion - that will be the hardest part. Go Stability.ai!
visarga t1_j3ecrx8 wrote
Yes, I agree traditional NLP tasks are mostly solved, a possibly large number of new skills unlocked at once. And they work so well without fine-tuning, just from the prompt.
So take your task to chatGPT (or text-davinci-003), label your dataset or generate more data. Then you finetune a slender transformer from Huggingface. You got an efficient and cheap model.
visarga t1_j39xs2x wrote
Reply to comment by Scarlet_pot2 in We need more small groups and individuals trying to build AGI by Scarlet_pot2
No, this concept is older, it predates Google. Hinton was working on it in 1986 and Schmidhuber in 1990s. By the way, "next token prediction" is not necessarily state of the art. The UL2 paper showed it is better to use a mix of masked spans.
If you follow the new papers, there are a thousand ideas floating around. How to make models learn better, how to make them smaller, how to teach the network to compose separate skills, why training on code improves reasoning skills, how to generate problem solutions as training data... we just don't know which are going to matter down the line. It takes a lot of time to try them out.
Here's a weird new idea: StitchNet: Composing Neural Networks from Pre-Trained Fragments. (link) People try anything and everything.
Or this one: Massive Language Models Can Be Accurately Pruned in One-Shot. (link) - maybe it means we will be able to run GPT-3 size models on a gaming desktop instead of a $150,000 computer
visarga t1_j39x1lv wrote
Reply to comment by Scarlet_pot2 in We need more small groups and individuals trying to build AGI by Scarlet_pot2
The code, yes, but the dataset will be the entire internet and loads of generated data. We have the people, what is necessary is to give them access to compute.
visarga t1_j36ih4y wrote
Reply to comment by datsmamail12 in 2022 was the year AGI arrived (Just don't call it that) by sideways
You have no basis to tell what GPT 5 or 6 will be like. Not even OpenAI knows yet.
My prediction is AI models will make fewer hallucinations and mistakes, and that will be trained by massive problem sets. Most of these problems will be completely generated, solved and tested by AI.
visarga t1_j36i9o1 wrote
Reply to comment by marvinthedog in 2022 was the year AGI arrived (Just don't call it that) by sideways
> I almost worry more about the overall level of conscious happiness throughout all of time and space throughout all dimensions/simulations/realities because that is the ONLY thing that ultimately matters in the end
This doesn't make sense from an evolutionary point of view. There's no big brotherhood of conscious entities, it's competition for resources.
visarga t1_j36ccg4 wrote
Reply to comment by currentscurrents in [D] Special-purpose "neuromorphic" chips for AI - current state of the art? by currentscurrents
> Innatera claims 10000x lower power usage with their chip.
Unfortunately it's just a toy. Not gonna run GPT-3 on edge.
Googled for you: Innatera's third-generation AI chip has 256 neurons and 65,000 synapses and runs inference at under 1 milliwatt, which doesn't sound like a lot compared to the human brain, which has 86 billion neurons and operates at around 20 watts.
visarga t1_j36by5o wrote
Reply to comment by hateboresme in [News] AMD Instinct MI300 APU for AI and HPC announced by samobon
No, it's train models to solve problems to make more data to train models. That's how it will go.
visarga t1_j4cqkkb wrote
Reply to comment by actualsnek in [D] What's your opinion on "neurocompositional computing"? (Microsoft paper from April 2022) by currentscurrents
Sometimes you can exploit asymmetrical difficulty. For example, factorising polynomials is hard but multiplying a bunch of degree 1 polynomials is easy. So you can generate data for free, and it will be very diverse. The data is such that is has a compositional structure, it will necessitate applying rules correctly without overfitting.
Taking derivatives and integrals is similar - easy one way, hard the other way. And solving the task will teach the model something about symbolic manipulation.
More generally you can use an external process, a simulator, an algorithm or a search engine to obtain a transformation of input X to Y, then learn to predict Y from X or X from Y. "Given this partial game of chess, predict who wins" and such. If X has compositional structure, solving the task would teach the model how to generalise, because you can generate as much data as necessary to force it not to overfit.