Viewing a single comment thread. View all comments

EquipmentStandard892 t1_je7xyd9 wrote

I read your paper and was reasoning about something interesting, I wonder if it is possible to use this method to fine-tune the model to be able to query a vector database without harming it's context length limitations. It may sound stupid but humans don't just say things, I'm not talking about CoT especially but I was curious if as our brains do, use another instance of the same LLM to generate little hypothesis about the ongoing conversation, and store those on a vector space database, then use those generated thesis during reasoning. We as humans have also an limited cognitive memory, and how do we overcome this ? Great paper btw.

30

saintshing t1_je9fciu wrote

> I was curious if as our brains do, use another instance of the same LLM to generate little hypothesis about the ongoing conversation, and store those on a vector space database, then use those generated thesis during reasoning.

I just learned about LangChain recently. If I understand correctly, they have agents that integrate LLMs and external tools like internet search, sql query, vector store query, it also has a memory module to store ongoing dialog and intermediate results.

They use ReAct or MKRL framework to create subprolems, decide what tools to use and how to react to the results returned by those tools.

example: https://tsmatz.files.wordpress.com/2023/03/20230307_paper_example.jpg?w=446&zoom=2

https://python.langchain.com/en/latest/getting_started/getting_started.html

https://tsmatz.wordpress.com/2023/03/07/react-with-openai-gpt-and-langchain/

https://twitter.com/yoheinakajima/status/1640934493489070080

7

EquipmentStandard892 t1_je9gc9y wrote

I've already seen langchain and it's truly amazing, the issue I've encountered and was trying to overcome is more an architectural problem actually, the token context span limit. I was looking to add a layer upon the transformer architecture to bypass this limitations, I've seen MKRL is able to handle higher context lengths, even claiming unlimited context span, although need to study more. I was not thinking about prompt engineering at all.

7

saintshing t1_je9iw85 wrote

Jeremy Howard tweeted about this new model that is RNN but can be trained in parallel. I havent read the details but it seems people are hyped that it can bypass the context length limit.

>RWKV is an RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). And it's 100% attention-free. You only need the hidden state at position t to compute the state at position t+1. You can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode.

>So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding (using the final hidden state).

https://github.com/BlinkDL/RWKV-LM#the-rwkv-language-model-and-my-tricks-for-lms
https://twitter.com/BlinkDL_AI/status/1638555109373378560

4

EquipmentStandard892 t1_je9kmvi wrote

This exactly what I was talking about, I'm studying the llama.cpp to understand how this whole ML LLM world works, and I've found its pretty "simple" in the meanings of the programming itself. I'm a software engineer outside the ML field, and it was pretty interesting to do this deep dive. I'll take a deeper look into this RWKV proposal and maybe make something upon to test. If I found something interesting I comment here 😊

3

JustOneAvailableName t1_jea2dzf wrote

Software engineer perspective on attention (self quote):

> You have to think about searching. If you search, you have a query (the search term), some way to correlate the query to the actual (size unknown/indifferent) knowledge base and the knowledge base itself. If you have to write this as a mathematical function you have to have something that matches a query, to how similar it is to some key and then return the corresponding value to that key. The transformer equation is a pretty straightforward formula from that perspective. Each layers learns what it searches for, how it can be found and which value it wants to transfer when requested.

RWKV changes this by removing the query. So data is not requested anymore, only pushed. I am frankly surprised to seems to work thus far. Pushing data (self determining how important something is for something else) is not dependant on other states, enabling it to be a RNN.

Edit: step I need to mention: in RWKV importance also fades over time, so it has a recency bias

3

EquipmentStandard892 t1_jeaqt6u wrote

I've already had that in mind, I've found some interesting paper talking about integrating LLMs in a specific way designed to handle autonomous task execution given an direct objective/goal. Combining this with this RNN approach seems to be the go to for increase the cognitive development of the whole system. Using the RNN as our subconscious would do and indexing this into a vector space capable of hybrid search, or something like SPLADE search engines, or even build a neural attention graph network to store the rules that aggregate the raw tokens into the vector space, could drastically improve the performance of small language models, maybe leading to further optimization beyond the token limit span.

Article about integrating memory and task/objectives using multiple LLM instances: https://yoheinakajima.com/task-driven-autonomous-agent-utilizing-gpt-4-pinecone-and-langchain-for-diverse-applications/

1

A_Light_Spark t1_jeaim48 wrote

The real vip is in the comments again. TIL about rwkv!
Now I just need to read up on it and see if it can do sequence classification...

3

saintshing t1_jeaowjz wrote

I almost missed it too. There are too many new results.

The most crazy thing is it is all done by one person when the big techs all work on transformer models.

3

unkz t1_je9wuzm wrote

Practically speaking, it does have a context limit — that RNN issue has not really been solved. It is a lot of fun to play with though.

2

ghostfaceschiller t1_je8habj wrote

Could you extrapolate what you mean here? I'm not sure I'm following

1

hailfire27 t1_je8l7id wrote

I think he's talking about how during conversations, there are different cognitive levels to a conversation. You are basically having a conversation with yourself about what to say and remembering things to talk about, while at the same time considering the context of the situation, such as the environment or activity.

So he's saying for a model like this, would it be possible to tune the model so that it is able to give better answers in a conversation.

13