GoodluckH OP t1_j4espcc wrote on January 15, 2023 at 4:53 AM

Reply to comment by m98789 in [D]: Are there models like CODEX but work in a reversed way? by GoodluckH

Wow, that's really cool. But I can actually ask things like "what does XYZ do?", and it can give me some explanations like ChatGPT.

Clearly, they are using more than OpenAI's embedding to make this possible. I read if from Twitter that GPTDuck also uses LangChain which I'm not so familiar with.

Any idea how they're able to go from advanced search to conversational?

thank you for your insight!

m98789 t1_j4eutfz wrote on January 15, 2023 at 5:13 AM

Can you please link me to the tweet you are referring to?

From my understanding of Q&A from LangChain is it can answer “what” questions like “What did XYZ say…” but not “why” because the “what” questions are really just text similarity searching.

But maybe there is more to it, so I’d like to see the tweet.

GoodluckH OP t1_j4evyhl wrote on January 15, 2023 at 5:23 AM

https://twitter.com/hwchase17/status/1611071272301260801?s=20&t=WFa0awEG43KTXfwV-Mb49Q

m98789 t1_j4f135j wrote on January 15, 2023 at 6:16 AM

Got it, this is how I believe it was implemented:

Stage 0: All code was split into chunks and had their embeddings taken, and saved into one table for lookups, e.g., code in one field and embedding in the adjacent field.
Stage 1: semantic search to find code. Take your query and encode it into an embedding. Then apply dot product over all the code embeddings in the table to find semantically similar code chunks.
Stage 2: combine all the top-K similar chunks into one string or list we can call the “context”.
Stage 3: stuff the context into a prompt as a preamble, then append the actual question you want to ask.
Stage 4: execute the prompt to a LLM like gpt-3 and collect the answer and show it to the user.

GoodluckH OP t1_j4hiypg wrote on January 15, 2023 at 7:45 PM

Ahh this makes a lot of sense. Regarding stage 0, how do you split codes? Like just by lines or have some methods to extract functions and classes?

I wrote some script that allows you to extract Python functions using regex, but this is def not scalable to other languages…