Submitted by darkbluetwilight t3_123j77g in MachineLearning
Step 1 in my efforts to have a robot do my job for me :P has led to a successful implementation of Llama Index. I used "GPTSimpleVectorIndex" to read in a folder of 140 procedures (1 million tokens) into a single json which I can then query with "index.query". It works flawlessly giving me excellent responses. However, it costs quite a bit - anywhere from 0 to 30c per query. I think this comes down to it using Davinci 3 rather than GPT3.5 Turbo which does not appear to be implemented with Llama yet. It appears to always use the full whack of 4096 tokens too.
Just wondering if there is a way of keeping the price down without imposing a smaller max token limit? I was thinking of maybe using some form of lemmatization or POS to condense down the context as much as possible but not sure if this would harm the accuracy. Any suggestions appreciated!
Update: thanks to @supreethrao, GPT3.5-Turbo is in fact implemented in Llama-index. Price per request instantly cut to one tenth of the cost. Just use these lines in python when building your index:
from llama_index import GPTSimpleVectorIndex, SimpleDirectoryReader, LLMPredictor
from langchain.llms import OpenAIChat
data = SimpleDirectoryReader('database').load_data() #'database' is the folder that contains your documents
llm_predictor = LLMPredictor(llm=OpenAIChat(temperature=0.7, model_name="gpt-3.5-turbo")) #set the model parameters
index = GPTSimpleVectorIndex(data, llm_predictor=llm_predictor) # create the index
response = index.query("How to create an engineering drawing?") #query the index
print(response)
Update2: After using the robot for a while, I've found that the responses from GPT3.5-Turbo have been very basic and unhelpful. It often says "yes the context contains the information you are asking about". Other times it just says "the context does not have the information to answer that question", which is untrue as I have the program print the context to the console and it is always contains very apt information to answer the query. Not sure if it's just not getting enough tokens to answer my query or if there is something more serious in GPT3.5's architecture that is just not very well suited to this task. Will have to do a bit more trial and error to figure it out.
supreethrao t1_jdv1whe wrote
Hi, there’s already support for ‘gpt-3.5-turbo’ in llama index , the examples can found in the git repo . You can also switch for SimpleVectorIndex to a TreeIndex , this could lower your cost