m98789

m98789 t1_j4f135j wrote

Got it, this is how I believe it was implemented:

  • Stage 0: All code was split into chunks and had their embeddings taken, and saved into one table for lookups, e.g., code in one field and embedding in the adjacent field.
  • Stage 1: semantic search to find code. Take your query and encode it into an embedding. Then apply dot product over all the code embeddings in the table to find semantically similar code chunks.
  • Stage 2: combine all the top-K similar chunks into one string or list we can call the “context”.
  • Stage 3: stuff the context into a prompt as a preamble, then append the actual question you want to ask.
  • Stage 4: execute the prompt to a LLM like gpt-3 and collect the answer and show it to the user.
2

m98789 t1_j4eutfz wrote

Can you please link me to the tweet you are referring to?

From my understanding of Q&A from LangChain is it can answer “what” questions like “What did XYZ say…” but not “why” because the “what” questions are really just text similarity searching.

But maybe there is more to it, so I’d like to see the tweet.

1

m98789 t1_j4e5du6 wrote

Gptduck is a cool project, but it only extracts embeddings of portions of the code which are typically just used for search, clustering or recommendation.

That is, the system will convert your question into an embedding, then simply do something like a dot product to get rankings of all other code embeddings to find the most semantically similar to your query. The top one would be presented as the answer.

So it would feel more like an advanced search rather than a ChatGPT-like Q&A experience.

More info on OpenAI’s GPT embeddings:

https://beta.openai.com/docs/guides/embeddings/what-are-embeddings

4

m98789 t1_j3xxyvm wrote

You are right that the trend is for costs to go down. It was originally reported that it took $12M in compute costs for a single training run of GPT-3 (source).

H100s will make a significant difference and all the optimization techniques. So I agree prices will drop a lot, but for the foreseeable future, still be out of reach for mere mortals.

2

m98789 t1_j3x653d wrote

The three main AI innovation ingredients are: talent, data, and compute. Microsoft has all three, but of them all, at the world-class level, top talent is the most scarce. Microsoft has amazing talent in MSR but it is spread into multiple areas and has different agendas. OpenAI talent is probably near/on par with MSR talent, but has focus and experience and a dream team dedicated to world-class generative AI. They will be collaborating with MSR researchers too, and leveraging the immense compute and data resources at Microsoft.

3

m98789 t1_j3wtx3g wrote

I think you may be underestimating the compute cost. It’s about $6M of compute (A100 servers) to train a GPT-3 level model from scratch. So with a billion dollars, that’s about 166 models. Considering experimentation, scaling upgrades, etc., that money will go quickly. Additionally, the cost to host the model to perform inference at scale is also very expensive. So it may be the case that the $10B investment isn’t all cash, but maybe partially paid in Azure compute credits. Considering they are already running on Azure.

2

m98789 t1_iug9ma9 wrote

I think the simplest approach is just to set up GPU-enabled VMs with your cloud providers auto-scale option (like scale sets), which can respond to http traffic “triggers” to create more or less of the same VMs in a pool.

When a VM comes online, it has an auto-start action to pull and run your container, joining the load balanced pool of workers.

As a starting point to learn more of this approach (Azure link, but they are all similar):

https://azure.microsoft.com/en-us/products/virtual-machine-scale-sets/#overview

I suggest VM as the simplest approach rather than your cloud provider’s serverless container instance infra because usually they lack or are limited in GPU support, or it is more experimental or complex. A VM approach is about as simple as it gets.

1

m98789 t1_ittly1y wrote

There’s several strategies to combine multimodal data. Here’s some simple approaches:

  1. First train the cnn classsifer. Then use it as a feature extractor by extracting the feature vector from the penultimate layer. Then augment those image features with the features from your tabular data. And then train it all with a classifier like xgboost.

  2. If you want to train both your feature extractor and classifier end to end, you could try different strategies for encoding the tabular data into the input tensor. A simple and fun way to try is to encode them visually into your images themselves, such as adding a few more pixel rows at bottom of image. One row can represent country (uniquely color by a country index), and so on.

6

m98789 t1_it8rax1 wrote

This was a popular approach early on to use a DNN essentially as a feature extractor, and then providing those features to a sophisticated classifier separately, such as a SVM. E.g., separate the process into two distinct steps.

Generally speaking, this approach fell out of favor when it became evident that “end to end” learning performed better. That is, you don’t just learn a feature extractor but also the classifier, together.

As the E2E approach took favor, folks did try to include more sophisticated approaches to the last layers to simulate various kinds of classical classifiers. Ultimately, it was found that a simple approach for the final layers yielded just as performant results.

7