m98789 t1_jedsr97 wrote on March 31, 2023 at 8:12 AM

Reply to [D][N] LAION Launches Petition to Establish an International Publicly Funded Supercomputing Facility for Open Source Large-scale AI Research and its Safety by stringShuffle

Sounds like a buy signal for $NVDA

m98789 t1_je8gdwb wrote on March 30, 2023 at 4:12 AM

Reply to comment by MadScientist-1214 in [D] Improvements/alternatives to U-net for medical images segmentation? by viertys

CVPR is not a journal

m98789 t1_j7ffidp wrote on February 6, 2023 at 12:11 PM

Reply to comment by gopher9 in [D] List of Large Language Models to play with. by sinavski

 (final release around Feb-15-2023):

m98789 t1_j4o7hlk wrote on January 17, 2023 at 2:27 AM

Reply to comment by junetwentyfirst2020 in [P] Looking for a CV/ML freelancer by bluebamboo3

This is actually a reasonable estimate.

m98789 t1_j4f27pu wrote on January 15, 2023 at 6:28 AM

Reply to comment by WigglyHypersurface in [D] Is there any reason hugging face GPT2 would behave (fundamentally) differently from GPT-Neo? by GasZealousideal8691

Tokenizer is also my guess

m98789 t1_j4f135j wrote on January 15, 2023 at 6:16 AM

Reply to comment by GoodluckH in [D]: Are there models like CODEX but work in a reversed way? by GoodluckH

Got it, this is how I believe it was implemented:

Stage 0: All code was split into chunks and had their embeddings taken, and saved into one table for lookups, e.g., code in one field and embedding in the adjacent field.
Stage 1: semantic search to find code. Take your query and encode it into an embedding. Then apply dot product over all the code embeddings in the table to find semantically similar code chunks.
Stage 2: combine all the top-K similar chunks into one string or list we can call the “context”.
Stage 3: stuff the context into a prompt as a preamble, then append the actual question you want to ask.
Stage 4: execute the prompt to a LLM like gpt-3 and collect the answer and show it to the user.

m98789 t1_j4eutfz wrote on January 15, 2023 at 5:13 AM

Reply to comment by GoodluckH in [D]: Are there models like CODEX but work in a reversed way? by GoodluckH

Can you please link me to the tweet you are referring to?

From my understanding of Q&A from LangChain is it can answer “what” questions like “What did XYZ say…” but not “why” because the “what” questions are really just text similarity searching.

But maybe there is more to it, so I’d like to see the tweet.

m98789 t1_j4e5du6 wrote on January 15, 2023 at 1:55 AM

Reply to comment by seventyducks in [D]: Are there models like CODEX but work in a reversed way? by GoodluckH

Gptduck is a cool project, but it only extracts embeddings of portions of the code which are typically just used for search, clustering or recommendation.

That is, the system will convert your question into an embedding, then simply do something like a dot product to get rankings of all other code embeddings to find the most semantically similar to your query. The top one would be presented as the answer.

So it would feel more like an advanced search rather than a ChatGPT-like Q&A experience.

More info on OpenAI’s GPT embeddings:

https://beta.openai.com/docs/guides/embeddings/what-are-embeddings

m98789 t1_j3xxyvm wrote on January 11, 2023 at 8:51 PM

Reply to comment by All-DayErrDay in [D] Microsoft ChatGPT investment isn't about Bing but about Cortana by fintechSGNYC

You are right that the trend is for costs to go down. It was originally reported that it took $12M in compute costs for a single training run of GPT-3 (source).

H100s will make a significant difference and all the optimization techniques. So I agree prices will drop a lot, but for the foreseeable future, still be out of reach for mere mortals.

m98789 t1_j3x653d wrote on January 11, 2023 at 6:01 PM

Reply to comment by starstruckmon in [D] Microsoft ChatGPT investment isn't about Bing but about Cortana by fintechSGNYC

The three main AI innovation ingredients are: talent, data, and compute. Microsoft has all three, but of them all, at the world-class level, top talent is the most scarce. Microsoft has amazing talent in MSR but it is spread into multiple areas and has different agendas. OpenAI talent is probably near/on par with MSR talent, but has focus and experience and a dream team dedicated to world-class generative AI. They will be collaborating with MSR researchers too, and leveraging the immense compute and data resources at Microsoft.

m98789 t1_j3wtx3g wrote on January 11, 2023 at 4:48 PM

Reply to comment by starstruckmon in [D] Microsoft ChatGPT investment isn't about Bing but about Cortana by fintechSGNYC

I think you may be underestimating the compute cost. It’s about $6M of compute (A100 servers) to train a GPT-3 level model from scratch. So with a billion dollars, that’s about 166 models. Considering experimentation, scaling upgrades, etc., that money will go quickly. Additionally, the cost to host the model to perform inference at scale is also very expensive. So it may be the case that the $10B investment isn’t all cash, but maybe partially paid in Azure compute credits. Considering they are already running on Azure.

m98789 t1_j105g7x wrote on December 20, 2022 at 6:30 PM

Reply to comment by w_is_h in [R] Foresight: Deep Generative Modelling of Patient Timelines using Electronic Health Records by w_is_h

Where can I find more about medcat?

m98789 t1_j0qqo7p wrote on December 18, 2022 at 6:39 PM

Reply to [P] Generate transcripts with Whisper AI and automatically translate with LibreTranslate by meddit_app

Do you know if Yiddish is supported?

m98789 t1_iz96oey wrote on December 7, 2022 at 12:06 PM

Reply to [D] If you had to pick 10-20 significant papers that summarize the research trajectory of AI from the past 100 years what would they be by versaceblues

ResNet

m98789 t1_iwr0bdt wrote on November 17, 2022 at 6:24 PM

Reply to [R] RWKV-4 7B release: an attention-free RNN language model matching GPT-J performance (14B training in progress) by bo_peng

How can it be used in a multi-label text classification task?

m98789 t1_iujdl6v wrote on October 31, 2022 at 7:44 PM

Reply to [D] When the GPU is NOT the bottleneck...? by alexnasla

If using an A100, ensure you have enabled AMP, it’s a game changer in training speed up:

https://developer.nvidia.com/automatic-mixed-precision

Also suggest using a very fast disk, get the fastest you can. Disk IO can surprisingly be a bottleneck.

m98789 t1_iug9ma9 wrote on October 31, 2022 at 2:19 AM

Reply to [D] Looking for suggestions on setting up autoscaling on GPU servers for AI inference (without kubernetes)? by fgp121

I think the simplest approach is just to set up GPU-enabled VMs with your cloud providers auto-scale option (like scale sets), which can respond to http traffic “triggers” to create more or less of the same VMs in a pool.

When a VM comes online, it has an auto-start action to pull and run your container, joining the load balanced pool of workers.

As a starting point to learn more of this approach (Azure link, but they are all similar):

https://azure.microsoft.com/en-us/products/virtual-machine-scale-sets/#overview

I suggest VM as the simplest approach rather than your cloud provider’s serverless container instance infra because usually they lack or are limited in GPU support, or it is more experimental or complex. A VM approach is about as simple as it gets.

m98789 t1_iu23g1e wrote on October 27, 2022 at 11:56 PM

Reply to comment by rich_atl in [D] how to add tabular data to cnn image classifier by rich_atl

Your welcome. Would be great if you could update this thread after your experiment to see what worked best, to help future readers in the same boat.

m98789 t1_ittly1y wrote on October 26, 2022 at 5:32 AM

Reply to [D] how to add tabular data to cnn image classifier by rich_atl

There’s several strategies to combine multimodal data. Here’s some simple approaches:

First train the cnn classsifer. Then use it as a feature extractor by extracting the feature vector from the penultimate layer. Then augment those image features with the features from your tabular data. And then train it all with a classifier like xgboost.
If you want to train both your feature extractor and classifier end to end, you could try different strategies for encoding the tabular data into the input tensor. A simple and fun way to try is to encode them visually into your images themselves, such as adding a few more pixel rows at bottom of image. One row can represent country (uniquely color by a country index), and so on.

m98789 t1_it8rax1 wrote on October 21, 2022 at 7:27 PM

Reply to [D][R] Staking XGBOOST and CNN/Transformer by MichelMED10

This was a popular approach early on to use a DNN essentially as a feature extractor, and then providing those features to a sophisticated classifier separately, such as a SVM. E.g., separate the process into two distinct steps.

Generally speaking, this approach fell out of favor when it became evident that “end to end” learning performed better. That is, you don’t just learn a feature extractor but also the classifier, together.

As the E2E approach took favor, folks did try to include more sophisticated approaches to the last layers to simulate various kinds of classical classifiers. Ultimately, it was found that a simple approach for the final layers yielded just as performant results.