_Arsenie_Boca_ t1_j5notgg wrote on January 24, 2023 at 7:55 AM

Reply to comment by Spico197 in [P] paper-hero: Yet Another Paper Search Tool by Spico197

I get the following error: SyntaxError: Unexpected token 'I', "Internal S"... is not valid JSON

_Arsenie_Boca_ t1_j54rrg9 wrote on January 20, 2023 at 11:48 AM

Reply to comment by Spico197 in [P] paper-hero: Yet Another Paper Search Tool by Spico197

Great, looking forward to trying the space :)

_Arsenie_Boca_ t1_j5492g7 wrote on January 20, 2023 at 7:41 AM

Reply to [P] paper-hero: Yet Another Paper Search Tool by Spico197

I think the idea is great! How long does it take to execute a query on the ArXiv set? Have you considered making a huggingface space out of this?

_Arsenie_Boca_ t1_j4rxdt8 wrote on January 17, 2023 at 9:04 PM

Reply to comment by bo_peng in [P] RWKV 14B Language Model & ChatRWKV : pure RNN (attention-free), scalable and parallelizable like Transformers by bo_peng

Is there some more detailed description? Would be interesting to read about these lots of new ideas :)

_Arsenie_Boca_ t1_j3qskip wrote on January 10, 2023 at 1:05 PM

Reply to [D] Found very similar paper to my submitted paper on Arxiv by [deleted]

Is your paper on arxiv aswell?

_Arsenie_Boca_ t1_j3l1x1y wrote on January 9, 2023 at 8:02 AM

Reply to comment by learningmoreandmore in [D] I want to use GPT-J-6B for my story-writing project but I have a few questions about it. by learningmoreandmore

Pretty much, yes. I believe other APIs might use slightly worse models than OpenAI but definitely better than GPT-J.

_Arsenie_Boca_ t1_j3kzllo wrote on January 9, 2023 at 7:32 AM

Reply to [D] I want to use GPT-J-6B for my story-writing project but I have a few questions about it. by learningmoreandmore

Your laptop will not begin to suffice, not for inference and especially not for fine tuning. You would need something like an A100 GPU in a server that handles requests. And in the end, the results will be much worse than GPT-3. If you dont already have an AI infrastructure, go with an API, it will save you more than a bit of money (unless you are certain you will use it at scale long-term). If you are worried about OpenAI, there are some other companies that serve LMs.

_Arsenie_Boca_ t1_j3dpfxv wrote on January 7, 2023 at 8:58 PM

Reply to comment by jrmylee in [D] I recently quit my job to start a ML company. Would really appreciate feedback on what we're working on. by jrmylee

Ah ok, I didnt know that was an issue. Extensions are really important so you should definitely look into that

_Arsenie_Boca_ t1_j3dntcr wrote on January 7, 2023 at 8:47 PM

Reply to comment by jrmylee in [D] I recently quit my job to start a ML company. Would really appreciate feedback on what we're working on. by jrmylee

Awesome. I dont, but there is a VSCode extension, so that would be integrated already. Or do you have any special integration of copilot?

_Arsenie_Boca_ t1_j3bbl7h wrote on January 7, 2023 at 9:03 AM

Reply to [D] I recently quit my job to start a ML company. Would really appreciate feedback on what we're working on. by jrmylee

For me personally, it would be very important that I am not tied to Jupyter Notebooks. Ideally integrate vscode and automatically load settings from .vscode directory in the repo

_Arsenie_Boca_ t1_j394z5v wrote on January 6, 2023 at 10:02 PM

Reply to [D] Best way to package Pytorch models as a standalone application by Atom_101

Without docker, you wont be able to ensure that all user environments have all dependencies. Is it really worth the effort? (which I believe would be big) Why not just deploy in the cloud and set up an api?

_Arsenie_Boca_ t1_j2xrj3r wrote on January 4, 2023 at 6:15 PM

Reply to [Discussion]: Quantization in native pytorch for GPUs (Cuda)? by faschu

I'm not an expert here, but as far as I understand from the docs, quantization is not yet a mature feature.

Im curious, what is the reason you dont want TensorRT?

_Arsenie_Boca_ t1_j2xonjk wrote on January 4, 2023 at 5:57 PM

Reply to [Discussion] If ML is based on data generated by humans, can it truly outperform humans? by groman434

Yes, I believe there are 2 factors playing a role here:

Models could potentially correct some errors of the human labeler using their generalization power, provided that the model is not overfitted.
You should differentiate between outperforming a human and outperforming humans. Labels usually represent the collective knowledge of a number of people not just one.

_Arsenie_Boca_ t1_j03flw7 wrote on December 13, 2022 at 8:16 PM

Reply to [Project] Run and fine-tune BLOOM-176B at home using a peer-to-peer network by hx-zero

Fascinating! I wonder what google has to say about colab being used that way

_Arsenie_Boca_ t1_izyqkf9 wrote on December 12, 2022 at 8:55 PM

Reply to comment by Brudaks in [D] - Has Open AI said what ChatGPT's architecture is? What technique is it using to "remember" previous prompts? by 029187

You are right. I think I used a slightly different prompt and got the something like "I am a LLM and i cannot execute commands"

_Arsenie_Boca_ t1_izwbuat wrote on December 12, 2022 at 9:15 AM

Reply to comment by krali_ in [D] - Has Open AI said what ChatGPT's architecture is? What technique is it using to "remember" previous prompts? by 029187

OpenAI is constantly working on restricting those things. A few days ago you could still instruct the model to behave like a vm and basically execute commands. Now its no fun anymore

_Arsenie_Boca_ t1_izgbbrj wrote on December 8, 2022 at 10:08 PM

Reply to comment by CrazyCrab in [D] Determining the right time to quit training (CNN) by thanderrine

Then how can you tell if you overfitted on the validation set?

_Arsenie_Boca_ t1_izg7d49 wrote on December 8, 2022 at 9:41 PM

Reply to comment by CrazyCrab in [D] Determining the right time to quit training (CNN) by thanderrine

Not sure how this indicates overfitting on the validation set? Wouldnt this be indicated by much worse performance on test compared to validation set? Havent done a lot of image segmentation work, is this specific to the task?

_Arsenie_Boca_ t1_iyuu6le wrote on December 4, 2022 at 9:27 AM

Reply to [D] Determining the right time to quit training (CNN) by thanderrine

I usually prefer checkpointing over early stopping, i.e. you always save a checkpoint when you get a better validation score. Loss is typically a good indicator, but if you have a more specific measure that you are aiming for(like downstream metrics), you should use that.

_Arsenie_Boca_ t1_ixgjhjp wrote on November 23, 2022 at 7:20 AM

Reply to [R] Getting GPT-3 quality with a model 1000x smaller via distillation plus Snorkel by bradenjh

As interesting as weak supervision is, the main takeaway is that using LLM few-shot predictions as labels to train a small model is a great approach to save labeling costs. Using snorkel on top means you have to query multiple LLMs and have snorkel as additional complexity, yielding only a few extra points. Perhaps those extra points also could have been achieved by letting the LLM label a few more samples or giving it a few more shots to get better labels

_Arsenie_Boca_ t1_ivvh1zh wrote on November 10, 2022 at 10:19 PM

Reply to comment by DreamyPen in [Discussion] Can we train with multiple sources of data, some very reliable, others less so? by DreamyPen

Oh, i dont think it is

_Arsenie_Boca_ t1_ivvfzfs wrote on November 10, 2022 at 10:12 PM

Reply to comment by DreamyPen in [Discussion] Can we train with multiple sources of data, some very reliable, others less so? by DreamyPen

In classification you usually have a single correct class, a hard label. However, you might also have soft labels, where multiple classes have non-zero target probabilities. Label smoothing is a technique that artificially introduces those soft labels from hard labels, i.e. if your hard label was [0 0 1 0] it might now be [0.05 0.05 0.85 0.05]. You could use the strength of smoothing to represent uncertainty.

_Arsenie_Boca_ t1_ivufzxw wrote on November 10, 2022 at 6:13 PM

Reply to [Discussion] Can we train with multiple sources of data, some very reliable, others less so? by DreamyPen

As others have pointed out, sample weights could be used. Another option would be to smoothen the labels of the unreliable source

_Arsenie_Boca_ t1_ivqr1k6 wrote on November 9, 2022 at 10:33 PM

Reply to comment by Meddhouib10 in [D] Best learning rate for fine tuning a pretained CNN by Meddhouib10

It depends on the model and the task, so there is no general answer. But you dont have to search randomly. Plot your loss over time. If the lr is too high, the loss will behave almost randomly, and its almost constant if lr is too low

_Arsenie_Boca_ t1_iusvc0e wrote on November 2, 2022 at 7:45 PM

Reply to [R] Is there any work being done on reduction of training weight vector size but not reducing computational overhead (eg pruning)? by Moose_a_Lini

Parameter sharing across layers would achieve just that. In the ALBERT paper the authors show that repeating a layer multiple times actually leads to similar performance than having separate parameter matrices. I havent heard a lot about this technique, but I assume this is because people mostly care about speed, which this does not improve (while it is a good match for your usecase)