elbiot

elbiot t1_jdlgxnz wrote

In my understanding, if you have text, it's not a challenge to train on next word prediction. Just keep the learning rate low. The reason there's a focus on the instruction based fine tuning is because that data is harder to come by.

My only experience is I've done this with a sentence embedding model (using sbert) and I just trained on my new text and the original training data 50/50 and it both got better at embedding my text and didn't forget how to do what it was originally trained on

5

elbiot t1_j1tjpg7 wrote

The source I found this post through also referenced Retrieval Augmented Generation (https://ai.facebook.com/blog/retrieval-augmented-generation-streamlining-the-creation-of-intelligent-natural-language-processing-models/) and it seems like they've integrated document selection into the back propagation of the model training. You couldn't do this with chat GPT but maybe smaller pretrained LLM that could be fine tuned on consumer hardware would be enough for just that part

1

elbiot t1_j0m3a67 wrote

Huh?

idx = unravel_indices(indices, shape) Values=arr[*idx]

No loop required. If you're referring to the same loop you were using to get the argmax, you can just adjust your indices first so they apply to the unstrided array

1

elbiot t1_j0k3evv wrote

I'm away from a computer for a while but you could cast the tuple to an array I assume. And since creating an array is expensive and you'll keep needing an array of the same shape every step, you could just hold onto it and assign values into it instead of re-creating it every time

1

elbiot t1_j0ivmop wrote

Yeah, I was just thinking in 1D. Im not at a computer so I can't try anything but roughly what I'm thinking is you have a (H, W, D) array and use stride tricks to get a (H, W, D, wx, wy). If you could get that to be (H, W, D, wx*wy) then argmax could give you a (H, W, D) array of indices. I dunno if you can reshape a strided array or use strides to get the shape in question

1

elbiot t1_j0fitwk wrote

Can't you just reshape the array and use argmax (so no as_strided). Reshaping is often free. You'd have to do some arithmetic to get the indices for the original shape, but it would just be one operation

I.e. you can take a shape (99,) array and reshape it to (3,33) and then get 33 maxes.

1