Viewing a single comment thread. View all comments

ephemeralentity t1_jdm6wkc wrote

Playing around with this. Running BaseModel.create("llama_lora") seems to return "Killed". I'm running it on WSL2 from Windows 11 so I'm not sure if that could be the issue. Running on my RTX 3070 with only 8GB VRAM so maybe that's the issue ...

EDIT - Side note, I first tried directly on Windows 11 but it seems deepspeed dependency is not fully supported: https://github.com/microsoft/DeepSpeed/issues/1769

2

machineko t1_jdnmg8l wrote

Right, 8GB won't be enough for LLaMA 7b. You should try GPT-2 model. That should work on 8GB VRAM.

2

ephemeralentity t1_jdp2pu8 wrote

Thanks looks like gpt2 worked! Sorry, stupid question but how do I save/re-use the results of my model finetune? When I re-finetune for 0:2 epochs it gives a reasonable response but if I try to skip model.finetune, it responds with new lines only (\n\n\n\n\n\n\n\n ...).

1

machineko t1_jdqzmyq wrote

model.save("path/to/your/weights") saves it to the directory
After that, you can load it with
model = BaseModel.create("gpt2", "path/to/your/weights")

Can you share the input text you have used? It is possible that GPT-2 is too small and needs custom generation parameters.

2

ephemeralentity t1_jdt1krp wrote

Thanks a lot! To be honest, I need to spend a bit more time familiarising myself with pytorch / this package. I'll see if I can figure it out from here.

1

machineko t1_jdtv8jv wrote

If you need help, come find us on our discord channel.

2