Viewing a single comment thread. View all comments

Scarlet_pot2 t1_je937zq wrote

to go from scratch to having a model is 6 steps. first step is data gathering - there are huge open-source datasets available such as "The pile" by eluther.ai. Second step is data cleaning, this is basically preparing the data to be trained on. Third step is designing the architecture- to make these advanced Ai models we know of, they are all based on a transformer architecture, which is a type of neural network. The paper "Attention is all you need" explains how to design a basic transformer. There have been improvements so more papers would need to be read if you want to get a very good model.

Fourth step is to train the model. That architecture that was developed in step three is trained on the data from step 1 and 2. You need GPUs to do this. This is automatic once you start it, just wait until its done.

Now you have a baseline AI. fifth step is fine-tuning the model. You can use a more advanced model to finetune your model on to improve it, this was shown by the Alpaca paper a few weeks ago. After that, the sixth step is to do RLHF. This can be done by people without technical knowledge. The model is asked a question (by the user or auto-generated) and it makes multiple answers and the user ranks them from worst to best. This teaches the model what answers are good and what aren't. This is basically aligning the model.

After those 6 steps you have a finished AI model

1