ustainbolt

ustainbolt t1_je7plqi wrote

For a 65b model you are probably going to have to parallelise the model parameters. See this link. As for training, it would be best to use a vm (any provider will work, lambda and vast.ai are cheap). I would a recommend 4x (or 8x) A100 machine. I'm sure you can find more information about all of this.

31

ustainbolt t1_ixrqhoy wrote

Could anyone help me by suggesting an architecture that might work for my problem? I've tried many (nn and otherwise) but haven't made much progress.

My data consist of groups of 10 people, each person has attached to them a number x_1 (identifying which person they are approx. ~1000 possibilities) and an integer x_2 (which can take one of ~150 different values). The group of 10 people then attempt to complete a task and if they are successful the data is labelled with 1, else it is labelled with 0.

You could think of the task as playing a football match (5v5) and x_2 as the position that they choose to play on the field.

Does this remind anyone of a particular class of problem?

1