Viewing a single comment thread. View all comments

currentscurrents OP t1_j674tf3 wrote

They have some downsides though. HOGWILD! requires a single shared memory, and horovod requires every machine to have a copy of the entire model.

A truly local training method would mean your model could be as big as all the machines put together. The order of magnitude in size increase could outweigh the poorer performance of forward-forward learning.

No idea how you'd handle them coming and going, you'd have to dynamically resize the network somehow - there are still other unsolved problems before we could have a GPT@home.

15