Comments

You must log in or register to comment.

SeucheAchat9115 t1_izdbkkz wrote

Try to use smaller subsets of your data. It is very likely that the performance scales with the amount of data afterwards.

11

fasttosmile t1_izgxj4n wrote

Careful. There are literally dozens of LMing papers that get an improvement on PTB which do not scale to larger datasets.

3

farmingvillein t1_izi021q wrote

True, but no one has really come up with a better methodology.

The best you can do is train on smaller data + make sure that you can tell yourself a story about how the new technique will still help when data is scaled up (and then hope that you are right).

(The latter is certainly argument for staying at least semi-current with the literature, as it will help you get an intuition for what might scale up and what probably won't.)

2

SeucheAchat9115 t1_izdbmzj wrote

Or you could compare your training after e.g. two epochs and only run the best for 500 Epochs

1

VirtualHat t1_izczlg3 wrote

I have a system where I can go from idea to initial results in 2-hours and full results by the next day. I've found a short loop like this critical for testing the hundreds of ideas that come to mind.

4

1bir t1_izelhl5 wrote

>I have a system where I can go from idea to initial results in 2-hours

I think the OP is asking for a description of that...

10

VirtualHat t1_izfu724 wrote

I use three scripts.

train.py (which trains my model)

worker.py (which picks up the next job and runs it using train.py)

runner.py (which is basically a list of jobs and code to display what's happening).

I then have multiple machines running multiple instances of worker.py. When a new job is created, the workers see it and start processing it. Work is broken into 5-epoch blocks, and at the end of each block, a new job from the priority queue is selected.

This way I can simply add a new job and within 30 minutes or so one of the workers will finish its current block and pick it up. Also because of the chunking, I get early results on all the jobs rather than having to wait for them to finish. This is important as I often know early on if it's worth finishing or not.

I evaluate the results in a Jupyter notebook using the logs that each job creates.

edit: fixed links.

5

moyle t1_izgsce9 wrote

Guild.ai can easily automate this pocess. I really recommend checking it out

3

RSchaeffer t1_izgxqod wrote

These links don't work for me. Can you double check them?

2

thundergolfer t1_izgyu6x wrote

They're not actually links, they've just been formatted like they are. They just link to train.py which is not a website.

3

VirtualHat t1_izjmbm0 wrote

Oh my bad, didn't realise Reddit automatically created links when writing abc.xyz. I've edited the reply to include links to my code.

2

AmalgamDragon t1_izfizm8 wrote

Pics or it didn't happen (i.e. please share the details of this system).

2

iamr0b0tx t1_izgdmkj wrote

Check out weights and biases. I believe it can help you manage multiple experiments. As for speed you may be able to test them concurrently once you have them all set up separately. And I think someone already mentioned you can use a smaller dataset to make the process faster

2

mlisnifty t1_izk4hvw wrote

Yea, I'd keep my data lineage for each project stored in something like CometML. I'd probably create a different project for each idea.. so multiple training runs would be in each project, then you've got all your graphics you need to compare models of the same project, hyperparameters, code, dependencies, and data all ready for you if you decide to come back to one of the projects after chasing something else for a month.

2

thundergolfer t1_izgiaa4 wrote

I'm sorry to shill, but Modal.com is easily the best thing for this. Here's a demo video should how fast you can edit code, run it in the cloud, and then edit it some more all in a handful of seconds.

I was the ML Platform lead at Canva and quick iteration was the #1 pain point of our data scientists and MLEs. I left Canva to join Modal because it can do heavy serverless compute and keep your inner dev loop tight.

Again, sorry to shill, but I've been in this sub for like 8 years and think tools like Modal and Metaflow are finally getting us to a place where ML development isn't a painful mess.

1

GinoAcknowledges t1_izl699d wrote

This is great. I would encourage my organization to use this, except the restriction to T4 GPUs renders this somewhat unusable for us. What’s the ETA on more modern GPUs?

1