I'm using Huggingface's transformers regularly for experimentations, but I plan to deploy some of the models to iOS.

I have found ml-ane-transformers repo from Apple, which shows how transformers can be rewritten to have much better performance on Apple's devices. There's an example of DistilBERT implemented in that optimized way.

As I plan to deploy transformers to iOS, I started thinking about this. I'm hoping some already have experience about this, so we can discuss:

Has anyone tried this themselves? Do they actually see the improvements in performance on iOS?
I'm using Huggingface's transformer models in my experiments. How much work do you think there is to rewrite model in this optimized way?
It's very difficult to train transformers from scratch (especially if they're big :) ), so I'm fine-tuning on top of pre-trained models on Huggingface. Is it possible to use weights from pretrained Huggingface models with the Apple's reference code? How difficult is it?

Comments

You must log in or register to comment.

TheDeviousPanda t1_j6vv0my wrote on February 2, 2023 at 6:23 AM

I hate to do this to you, but I have been in your position and I have answers to all your questions.

Yes, yes
A lot
Yes, very

alkibijad OP t1_j6w7lo3 wrote on February 2, 2023 at 9:09 AM

That was not the answer I was hoping for, but very helpful :)
Do you have any code/repo to share? I'm only able to find the DistilBERT implementation in apple's repo, would like to see some other examples?

red_b3 t1_j6wd8yx wrote on February 2, 2023 at 10:31 AM

Second that!

alkibijad OP t1_j6wuvnc wrote on February 2, 2023 at 1:41 PM

Can you please elaborate your answers and quantify?
I'm most interested in the effort for bullets 2 and 3. In your own experience, did it take hours, days, weeks?

Competitive-Rub-1958 t1_j6z8a7t wrote on February 2, 2023 at 10:51 PM

For someone who simply wants to use ANE (haven't bought it, just considering) for testing out bare-bones models locally (I find remotely debugging quite frustrating) for research purposes before finally training them on cloud, how good is the support with Containerization solutions like Singularity - does it even leverage ANE?

I know the speedup won't really be anything drastic, but if it helps (is faster and more resource efficient than the CPU/GPU) then that just translates to a lower time-to-iterate anyways...

So for someone using plain PyTorch (w/ a bells and whistles), how much of a pain would it be?

vade t1_j6xeylg wrote on February 2, 2023 at 4:02 PM

FWIW, a colleague of mine is working on this, and is also hitting some hiccups. Ive pointed them to this thread :)

alkibijad OP t1_j7f1b2e wrote on February 6, 2023 at 8:59 AM

Looking forward to hearing their experiences!

[deleted] t1_j6uqyrb wrote on February 2, 2023 at 12:47 AM

[removed]