Viewing a single comment thread. View all comments

askingforhelp1111 t1_j81ggm0 wrote

Sure, I have a few links. All of them have an inference speed of 4-9 seconds.

https://huggingface.co/poom-sci/WangchanBERTa-finetuned-sentiment

https://huggingface.co/ayameRushia/bert-base-indonesian-1.5G-sentiment-analysis-smsa

I call each checkpoint like this:

nlp = pipeline('sentiment-analysis',
            model=checkpoint, 
            tokenizer=checkpoint)

Thank you!

1

coolmlgirl t1_j8fmfpi wrote

I'm using the OctoML platform (https://octoml.ai/) to optimize your model and I got your average inference latency down to 2.14ms on an AWS T4 GPU. On an Ice Lake CPU I can get your latency down to 27.47ms. I'm assuming shapes of [1,128] for your inputs "input_ids," "attention_mask," and "token_type_ids," but want to confirm your actual shapes so that we're comparing apples to apples. Do you know what shapes you're using?

1