Viewing a single comment thread. View all comments

WonderFactory t1_jd2vrsy wrote

I've been testing this approach today and it works well. My aim is to try and reduce the numbers of tokens used and therefore the cost when calling the API. Punctuation counts as a token which is annoying so all the : and , characters cost

2

Lesterpaintstheworld OP t1_jd33hvc wrote

Yep the approach overall we found was very effective. I'm wondering how long it will keep relevant, with descending prices though.

1

KerfuffleV2 t1_jd57jq9 wrote

Be sure you're look at the number of tokens when you're considering conciseness, since that's what actually matters. I.E. an emoji may have a compact representation on the screen but that doesn't necessarily mean it'll be efficiently tokenized.

Just for example, "🧑🏾‍🚀" from one of the other comments actually is 11 tokens. The word "person" is just one token.

You can experiment here: https://platform.openai.com/tokenizer (non-OpenAI models likely will use a different tokenizer or tokenize text different, but that'll help you get an idea at least.)

Also relevant is that these models are trained to autocomplete text based on probabilities based on the text they were trained with. If you start using or asking them to generate text in a different format, it may well end up causing them to produce much lower quality answers (or understand less of what the user responded).

3