Viewing a single comment thread. View all comments

Tgs91 t1_irxr8oa wrote

Depends the subject matter. I was playing with summarizers recently and they are VERY sensitive to writing styles. For example, models trained on news articles will perform poorly on fiction and be nearly unreadable for technical documents.

Idk what models are out there for Turkish, but finding a Turkish model that also is trained on the correct writing style will be difficult. If you only need to summarize a paragraph that opens up possibilities for you. Regular transformers (like all of the models based on BERT or BART) can only handle a max of 512 tokens (words or pieces of words). The length you can feed them in limited, but they've been around since 2016 so there are a lot of options. If you want to use newer models that can handle longer inputs, you'll want to use a Longformer.

But no matter what you use, factual accuracy is a major problem in summarization models. They sort of compress an input into a representation, then use that representation to generate an output. So sort of like you give someone a writing prompt and they write a story. There's really no way to confirm accuracy. Extractive summarization isnt as state of the art, but it's more reliable if accuracy is important to your use case. This approach looks for pieces of text within the input that act as a good summary, then uses them as a summary. So if you feed it 2 pages of text, it can find 4-5 sentences that do a good job of summarizing the 2 pages. This also might be a more robust approach if you can't find a Turkish model that's a good fit for your use case. Huggingface does not have good support for extractive summarization, but I've been using the bert-extractive-summarizer package, which is built on top of Huggingface and can import Huggingface models.

8

CeFurkan OP t1_irxsv3j wrote

ty very much for detailed reply. i guess in that case i should try each one and see which one generates best

1