Viewing a single comment thread. View all comments

bortlip t1_j2f16du wrote

That has not been my experience.

If you expect it to give you 100% working code with minimal instruction, it might not work (although it might). But if you work with it just a little bit, it's pretty amazing.

Example: I was reading this article about text summarization with python. The article listed this code:

def summarize(text, per):
    nlp = spacy.load('en_core_web_sm')
    doc= nlp(text)
    tokens=[token.text for token in doc]
    word_frequencies={}
    for word in doc:
        if word.text.lower() not in list(STOP_WORDS):
            if word.text.lower() not in punctuation:
                if word.text not in word_frequencies.keys():
                    word_frequencies[word.text] = 1
                else:
                    word_frequencies[word.text] += 1
    max_frequency=max(word_frequencies.values())
    for word in word_frequencies.keys():
        word_frequencies[word]=word_frequencies[word]/max_frequency
    sentence_tokens= [sent for sent in doc.sents]
    sentence_scores = {}
    for sent in sentence_tokens:
        for word in sent:
            if word.text.lower() in word_frequencies.keys():
                if sent not in sentence_scores.keys():                            
                    sentence_scores[sent]=word_frequencies[word.text.lower()]
                else:
                    sentence_scores[sent]+=word_frequencies[word.text.lower()]
    select_length=int(len(sentence_tokens)*per)
    summary=nlargest(select_length, sentence_scores,key=sentence_scores.get)
    final_summary=[word.text for word in summary]
    summary=''.join(final_summary)
    return summary

I asked chat GPT what it did:

At a high level, this code defines a function that takes in a piece of text and a percentage value, and produces a summary of the text by:

  1. Tokenizing the text and calculating the frequency of each word in the document
  2. Normalizing the word frequencies to the range [0, 1]
  3. Tokenizing the text into sentences and calculating a score for each sentence based on the normalized word frequencies of the words it contains
  4. Selecting the highest-scoring sentences based on the specified percentage of the total number of sentences
  5. Returning the selected sentences as a single summary string

I then asked it to write that same code as c# and it gave me the (nearly: I had to add one cast) perfect LINQ:

            var words = Regex.Split(text, @"\s+")
                .Where(x => !stopWords.IsStopWord(x.ToLower()) && !char.IsPunctuation(x[0]))
                .GroupBy(x => x) 
                .ToDictionary(g => g.Key, g => (double)g.Count());

            var maxCount = words.Max(x => x.Value);
            words = words.ToDictionary(x => x.Key, x => (double)x.Value / maxCount);

            var sentences = Regex.Split(text, @"(?<=[\.!\?])\s+")
                .Select(x => x.Trim())
                .ToArray();

            var scores = sentences
                .Select(x => x.Split(' ')
                    .Select(y => words.ContainsKey(y) ? words[y] : 0)
                    .Sum())
                .ToArray();

            var summary = string.Join(" ", sentences
                .Zip(scores, (s, c) => new { Sentence = s, Score = c })
                .OrderByDescending(x => x.Score)
                .Take((int)(per * sentences.Length))
                .Select(x => x.Sentence));

That's a pretty high level of complexity to me.

32