Submitted by What_The_Hex t3_105d2w5 in MachineLearning

Let's say there's a whole paragraph of text, 90% of which is irrelevant fluff for my needs. What I'm specifically looking to do is isolate one key pertinent piece of information, which meets a certain criteria that I can somehow specify.

As an example, let's say I have 1,000 paragraphs that are brief biographies of famous people from history. Is there any kind of AI tool I can use to say something like: "For each of these biographies, IF they include information about where this famous person was born? Isolate this piece of information and only output that as a result."

Then it just runs through every single paragraph, conducts the analysis, finds the paragraphs that DO contain this information -- then outputs ONLY that as the result, for each paragraph?

For example, full paragraph 1:

"Teddy Roosevelt was an American President who had a variety of notable accomplishments. He was famous for his daring spirit of adventurism, and his boldness in all areas of life. Renowned for his boundless energy, he is commonly cited as a classic example of a political figure whose work ethic was truly unrelenting. He was born in Antarctica, and died on planet Neptune."

AI output:

"Teddy Roosevelt was born in Antarctica."

Or another possible application. Say I have another pool of paragraphs, which are full abstracts of scientific studies. Imagine I want to use an AI tool to ONLY extract a very concise conclusion of each study, and output just that: "This study concluded that fruit flies, if exposed to a diet of Cheetos and Pepsi Cola, will grow to be 8 meters long."

The best I've found so far are a bunch of text tools that merely allow you to summarize larger chunks of text. None of them, from what I've seen, allow you to put in place a criteria that says: "Don't just summarize; scan the text for the bits of information that meet a certain criteria (eg, it's the conclusion of the study; it's where this person was born; etc), then output THAT as the summarized result."

If such a tool exists, it would be extremely valuable for some projects I'm currently working on.

Thanks!

7

Comments

You must log in or register to comment.

What_The_Hex OP t1_j3a8bz8 wrote

I've seen a lot of memes lately about Chat GPT. Believe it or not, I'm testing it for this specific task, and holy fuck, it is absolutely nailing it. I can even go micro-specific with my requests, and ask stuff like: "Summarize this novel in less than 10 words", and it absolutely fucking nails it. VERY fucking cool!

4

Bart-o-Man t1_j3agboz wrote

Yea, it gets even better.

Ask it to: Summarize quantum mechanics in a short Shakespearian Sonnet.

Or ask it to: Write a 10 paragraph screenplay in which people argue over which programming language is better: C++ or Python. And make the dialog rhyme while using words and sentences that sound like they are from southern Texas.

It's mindblowing.

5

What_The_Hex OP t1_j3ah4z9 wrote

Absolutely bonkers what's possible these days. I see shit like this and I wonder how every computer programmer is not a millionaire XD

2

Bart-o-Man t1_j3kowba wrote

LOL. No kidding. Bit then sometimes I look at it and get a little bit nervous, since they also write code!

1

What_The_Hex OP t1_j3l3fe5 wrote

Dude it is just such a valuable tool. Sky's the fucking limit, truly. Computer programming + AI = like The PROMISED Land of just, massive, massive leverage and automation. Like, click one button, run one program, and you can move fucking mountains with what those two working in tandem are capable of.

2

Bart-o-Man t1_j40r189 wrote

Yea, that's no exaggeration. The hardest part is continually remind yourself to keep trying new things to push it further.

I asked GitHub Copilot to write a couple of Python functions, and I was pretty impressed. I dont mean, "write a function to add two numbers" or "parse some text". I defined an Nx3 and an Mx3 Numpy matrices (a & b), told it (in comments) that a & b were two arrays of 3D points. I asked it to write a very fast function to compute all distances between pts in a and b, and return it. It did it immediately and the results were correct, so it's a start.

I had already written my own function to do it with vectorized numpy math (no slow loops), and optimized. I wanted to know which was faster.
The result is always a symmetric MxN matrix and diagonals are zero, so I knew my outer-difference MxN matrix had to be wasteful.

I benchmarked CoPilot's code against my own: their code was 3X faster in large matrix tests.

The second example: I told it I have a laser with 700 nm wavelength. I gave it some specs, like the diameter of the laser, an aperture size, and told CoPilot to write a function to compute and plot the laser image projection on a plane that was X mm away. It did it first try. It looks something like this image:

Yea... amazing is just the start.

1

WikiSummarizerBot t1_j40r2n0 wrote

Diffraction

>Diffraction is defined as the interference or bending of waves around the corners of an obstacle or through an aperture into the region of geometrical shadow of the obstacle/aperture. The diffracting object or aperture effectively becomes a secondary source of the propagating wave. Italian scientist Francesco Maria Grimaldi coined the word diffraction and was the first to record accurate observations of the phenomenon in 1660. In classical physics, the diffraction phenomenon is described by the Huygens–Fresnel principle that treats each point in a propagating wavefront as a collection of individual spherical wavelets.

^([ )^(F.A.Q)^( | )^(Opt Out)^( | )^(Opt Out Of Subreddit)^( | )^(GitHub)^( ] Downvote to remove | v1.5)

0

bearberry21 t1_j3a7cp2 wrote

Prompting summarization could be something to look into

3

PicaPaoDiablo t1_j3ach5f wrote

Tbh this could be done algorithmcly pretty easy. There are only a handful of synonyms and phrases that would be "born", grew up etc. U think just tokenizing on sentences would suffice and depending on what qualification criteria it could be expanded petty easily

1

ndemir t1_j3ad5f8 wrote

I know it may sound hype but it is worth trying GPT3.

1

What_The_Hex OP t1_j3ae4vf wrote

Is that different from Chat GPT? AI Noob here. Chat GPT is CRUSHING at the effectiveness for my needs. If I could have a version of that, which I could use programmatically? (versus one-off prompts via the chatbox) Via Python or Javascript? Like, batch the same request 1,000 - 10,000 times? That would be 100% perfect for my needs.

1

alivebliss t1_j3ars5u wrote

You might want to check LangChain.

1

nildeea t1_j3betb4 wrote

“Summarize this text using only excerpts from the text itself” into chat gpt along with your text.

1

Just_CurioussSss t1_j3choa8 wrote

Have you tried Named Entity Recognition (NER)? You could use NER that involves identifying and classifying named entities (such as people, organizations, and locations) in text. You could use this tool to extract the named entities from each paragraph and then filter them based on specific criteria, such as location or birth place. If you're feeling a bit ambitious, why not try semantic search?

1