I'd like to extract named entities, something like this:

"[Text]: Microsoft (the word being a portmanteau of "microcomputer software") was founded by Bill Gates on April 4, 1975, to develop and sell BASIC interpreters for the Altair 8800. Steve Ballmer replaced Gates as CEO in 2000, and later envisioned a "devices and services" strategy.

[Name]: Steve Ballmer

[Position]: CEO

[Company]: Microsoft

Tried it on GPT-Neox with 20b parameters with mixed success, is there anything better out there to try for a few-shot learning (without fine tuning)?

Comments

TankAttack OP t1_j63scac wrote on January 27, 2023 at 2:47 PM

#1,528,493

I also tried pre-trained tools like Spacy, but they only have a few fixed entity types they detect.

thatphotoguy89 t1_j63zz5q wrote on January 27, 2023 at 3:38 PM

#1,529,573

GPT-J is supposed to be quite good. Do you have a list of the types of entities you'd like to detect?

TankAttack OP t1_j64p2oj wrote on January 27, 2023 at 6:15 PM

#1,533,140

Replying to thatphotoguy89 (#1,529,573)

At this point I would like to imitate the example with position and company. It was taken from gpt-j btw. I thought neox is 3 times bigger so tried that first. Will run gptj and compare the results now.

Thank you

thatphotoguy89 t1_j64q5hm wrote on January 27, 2023 at 6:22 PM

#1,533,272

Replying to TankAttack (#1,533,140)

You can try extractive QA if you don't want to fine-tune it. Basically, create a QA pipeline and ask the same questions for different text

LetMeGuessYourAlts t1_j64t0lv wrote on January 27, 2023 at 6:40 PM

#1,533,674

I'm doing something similar to your task. My plan is to use GPT-3's Text-divinci-003 as it can do this in Instruct mode without modification and then once I have a hundreds to thousands of examples then fine-tune GPT-J on Forefront.ai using what GPT-3 generated to hopefully cut costs by about 75%.

visarga t1_j65iwit wrote on January 27, 2023 at 9:24 PM

#1,537,524

I am using GPT-3 for this kind of stuff, and fine-tuning small models on the data.

TankAttack OP t1_j660bm9 wrote on January 27, 2023 at 11:21 PM

#1,539,855

Replying to thatphotoguy89 (#1,533,272)

Do you mean free text questions? Like zero shot learning? Are there any examples of this?

TankAttack OP t1_j660efo wrote on January 27, 2023 at 11:21 PM

#1,539,861

Replying to visarga (#1,537,524)

How many samples do you use for fine turning?

bubudumbdumb t1_j66a4kw wrote on January 28, 2023 at 12:31 AM

#1,541,161

The way you prompt assume there is a single entity for "name" so you catch "balmer" but not "bill gates".

Why not BIO tagging each token for each of the entity types?

thatphotoguy89 t1_j66n16v wrote on January 28, 2023 at 2:10 AM

#1,543,122

Replying to TankAttack (#1,539,855)

Yeah. Here’s a HF tutorial on how to do this QA tutorial

TankAttack OP t1_j66s7js wrote on January 28, 2023 at 2:51 AM

#1,543,837

Replying to thatphotoguy89 (#1,543,122)

Cool, will have a look! Do they list any models as being good at question answering?

janck12 t1_j67d4qt wrote on January 28, 2023 at 6:08 AM

#1,546,805

I am not sure, if there is huge differences from one model to another. This is heavily depending on the training data that you can get.

I would suggest using some existing NER nodels and possibly fine tune them on your own data. Have a look at GENRE https://github.com/facebookresearch/GENRE

visarga t1_j67sivp wrote on January 28, 2023 at 9:32 AM

#1,548,790

Replying to TankAttack (#1,539,861)

My task uses sentence pairs, and I have an efficient prompt that makes many pairs in one go. So in 5 hours I managed to generate 230K pairs. Cost $10. I plan to generate millions to "exfiltrate" more domain knowledge for the small and efficient models I am training downstream.