Lets say I have a dataset of real estate listings. I have a column of text that describes the listing, and another column that shows the number of rooms for example. In most of the cases, the number of rooms is shown in both columns, in the description text and also in the dedicated column.

But for some observations, the number of rooms is in the description text but not in the column "number of rooms". So I have missing data.

I could try to fill the missing data with by applying regex in the description text, but the number of possibilities seems to big.

Is there a machine learning technique in NLP that allows me to do that, since it most of the observations the data is present in both column, so is "naturally labelled"?

If there is, what is the name of these techniques? I would like to search about it but I don't know the proper keywords to google.

Comments

Ok-Cartoonist8114 t1_j4zrur1 wrote on January 19, 2023 at 12:05 PM

It is called Slot filling, extractive QA may works :)

Kebet-Mendez OP t1_j50tn1o wrote on January 19, 2023 at 4:46 PM

Thank you!

Acceptable-Cress-374 t1_j50aej9 wrote on January 19, 2023 at 2:41 PM

You could also look up Named Entity Recognition (NER)

Kebet-Mendez OP t1_j50to3u wrote on January 19, 2023 at 4:46 PM

Thank you!

Dear-Acanthisitta698 t1_j4zqkv8 wrote on January 19, 2023 at 11:51 AM

Text QA might work. Give descriptiom as passage and question as "how many number of rooms in this house?".

Kebet-Mendez OP t1_j50tplz wrote on January 19, 2023 at 4:47 PM

Thanks a lot!