Submitted by Ggronne t3_10tbfjq in MachineLearning

Maybe not a Machine Learning question, but I'm searching for good books about information retrieval.

The two primary ones I can find are:

- Introduction to Information Retrieval (2008)

- Information Retrieval - Implementing and Evaluating Search Engines (2016)

​

They seem a bit old for 2023, but they may still be useful?

Do you have any good book recommendations?

10

Comments

You must log in or register to comment.

larswl1 t1_j7688oq wrote

I don't know about the new books, but these seem important to me to start with. They set the main tasks of information retrieval. And to solve some specific problems, there are many different articles, for example, ss conferences SIGIR

2

matth0x01 t1_j76dt6k wrote

Depends a bit on your skill level and what you want to achieve.

I started with the Introduction to Information Retrieval (2008) book, which was quite math-heavy back then. But I learned a lot and found it a good starting point.

You get the concept of decompounding, reverse index, ranking functions, etc.

Newer IR strategies involve word2vec methods for item representation instead of handcrafted ones or directly learning the search ranking function, which is a different beast compared to traditional search engines.

1

cruddybanana1102 t1_j76eb6v wrote

Schutze and Manning's book on Information Retrieval is your best guide.

2

VectorSpaceModel t1_j76zubh wrote

The IR basics are timeless. I’ve read parts of the first textbook and it’s really good.

2

Ggronne OP t1_j7aj3co wrote

I have written small web scrapers for different applications, but none were based on theory. An upcoming project requires more extensive information retrieval and I would therefore like to get a better foundation.

I will start with Introduction to Information Retrieval, thanks!

I will start with Introduction to Information Retrieval; thanks!

1

matth0x01 t1_j7ayc9e wrote

Seems that you are more interested on the crawling and ETL side.

Maybe you should look more into Data warehouse or Data lake literatur. Especially the shift in paradigm from ETL (extract, transform, load) to ELT (extract, load, transform) respectively schema-on-read.

2

matth0x01 t1_j7c3smm wrote

Sorry, my library seems a bit outdated on that side.

But the one from Wikipedia looks great at first sight. Ralph., Kimball (2004). The data warehouse ETL toolkit : practical techniques for extracting, cleaning, conforming, and delivering data

1