https://preview.redd.it/xqjhajxw83ka1.png?width=540&format=png&auto=webp&v=enabled&s=ec55cf0a63deffb1c0cdc2041633ac434adefd9b

The news is filled with reports of search engine wars but people are missing the bigger picture!

Today, it seems obvious that search engines will have some sort of chat interface in the future. No matter how flawed [5] large language models (LLMs) still are, the benefits will be simply too great.

But the integration will also improve the models themselves beyond our wildest dreams!

LLM technology will profit in three distinct ways from this integration. Though Google has the greatest search engine and its teams are at the cutting edge of AI they will have a hard time competing with Microsoft. We will look at how Google’s very profitable advertising monopoly might actually become its Achilles’ hell in the AI arms race.

Microsoft on the other hand has set itself up to create truly transformational technology. disrupt all knowledge work and potentially outrank Google (pun intended).

I know, I know! These are a lot of buzzwords, but wait for it:

You’ll be surprised it is actually an amazing story! And by the end, Microsoft might make Google look like a rubber duck in a bow wave.

Let’s jump in!

LLMs And Search Engines - A Match Made In Heaven

Currently, there are three obvious ways in which LLMs can be improved and search engines can help with all of them!

The first way is to train the models on more data. The latest research on scaling LLMs has shown that our current models are significantly undertrained. To properly fit a model of the size of GPT-3 about 10 times more training data should have been used. Such a dataset does not exist yet.

Search engines regularly download the entire internet. This makes them a formidable partner to collect such a dataset.

The second way is to give models access to an external knowledge base. Normally during training, many of the facts contained in the training data are stored in the model’s weights. Once the training is finished, the model stops learning and immediately begins to become outdated. Let’s say we wanted to make a model that is aware of breaking news. The naive approach would be to retrain the model every few seconds as new data comes in. This would be totally impractical and prohibitively expensive with a model like GPT-3.

There is an elegant solution!

We integrate the model with an external information retrieval system. Every time the model generates an output, the retrieval system helps the model by providing it with relevant information. In recent years, there has been extensive research into these so-called retrieval-enhanced networks. So far, most approaches have been retrieving information from a static database of text [10]. This obviously has the problem that the database has to be kept up to date.

However, search engines already are incredibly powerful retrievers of up-to-date information.

In June of last year, OpenAI published a paper in which they integrated GPT-3 with Bing. Their model serves as a basis for Bing's chat and they called it WebGPT [6]. The model was able to access the web, scroll, and search websites using “Ctrl + f”. This way the model can always access the latest information and never becomes outdated.

Last but not least, search engines help to make LLMs smaller and cheaper.

If a model uses a search engine to retrieve relevant information, fewer facts need to be stored in its weights. Research has shown that as a result, the model itself can be at least 25 times smaller without losing performance [10]. (Side note: sparsity is obviously also a driver of efficiency [16] but unrelated to search engines)

These three trends alone will soon make ChatGPT look like an exceptionally ungifted, hallucinating toddler!

Okay, Google is also pretty decent at AI stuff. On top of that, they got the greatest search engine out there. Why don’t they slap a language model on top of it?

Why Google Can Easily Replicate Bing’s Chat But It Would Be Terrible For Business

In theory, they are well capable of replicating Bing's chat.

In January of last year, Google already released a paper outlining its own chat model LaMBDA [9]. The model also had access to “external knowledge sources” and even made headlines because an engineer at Google started to think it was sentient.

Heck, Google even invented many of the underlying technologies that went into ChatGPT [8].

Many people argue that Google's size and complacency keep them from innovating.

Nah, there is a different reason why they haven't created their own chat!

It is actually a real double whammy for their advertising business!

On the one hand, a chat model makes Google’s ability to monetize search queries dwindle. Google makes money if users click on search results. In a future where people get answers in the form of human language, fewer people will be visiting the underlying websites.

Fewer visits mean fewer clicks and that means less money for Google.

Sridhar Ramaswamy who used to run Google’s ad division predicted that this reduction in clicks will disrupt Google’s business model in a “massive way” [1].

But it gets worse!

The other problem for Google is that it is also much more expensive to serve search results through LLMs. Currently, it costs Google ~1.06 cents to retrieve the results for a query. Each query generates on average 1.61 cents in advertising revenue. This means Google earns roughly 0.5 cents per query.

So, how much would it cost to have a model such as ChatGPT summarize these search results?

The answer is about 0.36 cents. Hence, every time Google uses such a model, the margin for that query reduces by 72%. As a result, Google’s business model would become wildly less profitable.

But these are just the variable costs per query!

Let’s assume for a second Google would serve all of its ~320K queries per second through a chat interface. In that case, Google would need to build an additional $100B worth of computing infrastructure to serve the workload [11].

This is of course slightly oversimplified.

Google would not need to run the chat models for all queries. Further, as we saw above, these models will get cheaper in the future. Though this might be true, chat models will be making it more expensive to serve answers to users.

So far so good, but what will happen now?

Will Google just create its own chat model, which they probably are capable of doing? Then they will crush Bing as they always have done, right? They might make a bit less money than they used to, but who cares?

Unfortunately for Google, it is not that simple!

Google’s Monopoly Could Crumble

Not all search queries are created equal. They can be divided into three categories:

Navigational: People search for “bbc” because it is more convenient than going to the BBC website directly by typing out “bbc.com” [4].
Informational: “How fast does an electric bike go”.
Transactional: “Buy cheap computer online”

It is surprisingly difficult to find reliable numbers for the share of each type. However, most sources estimate informational queries to make up 60-80% of all queries. Navigational and transactional queries share the rest [12].

This is a huge problem for Google!

What made Google superior to Bing in the past is its ability to interpret search intent and return useful answers to even the fringiest of questions. The data show that more people chose Google over other search engines if their queries are longer and more complex [13]. However, this is exactly where chat models shine the most. As a result, some users find two-thirds of chat-based answers to be more useful than Google’s results [2].

This is crazy! Google is no longer the best at serving the largest category of search queries!

This could hold true even if Google manages to create an equally good chat engine. If Google wanted to restore its competitive edge, it would need to create a chat model that is significantly better than OpenAI’s models. Creating one with similar performance is possible, blowing OpenAI out of the water seems very unlikely at this point.

It will be a while before this plays out because Google’s monopoly is very sticky!

Believe it or not, for the average internet user the browser's default search engine is what she uses. Google is the default in most of the common browsers. They will fight tooth and nail for this to remain that way. In 2021 for example, they reportedly paid apple $15B to stay the default on Safari [14].

This is wild!

Google has forever been the prime example of a tech company that achieved escape velocity. Its network-effect-driven business has long been seen as untouchable.

But this is not even the craziest part of this story! Microsoft is not even after Google. They shaking up the search giant is only a sideshow.

Microsoft Is Winning Big Even Without Beating Google

The real value for Microsoft does not lie in the search business.

We saw above that the integration of LLMs with search engines is mutually beneficial. This offers a clear path for Microsoft and OpenAI to breed foundation models that are wildly more powerful than anything we have today. Microsoft can then use these models across all of its products from Word to the Edge browser [7]. This will create an ungodly amount of value and has the potential to transform all knowledge work.

What’s more, they can pay for a lot of this development with the money they poach from Google’s search business.

Here is how this could play out!

From the Graphic below you can see that Google has had a tight grip on almost 90% of the search market [3].

https://preview.redd.it/m1zki34093ka1.png?width=766&format=png&auto=webp&v=enabled&s=d182610dc35efb50cd572fd63454f6bf5c743a9a

In 2021, Google’s parent company Alphabet made $208B, 81% of its revenue, through Google’s advertising business [1].

Microsoft however, is in a completely different situation!

Bing’s advertising business makes up a measly 6% of Microsoft’s total revenue. For comparison, Microsoft’s Office products account for 22% [15]. Hence, Microsoft is much less reliant on Bing’s advertising business.

This is a huge opportunity for Bing!

Whereas Google’s parent company Alphabet will be screeching if their advertising margins reduce even a bit, Microsoft does not care. If Bing’s profit went away tomorrow, they would probably still be fine. However, for every 1% in market share that Bing is able to steal away from Google it has the opportunity to increase its revenue by $2B annually [11].

If you are at the top like Google, the only way you can go is down!

On the contrary, if Bing manages to capture only 10% more of the search market, this adds another $20B to the bottom line. Microsoft can then spend all of that money on more and more powerful foundation models. In turn, it can use these models to cross-pollinate all of its other businesses.

Wow!

This AI-fueled race for the top might negatively influence the overall willingness to open-source research findings.

Regardless, I marvel at the master class in strategy that Microsoft and OpenAI are giving us here. I am also deeply excited for the future and the amazing technologies that will come from this!

Thank you for reading!

As always, I really enjoyed making this for you and I sincerely hope you found it useful!

I send out an essay like this every week. Join 1.800 subscribers from top universities like MIT, Harvard, and Yale as well as Organizations like Hubspot, a16z, and Google. Click here to subscribe!

References

[1] https://www.bloomberg.com/opinion/articles/2022-12-07/chatgpt-should-worry-google-and-alphabet-why-search-when-you-can-ask-ai

[2] https://www.bloomberg.com/opinion/articles/2022-12-07/chatgpt-should-worry-google-and-alphabet-why-search-when-you-can-ask-ai?leadSource=uverify wall

[3] https://www.statista.com/statistics/216573/worldwide-market-share-of-search-engines/#:~:text=Global desktop market share of search engines 2015-2022&text=As of December 2022%2C online,market share was 2.55 percent.

[4] https://www.siegemedia.com/seo/most-popular-keywords#:~:text=The winner of most popular,or "weather" for short.

[5] https://twitter.com/vladquant/status/1624996869654056960?s=46&t=oAzVIB-avPf-JbQAnhcbtA

[6] https://arxiv.org/pdf/2112.09332.pdf

[7] https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/

[8] https://arxiv.org/abs/1706.03762

[9] https://arxiv.org/abs/2201.08239

[10] https://arxiv.org/abs/2112.04426

[11] https://www.semianalysis.com/p/the-inference-cost-of-search-disruption

[12] https://www.quora.com/What-percentage-of-web-search-queries-are-navigational

[13] https://www.statista.com/statistics/413229/search-query-size-search-engine-share/

[14] https://www.forbes.com/sites/johanmoreno/2021/08/27/google-estimated-to-be-paying-15-billion-to-remain-default-search-engine-on-safari/?sh=40cbbfcf669b

[15] https://businessquant.com/microsoft-revenue-by-product

[16] https://arxiv.org/abs/2209.01667

Comments

PM_ME_A_STEAM_GIFT t1_j9zbnag wrote on February 25, 2023 at 5:55 PM

Great read! Your subscription link isn't working though.

LesleyFair OP t1_ja3ak5e wrote on February 26, 2023 at 3:09 PM

Thank you! I am glad you liked it.
I fixed the link. Thanks for pointing it out!

WarAndGeese t1_j9zlf9t wrote on February 25, 2023 at 6:59 PM

Why is this framed in the context of corporate worship? If things go as they should them microsoft and google would cease to exist.

It's a well-written article so sorry for the contrarian comment. My comment isn't a response to your article, but of the regular framing people have about "this technology means X company will beat Y company". Who cares about these companies or about the people in them? Again, if things go as they should, the companies would functionally cease to exist.

WarAndGeese t1_j9zllpi wrote on February 25, 2023 at 7:00 PM

Again the article is well-researched and links sources and everything, I guess my comment belongs elsewhere in cases where other people keep framing things in terms of companies and corporations.

LesleyFair OP t1_ja3b59e wrote on February 26, 2023 at 3:13 PM

Thanks for your input.
I was hoping to paint a picture of what I think is the competitive landscape that will be coloring the developments over the next years.
I am glad you found the article helpful and took the time to share your thoughs! I appreciate it!

Izzy187 t1_ja3nb5d wrote on February 26, 2023 at 4:36 PM

Thia entire post although took a lot of time to make and write, is sadly written by someone who doesnt have much knowledge on the topic and speculated/assumed the majority of it. Unfortunately in not the correct way. I appreciate the sources but bloomberg and forbes... Really? The data you are trying to reference isnt something that a company is going to release to the public, hence why the mentioned 'news' websites speculated some clickbait crap. You did good on the effort portion though

LesleyFair OP t1_ja3uls8 wrote on February 26, 2023 at 5:24 PM

I might very well be wrong. I am merely exploring this fascinating topic and putting together the pieces I find. What part do you think I got wrong? I would love to hear your take on it.

Izzy187 t1_ja778z1 wrote on February 27, 2023 at 10:24 AM

I cannot tell you to be honest. I just can tell that the data isnt credible and not something that can be proven or found. However most news now adays is speculation anyway. But if you want a direct answer it is this; the data and information you reference to in your thread has zero reason to be tossed around on the internet and zero reason for any company to disclose any such similar info.