By Leah Nylen
(Bloomberg) — ChatGPT doesn’t know whether Taylor Swift is dating Kansas City Chiefs tight end Travis Kelce.
That example was used by Microsoft Corp. executive Mikhail Parakhin this week at the US Justice Department’s landmark antitrust trial to illustrate how Alphabet Inc.’s market-dominant Google search engine can’t be easily replaced or challenged by new technologies, such as chatbots.
The OpenAI chatbot allows users to type in a query and receive a written response, but the data used to train the artificial intelligence system is based on older information culled from the web. Without fresh data – the type provided by users searching for new topics like the pop singer’s latest beau – it’s unlikely to provide an accurate answer.
Swift’s rumoured new boyfriend Kelce, the two-time Super Bowl winning US football player, won’t show up in ChatGPT, but it will in Microsoft’s Bing search engine, Parakhin told US District Judge Amit Mehta, who is overseeing the case in Washington DC.
The chatbot “is used for reasoning and for providing the answer, but the base information is coming from search,” said Parakhin, who joined Microsoft in 2019 after a stint as the chief technology officer for Russian search engine Yandex NV.
The Justice Department’s antitrust lawsuit against Google involves conduct as far back as 2002. But antitrust enforcers say the case is likely to impact the future of the internet as tech companies begin to incorporate artificial intelligence into products.
A key disagreement at the trial has been over a search engine’s “scale,” a term that refers to the amount of data it collects from websites and users. Search engines crawl the web to create an index — a map that makes it easier for the search engine to quickly provide relevant links in response to a query. Google’s index, the Justice Department said, is the largest in the world and if printed out on paper the stack would reach to the moon and back 12 times.
Because it costs a website money to allow crawlers, they often limit which search engines they will allow to gather data. For example, the popular question-and-answer website Quora Inc. only permits crawlers from Google, not from Bing or other search engines, Parakhin said.
“Websites won’t let you index them if you aren’t a big search engine,” he said. “It doesn’t matter if you can index the data if websites don’t let you.”
In testimony earlier in the trial, Google chief economist Hal Varian and engineer Eric Lehman testified that the user data gathered by a search engine is less important today and newer technologies including the large language models on which ChatGPT is based don’t need it.
“I thought user data would be essential to helping machines learn language. It turned out that these very large machine learning systems can learn simply from text,” said Lehman, who was involved in Google search for 17 years before leaving in 2022. “There will still be a role for user data but I think it will be much diminished.”
Microsoft’s Parakhin, however, said that even new technologies can’t fully replace the data disadvantage. Bing’s data matters to more than just Microsoft. Other search engines, including DuckDuckGo whose Chief Executive Gabriel Weinberg testified in the trial last week, and Yahoo rely on Bing’s data to build out their own results.
“You can mitigate the effects of scale to some degree. We haven’t been able to reverse the effects,” Parakhin said. “We’ve seen companies trying. We haven’t seen anybody succeed.”
During Parakhin’s testimony, the judge asked him whether a company could build a “high quality search engine” solely with a large language model like ChatGPT.
“It is very easy to build a search engine that would do reasonably well on a certain segment of queries,” Parakhin said, “just like it’s easy to build a self-driving car that can drive around an empty parking lot.”
“Even with the best algorithms, even with large-language models, to build a competitive fully functioning search engine is extremely hard,” he said.
©2023 Bloomberg L.P.