During its Build 2020 conference this week, Microsoft took the wraps off of AI at Scale, an initiative aimed at applying large-scale AI and supercomputing to language processing across the company’s apps, services, and managed products. Already, Microsoft says, massive algorithms have driven improvements in SharePoint, OneDrive, Outlook, Xbox Live, and Excel. They’ve also benefited Bing by bolstering the search engine’s ability to directly answer questions and to generate image captions.
Bing and its competitors have a lot to gain from AI and machine learning, particularly in the natural language domain. Search tasks necessarily begin with teasing out a search’s intent. Search engines need to comprehend queries no matter how confusingly or wrongly they’re worded. They’ve historically struggled with this, leaning on Boolean operators — simple words like “and,” “or,” and “not” — as conjunctive band-aids to combine or exclude search terms. But with the advent of AI like Google’s BERT and Microsoft’s Turing family, search engines have the potential to become more conversationally and contextually aware than perhaps ever before.
Large-scale language models
Bing now uses fine-tuned language models distilled from a large-scale multimodal representation (NLR) algorithm to power a number of features, including intelligent yes/no summaries. Given a search query, a model assesses the relevance of document passages in relation to the query and reasons over and summarizes across multiple sources to arrive at an answer. (That’s only in the U.S. for now.) A search for “can dogs eat chocolate” would prompt the model — which can understand natural language thanks to the NLR — to infer the phrase “chocolate is toxic to dogs” means that dogs can’t eat chocolate, even when a source doesn’t explicitly say it.
Beyond this, building on a recently deployed Turing NLR-based algorithm that enhanced the answers and image descriptions in English results, the Bing team used the algorithm’s question-answering component to improve “intelligent” answer quality in other languages. Fine-tuned only with English data, the component drew on the linguistic knowledge and nuances learned by the NLR algorithm, which was pre-trained on 100 different languages. This enabled it to return identical answer snippets across languages in 13 markets for searches like “red turnip benefits.”
The Bing team also applied AI to the fundamental problem of breaking down ambiguous concepts. A new NLR-originated algorithm tailored to rank potential web results for queries uses the same scale as human judges, allowing it to realize that the search “brewery Germany from year 1080” likely refers to the Weihenstephan Brewery, for example, which was founded 40 years earlier (1040) but in the same time period.
Last year, Google similarly set out to solve the query ambiguity problem with an AI technique called Bidirectional Encoder Representations from Transformers, or BERT for short. BERT, which emerged from the tech giant’s research on Transformers, forces models to consider the context of a word by looking at the words that come before and after it. According to Google, BERT helped Google Search better understand 10% of queries in the U.S. in English — particularly longer, more conversational searches where prepositions like “for” and “to” matter a lot to the meaning.
For instance, Google’s previous search algorithm wouldn’t understand that “2019 brazil traveler to usa need a visa” is about a Brazilian traveling to the U.S. and not the other way around. With BERT, which realizes the importance of the word “to” in context, Google Search provides more relevant results for the query.
Like Microsoft, Google adapted AI models including BERT to other languages, specifically to improve short answers to queries, called “featured snippets,” that appear at the top of Google Search results. The company reports that this resulted in substantially better Korean, Hindi, and Portuguese snippets and general improvements in the more than two dozen countries where featured snippets are available.
Large-scale models like those now powering Bing and Google Search learn to parse language from enormous data sets — that’s what makes them large in scale. For example, the largest of Microsoft’s Turing models — Turing NLG — ingested billions of pages of text from self-published books, instruction manuals, history lessons, human resources guidelines, and other sources to achieve top results in popular language benchmarks
Predictably, large models require scalable hardware to match. Microsoft says it’s running the NLR-derived Bing model for query intent comprehension on “state-of-the-art” Azure’s N-series Virtual Machines (VM) with GPU accelerators built-in. Across four regions as of November 2019, over 2,000 of these machines were serving more than 1 million search inferences per second in four regions.
Microsoft previously experimented with field-programmable gate arrays (FPGAs), or integrated circuits designed to be configured after manufacturing, for AI computation through a system called Project Brainwave. Brainwave, which curiously escaped mention at this year’s Build conference, enabled the Bing team to train a model with 10 times the complexity compared with a version built for processors. Despite the added complexity, Brainwave’s hundreds of thousands of FPGAs deployed throughout Microsoft datacenters could return results back from the model over 10 times faster, the company claimed.
For its part, Google is using its third-generation tensor processing units (TPUs) — chips specially designed to accelerate AI — to serve search results globally. They’re liquid-cooled and designed to slot into server racks; deliver up to 100 petaflops in performance; and have been used internally to power other Google products like Google Photos, Google Cloud Vision API calls, and Google Search results.
Assuming the large-scale natural language processing trend holds, models like those in Microsoft’s Turing family appear poised to become a core part of search engine backends. If they’re anything like the models deployed today, they’ll require substantial compute to train and run — but the cost might be worth it. Taking Microsoft and Google at their word, these models have led to leaps in understanding of the billions of queries people around the world submit every day.