The Walmart.com Journey Towards Spanish Query Understanding (Part 1)

MalBot · February 2, 2023, 12:10am

Opportunity statement and challenges

Online stores in the United States offer a unique scenario for Cross-Lingual Information Retrieval (CLIR) due to the mix of Spanish and English inside and across queries. While State-of-the-art Machine Translation (MT) provides an opportunity to lift relevance by translating the Spanish queries to English before delivering them to the search engine, there are domain specific challenges in product search that render the raw application of generic MT an impractical solution.

In this first article of the series The journey towards Spanish Query Understanding at Walmart.com we will be covering the opportunity statement and challenges.

USA is home to the 2nd largest Spanish speaking population

According to the 2020 US Census, the United States is now home to 62 million Spanish speakers, representing 18.7% of its population. More people speak Spanish in US than in any other country in the world, except for Mexico. Moreover, this dominance of US amongst Spanish speaking nations is consolidating. According to the Pew Research Center, hispanics accounted for half of the US population increase from 2010 to 2020, a greater share than any other racial or ethnic group. It was also an increase of 23% over the previous decade, outpacing the nation’s 7% overall population growth.

The impact of COVID-19 in purchasing habits

While physical stores have traditionally allowed the non-English speaking customers to visually find a product, the e-commerce search experience requires familiarity with specific English terms. With about 70% of Hispanics speaking Spanish at home, such English terminology is frequently lacking.

When the pandemic hit US soil, Spanish speakers had no choice but to textually submit their every day grocery needs, cuisine needs and merchandizing needs into online stores and remote delivery systems. As professor Scott Galloway put it, the pandemic was an accelerant that fast-forwarded 10 years of adoption in online shopping and remote delivery services, amongst other trends.

A survey by Shopify on multilingual support in eCommerce showed a 13% relative increase in conversion when buyers were shown a store translated into their language. And CSA results from the series “Can’t Read, Won’t Buy” arrived to the following stats:

“65% of consumers prefer content in their native language”
“40% will not buy from websites in other languages”
“66% of business users told us they’d pay up to 30% more for a localized product”
“34% of consumers said they would also be willing to dig deeper into their wallets for products adapted to their language and market”

Walmart reaction to the influx of Spanish in the search bar

At Walmart.com, we started noticing an influx of Spanish searches as COVID-19 pandemic restrictions increased, and in-store customers began shifting to online shopping. At the height of the pandemic, Spanish queries across the Walmart app and website increased more than five times their pre-pandemic levels.

This rapid shift in behavior highlighted the need to bring Spanish Query Understanding, as it can drastically improve relevance and reduce null and low results for queries written in a language other than English. Given that our US Store and Online were indexed in English, a new layer of language detection, translation and understanding was needed to retrieve relevant items for non-English queries.

We quickly mobilized and rolled out a minimum viable product (MVP) a few months later. Our initial release has been followed by a sequence of back-end and front-end improvements that have led to where we are today, a domain-adapted query translation system that will be covered in Part 2 of this series.

Challenges of Spanish query translation

With some of them being associated to polysemy, and others to the anglo-hispanic blend colloquially known as Spanglish, this section introduces the challenges that we have faced in the pursue of an accurate and efficient Spanish Query Understanding.

Challenge #1 — Non-Translatable entities: Contrary to generic text, product oriented queries carry a high density of named entities, such as brands, movie titles, product lines, or names of sport teams. An example is the entity “corona” (“crown” in English) which should not be translated when referring to the beer brand.

Challenge #2 — Spanish Ambiguity: The polysemic “muñeca” can be translated to “wrist” or “doll” depending on the context. When such context is lacking, ambiguity can be controlled by biasing the translation towards product interpretations, i.e., intent is assumed to be “doll”. On the other hand, the Spanish search “calabaza” can be translated to either “butternut squash” or “pumpkin” products, making the disambiguation less straightforward.

Challenge #3 — Cross-Language Ambiguity: The lack of context may also harm language detection; e.g., while in English “pie” and “pan” relate to “pastry” and “cookware”, respectively, the same morphemes mean “foot” and “bread” in Spanish.

Challenge #4 — Query Translatability: Beyond accuracy, the translation of queries is not the final goal but the means to relevant results. Offering the best results requires not only detecting Spanish but also deciding whether to automatically translate or not. For example, if translated, the Spanish query “cuentos de hadas” (“fairy tales”) would lose its implicit Spanish intention, and “queso blanco” (“white cheese”) would not imply the Latin American style of cheese.

Challenge #5 — Mixed language queries: Also known as Spanglish, Hispanic customers in the US frequently mix Spanish and English in the same query, e.g., “cake de fresa”, “colchón queen” or “display grande”.

Challenge #6— Spanish dialects: The US Hispanic immigration is rooted in more than 10 Spanish-speaking countries. The table below shows the distribution of origin, as well as the US state with the largest population of hispanics with such origin.

https://es.wikipedia.org/wiki/Latino_(Estados_Unidos)

Different regions enrich the Spanish language with different dialects, leading to a variety of terms that sometimes converge to the same English intention. For example, there are 15+ Spanish words for “popcorn” across Latin America, 5+ for “appetizer”, 3+ for “avocado”, etc.

The distribution of origin listed above leads to an unbalanced distribution in the usage of Spanish dialects and variations across the states. For example, our online customers from CA and TX most frequently refer to “pools” as “albercas”, while FL customers usually prefer “piscinas”.

Dominant query variations by state as submitted to Walmart.com

Challenge #7 — Evaluation from a Search perspective: nDCG and A/B tests are traditional instruments to assess the statistical significance of multiple metrics for a new Search feature, including relevance, CTR and GMV. These instruments can be used to evaluate the impact of translation in search, however we should keep in mind the following:

These are end to end instruments that evaluate not just the quality of translation but how well it plays with the rest of the search stack to offer an effective product search experience.
The success of nDCG, when query translation is involved, requires the participation of bilingual judges so that the ratings of the Query-Item pairs that feed the test can be accurate.
For A/B testing, we ideally sample users that could be impacted by the query translation. However this is challenging in a diverse and multilingual society like US, where users frequently mix traffic in the language under study (eg. Spanish) with the dominant English. This may lead to noisy signals and neutral results.

Challenge #8 — Evaluation from a Translation perspective:

The BiLingual Evaluation Understudy (BLEU) metric is vastly adopted by the machine translation community. It belongs to the family of n-gram matching algorithms that offer simple and inexpensive implementations with high explainability. However, it has historically suffered from intolerance to paraphrasing and token order change in semantically correct translations.
To compensate for BLEU’s limitations, there has been a growing interest on metrics based on semantic similarity. BERTScore leverages contextualized token embeddings to tolerate synonyms, paraphrasing and distant dependencies in semantically correct translations. However, these approaches offer less explainability and are susceptible to the compatibility between the selected encoder and the domain under study. For example, a BERTscore could find “corona beer” and “crown beer” to be semantically close given that both “corona” and “crown” are beer brands, however, “crown beer” is an incorrect translation for “cerveza corona”

Challenge #9 — Latency: For online stores serving millions of customers, the real-time speed of the translation is as important as its accuracy.

Challenge #10 — Cold start: When supporting a secondary language in an IR system, its query activity is scarce until awareness grows amongst customers. This impacts the click-through data that could be leveraged for domain adaptation purposes.

In Part 2 of this series, we will cover the domain-adapted query translation system that has iteratively evolved inside Walmart Search to address the 10 challenges we covered above.

The Walmart.com Journey Towards Spanish Query Understanding (Part 1) was originally published in Walmart Global Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Article Link: The Walmart.com Journey Towards Spanish Query Understanding (Part 1) | by Leo Lezcano | Walmart Global Tech Blog | Feb, 2023 | Medium