For years, the review app Yelp has been a go-to resource for consumers seeking information and insights. Interestingly, Yelp has been experimenting with machine learning for quite some time; however, during the recent surge in AI, it faced challenges in integrating modern large language models (LLMs) to power some features.
Yelp found itself facing some unexpected roadblocks. Customers, especially those less familiar with the platform, found its AI features, such as its AI-powered assistant, difficult to use.
“One of the obvious lessons that we saw is that it’s very easy to build something that looks cool, but very hard to build something that looks cool and is very useful,” said Craig Saldanha, chief product officer at Yelp, in an interview with VentureBeat.
In April 2024, Yelp launched Yelp Assistant, its AI-powered service search assistant, to a broader customer base. However, usage figures for its AI tools began to decline, which presented a puzzle for the company. Saldanha explained, “The one that took us by surprise was when we launched this as a beta to consumers — a few users and folks who are very familiar with the app — [and they] loved it. We got such a strong signal that this would be successful, and then we rolled it out to everyone, [and] the performance just fell off. It took us a long time to figure out why.”
The decline in usage highlighted a key issue. Casual Yelp users, those who occasionally used the app to find a new service provider, did not expect to immediately interact with an AI representative. This meant that the user experience needed some serious fine-tuning.
From Simple to More Involved AI Features
Most people know Yelp as a website and app for finding restaurant reviews and menu photos. Indeed, Yelp can offer a lot of value. This includes finding pictures of food at new restaurants, or checking how others rate a particular dish. The platform also offers practical information, such as whether a coffee shop provides WiFi, plugs, and seating, which is a boon for those looking for a workspace.
Saldanha noted that Yelp has been investing in AI for a decade. In the past, Yelp focused on developing its own models for tasks like query understanding. “Part of the job of making a meaningful connection is helping people refine their own search intent,” he said.
As machine learning has evolved, so have Yelp’s application of the technology. It began to use AI to recognize food in photos and identify popular dishes. Later, it launched new ways to connect users with tradespeople and service providers, helping guide user searches on the platform.

Yelp Assistant is designed to find the right “Pro” to work with. Users can use the chatbox and either use the prompts or type out the task they need done. The assistant then asks follow-up questions to narrow down potential service providers before drafting a message to Pros who might want to bid for the job.
Saldanha stated that Yelp encourages Pros to respond to users themselves, but acknowledges that larger brands often use call centers to handle messages generated by Yelp’s AI Assistant.
In addition to Yelp Assistant, Yelp also introduced Review Insights and Highlights. LLMs analyze user and reviewer sentiment, which is collected into sentiment scores by Yelp. Yelp utilizes a detailed GPT-4o prompt to generate a dataset for a list of topics. Then, they are fine-tuned with a GPT-4o-mini model. The review highlights feature, which presents information from reviews, utilizes an LLM prompt to generate a dataset. However, it is based on GPT-4, with fine-tuning from GPT-3.5 Turbo. Yelp has said it will update the feature with GPT-4o and o1.
Yelp, like many other companies, is using LLMs to improve review usefulness by adding better search functions based on customer comments.
Big Models and Performance Needs
For many new AI features, including the AI assistant, Yelp has turned to OpenAI’s GPT-4o and other models. Saldanha pointed out that Yelp’s proprietary data remains the key to its assistants’ effectiveness, regardless of the specific model. Yelp keeps an open mind regarding which LLMs provide the best service for customers and does not want to be locked into a single model.
“We use models from OpenAI, Anthropic and other models on AWS Bedrock,” Saldanha noted.
Saldanha said that Yelp had created a rubric to test model performance across several categories. These included correctness, relevance, conciseness, customer safety, and compliance. He said that “it ‘s really the top end models” performed best. The company runs a small pilot with each model before considering iteration cost and response latency.
Teaching Users
Yelp has also made a strong effort to educate both casual and power users to get comfortable using the new AI features. Saldanha said that one of the first things they realized, especially with the AI assistant, is that the tone of the AI needed to feel human. It should not respond too fast or too slowly, and it could not be overly encouraging or too brusque.
“We put a bunch of effort into helping people feel comfortable, especially with that first response. It took us almost four months to get this second piece right. And as soon as we did, it was very obvious and you could see that hockey stick in engagement,” Saldanha said.
This training and fine-tuning resulted in higher usage numbers for Yelp’s AI features. After all, the goal is to develop AI features that consumers will embrace, and by adapting to the needs of their users, Yelp’s latest generation of AI tools are providing the kind of utility that keeps people coming back to the platform.