![Hero image for the usertesting guide
Effective AI:
How to choose the right generative AI features – and build them fast](https://usertesting.widen.net/content/842dbdac-b0c1-4cf4-b9cf-541971da41c7/webp/HERO-effective-ai-how-to-choose-the-right-generative-ai-features-2025-782x610.png?w=782&itok=VSydO7t6)
Generative AI has huge potential to change computing, and tech companies are scrambling to add AI features and products. But it’s hard to know exactly what to build. At the same time as generative AI gives us new capabilities, it changes customer behavior and expectations and rewrites the rules of what makes a good experience. This guide describes the problems and opportunities to help you get your AI development right on the first try.
Tech companies are under intense pressure to deliver AI-enhanced products and features to seize new business opportunities and head off competitors. But generative AI isn’t just creating new opportunities; it’s also changing customer behaviors and rewriting the rules for how to make a compelling software experience.
History shows that big changes in computing platforms usually invalidate existing business practices and make market-leading software obsolete. Yet most of us are approaching generative AI with the same product assumptions and design processes that we used for the previous generation of computing. That blind spot puts us at risk of missing the opportunity.
Part 1 of this guide describes how AI is changing the rules for software success and how that challenges us to change how we work. The key challenges are:
Even before generative AI emerged, AI software projects were notoriously risky. Fortune magazine reported that the failure rate of AI projects was 83% to 92% (link).
However, just knowing about the risks is not enough; we urgently need to understand what to do about them. In Part 2 we describe ten emerging principles of AI product development:
Part 3 of the guide gives five examples of companies that created successful new AI features.
If you follow the principles, you’ll improve the odds that your company will thrive in the new business world being created by generative AI.
Microsoft co-founder Bill Gates calls AI “the most important advance in technology since the graphical user interface….as fundamental as the creation of the microprocessor, the personal computer, the Internet, and the mobile phone.”
This sort of generational change in computing is often an extinction event for tech products and companies. The leading tech products of the day can be obsoleted rapidly, often along with the companies that made them. For example, Nokia was the world leader in mobile phones until smartphones emerged. Lotus was the leader in command-line spreadsheets, and WordPerfect dominated command-line word processors, but neither survived the change to graphical interfaces.
You might find that exhilarating if you’re creating a challenger product, or disturbing if you’re defending an industry leader. But either way, you need to understand why transitions like this are so disruptive. Tech transitions don’t just enable new products, they also invalidate many of the assumptions and business practices that made a company successful in the previous generation. Suddenly the foundations of your success become liabilities.
You can throw yourself into adopting the new technology, but unless you’ve also changed the way you think and work, you’re still at high risk of failing. Startups do well in transitions because they have fewer assumptions and processes to unlearn.
Here are some key ways the generative AI transition is changing the rules:
Most of us take for granted the tapping and swiping we do to control a smartphone, but it’s actually the end result of seventy years of disruptive generational change:
The conversational interface is the next fundamental change in the way people interact with technology: instead of clicking and dragging, we’re talking with the computer. Conversational interfaces have been in development for decades, but they didn’t break through into mainstream products because they were too limited and inflexible. Only with generative AI did the conversations become fluid and responsive enough that they could be used for generalized computing tasks.
The conversational interface is a breakthrough not only because it uses verbal commands, but also because the user is now specifying the outcome they want rather than giving commands on how to do it. There’s a great discussion of this change here.
Feelings are becoming more important in computing. Generative AI is also adding a new emotional dimension to the computer-human relationship. Our user tests of generative AI products show that humans judge a computer conversation the same way they judge a human conversation:
These emotional reactions can make or break an AI product.
Conversational AI is also driving new user behaviors. Interface consultancy Nielsen Norman Group studied how people use AI, and identified two new behaviors, which they call accordion editing and apple picking (link):
NNG wrote: “Users almost always engage in multistep iteration because the AI doesn’t deliver exactly what the user wants — it can only guess at the intent. At this point, the conversational user interface stops being easy. Users must perform significant extra work to modify the output to suit their needs.” They said they found signs of this problem in every AI conversation they studied.
We’re just at the start of understanding how AI conversations change user behaviors, but even now it’s clear that we’ll need different design approaches to deal with these new user behaviors.
Just as generative AI is changing the human-computer interface, it’s also changing the tasks that people do with computers. Generative AI can accomplish tasks that were previously tedious or impossible on a computer or smartphone. This is similar to what happened with previous generations of computing. For example, desktop publishing on a PC replaced much of the manual print production work previously done by local printshops, and spreadsheets displaced much of the data entry and calculation work previously done by data entry clerks.
It usually takes time for those new usages to emerge, and because generative AI is so recent, we’re just starting to realize which tasks it can automate. Some specific examples are:
There’s an enormous amount of speculation online about the future effects of generative AI on the economy, much of it focused on employment in particular jobs. It’s beyond the scope of this guide to say what AI will do to employment, but it’s worthwhile to point out that none of the previous generations of computing produced mass unemployment – but they all produced massive turnover in tech products and companies that were built around older work processes. So while generative AI might be a risk for individual workers, it definitely is a big danger to tech companies.
Even before the release of ChatGPT, there were widespread predictions that AI would transform customer journeys, primarily through real-time customized messages and offers (link). What generative AI has added is a relationship element – rather than just customizing traditional marketing communications, AI can present itself as a companion and coach that the user can refer to at any time.
This potentially enables a much deeper, more emotional relationship between the customer and brand. In our research on user attitudes toward AI, the heaviest adopters of generative AI were intrigued by the idea of using it as a personal assistant that learned more and more about them and served their particular needs, like an information butler. We’re seeing companies explore ways to position their bots as a friend or servant who accompanies the user on the journey of buying or using a product. This is likely to become an important area for competitive differentiation for any company using generative AI.
Generative AI is probably the fastest-growing new software product in history. It took about 20 years for email to become ubiquitous on computers and about six years for web browsers to do the same. In just under two years, ChatGPT and its competitors have reached almost all knowledge workers in English-speaking developed countries. (The source for this and other statistics on AI adoption in this guide is UT’s Generative AI Benchmark study, a study of more than 2,500 knowledge workers around the world, conducted in late 2024. The study included an extensive survey, more than 100 in-depth video interviews, and a competitive benchmark between ChatGPT, Anthropic Claude, Google Gemini, and Microsoft Copilot [link].)
This incredibly rapid adoption has created a big training and education gap for any company working on AI products. In the normal adoption curve for a new tech product, people try it and learn what they are supposed to use it for at the same time. AI chatbot adoption got out far ahead of that understanding. Our survey of AI chatbot users showed that the majority of them got it because they were curious or because they wanted to try it for a single task. Only about 20% of them are using it as a generalized personal information assistant, the way it was intended to be used.
This means companies creating AI features and products face a much bigger training and education burden than they would usually expect. There’s also a much higher than usual risk that AI products will fail because customers don’t understand what to use them for.
Because many AI chatbots are being used as an interface to the customer, and because people respond emotionally to those bots, a generative AI chatbot can affect the customer’s emotional response to a brand in ways that companies don’t expect. Traditionally, the brand’s voice and personality are created and carefully managed by specialists in the marketing team. Generative AI turns your product into a default brand spokesperson, but often without any coordination with the marketing team.
The AI transition is especially tough on tech companies because all of these challenges need to be dealt with at the same time, in a coordinated fashion. It’s rare to find a company that’s focusing on all of them. For example, some companies are doing a great job of exploring how a bot can change the customer journey, but not considering how that same bot can undermine the brand’s personality. It’s possible for a company to have a successful AI feature but still fail in the overall AI transition. To avoid that, companies need to systematically adjust their processes and mental models for AI. That’s what we cover in the next section.
It’s going to take years to define all the best practices for developing with generative AI, but even now some priorities are becoming clear. Here are the priorities we’re hearing from companies that are succeeding in AI:
Even in the best of times, needs discovery can be challenging for tech companies. It can be slow and expensive, and often it’s hard to get access to your exact target customers. Tech companies tell us that finding the time to do discovery for AI features is especially challenging because there’s so much pressure to ship something quickly.
Ironically, AI products benefit from discovery more than usual because there are so many unknowns about customer needs and reactions. Chris Carreiro, CTO of Park Place Technologies, said, “Choosing the right use cases from the start requires careful thought and planning….Rushing this process can lead to misaligned projects that veer off course….A measured approach…paves the way for successful adoption and utilization of the AI solution.” (link)
When you do discovery for an AI feature, push to understand not just customer needs but also expectations since customers are likely to be confused about what AI can and can’t do. Here are some questions to focus on:
It’s also very important to have a back-and-forth dialog between the product team and senior management on the findings and implications of the discovery process. Rand Corporation studied failure in AI projects and found that disconnects between senior management and product were a common cause of failure – specifically, targeting a problem that’s not compelling to customers, overestimating what AI can achieve, and underestimating the time needed to produce a satisfying solution (link).
How UserTesting helps
Using a human insight system like UserTesting, you can do in-depth discovery much faster than traditional processes, often in a single sprint. That enables you to focus your AI project without slowing it down. You also get videos of customers talking about their problems, which makes it much easier to communicate their needs to senior management and other stakeholders. For more details, see our guide to quick problem discovery here.
In the tech industry we’re used to designing products for ease of use and effectiveness. Those issues are still important in an AI product, but for a conversational interface you also need to design for likability and credibility. Think about the AI user experience exactly like you would a human conversation: How do you introduce yourself? What signals can you pick up from the other person, and how do you adjust to them? How do you create a connection and trust?
That may sound like a daunting task, but most of us do it instinctively every day. It’s how human beings are wired to communicate. If your AI product initiates a conversation with the user, you will trigger the users’ natural human reactions automatically. You should anticipate and use them in your product design. Here are some specific issues to consider:
Be mindful of the baggage carried by particular words. In our tests of AI chatbots, people reacted strongly to particular words used by the bot. Those could be terms that some people find offensive, but more often it’s word choices that have double meanings, or that some people associate with particular attitudes or cultures. Users sometimes tell us that a bot sounds like a particular type of person, or that they feel it’s trying to imply something. It’s fine if you wanted to create that impression, but otherwise it’s a problem.
Other research has shown that details like use of punctuation and emojis can also change how a chatbot is perceived (link).
Think about the overall vocabulary. In a comparative test between AI chatbots, we asked them to write a response using “terms that the average person can understand.” One of the bots chose very simplified language that included two repetitions of the adjective “super.” Participants mocked this response heavily, with one saying scornfully that it was appropriate for his daughter but not for him.
Adjust the length and structure of the answer to the context. Prior to generative AI, a rule of thumb among chatbot designers was to keep bot responses to the length of a Twitter message (link). That rule was driven by the limited capabilities of those bots; longer responses could be off target or intimidating. Generative AI’s flexibility has made this rule obsolete. Although the initial prompt should be short (something like “how can I help you?”), the new best practice is to adjust the length and structure of the response to the type of question the user asked.
But how do you adapt it? There are at least six different types of generative AI conversation, according to Nielsen Norman Group (link):
Their recommendation is to “use the length and the structure of the user’s prompt, as well as the complexity of the answer to determine the conversation type early in the exchange and adjust behavior accordingly.”
Specific tactics the bot can use include:
Don’t put yourself under pressure to shorten the conversation arbitrarily. NNG found no correlation between a conversation’s length (the number of queries and responses) and the users’ ratings of its helpfulness and trustworthiness. Just like a human conversation, as long as the user is feeling rewarded it’s OK to continue talking.
There is, though, a correlation between the length of an individual response and user satisfaction. For an informational query (“how do I do X” or “explain X”), our tests have shown that a single paragraph response is usually viewed as not enough. People gave higher ratings to responses with several short paragraphs, preferably including brief context on why the AI chose that response.
On the other hand, a very long answer (several long paragraphs) can be perceived as overwhelming. So it’s usually best for the bot to give a moderately detailed response and give the user opportunities to guide the conversation to the next step. Remember, there’s no penalty to having several back and forth interactions as long as the user feels they are making progress.
In our tests, we’ve found that the structure of a response can also drive higher user satisfaction. Bulleted and numbered lists are often valued if they make information easier to absorb, and users often say they appreciate it if the answer includes some follow-up web links the user can click on. However, users do not respond well if the response looks too much like a web search result. They tell us that they chose to use a chatbot instead of web search because they’re looking for information and understanding rather than just links they have to follow and interpret on their own.
As with many issues around generative AI, it’s all about hitting a balance.
How UserTesting helps
You can test all of these wording, message structure, and length issues through user tests. If you have already shipped your AI product, you can bring the participants to it, give them a scenario on what to do, and then ask them to narrate their thinking and reactions as they use the bot. You can see what they do on screen and watch their faces and listen to their tone of voice as they use the product. Also ask them at the end of the test how they felt about the task. It’s often very helpful to compare what they did online to how they felt about it. Sometimes people might appear to struggle but feel good about it, or vice-versa.
Be sure to ask what they thought of the result they got, not just how easy or difficult the product was to use.
Competitive testing can be a little harder, since most AI chatbots require a personal login verified through e-mail. That prevents you from having participants log into a competitive account you set up. You could recruit people who have accounts with your competitors, but that’s problematic if they’re using an account paid for by their employer, as you could expose confidential information from that company.
An alternative approach we’ve used is to run the same query on your bot and the competition, and then show users the anonymized responses from the bots. Give the participant context on the query you asked and anything else you think is relevant, and then have them compare and rank the responses. This won’t tell you how people feel about the overall experience of using the bot, but it can give you good feedback on its responses.
If you want to test the overall experience of using the bot but can’t work out how to have participants log in, you can record screen video of yourself using the bot, and have the participants watch and respond to that video.
In our user research on AI chatbots, the bot’s “personality” emerged as an important factor in user affinity. Based solely on the wording and structure of a bot’s responses, people made sweeping emotional judgments about a bot – “it’s friendly,” “it’s credible,” “it’s arrogant,” “it’s formal,” etc. In one case, a user said that a bot was “not respectful.” It’s striking that people are looking for a complex behavior like respect from a bot.
These judgments affected the users’ feelings about the chatbot’s brand, and also their willingness to recommend the product to others. In essence, the chatbot is functioning as a representative of the brand. Its personality affects the brand’s personality.
If your company hasn’t crafted a brand personality, there’s a lot of information on how to use AI to build one (examples here, here, and here). There’s much less information on how to infuse a brand’s personality into an AI chatbot. Current best practice is to write a description of the brand personality and include that in the training for the chatbot. When testing the chatbot, you can also use some of the standard techniques used by branding agencies – for example, ask users to describe the type of person they imagine the bot to be. Here are some other issues to consider:
How UserTesting helps
Many people don’t realize that you can use a human insight system to test for things other than usability. You can show participants anything that can be displayed by a computer—including text, images, and videos—and ask their reactions. Because you can record participants’ faces, you can get insights into how they feel, and built-in AI analysis can identify emotions in the things they say.
You can also use a smartphone test to record how people react to a product while they are out and about in public or at work. This can be very helpful for testing AI bots that are intended to help with real-world customer journeys, such as shopping.
Generative AI changes the role of the designer. Designing a conversational interface is less about manipulating visual elements in a preferred sequence of steps, and more about facilitating a conversation that can move in many different directions. Nielsen Norman Group recommends coaching the AI on the outcomes you’re trying to produce:
“For example, we might prioritize different types of user actions or categorizing information into must show, should show, never show. We may not need to specify individual design details, but we’ll need to help a genUI system understand our user and business goals.”
They call this process “outcome-oriented design” (link). It’s a major mindset change for your design team.
Although a conversational interface is very flexible, it’s not ideal for all human-computer interactions. Clicking and tapping are superior for doing some types of work. Designer Maximillian Piras has a great perspective on this:
“Interactions should remain outside of an input field when words are less efficient…Describing which part of an image to manipulate is much more cumbersome than clicking it…Sliders seem like a better fit for sizing, as saying “make it bigger” leaves too much room for subjectivity. Settings like colors and aspect ratios are easier to select than describe.”
For many AI-enhanced products, a combination of conversation and traditional UI will probably be most efficient (link).
How UserTesting helps
You should set up comparative tests for different combinations of traditional and conversational UI. For the early stages of development, when you don’t have a working bot, you can show the users mockups and ask them what they would do first with the interface and what they think each design element on the screen would do.
If you repeat these tests as you work on the design, you can get the broad outlines of the interaction well verified before you start to build it. This is likely to spare you a lot of rework later.
For people using a computer, typing to interact with an AI chatbot feels natural and convenient. When we surveyed chatbot users, most people said they prefer typing over speech to have an AI conversation. Typing a query keeps it private, and most people can read faster than they can listen to a spoken response.
But on a mobile phone or tablet, the balance changes. Typing is much slower and more awkward, so about half of users say they prefer spoken conversations when using AI on a smartphone. The other half still prefers the privacy of typing.
This has important implications for chatbot design. Different modes of conversation will be preferred on different devices, and for different types of queries. A well-designed chatbot will need to accommodate these differences. One size doesn’t fit all.
How UserTesting helps
You can easily specify that a test is to be run on computers or smartphones. Do the same test of your AI bot with both types of devices and compare the results.
We also recommend running real-world mobile tests so you can evaluate how well the system performs when the user is surrounded by distractions (we call those “destination tests”).
You can work with our participant recruiting team to have participants go to a particular location before they take the test. Often, an app that appears easy in a quiet setting is much harder to use and understand “in the wild.”
There has been a lot of discussion of racial stereotyping in AI. For example, there’s evidence that ethnic use of language may skew job referrals made by an AI bot (link). But the problem is even more widespread and subtle than most people realize. In our research, people living outside the US said AI struggles to generate the standard English used in their countries. For example, an Australian said that when he asked a chatbot to use Australian English, the bot inserted references to kangaroos and other stereotypical Australian elements.
Any group is potentially vulnerable to this sort of stereotyping. The publication Rest of World ran a study on image generation that showed how every group they tested was homogenized and stereotyped (link).
Since the problem is embedded in the data sets used to train AI, it’s not easy to fix. The best practice is to use a combination of techniques to whittle away at harm. Adobe suggests the following (link):
How UserTesting helps
It’s easy to say that a company should move slower when developing AI, but many companies are unlikely to take the advice. There’s just too much pressure on companies to “do something” with AI. A human insight system lets you get fast feedback without slowing down the project, even on contentious issues like bias. You can recruit participants by their demographic background, and then ask them to read various AI responses to identify insensitive wording or other problems.
If you’re testing on people with common demographic characteristics (age, gender, even particular job types), those tests will usually fill within hours. If you’re looking for rare types of people (a rare nationality or unusual job role), testing may take one or more days. You may also want recruitment help (link).
One of the greatest strengths of the graphical user interface and its desktop metaphor is that users can easily discover what the system is capable of doing. Menus are usually grouped into common themes so it is easy to dig through them, and icons like a printer and a file folder are pretty much self-explanatory.
The text input box of an AI chatbot is almost the opposite of the desktop metaphor: Although it’s obvious that you should type something, there’s no indication of what the system can or can’t do. AI chatbot companies are starting to augment the text box with examples of prompts and tasks, selectable personas for the bot, options for next steps after a response, and supplemental icons and buttons.
There’s a lot of experimentation going along, but some best practices are emerging (link):
How UserTesting helps
It’s easy to test interface elements like icons and buttons. Just display them to the user and ask participants what they would expect that element to do.
Or you can give the participants a task and a mockup of the interface, and ask them how they would approach the task—where they would click, what they would type, etc. In all of these cases you can record both the screen and the participant’s face, letting you evaluate both the simplicity and the emotional content of the design.
Discussions of “sustainability” in AI usually focus on managing total costs across the economy – for example, the energy consumption of all the data centers running AI models. Those issues are indeed important for society, but there’s another form of sustainability that comes a lot closer to home for most companies: making sure you don’t lose money on your AI products.
For most software, the cost of development dominates the total cost of the product. When SAAS software is deployed, the developer pays a bit for storage and processing and networking, but that’s usually minimal. But due to the massive calculation done by a generative AI model, the cost of answering a query can add up significantly. A single moderately complicated query to a leading-edge third party AI chatbot could cost a penny, and if the answer is complicated it could cost several cents or more. Generating a simple image could cost four cents.
Those may sound like minor costs, but the more successful an AI product is, the more it will be used, and the more the costs add up. Gartner estimates that the recurring calculation cost per user for an enterprise AI product is between $280 and $21,000 a year, depending on the complexity of the task (link).
As Maximillian Piras put it, “Each interaction requires intense calculation, so costs scale linearly with usage. Without a zero-marginal cost of reproduction, the common software subscription model becomes less tenable.” (link)
On top of that expense are the costs of training and maintaining the model, which you’ll need to amortize across those queries. Overall, the cost to run a generative AI product can range from a few hundred dollars to $190,000 or more per month (link). IBM estimates that generative AI will increase the average company’s computing costs by 89% between 2023 and 2025 (link).
If you’re charging customers per query, or monetizing your AI bot in some other way, the additional costs may not be too burdensome. But if you’re giving away AI services, as many companies are today, the costs can add up surprisingly. The more popular your free AI product, the bigger your financial exposure. It’s best to think through your long-term AI economic model now, before you have to start charging for a service that you previously told customers was free.
Even if you’re creating an enterprise product that customers expect to pay for, you still need to check that the customer will be willing to pay enough to make the product profitable. That means not just verifying that you’re addressing a valid pain point but that the buyer is willing to sign off on the size of contract you need. This is an issue for all enterprise software, but it’s especially true for AI products because of the high running costs of AI, and the high opportunity cost of assigning scarce AI development resources to the wrong project.
At the Wall Street Journal’s CIO Summit in late 2024, the Chief Digital Officer at Harman (a custom engineering subsidiary of Samsung) said AI projects should be expected to prove their financial worth within 12 months, with worth measured by either higher employee productivity or revenue generation. The VP of generative AI at Databricks, a cloud data firm, recommended breaking AI initiatives into smaller chunks that are more easily proven out (link).
How UserTesting helps
It’s easy to collect end-user feedback on pricing scenarios—just show them the product, tell them what you’ll charge, and have them respond. But there’s a catch—a testing situation biases people to give positive feedback because they are very focused on your product, and most people subconsciously prefer to give nice feedback.
So a pricing scheme that’s only OK in reality may get fairly favorable feedback in a test. Generally, you should interpret anything less than enthusiastic support for a pricing scheme to be a sign of trouble.
If you’re creating an enterprise product sold through subscription, test the pricing on buyers, not users. Those senior participants can be difficult to source independently, but UserTesting’s professional services team can help you recruit them.
Because generative AI is so new, experience testing is even more important than it is for traditional software. As Nielsen Norman Group put it, “Studying users will become even more crucial as traditional design principles and assumptions are challenged and user behavior shifts. Testing will ensure that the dynamically generated interfaces effectively meet diverse user needs and preferences.” (link)
Here are some of the best practices for testing an AI product:
Nielsen-Norman wrote a thorough description of how it tested AI chatbots; you can see it here.
How UserTesting helps
Because AI development has so many unknowns, it’s advisable to test frequently in small tranches. Rather than running a test with 100 participants every quarter, run small-scale tests with just a handful of participants whenever you have something new for them to look at, even if it’s a small change.
The right human insight system makes those tests easy and quick to manage. Incremental testing helps you find problems before they get deeply embedded within the product when fixing them is much more costly. Over time, the small tests will add up to a larger sample, so you can also look across the tests to spot more subtle issues.
The right testing program enables you to move faster because you have high confidence that your solution will be on target.
Once you’ve completed the ten steps, you’ll need to choose a set of key performance indicators (KPIs) to track your progress. Many of the KPIs used for AI products are familiar holdovers from other software products—for example, adoption, frequency of use, session length, and abandonment rate. Other technical metrics are specific to AI. Google has written two excellent overviews on all of these metrics, which you can find here and here. The metrics include:
In addition to Google’s metrics, leading-edge companies are also starting to measure aspects of the AI system’s personality and emotional impact, with a special focus on likability and credibility.
How UserTesting helps
You can structure user tests to measure the personality of AI chatbots. In our AI chatbot benchmark study, we had the participants read responses from the various bots, anonymized so they could not tell which bot was which.
We then asked the participants to rate the bots’ responses on various attributes, such as ease of understanding and trustworthiness. For example, we asked participants, “Do you trust the answer?” Participants were given a 1-7 scale, with 1 = ”definitely no” and 7 = ”definitely yes.” The participants were also asked to explain out loud why they chose that rating. This step is very important so you’ll understand what you need to fix in order to improve the score.
We also had participants stack rank the responses, from best to worst, and had them explain why they made their choices. That helped illuminate the sometimes subtle distinctions they made between bots.
Kendra is Amazon Web Services’ AI-enhanced tool that helps companies create their own search products. Kendra scans common data formats, both structured and unstructured, and then analyzes the contents for context and to integrate that information into search results.
The Kendra team used user tests to discover the pain points of search developers and then tested prototypes as they iterated on the design.
Based on what was learned from UserTesting, Amazon added a dashboard to Kendra that helps search developers track the effectiveness of user searches, fine-tune the search models, and improve them over time. Customers also use the platform to fix or remove dead links and track user behavior automatically.
Watch Amazon's story below.
Streaming service Deezer uses its Flow feature to curate music playlists for its customers. The company wanted to use AI to further personalize the experience, and used UserTesting to discover customer needs. The company found that casual listeners had a different music consumption pattern from its core users. While the core users tended to pick particular music genres, casual users wanted to customize their playlists by emotion.
The data team classified 90 million songs by their emotional content, while the user research team explored ways to classify user emotions. The result was a new feature, Flow Moods. Deezer iterated on its prototype rapidly through UserTesting tests in six countries.
Within the first month of its release, one million listeners began to use Flow Moods. Less than a year later, 700,000 users accessed Flow Moods per week. The new feature also helped with customer retention—listeners who use Flow Moods are more likely to reconnect than listeners who don’t, and they listen to more music than non-users of the feature. “UserTesting helps us to really get into the mind of the user in a way that quantitative data can’t,” said Fernanda Senko, User Researcher at Deezer.
Hear more about Deezer's success below.
Canada’s second-largest airline was developing a voice-driven AI assistant, “Ask WestJet,” to help travelers and reduce their frustrations. The company used UserTesting to discover friction points in the current customer journey and understand how people would ask questions of an AI assistant.
They also explored whether customers would ask questions differently on different platforms (for example, Alexa vs. Google). The research found that travelers sometimes get inaccurate information on things like baggage fees, size restrictions, and other pre-flight information.
So WestJet tailored the assistant to respond to those issues and built a baggage size calculator for Google Voice Assistant. The changes increased accuracy and brand trust while decreasing complaints about customer service. Learn more about WestJet's success here.
The Zigzag mobile app (supported by Purina) is a puppy-raising coaching app that gives breed-specific advice and guides people through the puppy-raising journey. Through discovery research on the UserTesting platform, Zigzag discovered that puppy owners hesitated to ask the company’s live coaches for advice on issues that they considered to be simple or trivial.
At the same time, Zigzag realized that it couldn’t scale the number of live puppy coaches to meet all of its customer needs. So the company added a generative AI coach to its offering.
The company used user tests to confirm user willingness to pay for the solution, measure user satisfaction with the answers, evaluate whether users understand the feature, and validate that users would ultimately prefer the AI chat over searching with Google. It also did multiple rounds of tests on prompts and prototypes during development.
The result: There were 80,000 interactions with the bot in the first six months, comments on it were 96% positive, and there was a 27% reduction in support requests to human puppy coaches.
Aimee is the personal assistant chatbot of EE, the leading consumer telecoms brand in the UK (EE is owned by BT Group, formerly British Telecom). The goal of Aimee is to help consumers with whatever they need, but the system was being overwhelmed with questions about roaming services. Only 16% of roaming questions were resolved by the bot; the rest were routed to humans in customer support.
EE added AI features to the bot to predict where people would travel and the cost of roaming in those locations, then tested the improvements through UserTesting to ensure that it would meet customer needs. The launch of the new bot was very successful.
Chatbot containment rates increased by 75%, clickthroughs to the roaming calculator increased by 50%, and total call volume to the contact centers decreased by 2.4%.
Watch below to learn how the team at EE leveraged user feedback to improve its AI chatbot experience.
Designing for AI. The best article we’ve seen on the challenges of designing an AI product. Required reading for practitioners, and highly recommended for executives who want to understand what they’re getting into.
Microsoft cofounder Bill Gates on the age of AI. Written in 2023, this is a good overview of the potential benefits from AI, informed by the decades of tech industry history that Gates participated in. The article is a good starting point for an exec who wants to think about how AI might change their particular industry.
Impact of generative AI on conversational design. A chat designer who predates generative AI describes how her field used to work and how generative AI is changing it. Great perspective.
Generative UI. Nielsen Norman Group describes how interface design is changing in the age of generative AI. NNG has done a series of articles and tests exploring the effect of generative AI on customer experience, with practical advice on what to do about it. Throughout our guide we’ve inserted links to the best articles. The NNG material is very good reading for practitioners.
Pitfalls of rushing AI implementations and Should businesses hold back on AI adoption? These two articles make the case for slowing down a bit and planning before you rush into an AI project.
Customer experience in the age of AI. From the Harvard Business Review in 2022, this article predates the release of ChatGPT. But it’s still very useful because it highlights the people and organizational issues that can hinder a company’s success in AI. Those issues are still completely relevant today.
UserTesting resources
How to test AI software. Practical advice for practitioners.
Generative AI chatbots: Overhyped but still underestimated. Our research on knowledge worker attitudes and adoption of AI, plus a competitive comparison of user reactions to the top four AI chatbots. Learn what does and doesn’t work, via real-world feedback from users.
Discover the hidden ROI of your customers' insights. Book a meeting with our Sales team today to learn more.