Effective AI: how to choose the right generative AI features

In this guide

Effective AI: how to choose the right generative AI features—and build them fast

Generative AI has huge potential to change computing, and tech companies are scrambling to add AI features and products. But it’s hard to know exactly what to build. At the same time as generative AI gives us new capabilities, it changes customer behavior and expectations and rewrites the rules of what makes a good experience. This guide describes the problems and opportunities to help you get your AI development right on the first try.

Executive summary

Tech companies are under intense pressure to deliver AI-enhanced products and features to seize new business opportunities and head off competitors. But generative AI isn’t just creating new opportunities; it’s also changing customer behaviors and rewriting the rules for how to make a compelling software experience.

History shows that big changes in computing platforms usually invalidate existing business practices and make market-leading software obsolete. Yet most of us are approaching generative AI with the same product assumptions and design processes that we used for the previous generation of computing. That blind spot puts us at risk of missing the opportunity.

Tight for time? Listen to the AI-generated audio summary of this guide created with NotebookLM.

Part 1 of this guide describes how AI is changing the rules for software success and how that challenges us to change how we work. The key challenges are:

The way people interact with technology is changing. We’re evolving a new usage metaphor rooted in conversations rather than manipulating objects on a desktop
The things people do with technology are changing. Generative AI can accomplish tasks that were previously tedious or impossible on a computer or smartphone
Customer journeys are being disrupted. AI has the potential to dramatically change journeys in both product and marketing
Adoption has outstripped understanding. The massive hype for generative AI has upended the usual adoption process, creating customer confusion and challenges that most of us haven’t seen before
The lines between the product team and the marketing team are blurring. When an AI chatbot becomes your company spokesperson, who drives your brand image?

Even before generative AI emerged, AI software projects were notoriously risky. Fortune magazine reported that the failure rate of AI projects was 83% to 92% (source).

However, just knowing about the risks is not enough; we urgently need to understand what to do about them. In Part 2 we describe ten emerging principles of AI product development:

Double down on customer needs discovery
Optimize your product for emotion and credibility, not just usability
Plan your product’s personality carefully
Coach the AI on your goals
Create hybrid interfaces
Balance spoken and written conversation
Manage stereotyping
Make functionality easy to discover
Plan carefully for economies of scale
Test your products differently

Part 3 of the guide gives five examples of companies that created successful new AI features.

If you follow the principles, you’ll improve the odds that your company will thrive in the new business world being created by generative AI.

UPCOMING WEBINAR

Effective AI: how to choose the right generative AI features—and build them fast

In this webinar, you'll learn new AI best practices, including:

How the conversational interface is changing the way people interact with technology
How to optimize AI for emotional impact and credibility, not just usability
How you can make AI functionality more discoverable, increasing user adoption

Part 1: The challenge. How generative AI is changing the rules of product experience

Microsoft co-founder Bill Gates calls AI “the most important advance in technology since the graphical user interface….as fundamental as the creation of the microprocessor, the personal computer, the Internet, and the mobile phone.”

This sort of generational change in computing is often an extinction event for tech products and companies. The leading tech products of the day can be obsoleted rapidly, often along with the companies that made them. For example, Nokia was the world leader in mobile phones until smartphones emerged. Lotus was the leader in command-line spreadsheets, and WordPerfect dominated command-line word processors, but neither survived the change to graphical interfaces.

You might find that exhilarating if you’re creating a challenger product, or disturbing if you’re defending an industry leader. But either way, you need to understand why transitions like this are so disruptive. Tech transitions don’t just enable new products, they also invalidate many of the assumptions and business practices that made a company successful in the previous generation. Suddenly the foundations of your success become liabilities.

You can throw yourself into adopting the new technology, but unless you’ve also changed the way you think and work, you’re still at high risk of failing. Startups do well in transitions because they have fewer assumptions and processes to unlearn.

Here are some key ways the generative AI transition is changing the rules:

The way people interact with computers is changing (again)

Most of us take for granted the tapping and swiping we do to control a smartphone, but it’s actually the end result of seventy years of disruptive generational change:

The earliest electronic computers were controlled by cables and switches that physically reconfigured the machine to perform different functions.
The switches were replaced with punch cards or tape. Responses from the computer were usually delivered through paper printouts.
Teletype terminals were added, which enabled the user to enter commands using a keyboard and see text responses in real time on a paper printout or later a video display. As computers shrank and proliferated, the first personal computers adapted this “command-line interface.” Some could also display very primitive graphics.
In an effort to make computers more approachable, graphical user interfaces were developed, which included a bitmapped screen to display images, and a mouse to manipulate objects on the screen. These were unified by a visual “desktop” metaphor in which icons represented files and programs.
When touchscreen smartphones entered the market, they retained the icons of the graphical interface, but added tapping, swiping, and other gestures enabled by the touchscreen.

The conversational interface is the next fundamental change in the way people interact with technology: instead of clicking and dragging, we’re talking with the computer. Conversational interfaces have been in development for decades, but they didn’t break through into mainstream products because they were too limited and inflexible. Only with generative AI did the conversations become fluid and responsive enough that they could be used for generalized computing tasks.

The conversational interface is a breakthrough not only because it uses verbal commands, but also because the user is now specifying the outcome they want rather than giving commands on how to do it. There’s a great discussion of this change here.

Feelings are becoming more important in computing. Generative AI is also adding a new emotional dimension to the computer-human relationship. Our user tests of generative AI products show that humans judge a computer conversation the same way they judge a human conversation:

Users impute huge importance to details of wording and formatting
They form emotional conclusions about the “person” they’re talking to – for example, that the chatbot is friendly, aloof, or even disrespectful
They may take offense at small nuances of the conversation

These emotional reactions can make or break an AI product.

Conversational AI is also driving new user behaviors. Interface consultancy Nielsen Norman Group studied how people use AI, and identified two new behaviors, which they call accordion editing and apple picking (source):

In accordion editing, users cycle between collecting a lot of information from the AI, and then narrowing it down to something more manageable. For example, the user might ask for a list of things to do in a city, and then ask the AI to narrow it down for a particular location. NNG said users often repeatedly broaden and restrict their queries in an AI session. Sometimes accordion editing is an efficient way to use AI, but sometimes it indicates that the user is struggling to get what they want from the bot.
Apple picking is when users extract several tidbits of information from earlier parts of an AI conversation and then combine them into a single prompt. For example, they might ask the bot to build an email message incorporating three points made earlier in a thread with the chatbot. Sometimes those points come from much earlier, and are difficult for the user to find and specify.

NNG wrote: “Users almost always engage in multistep iteration because the AI doesn’t deliver exactly what the user wants — it can only guess at the intent. At this point, the conversational user interface stops being easy. Users must perform significant extra work to modify the output to suit their needs.” They said they found signs of this problem in every AI conversation they studied.

We’re just at the start of understanding how AI conversations change user behaviors, but even now it’s clear that we’ll need different design approaches to deal with these new user behaviors.

The things people do with technology are changing

Just as generative AI is changing the human-computer interface, it’s also changing the tasks that people do with computers. Generative AI can accomplish tasks that were previously tedious or impossible on a computer or smartphone. This is similar to what happened with previous generations of computing. For example, desktop publishing on a PC replaced much of the manual print production work previously done by local printshops, and spreadsheets displaced much of the data entry and calculation work previously done by data entry clerks.

It usually takes time for those new usages to emerge, and because generative AI is so recent, we’re just starting to realize which tasks it can automate. Some specific examples are:

Translation. The Economist wrote in December 2024 that generative AI is on a path to replace most of the work of human translators (source)
Legal research. Bloomberg Law reported in 2024 that generative AI is automating much of the case law research that was formerly done by paralegals (source)
Medical record-keeping. The Mayo Clinic reported that generative AI is reducing by 70% the workload of physicians entering information into electronic record-keeping systems, reducing physician burnout (source)

There’s an enormous amount of speculation online about the future effects of generative AI on the economy, much of it focused on employment in particular jobs. It’s beyond the scope of this guide to say what AI will do to employment, but it’s worthwhile to point out that none of the previous generations of computing produced mass unemployment – but they all produced massive turnover in tech products and companies that were built around older work processes. So while generative AI might be a risk for individual workers, it definitely is a big danger to tech companies.

Customer journeys are being disrupted

Even before the release of ChatGPT, there were widespread predictions that AI would transform customer journeys, primarily through real-time customized messages and offers (source). What generative AI has added is a relationship element – rather than just customizing traditional marketing communications, AI can present itself as a companion and coach that the user can refer to at any time.

This potentially enables a much deeper, more emotional relationship between the customer and brand. In our research on user attitudes toward AI, the heaviest adopters of generative AI were intrigued by the idea of using it as a personal assistant that learned more and more about them and served their particular needs, like an information butler. We’re seeing companies explore ways to position their bots as a friend or servant who accompanies the user on the journey of buying or using a product. This is likely to become an important area for competitive differentiation for any company using generative AI.

Adoption has outstripped understanding

Generative AI is probably the fastest-growing new software product in history. It took about 20 years for email to become ubiquitous on computers and about six years for web browsers to do the same. In just under two years, ChatGPT and its competitors have reached almost all knowledge workers in English-speaking developed countries. (The source for this and other statistics on AI adoption in this guide is UT’s Generative AI Benchmark study, a study of more than 2,500 knowledge workers around the world, conducted in late 2024. The study included an extensive survey, more than 100 in-depth video interviews, and a competitive benchmark between ChatGPT, Anthropic Claude, Google Gemini, and Microsoft Copilot [source].)

This incredibly rapid adoption has created a big training and education gap for any company working on AI products. In the normal adoption curve for a new tech product, people try it and learn what they are supposed to use it for at the same time. AI chatbot adoption got out far ahead of that understanding. Our survey of AI chatbot users showed that the majority of them got it because they were curious or because they wanted to try it for a single task. Only about 20% of them are using it as a generalized personal information assistant, the way it was intended to be used.

This means companies creating AI features and products face a much bigger training and education burden than they would usually expect. There’s also a much higher than usual risk that AI products will fail because customers don’t understand what to use them for.

The lines between product and marketing are blurring

Because many AI chatbots are being used as an interface to the customer, and because people respond emotionally to those bots, a generative AI chatbot can affect the customer’s emotional response to a brand in ways that companies don’t expect. Traditionally, the brand’s voice and personality are created and carefully managed by specialists in the marketing team. Generative AI turns your product into a default brand spokesperson, but often without any coordination with the marketing team.

The challenges reinforce each other

The AI transition is especially tough on tech companies because all of these challenges need to be dealt with at the same time, in a coordinated fashion. It’s rare to find a company that’s focusing on all of them. For example, some companies are doing a great job of exploring how a bot can change the customer journey, but not considering how that same bot can undermine the brand’s personality. It’s possible for a company to have a successful AI feature but still fail in the overall AI transition. To avoid that, companies need to systematically adjust their processes and mental models for AI. That’s what we cover in the next section.

Part 2: The solution. 10 steps to create effective AI products

It’s going to take years to define all the best practices for developing with generative AI, but even now some priorities are becoming clear. Here are the priorities we’re hearing from companies that are succeeding in AI:

Double down on customer needs discovery
Optimize your product for emotion and credibility, not just usability
Plan your product’s personality carefully, and add KPIs to track it
Coach the AI on your goals
Create hybrid interfaces
Balance spoken and written conversation
Manage stereotyping
Make functionality easy to discover
Plan carefully for economies of scale
Test your products differently

1. Double down on customer needs discovery

Even in the best of times, needs discovery can be challenging for tech companies. It can be slow and expensive, and often it’s hard to get access to your exact target customers. Tech companies tell us that finding the time to do discovery for AI features is especially challenging because there’s so much pressure to ship something quickly.

Ironically, AI products benefit from discovery more than usual because there are so many unknowns about customer needs and reactions. Chris Carreiro, CTO of Park Place Technologies, said, “Choosing the right use cases from the start requires careful thought and planning….Rushing this process can lead to misaligned projects that veer off course….A measured approach…paves the way for successful adoption and utilization of the AI solution.” (source)

When you do discovery for an AI feature, push to understand not just customer needs but also expectations since customers are likely to be confused about what AI can and can’t do. Here are some questions to focus on:

Are you solving a compelling need (as opposed to just an irritation or inconvenience)?
What does the customer expect from a generative AI solution to the problem?
Can the current state of the art in generative AI produce a satisfying solution?
Will the customer understand how to use it? If not, how can you best educate them?
Does your approach trigger any concerns about information security? (We’re finding that most consumers are not deeply concerned about AI security, but for the people who are concerned, it’s a big issue.)

It’s also very important to have a back-and-forth dialog between the product team and senior management on the findings and implications of the discovery process. Rand Corporation studied failure in AI projects and found that disconnects between senior management and product were a common cause of failure – specifically, targeting a problem that’s not compelling to customers, overestimating what AI can achieve, and underestimating the time needed to produce a satisfying solution (source).

How UserTesting helps

Using a human insight system like UserTesting, you can do in-depth discovery much faster than traditional processes, often in a single sprint. That enables you to focus your AI project without slowing it down. You also get videos of customers talking about their problems, which makes it much easier to communicate their needs to senior management and other stakeholders. For more details, see our guide to quick problem discovery here.

2. Optimize your product for emotion and credibility, not just usability

In the tech industry we’re used to designing products for ease of use and effectiveness. Those issues are still important in an AI product, but for a conversational interface you also need to design for likability and credibility. Think about the AI user experience exactly like you would a human conversation: How do you introduce yourself? What signals can you pick up from the other person, and how do you adjust to them? How do you create a connection and trust?

That may sound like a daunting task, but most of us do it instinctively every day. It’s how human beings are wired to communicate. If your AI product initiates a conversation with the user, you will trigger the users’ natural human reactions automatically. You should anticipate and use them in your product design. Here are some specific issues to consider:

Be mindful of the baggage carried by particular words. In our tests of AI chatbots, people reacted strongly to particular words used by the bot. Those could be terms that some people find offensive, but more often it’s word choices that have double meanings, or that some people associate with particular attitudes or cultures. Users sometimes tell us that a bot sounds like a particular type of person, or that they feel it’s trying to imply something. It’s fine if you wanted to create that impression, but otherwise it’s a problem.

Other research has shown that details like use of punctuation and emojis can also change how a chatbot is perceived (source).

Think about the overall vocabulary. In a comparative test between AI chatbots, we asked them to write a response using “terms that the average person can understand.” One of the bots chose very simplified language that included two repetitions of the adjective “super.” Participants mocked this response heavily, with one saying scornfully that it was appropriate for his daughter but not for him.

Adjust the length and structure of the answer to the context. Prior to generative AI, a rule of thumb among chatbot designers was to keep bot responses to the length of a Twitter message (source). That rule was driven by the limited capabilities of those bots; longer responses could be off target or intimidating. Generative AI’s flexibility has made this rule obsolete. Although the initial prompt should be short (something like “how can I help you?”), the new best practice is to adjust the length and structure of the response to the type of question the user asked.

But how do you adapt it? There are at least six different types of generative AI conversation, according to Nielsen Norman Group (source):

Short conversations
- Search (very much like a web search)
- Pinpointing (a query that includes very specific details on the desired response)
Longer conversations
- Funneling (a broad query that becomes more specific as the conversation proceeds)
- Exploring (a broad exploratory conversation that does not funnel)
- Chiseling (exploring different facets of a topic)
- Expanding (a narrow query that expands as the conversation proceeds)

Their recommendation is to “use the length and the structure of the user’s prompt, as well as the complexity of the answer to determine the conversation type early in the exchange and adjust behavior accordingly.”

Specific tactics the bot can use include:

Ask questions for clarification
Suggest followup prompts the user can follow
Give examples of the information the bot could provide
If the bot can’t find a precise answer to a query, offer related followups that can help the user find relevant information

Don’t put yourself under pressure to shorten the conversation arbitrarily. NNG found no correlation between a conversation’s length (the number of queries and responses) and the users’ ratings of its helpfulness and trustworthiness. Just like a human conversation, as long as the user is feeling rewarded it’s OK to continue talking.

There is, though, a correlation between the length of an individual response and user satisfaction. For an informational query (“how do I do X” or “explain X”), our tests have shown that a single paragraph response is usually viewed as not enough. People gave higher ratings to responses with several short paragraphs, preferably including brief context on why the AI chose that response.

On the other hand, a very long answer (several long paragraphs) can be perceived as overwhelming. So it’s usually best for the bot to give a moderately detailed response and give the user opportunities to guide the conversation to the next step. Remember, there’s no penalty to having several back and forth interactions as long as the user feels they are making progress.

In our tests, we’ve found that the structure of a response can also drive higher user satisfaction. Bulleted and numbered lists are often valued if they make information easier to absorb, and users often say they appreciate it if the answer includes some follow-up web links the user can click on. However, users do not respond well if the response looks too much like a web search result. They tell us that they chose to use a chatbot instead of web search because they’re looking for information and understanding rather than just links they have to follow and interpret on their own.

As with many issues around generative AI, it’s all about hitting a balance.

How UserTesting helps

You can test all of these wording, message structure, and length issues through user tests. If you have already shipped your AI product, you can bring the participants to it, give them a scenario on what to do, and then ask them to narrate their thinking and reactions as they use the bot. You can see what they do on screen and watch their faces and listen to their tone of voice as they use the product. Also ask them at the end of the test how they felt about the task. It’s often very helpful to compare what they did online to how they felt about it. Sometimes people might appear to struggle but feel good about it, or vice-versa.

Be sure to ask what they thought of the result they got, not just how easy or difficult the product was to use.

Competitive testing can be a little harder, since most AI chatbots require a personal login verified through e-mail. That prevents you from having participants log into a competitive account you set up. You could recruit people who have accounts with your competitors, but that’s problematic if they’re using an account paid for by their employer, as you could expose confidential information from that company.

An alternative approach we’ve used is to run the same query on your bot and the competition, and then show users the anonymized responses from the bots. Give the participant context on the query you asked and anything else you think is relevant, and then have them compare and rank the responses. This won’t tell you how people feel about the overall experience of using the bot, but it can give you good feedback on its responses.

If you want to test the overall experience of using the bot but can’t work out how to have participants log in, you can record screen video of yourself using the bot, and have the participants watch and respond to that video.

3. Plan your product’s personality carefully

In our user research on AI chatbots, the bot’s “personality” emerged as an important factor in user affinity. Based solely on the wording and structure of a bot’s responses, people made sweeping emotional judgments about a bot – “it’s friendly,” “it’s credible,” “it’s arrogant,” “it’s formal,” etc. In one case, a user said that a bot was “not respectful.” It’s striking that people are looking for a complex behavior like respect from a bot.

These judgments affected the users’ feelings about the chatbot’s brand, and also their willingness to recommend the product to others. In essence, the chatbot is functioning as a representative of the brand. Its personality affects the brand’s personality.

If your company hasn’t crafted a brand personality, there’s a lot of information on how to use AI to build one (examples here, here, and here). There’s much less information on how to infuse a brand’s personality into an AI chatbot. Current best practice is to write a description of the brand personality and include that in the training for the chatbot. When testing the chatbot, you can also use some of the standard techniques used by branding agencies – for example, ask users to describe the type of person they imagine the bot to be. Here are some other issues to consider:

Study Anthropic Claude. Claude has a reputation as the most personable AI chatbot (source), and in our own research it was preferred over the other leading chatbots. In the early days of e-commerce, a great way to start planning a commerce site was to look at how Amazon worked, because they had been so successful optimizing their site. The same is true for Claude today.
Is your brand’s voice appropriate to a chatbot? Some companies have an edgy personality that works in advertising but might not be appreciated in a chatbot. A great example is the US fast food company Jack in the Box, which has a notably confrontational and snarky brand voice that might come off as hostile in a chatbot doing customer support.
Are you coordinating with the corporate branding team? Many AI features are being developed in the product organization, which might never interact with the brand’s stewards in marketing. Much closer ties will be needed to guide a chatbot that speaks for the company.
Should the chatbot be a chameleon? In our research, some chatbot users told us that over time they want a bot to become more and more personalized to their particular needs and behaviors. If that’s the case, the most successful bots might learn to speak more like the user and less like a particular brand.
Should you offer selectable personas? Some AI designers are exploring the idea of enabling the user to choose the personality they want their chatbot to use, depending on their personal preferences and the type of work they’re doing. For example, they could put the bot into “professor” mode when they’re looking for information, and “counselor” mode when they want advice.

How UserTesting helps

Many people don’t realize that you can use a human insight system to test for things other than usability. You can show participants anything that can be displayed by a computer—including text, images, and videos—and ask their reactions. Because you can record participants’ faces, you can get insights into how they feel, and built-in AI analysis can identify emotions in the things they say.

You can also use a smartphone test to record how people react to a product while they are out and about in public or at work. This can be very helpful for testing AI bots that are intended to help with real-world customer journeys, such as shopping.

4. Coach the AI on your goals

Generative AI changes the role of the designer. Designing a conversational interface is less about manipulating visual elements in a preferred sequence of steps, and more about facilitating a conversation that can move in many different directions. Nielsen Norman Group recommends coaching the AI on the outcomes you’re trying to produce:

“For example, we might prioritize different types of user actions or categorizing information into must show, should show, never show. We may not need to specify individual design details, but we’ll need to help a genUI system understand our user and business goals.”

They call this process “outcome-oriented design” (source). It’s a major mindset change for your design team.

5. Create hybrid interfaces

Although a conversational interface is very flexible, it’s not ideal for all human-computer interactions. Clicking and tapping are superior for doing some types of work. Designer Maximillian Piras has a great perspective on this:

“Interactions should remain outside of an input field when words are less efficient…Describing which part of an image to manipulate is much more cumbersome than clicking it…Sliders seem like a better fit for sizing, as saying “make it bigger” leaves too much room for subjectivity. Settings like colors and aspect ratios are easier to select than describe.”

For many AI-enhanced products, a combination of conversation and traditional UI will probably be most efficient (source).

How UserTesting helps

You should set up comparative tests for different combinations of traditional and conversational UI. For the early stages of development, when you don’t have a working bot, you can show the users mockups and ask them what they would do first with the interface and what they think each design element on the screen would do.

If you repeat these tests as you work on the design, you can get the broad outlines of the interaction well verified before you start to build it. This is likely to spare you a lot of rework later.

6. Plan for mobile

For people using a computer, typing to interact with an AI chatbot feels natural and convenient. When we surveyed chatbot users, most people said they prefer typing over speech to have an AI conversation. Typing a query keeps it private, and most people can read faster than they can listen to a spoken response.

But on a mobile phone or tablet, the balance changes. Typing is much slower and more awkward, so about half of users say they prefer spoken conversations when using AI on a smartphone. The other half still prefers the privacy of typing.

This has important implications for chatbot design. Different modes of conversation will be preferred on different devices, and for different types of queries. A well-designed chatbot will need to accommodate these differences. One size doesn’t fit all.

How UserTesting helps

You can easily specify that a test is to be run on computers or smartphones. Do the same test of your AI bot with both types of devices and compare the results.

We also recommend running real-world mobile tests so you can evaluate how well the system performs when the user is surrounded by distractions (we call those “destination tests”).

You can work with our participant recruiting team to have participants go to a particular location before they take the test. Often, an app that appears easy in a quiet setting is much harder to use and understand “in the wild.”

7. Manage stereotyping

There has been a lot of discussion of racial stereotyping in AI. For example, there’s evidence that ethnic use of language may skew job referrals made by an AI bot (source). But the problem is even more widespread and subtle than most people realize. In our research, people living outside the US said AI struggles to generate the standard English used in their countries. For example, an Australian said that when he asked a chatbot to use Australian English, the bot inserted references to kangaroos and other stereotypical Australian elements.

Any group is potentially vulnerable to this sort of stereotyping. The publication Rest of World ran a study on image generation that showed how every group they tested was homogenized and stereotyped (source).

Since the problem is embedded in the data sets used to train AI, it’s not easy to fix. The best practice is to use a combination of techniques to whittle away at harm. Adobe suggests the following (source):

Create intentionally harmful prompts and test to make sure the system handles them properly
Instruct the AI to balance results across the known characteristics of a group as opposed to representing a single average
Filter the prompts from users to eliminate words that are likely to produce harmful results. The list of filtered terms must be managed carefully to ensure that appropriate prompts are not blocked (for example, shooting a basketball is appropriate; shooting a person is not).
If your product generates output that is especially vulnerable to stereotyping (for example, an image generator), have human monitors sample results and look for problems
Enable users to report those problems so you can adjust the system
Don’t be afraid to slow down development a bit in order to ensure that the system behaves properly

How UserTesting helps

It’s easy to say that a company should move slower when developing AI, but many companies are unlikely to take the advice. There’s just too much pressure on companies to “do something” with AI. A human insight system lets you get fast feedback without slowing down the project, even on contentious issues like bias. You can recruit participants by their demographic background, and then ask them to read various AI responses to identify insensitive wording or other problems.

If you’re testing on people with common demographic characteristics (age, gender, even particular job types), those tests will usually fill within hours. If you’re looking for rare types of people (a rare nationality or unusual job role), testing may take one or more days. You may also want recruitment help.

8. Make functionality easy to discover

One of the greatest strengths of the graphical user interface and its desktop metaphor is that users can easily discover what the system is capable of doing. Menus are usually grouped into common themes so it is easy to dig through them, and icons like a printer and a file folder are pretty much self-explanatory.

The text input box of an AI chatbot is almost the opposite of the desktop metaphor: Although it’s obvious that you should type something, there’s no indication of what the system can or can’t do. AI chatbot companies are starting to augment the text box with examples of prompts and tasks, selectable personas for the bot, options for next steps after a response, and supplemental icons and buttons.

There’s a lot of experimentation going along, but some best practices are emerging (source):

If using icons, pick standard ones that everyone understands, and as a backup give them text labels or popup tooltips.
Give functions clear, descriptive names.
Group controls based on their functionality (the same as commands in a menu are expected to be related)
Follow interface standards of the platform you’re on. For example, a long tap in your mobile AI app should do roughly the same things as a long press in other mobile apps.

How UserTesting helps

It’s easy to test interface elements like icons and buttons. Just display them to the user and ask participants what they would expect that element to do.

Or you can give the participants a task and a mockup of the interface, and ask them how they would approach the task—where they would click, what they would type, etc. In all of these cases you can record both the screen and the participant’s face, letting you evaluate both the simplicity and the emotional content of the design.

9. Make sure your AI economic model is sustainable

Discussions of “sustainability” in AI usually focus on managing total costs across the economy – for example, the energy consumption of all the data centers running AI models. Those issues are indeed important for society, but there’s another form of sustainability that comes a lot closer to home for most companies: making sure you don’t lose money on your AI products.

For most software, the cost of development dominates the total cost of the product. When SAAS software is deployed, the developer pays a bit for storage and processing and networking, but that’s usually minimal. But due to the massive calculation done by a generative AI model, the cost of answering a query can add up significantly. A single moderately complicated query to a leading-edge third party AI chatbot could cost a penny, and if the answer is complicated it could cost several cents or more. Generating a simple image could cost four cents.

Those may sound like minor costs, but the more successful an AI product is, the more it will be used, and the more the costs add up. Gartner estimates that the recurring calculation cost per user for an enterprise AI product is between $280 and $21,000 a year, depending on the complexity of the task (source).

As Maximillian Piras put it, “Each interaction requires intense calculation, so costs scale linearly with usage. Without a zero-marginal cost of reproduction, the common software subscription model becomes less tenable.” (source)

On top of that expense are the costs of training and maintaining the model, which you’ll need to amortize across those queries. Overall, the cost to run a generative AI product can range from a few hundred dollars to $190,000 or more per month (source). IBM estimates that generative AI will increase the average company’s computing costs by 89% between 2023 and 2025 (source).

If you’re charging customers per query, or monetizing your AI bot in some other way, the additional costs may not be too burdensome. But if you’re giving away AI services, as many companies are today, the costs can add up surprisingly. The more popular your free AI product, the bigger your financial exposure. It’s best to think through your long-term AI economic model now, before you have to start charging for a service that you previously told customers was free.

Even if you’re creating an enterprise product that customers expect to pay for, you still need to check that the customer will be willing to pay enough to make the product profitable. That means not just verifying that you’re addressing a valid pain point but that the buyer is willing to sign off on the size of contract you need. This is an issue for all enterprise software, but it’s especially true for AI products because of the high running costs of AI, and the high opportunity cost of assigning scarce AI development resources to the wrong project.

At the Wall Street Journal’s CIO Summit in late 2024, the Chief Digital Officer at Harman (a custom engineering subsidiary of Samsung) said AI projects should be expected to prove their financial worth within 12 months, with worth measured by either higher employee productivity or revenue generation. The VP of generative AI at Databricks, a cloud data firm, recommended breaking AI initiatives into smaller chunks that are more easily proven out (source).

How UserTesting helps

It’s easy to collect end-user feedback on pricing scenarios—just show them the product, tell them what you’ll charge, and have them respond. But there’s a catch—a testing situation biases people to give positive feedback because they are very focused on your product, and most people subconsciously prefer to give nice feedback.

So a pricing scheme that’s only OK in reality may get fairly favorable feedback in a test. Generally, you should interpret anything less than enthusiastic support for a pricing scheme to be a sign of trouble.

If you’re creating an enterprise product sold through subscription, test the pricing on buyers, not users. Those senior participants can be difficult to source independently, but UserTesting’s professional services team can help you recruit them.

10. Test your AI products rigorously

Because generative AI is so new, experience testing is even more important than it is for traditional software. As Nielsen Norman Group put it, “Studying users will become even more crucial as traditional design principles and assumptions are challenged and user behavior shifts. Testing will ensure that the dynamically generated interfaces effectively meet diverse user needs and preferences.” (source)

Here are some of the best practices for testing an AI product:

Explain the context. Many of the uses of generative AI are unfamiliar, so you need to give test participants more than the usual details on the test scenario: why the product would be used and the expected outcome.
Evaluate both sides of the conversation. Don’t look only at usability; you need to also evaluate the responses of the bot to ensure that they are accurate and appropriate.
Recruit participants with varying levels of AI experience. Be sure to include both heavy AI users and novices.
Use human-powered prototypes. It’s hard to test an early stage prototype for an AI bot because the bot hasn’t been built yet. Some companies use human beings in the background as a stand-in for the AI – the test participant inputs their questions and prompts, and a human hidden in the background types responses, following a set of guidelines written for the test.
Evaluate both what participants say and what they do. Since you don’t have a long track record of what worked in the past, study carefully both what participants say about the test, and the analytics on what they did.
Be careful with legal and security issues. Some companies forbid their employees from entering data into a generative AI product. If you’re testing a B2B product, make sure your test won’t ask participants to break their employer’s rules.
Test internally before you put the product in front of users. Often with some creativity, you can find ways to do extensive testing before you even involve users. For example, you can have generative AI review your company’s support logs and generate sample queries based for testing your support bot (source).

Nielsen-Norman wrote a thorough description of how it tested AI chatbots; you can see it here.

How UserTesting helps

Because AI development has so many unknowns, it’s advisable to test frequently in small tranches. Rather than running a test with 100 participants every quarter, run small-scale tests with just a handful of participants whenever you have something new for them to look at, even if it’s a small change.

The right human insight solution makes those tests easy and quick to manage. Incremental testing helps you find problems before they get deeply embedded within the product when fixing them is much more costly. Over time, the small tests will add up to a larger sample, so you can also look across the tests to spot more subtle issues.

The right testing program enables you to move faster because you have high confidence that your solution will be on target.

KPIs for AI products

Once you’ve completed the ten steps, you’ll need to choose a set of key performance indicators (KPIs) to track your progress. Many of the KPIs used for AI products are familiar holdovers from other software products—for example, adoption, frequency of use, session length, and abandonment rate. Other technical metrics are specific to AI. Google has written two excellent overviews on all of these metrics, which you can find here and here. The metrics include:

Measures of model quality, such as safety of answers, fluency, and coherence
System quality, including time to deployment and percentage of automated pipelines
Responsiveness, including latency and uptime
Throughput, such as tokens processed per unit of time and accelerator utilization
Business KPIs like support chat containment, human agent churn, time on site, and visit volume
Adoption metrics, including session length and frequency of use
Business value, such as cost savings and customer satisfaction

In addition to Google’s metrics, leading-edge companies are also starting to measure aspects of the AI system’s personality and emotional impact, with a special focus on likability and credibility.

How UserTesting helps

You can structure user tests to measure the personality of AI chatbots. In our AI chatbot benchmark study, we had the participants read responses from the various bots, anonymized so they could not tell which bot was which.

We then asked the participants to rate the bots’ responses on various attributes, such as ease of understanding and trustworthiness. For example, we asked participants, “Do you trust the answer?” Participants were given a 1-7 scale, with 1 = ”definitely no” and 7 = ”definitely yes.” The participants were also asked to explain out loud why they chose that rating. This step is very important so you’ll understand what you need to fix in order to improve the score.

We also had participants stack rank the responses, from best to worst, and had them explain why they made their choices. That helped illuminate the sometimes subtle distinctions they made between bots.

Part 3: 5 AI success stories

Amazon Web Services optimized AI-enhanced search

Kendra is Amazon Web Services’ AI-enhanced tool that helps companies create their own search products. Kendra scans common data formats, both structured and unstructured, and then analyzes the contents for context and to integrate that information into search results.

The Kendra team used user tests to discover the pain points of search developers and then tested prototypes as they iterated on the design.

UserTesting gives us speed. Speed to recruit, to analyze, and engage with a wide range of personas. We need a partner like UserTesting to bring it all together.

Matt Menz VP, Customer Experience, Amazon Web Services

Based on what was learned from UserTesting, Amazon added a dashboard to Kendra that helps search developers track the effectiveness of user searches, fine-tune the search models, and improve them over time. Customers also use the platform to fix or remove dead links and track user behavior automatically.

Watch Amazon's story below.

Deezer customized the music listening experience

Streaming service Deezer uses its Flow feature to curate music playlists for its customers. The company wanted to use AI to further personalize the experience, and used UserTesting to discover customer needs. The company found that casual listeners had a different music consumption pattern from its core users. While the core users tended to pick particular music genres, casual users wanted to customize their playlists by emotion.

The data team classified 90 million songs by their emotional content, while the user research team explored ways to classify user emotions. The result was a new feature, Flow Moods. Deezer iterated on its prototype rapidly through UserTesting tests in six countries.

Within the first month of its release, one million listeners began to use Flow Moods. Less than a year later, 700,000 users accessed Flow Moods per week. The new feature also helped with customer retention—listeners who use Flow Moods are more likely to reconnect than listeners who don’t, and they listen to more music than non-users of the feature. “UserTesting helps us to really get into the mind of the user in a way that quantitative data can’t,” said Fernanda Senko, User Researcher at Deezer.

Hear more about Deezer's success below.

WestJet used AI to smooth the customer journey

Canada’s second-largest airline was developing a voice-driven AI assistant, “Ask WestJet,” to help travelers and reduce their frustrations. The company used UserTesting to discover friction points in the current customer journey and understand how people would ask questions of an AI assistant.

They also explored whether customers would ask questions differently on different platforms (for example, Alexa vs. Google). The research found that travelers sometimes get inaccurate information on things like baggage fees, size restrictions, and other pre-flight information.

So WestJet tailored the assistant to respond to those issues and built a baggage size calculator for Google Voice Assistant. The changes increased accuracy and brand trust while decreasing complaints about customer service. Learn more about WestJet's success here.

Zigzag's AI chatbot reduced support requests by 27%

The Zigzag mobile app (supported by Purina) is a puppy-raising coaching app that gives breed-specific advice and guides people through the puppy-raising journey. Through discovery research on the UserTesting platform, Zigzag discovered that puppy owners hesitated to ask the company’s live coaches for advice on issues that they considered to be simple or trivial.

At the same time, Zigzag realized that it couldn’t scale the number of live puppy coaches to meet all of its customer needs. So the company added a generative AI coach to its offering.

The company used user tests to confirm user willingness to pay for the solution, measure user satisfaction with the answers, evaluate whether users understand the feature, and validate that users would ultimately prefer the AI chat over searching with Google. It also did multiple rounds of tests on prompts and prototypes during development.

The result: There were 80,000 interactions with the bot in the first six months, comments on it were 96% positive, and there was a 27% reduction in support requests to human puppy coaches.

BT's consumer assistance chatbot increased containment by 75%

Aimee is the personal assistant chatbot of EE, the leading consumer telecoms brand in the UK (EE is owned by BT Group, formerly British Telecom). The goal of Aimee is to help consumers with whatever they need, but the system was being overwhelmed with questions about roaming services. Only 16% of roaming questions were resolved by the bot; the rest were routed to humans in customer support.

EE added AI features to the bot to predict where people would travel and the cost of roaming in those locations, then tested the improvements through UserTesting to ensure that it would meet customer needs. The launch of the new bot was very successful.

Chatbot containment rates increased by 75%, clickthroughs to the roaming calculator increased by 50%, and total call volume to the contact centers decreased by 2.4%.

Watch below to learn how the team at EE leveraged user feedback to improve its AI chatbot experience.

In this guide

Effective AI: how to choose the right generative AI features—and build them fast

Effective AI: how to choose the right generative AI features—and build them fast

Executive summary

Effective AI: how to choose the right generative AI features—and build them fast

Part 1: The challenge. How generative AI is changing the rules of product experience

The way people interact with computers is changing (again)

The things people do with technology are changing

Customer journeys are being disrupted

Adoption has outstripped understanding

The lines between product and marketing are blurring

The challenges reinforce each other

Part 2: The solution. 10 steps to create effective AI products

1. Double down on customer needs discovery

2. Optimize your product for emotion and credibility, not just usability

3. Plan your product’s personality carefully

4. Coach the AI on your goals

5. Create hybrid interfaces

6. Plan for mobile

7. Manage stereotyping

8. Make functionality easy to discover

9. Make sure your AI economic model is sustainable

10. Test your AI products rigorously

KPIs for AI products

Part 3: 5 AI success stories

Amazon Web Services optimized AI-enhanced search

Deezer customized the music listening experience

WestJet used AI to smooth the customer journey

Zigzag's AI chatbot reduced support requests by 27%

BT's consumer assistance chatbot increased containment by 75%

Recommended reading

Unlock your customer insight ROI