A practical guide to building taxonomies that make data truly useful to your organization

Posted on June 15, 2023
16 min read

Share

With so many competitors in any industry, businesses are becoming increasingly customer and user-focused. And with good reason: According to a recent white paper from Econsultancy, organizations that put the customer at the center of all initiatives are nearly twice as likely to exceed their top business goals as those that do not. The report also notes that 65% of respondents say that improving data analysis capabilities to understand better the customer experience is an important internal prerequisite for customer experience success.

As business people, we're pretty good at counting things — how many page views we get, how many people click on an ad, how many times a customer signs in to our app — but quantitative data is flat. The real insights are in the qualitative data we collect, the things that customers tell us.

So the question is, how do you quantify what matters to your customers?

Fortunately, there are more channels than ever to collect data about what your customers want and what they struggle with. With surveys, emails, live chat logs, support calls, and even social media, you have more opportunities than ever to listen to what your customers have to say.

But all of that data is meaningless unless you can make sense of it. And the sheer volume of data available can be overwhelming.

The information is there, but knowing how to classify the data is critical to generating meaningful insights from it.

Once you get a handle on the data, you’re in a position to analyze it, and to uncover themes and hidden opportunities. You can use the information to serve your customers better, stay ahead of your competition, and head off any areas of dissatisfaction that could lead to churn.

The key is creating a vocabulary — a taxonomy — that allows you to classify all that qualitative data in a meaningful way.

In this guide, we’re going to walk you through the basics of creating a taxonomy for your qualitative data — a system for tagging and organizing the information you have — so that you can uncover themes and patterns, and ultimately generate important insights about your customers.

What is a taxonomy and why do you need one?

Most people associate the word "taxonomy" with their high school biology class or some obscure academic field. Depending on the information set, some taxonomies are extremely complex. In fact, the science of taxonomy is a discipline unto itself.

But it doesn’t have to be complicated.

At its simplest, a taxonomy is just a way of naming and classifying things according to their similarities and differences.

A taxonomy allows you to divide and group a large set of information into more manageable chunks. It helps you visualize how you want to organize your data.

You can apply a taxonomy to anything. Dog breeds. Musical genres. Website content. Anywhere that items of any kind need to be grouped and classified in order to be better understood.

Whenever you see a category of items, and sub-categories, you’re looking at a taxonomy.

The names that you give to each category or sub-category don’t really matter, as long as they make sense to whoever is going to use the data — and as long as they’re applied consistently.

The true value of a taxonomy isn’t just that it gives you a way to identify something, it’s that a particular classification is universally understood by everyone using the information to mean the same thing.

Taxonomies in some disciplines are fairly straightforward and hierarchical, but business data is “blurrier.” With business data, you need a solid framework to classify information in a useful way, such that you can see trends, identify risks, or seize opportunities.

Finding order in chaos

The single biggest benefit to having a taxonomy to describe your data is that it provides a way for people to digest the information and make sense of it.

A taxonomy makes information accessible.

Think about it. When you’re facing a massive amount of data (especially qualitative data), how can you possibly determine what’s relevant and what isn’t if you don’t have a way to identify what the individual pieces are and how they relate to each other?

In a business setting, the stakes are even higher because there are so many potential sources of data. For example, your company likely has a wide variety of channels for collecting customer feedback, like:

  • Live chat on your website
  • A call center staffed by customer support specialists
  • Customer surveys managed by your marketing team
  • Social media accounts
  • Email

That’s a lot of customer touchpoints, and a lot of data to sort through.

All this information — directly from your customers — is invaluable, but only if you know how to organize it and access it.

Let's say you’re a Product Manager, and you want to understand the product features that customers have the most trouble with so you can improve the user experience. Or maybe you want to know which features are requested most often, so you can prioritize it on your product roadmap.

How are you going to determine that if all of that customer feedback is spread around the organization, siloed in support tickets and chat logs? You need a way to consolidate, organize and tag all that raw data, from all those channels, in a way that allows you to analyze it and distill larger patterns.

You can apply the same logic to other use cases that involve qualitative data. For example, if you're a User Experience (UX) Researcher, you might be more interested in uncovering a customer’s motivations — the triggers that led her to try your product, or the points of friction in her onboarding experience that led her to cancel her account.

You could have dozens of interview transcripts and open-ended survey answers, all with key information that give you invaluable details about your user personas. But without a way to highlight and classify that data, you won’t be able to leverage the information and turn it into actionable insights.

A taxonomy helps you do that. When designed properly, a taxonomy makes a large set of data easier to digest and analyze.

But remember: The data is only useful if everyone groups and tags it in the same way. Using an established taxonomy that is tailored to your own circumstances and business needs will help ensure that your data is classified consistently.

The true value of a taxonomy isn’t just that it gives you a way to identify something, it’s that a particular classification is universally understood by everyone using the information to mean the same thing.

Getting started: Begin with the end in mind

If you don’t know where you’re going, you’ll end up someplace else. – Yogi Berra

Now that we’re clear on what a taxonomy is, and how it can help you in your business, let’s look at how to go about building one.

But before you scrawl a single category name on a single post-it note, let’s get clear on one thing: You have to know where you’re going before you start mapping the route to get there.

A set of data can tell many stories, and not all of them may be relevant to your business goals. You need clarity on what you want to extract from the data — what broader themes and patterns you want to look for — before you decide how to organize the information.

The data will not magically tell you what you need to know. You need to identify what you want to learn from the data before you decide how to classify it.

So ask yourself (and your team, too):

  • What do we want to learn from all of this data?
  • What can we do with that information?
  • What impact do we want to make?

Taxonomy creation starts with identifying relevant use cases. You need to know the business questions that you want to answer, and then work backwards from there.

Those questions will depend on your business context. Are you trying to identify problem areas in customer support? Are you trying to understand why trial users don't convert to paid subscriptions? Are you trying to uncover themes or characteristics of certain user personas? The questions you ask will help you understand what type of data classification will be useful.

For example, as a Product Manager, you might consider the following questions:

  • What are the top feature requests by enterprise customers in the last 30 days
  • Which enterprise customers reported bugs in the last 7 days?
  • What are the new features requested by freemium customers?
  • What information do we need to give new customers to help them be successful?
  • What do customers who cancelled or failed to upgrade to a paid account say?

Questions like these might yield a classification model that organizes customer feedback by type (for example, feature request, bug reporting, help, churn feedback, etc), and then drills down.

On the other hand, if you’re a UX Researcher, you’ll have a different set of concerns. You might want to know things like:

  • Why do people sign up? What “jobs” do people hire the product to do?
  • What alternative solutions have people used or considered?
  • Why do people cancel?
  • Why don’t people upgrade to a paid account? What are their objections?
  • Where are people getting stuck in their onboarding experience?

To answer questions like these, a classification system that tags qualitative data to identity information like desired outcomes, buying approach, customer journey stage, and objections or areas of friction might be more appropriate (with more specific sub-categories beneath each high-level group).

The point is, you need to assess your own situation and decide what stories you want to uncover in the data. Your chosen taxonomy has to serve your business goals or it won’t be of much use to anyone.

Top-down or bottom-up?

While it’s critical that you identify your business goals as a starting point, you also need to understand the data that you have before you design your taxonomy. Once you understand the questions you want to answer, it’s useful to scan a sample of your existing data to get a sense of the broad categories that might capture what’s there in a useful way. 

Building your model: Keep it simple

Make everything as simple as possible, but not simpler. - Albert Einstein

Creating a taxonomy is part art and part science. You're characterizing information, which is inherently subjective, but you're also drawing lines around certain kinds of data and applying rules to their classification.

Every business is different, and no two business taxonomies will be the same. And while there’s no one-size-fits-all formula for taxonomy creation, the key is simplicity.

While it may be tempting to drill-down to ever-more granular levels of detail in an effort to capture everything (more hierarchy, more labels and categories), that approach can quickly lead to overwhelm. Whether you’re taking a top-down or a bottom-up approach (or a combination of both), it’s important not to over-think your classification system.

When we asked taxonomy expert Heather Hedden how she would recommend getting started building a taxonomy, she responded:

"Any taxonomy must reflect both (1) the topics of the queries of the taxonomy users, such as the employees who are using it to analyze data, and (2) the scope and detail of the specific set of content that will be tagged with the taxonomy. Designing a taxonomy requires both talking to the intended users to get their input about topics of interest and looking at representative samples of content to discern common topics and issues. The taxonomy is then built with a combination of a top-down (desired topics of the users) and bottom-up (specific topics in the content) approach." - Heather Hedden, author of The Accidental Taxonomist

In other words, it’s important to take into account both the intended users of the information and the content of the data itself when designing your classification system.

Clarity rules

Keep in mind that when you’re designing a system to classify your qualitative data, you aren’t just designing for the output. Your team members are going to have to implement it, so your categories and labels should be unambiguous and clearly understood.

Let's take our Product Manager example. You've determined that you want to know what customers are looking for when they reach out to your company. So you might start by creating high-level categories that group the data by feedback type — for example, feature request, bug report, user education, UX issue, and billing.

From there, you’d get more specific — for example, you might define sub-labels (with predefined values) for product area, priority, status, and customer size.

If you’re a UX Researcher, you can take the same approach. Let’s say you want to understand a new user’s experience with your product, so you can increase conversions from trial to paying customers. You might begin by creating categories that represent the different inflection points in the customer journey — for example, pre-sales, sign-up, onboarding, upgrade, and cancellation.

Once you’ve established those broad areas, you would get more specific to characterize the feedback you want to analyze. You could label certain feedback as reasons for signing-up, job that the customer hired your product to do, objections or friction, reasons for cancelling, and so on.

In this example, your taxonomy would look something like this:

Neither of the sample taxonomies cited above are overly complex, but they provide a way to frame the raw data that makes it easier to tell a story and generate insights.

Let’s try something more specific: Let's say you want to answer the question "why do people cancel?". You create a high-level category of "Churn Feedback" for your customer feedback data, and you put anything related to cancelations in that category.

But people could have lots of reasons for canceling — some of which you can address, and some that you can't. For example, if someone cancels because their business closes, that's pretty much beyond your control — and much lower priority than feedback from a customer who cancels because of unexpected data loss. In this second case, you'd want to tag that feedback as much higher priority.

By creating a taxonomy that lets you group feedback into categories and then further tag items by priority, you very quickly get a set of actionable insights from your qualitative data.

Take another example. Let's say your company uses Net Promoter Score (NPS) to capture customer feedback on your product or service. Each response ranks the respondent's likelihood of recommending your product to a friend or colleague on a scale of 1 to 10, and includes an open-ended question that asks the respondent to explain why they ranked your product the way they did. Simple, right?

So you know how many Promoters (9-10 score), Passives (7-8), and Detractors (0-6 range) you have. But what's driving their assessment? That's where the really interesting data is.

To drill down more, you create broad categories that describe various attributes of your product, like onboarding, UI, ease of use, features, pricing, customer support, and so on. Once you've classified the responses into those broad categories, you can apply the NPS label that identifies whether the feedback was provided by a Promoter, a Passive, or a Detractor.

The result? You can get a more granular view of what product areas come up most frequently in those groups of users. Maybe your Detractors have a lot to say about ease of use, and maybe your Promoters sing the praises of your customer support. Either way, you get a sense very quickly of where your customer's priorities lie, and where you can take action.

One data set, many dimensions

We’ve been talking about SaaS-specific scenarios so far, but business data taxonomies are applicable in any industry that collects qualitative data.

Think about a financial institution. A big bank has thousands of interactions with their customers every day, and a lot of data about those interactions. If they want to do a root-cause analysis of customer complaints, for example, to better serve their clientele, they need a way to classify that data to identify problem areas.

So they might begin with a taxonomy that identifies the “what” (the product a customer was using), the “where” (the point in their customer journey when they interacted with the bank) and the “how” (the way that something went wrong).

In practical terms, that might translate into a set of high-level categories for customer complaints by products: credit cards, checking accounts, mortgages, and so on.

Next, they’d want to understand where the customer was in their customer journey when the problem occurred. Were they opening a new checking account? Withdrawing money from an ATM? Disputing a charge on their credit card statement? Trying to access their accounts online? Tagging the “where” would be valuable information for uncovering problem areas in the customer experience.

And of course, they’d want some way to characterize the nature of the complaint. Was it a technical issue? Was a delay in service unacceptable? Was staff rude or unhelpful? Was product information misleading? Understanding — even at a high level — what customers are complaining about would help the bank prioritize areas of common concern in their business operations.

With this kind of classification, they could tag customer complaints in a more meaningful way, and begin to identify the most obvious points of friction in their customer service.

The thing to remember is that qualitative data is multi-dimensional, and it can have as many tags as you need to distill the essence of what’s being communicated. If you force your data down one linear or hierarchical classification path, you're bound to lose vital context and insights.

Again, the key is not to limit yourself in how you think about your data — you don’t need to apply a rigid structure, like a filing system. Rather, you can think about the data relationally, in terms of how different pieces of data might connect in the larger story you’re hoping to uncover.

A few things to remember …

  • Broad and shallow is better than narrow and deep: If you go to too many levels, you could end up burying useful categories and labels.
  • Labels should be unique and distinct: Make sure that the labels you use are distinct from all other labels and won’t be confused with anything else. This will help avoid confusion for anyone who’s applying the classification and ensure that data isn’t mislabeled.
  • Not every piece of data will fit: If you have data that’s hard to categorize, consider creating a category to park it in (like “Other”). You can go back and reclassify it later as themes or patterns emerge.

Planning for the future: Adapt and iterate

The only thing that is constant is change. - Heraclitus

Finally, keep in mind that a taxonomy is not set in stone. As you get deeper into your data analysis with your initial taxonomy, new categories and attributes will almost certainly emerge. Qualitative data is rich, and uncovering patterns and themes is an iterative process.

Be flexible. You won’t really know if your taxonomy is working for you until you and your team start to use it, and you see the output you can generate from it.

In reality, a taxonomy is never really finished, and that’s a good thing. It allows you to adjust your model according to changing business environments and requirements. Be prepared for your taxonomy to evolve over time and use.

There are a few common scenarios where you’ll find that your taxonomy may need some adjustments.

If you’re finding it hard to classify some of the data into your organization’s predefined categories, you may need to revisit your taxonomy model to see what’s missing.

For example, as a Product Manager, you could have categories for feature requests and bugs, but some portion of your customer feedback might be related to an existing feature that a customer is struggling with. It’s neither a request for new functionality nor a defect in existing functionality, but it could point to a usability issue or a gap in documentation. If your model only allows for feature requests and bugs, you’re missing an opportunity to address an area of customer confusion and dissatisfaction.

If you suspect (based on other business activities or research) that there’s another opportunity to leverage in your data, you should adjust your taxonomy to capture it.

For example, as a UX Researcher, you might identify another user persona and want to extract relevant profile information from the raw data. Or you might have adjusted your onboarding workflow, and now have an additional data point that you can measure. Your taxonomy should be updated to reflect such changes.

If you integrate new sources of data, you might find that your qualitative data set gets richer, so you’ll want to adjust your taxonomy further to make sure you have a way to capture any nuggets of insight.

Of course, if the reports you’re generating from your classified data aren’t telling a coherent story, you should go back to your taxonomy framework and validate it against your business goals.

Bottom line: Your taxonomy model should serve your business needs, not force your data into a box. If it isn’t working for you, keep working at it until it does.

In a nutshell: Create the taxonomy that serves your needs best

Taxonomies can be notoriously complex, and all too often they’re driven by the data instead of being designed to serve larger business goals. As we hope you can see, they don’t need to be overly complicated, and should always be tied to a larger vision of what insights you want to draw from the data.

If you’re clear on your goals, design for elegant simplicity, and are prepared to adjust your taxonomy as circumstances require, you’ll be able to uncover hidden insights and opportunities in your qualitative data. Those insights will allow you to serve your customers better, reduce churn, and innovate.

In this Article