One of the first questions we ask when setting up brand monitoring in GenAI Answer Tracking is: What questions should we track? It’s a deceptively simple question that can quickly spiral into a rabbit hole of options.
Our GenAI Answer Tracker helps brands monitor their presence in AI-driven answers, offering insights into share of voice, competitors, and opportunities for visibility across large language models (LLMs).
Historically, tools like Google’s Keyword Planner and SEO research have guided us here. They taught us to obsess over phrasing, synonyms, and variations because those nuances could make or break a search strategy.
But when it comes to Large Language Models (LLMs), the game has changed. LLMs don’t give us monthly search volume, click-through rates, or granular metrics to guide our choices.
This got me thinking: Do we really need to sweat the small stuff? To find out, I ran an experiment.
How To Generate Questions
Before diving into the test, let’s talk about how we usually come up with the questions we track:
- Audience Research: We start by understanding what our users are curious about. This might come from focus groups, surveys, or direct feedback.
- SEO Research: Tools like “People Also Ask” (PAA) and related queries help us map out the broader intent landscape.
- Content Strategy: Finally, we align the questions with the goals of brand visibility, authority, and relevance.
This approach ensures we’re covering the spectrum of what users might ask. For example, the query “Who are the best brands for composite decking?” could spin off dozens of variations:
- What are the top composite decking brands?
- Best companies for composite decking?
- Recommendations for composite decking brands?
In SEO, these variations matter because synonyms can behave wildly differently. “Blue jeans” and “denim jeans” may seem interchangeable, but they often have different search volumes, click-through rates, and rankings. With LLMs, I wanted to test if that same specificity still holds water.
[TIP] Use questions from your existing audience or SEO research like People Also Ask questions to start tracking.
Hypothesis: Large language models (LLMs) no longer require the same level of specificity in phrasing that traditional SEO demands.
The Test
Here’s what I did to understand if semantically similar questions have the same or similar answers:
- Picked a Question: I started with a simple one—Who are the best brands for composite decking?
- Created Variations: I wrote 1,000 semantically similar versions of the same question, changing only the phrasing.
- Ran the Test: Each question was fed into an LLM, and I analyzed the results.
The results were compelling:
- Only 74 unique brands showed up across the answers to these 1,000 questions.
- On average, six brands appeared per answer.
- While there were slight variations in brand mentions, the overlap was significant. With an average of 6 brands per answer, we could have seen up to 6,000 unique brands - but there were only 74.
What This Means
The big takeaway? Today, we don’t need to stress over minor variations in phrasing. Here’s why:
- Focus on Intent: LLMs prioritize semantic intent over specific wording. Whether users ask about the “best brands” or “top brands,” they’re essentially asking the same thing.
- Spot Key Synonyms: While minor differences like “best” versus “top” don’t matter much, synonyms with more distinct meanings—like composite versus engineered—could still make a difference.
- Simplify Your Approach: Instead of chasing every possible variation, focus on the core themes and concepts behind the questions.
In this case, obsessing over tiny wording changes doesn’t add enough value to be worth the effort.
[TIP] When deciding which questions to track in LLMs, focus on core themes and concepts instead of getting caught up in slight variations in wording.
Why This Matters
Without metrics like search volume to guide us, LLM optimization can feel like guesswork. But this test shows we don’t need to overcomplicate it. If we focus on intent and the bigger picture, we can spend less time nitpicking and more time creating meaningful content.
For instance, when tracking questions about composite decking, we don’t need 50 variations of “Who are the best brands?” Instead, we can focus on the broader intent—materials, durability, comparisons—and call it a day.
Your audience is asking questions—are you part of the answer?