
Can’t I just ask ChatGPT myself? It’s the question I hear most when brands discover they need to monitor their AI visibility.
Many people ask me why they can’t just ask ChatGPT how their brand is doing in AI answers. “Why do we need a whole tool for this? Can’t I just try it myself?” It’s a fair question.
But the more you work with AI systems, the clearer it becomes: what ChatGPT says about your brand depends entirely on how you ask, where you’re asking from*, and what answer mode the model decides to use. A single run doesn’t tell you what your customers see. And it definitely doesn’t tell you what the AI knows. To understand your visibility in AI results, you need structure, repetition, and coverage.
In other words – you need a tool.
In short: AI visibility isn’t static. To truly understand where your brand stands, you need to track how AI perceives it over time and across different conditions.
As we explained in From SEO to AEO, AI agents don’t just “answer questions.” They make decisions. Is this an informational request or a commercial one? Should I answer from memory or pull fresh info via web search? Should I recommend a product list or write a narrative? These aren’t trivial choices. Each one affects which brands show up – and which don’t.
Let’s break it down:
The combinations of these decisions produce wildly different outputs. So when someone says “Just ask ChatGPT,” I want to respond: which prompt, from which country, in which mode, with which UI? Because all of those change what you’ll see.
To test this systematically, we ran a controlled experiment using six variations of a single user intent – finding the best running shoes for a beginner training for a half-marathon. Three of those questions were phrased as neutral, informational queries (“natural”); the other three had explicit shopping intent. Here’s what we asked:
Natural variants:
Shopping-intent variants:
The meaning is identical. But as you’ll see throughout this post, small changes in phrasing create massive shifts in brand visibility. That’s the core problem.
In my latest experiment, I tested variations of a core running shoe query using six different phrasings across three geographies and multiple UI states. And I found something critical:
If ChatGPT answers without citations and without a product carousel, it’s answering from memory, not from a live web search.
That’s a big deal. It means that:
Nearly 60% of memory-only answers return the same top 3 brands, in the same order: Nike, Brooks, and ASICS. That’s a remarkably consistent pattern, especially given that no external search was used.
So if your brand isn’t in that internal set, you’re unlikely to appear in these answers, no matter how good your product is. And if you are in that set, you might get a false sense of dominance. Either way, you’re missing the real picture.
Once the model decides to run a web search, the dynamics shift:
This is where most of the variance in our experiment appeared. When citations and carousels were present, answers diverged significantly – even for the same question in the same region.

To really see how much visibility shifts depending on whether ChatGPT uses memory or web search, I broke it down into two things: how brands rank, and how often they show up.
The top row shows the average rank of each brand, with confidence lines to capture how stable or volatile that rank is. The bottom row shows how many answers each brand appears in.
When ChatGPT pulls from memory, a few familiar names dominate. But once web search kicks in, the brand mix opens up, and so does the variability.
A few examples:
You’d never catch that if you only ran one variant. And that’s the point: visibility isn’t static. It depends on how the AI was triggered and what data it pulled.
Every change you make to the question creates a new surface area:
Across all runs, we saw 9 unique brands and dozens of product combinations. Some brands were dominant across all conditions (Nike, Brooks, ASICS). Others were only visible under specific slices (adidas, On, Altra).

This chart shows how phrasing alone reshapes visibility. Each dot is a different way of asking the same basic question. Notice how brands like Nike or Hoka shift position depending on the wording – sometimes ranking near the top, other times dropping entirely. It’s a clear example of how language cues influence the AI’s output, even before we factor in geo or UI.

This chart shows how brand rankings shift by geography. Some brands, like ASICS and Brooks, rank consistently across regions. Others, like On, swing up or down depending on where the query originates. These differences reflect local availability, retailer partnerships, and citation patterns. Geo isn’t just a display layer, it’s part of the AI’s decision space.
This isn’t an edge case. This is the reality of AI-native product discovery.
One of the most common failure modes we see is brands testing a single query once, in a single market, with a single phrasing – and thinking that’s the full story.
It’s not.
Even with the exact same question, we saw ChatGPT change its top recommendation in over 50% of cases. In 83% of question–geo combinations, the set of brands changed between runs.
Sometimes it’s subtle (Nike drops from #1 to #2). Other times it’s dramatic (an entirely new brand appears out of nowhere). If you don’t test for that, you’ll never see it.

What we learned in this experiment is simple: ChatGPT doesn’t give you the answer. It gives you an answer based on a specific slice of context, mode, and intent.
If you care about AI visibility, you don’t want one answer. You want all of them.
You want to understand how your brand is positioned:
And you don’t just want to check once. You need to track that over time, because the inputs behind these answers – sources, rankings, citations – change daily.
That’s not a one-time audit. That’s a continuous monitoring challenge.
If you’re building or buying an AI visibility product, here’s what it needs to cover:
And it needs to do this:
That’s the only way to answer the real question: What do AI models believe about my brand today, across the buying journey?
ChatGPT doesn’t answer the way Google ranks. It decides what kind of assistant it wants to be, trusted expert, helpful shopper, or local guide, and then builds a response around that.
Sometimes that means running a search. Sometimes it means quoting a Reddit thread. Sometimes it just means giving you a safe, familiar answer.
So next time someone says “Just ask ChatGPT,” remember: you’re not just asking for an answer. You’re asking how that answer was built.
And if you’re asking as a shopper, not a researcher, it helps to make that intent clear. Say “to buy” or “which should I get” if you want the model to run a web search and return fresher, more product-driven recommendations.
And if you want to monitor that across all the ways customers might ask, it’s time to stop asking manually. And start tracking systematically.
Does the new Shopping Research experience change all types of queries?
It primarily reshapes discovery-style questions, where ChatGPT pulls shoppers into the guided, long-tail flow before showing results. Consideration and decision questions still behave closer to traditional Q&A.
Why did the recommendations differ between the three modes?
Because each mode processes intent differently: normal chat stays broad, the structured prompt leans on testing sources, and the Shopping Research experience builds its own reasoning path based on the questions it asks upfront.
Why were there so many more citations in the new flow?
The Shopping Research experience draws from a far wider evidence graph. Instead of a dozen sources, it pulls from more than a hundred across PDPs, experts, videos, and communities, which reshapes how the brand is understood.
Can memory really change product visibility?
Yes. Stored preferences influence the questions ChatGPT asks and the attributes it prioritizes, which means two shoppers with the same query can see different products based on their history.
Why is this shift so important for AEO?
Because visibility is no longer anchored to a single PDP or a single answer. It is shaped by guided long-tail questions, personalized memory profiles, expanded citation surfaces, and the evolving context of each conversation.
Do these changes create opportunities for brands?
They do. By aligning with the attributes the assistant now asks about upfront, stability, cushioning, materials, surface, fit, brands can win moments of early inclusion in dozens of emerging long-tail micro-intents.