Good Decisions Over Perfect Data
Redefining Product Analytics in the Age of AI
There’s a number in the deck. A percentage, a trend, a claim about user behavior. It’s formatted correctly. It has the confidence of something that came from somewhere real.
And someone in the room looks at it and thinks: I don’t know where that came from. That doesn’t match what I know. But everyone else is nodding.
If you work in product analytics, you’ve had that experience recently. Probably more than once. And if you’ve been treating it as a minor annoyance, I’d argue you’re misreading what it actually is.
It’s the signal. Not that someone made a mistake. That something structural is changing about how information moves through organizations, and about what the analytical function is actually supposed to do.
AI isn’t just changing what we measure. It’s changing what measurement is for.
The question that used to be tractable
For most of the history of product analytics, we answered a clear question: did people use the feature, and how often? Count sessions. Forecast adoption. Run an A/B test against a usage metric. Ship a recommendation.
The question was hard to answer well. But at least it was clear.
AI-powered features break the question entirely. Usage volume is still relevant, but it no longer tells you whether the product is working. A user who got a confidently wrong answer from a generative feature and never came back looks identical in the data to one who got a great answer and didn’t need to return. The signal and the noise have the same shape.
We now have to assess interaction quality, which requires different instrumentation, different evaluation frameworks, and different skills than traditional funnel or experiment analysis. There is no industry-standard metric for AI answer quality. NPS doesn’t capture it. Session length doesn’t capture it. Thumbs up/down is noisy and gameable.
The field hasn’t standardized, but some signals are proving more useful than others. Task completion rate, measured by whether the user took a meaningful next action after an AI interaction rather than abandoning or immediately repeating the query, captures something session length doesn’t. Downstream outcome correlation, tracking whether users who heavily engaged with an AI feature behaved differently on the metrics that actually matter to the business, is harder to build but closer to the truth. And explicit confidence signals, asking users not just whether they liked the answer but whether they acted on it, can surface quality problems that usage data buries entirely. None of these is a complete solution. But any of them is more honest than a session count.
Product analytics teams are being asked to measure something the field hasn’t solved yet. The teams that build credible frameworks for this first are going to matter enormously to their organizations. The teams that wait for a standard to emerge are going to spend years retrofitting measurement onto decisions already made.
The instrumentation window is closing
There’s a second shift happening that’s less visible but equally consequential.
As development teams move toward AI-assisted formats where code is generated from plain-English specifications, the window for defining measurement is shrinking. Requirements that were informal can now produce large, complex codebases quickly. If you haven’t specified what you’re tracking and why before generation begins, retrofitting instrumentation becomes expensive or sometimes impossible.
Here’s what that looks like in practice. A team generates a feature from a specification in eleven days. It would have taken six weeks the traditional way. Leadership is thrilled. The feature goes to QA. Someone asks, “what does success look like for this, and how will we measure it?” Silence. The spec never defined it. The code generation didn’t require it. Now you’re being asked to retrofit tracking onto something already in staging. You can measure that it ran. You cannot measure whether it worked.
This is a cultural problem as much as a technical one. It means analytics teams need to be in conversations earlier, willing to say “we can’t measure that as written” before a feature is built rather than after. That’s a different posture than analytics has traditionally taken. It requires being upstream of decisions in a way most teams haven’t organized themselves for.
Getting upstream is as much a political challenge as a technical one. Most analytics teams have been trained, implicitly or explicitly, to receive work rather than shape it. Changing that posture requires organizational permission, and earning that permission usually means demonstrating value in the current model before being granted a seat at an earlier table. The teams that have moved successfully have typically done it by solving one upstream instrumentation problem visibly, not by announcing a new strategic position.
Instrumentation is not a technical detail. It is the mechanism by which we either close the information gap or leave it open. And in AI-transformed development, the window to close it is earlier and shorter than it used to be.
The trust infrastructure argument
I want to make a structural argument for why any of this matters, because I think it’s more durable than “analytics is useful” or “data-driven decisions are good.”
In 1970, economist George Akerlof described what happens when buyers and sellers transact with unequal information and no mechanism to close the gap. He showed how used car markets, left without trust infrastructure, collapse toward the worst-quality transactions as good participants exit. He called it the market for lemons. He won the Nobel Prize for the insight.
The reason this matters to analytics is that most of the organizations we work inside are marketplace operators in one form or another. They exist to connect people to things, or to other people, or to decisions. And in every case, the value of the exchange is downstream of the trust underlying it.
Every analytical output we produce either closes an information gap or leaves it open. When it closes one honestly, it builds trust. When it creates a false impression of certainty, it erodes trust in ways that are hard to see until they compound.
AI changes the scale at which trust can be built or destroyed. A single confidently wrong AI output, cited in a leadership meeting and forwarded to three more, spreads in minutes. The organizations that win in an AI-transformed industry will be the ones whose outputs can be relied upon. The ones that lose will be the ones that moved fast and faked it.
Product analytics teams are trustworthiness infrastructure, and that distinction matters.
Trust is what other people decide. Trustworthiness is what we control. Philosopher Onora O’Neill makes this distinction precisely: we tend to focus on building trust when we should be focused on building the conditions that make trust a rational response. We can’t make leadership trust an analytical output. We can make sure our work is verifiable, that we show up before things break rather than after, and that we represent confidence accurately rather than performing certainty we don’t have. When those three things are true, trust follows.
This applies even outside traditional marketplace businesses. Any organization where people make decisions based on information they didn’t personally generate is operating with the same dynamics. The information gap and the trust problem are universal. The scale at which AI can widen or close them is what’s new.
AI is very good at projecting confidence. It is very bad at earning it. The organizations that win will be the ones whose outputs can be relied upon. The ones that lose will be the ones that moved fast and faked it. Product analytics is the part of the system that knows the difference.
The new class of question nobody owns
There is a third shift, one that shows up in the calendar rather than the roadmap.
Leaders now have direct access to AI-powered analytics interfaces that can surface data, generate summaries, and answer questions without going through an analyst. Most of the time that’s a genuine productivity gain. However, AI tools produce answers that sound authoritative at whatever confidence level the underlying data actually supports, and the end user often cannot tell the difference.
Here’s what that looks like. An executive presents a slide showing a specific metric is up eighteen percent. The number came from a natural language query to a BI tool. Three people write it down. After the meeting, an analyst pulls the actual data and gets eleven percent. The discrepancy traces back to the AI tool using a broader event definition than the organization’s standard taxonomy. The slide has already been forwarded. The new metric is already in a press release.
The reactive correction work that follows doesn’t show up in any project plan. It’s hard to staff for because it’s urgent and requires enough context to both reproduce the error and explain it in terms the original audience can trust.
But framing this as a burden misses the more important point about what the analytical function actually does.
Our job is not to achieve perfect data before anyone acts. It is to make sure the confidence level of the information matches the weight of the decision. An 80% confident answer, honestly represented, is often exactly right. What is dangerous is a 60% answer dressed up as certainty. That is precisely what AI tools produce when nobody is calibrating the output.
We are not the team that catches mistakes. We are the team that makes sure the confidence behind a decision matches the actual evidence. Those are different jobs, and the second one is harder to automate.
The jagged frontier problem
Most of the public conversation about AI and knowledge work focuses on the average case: AI raises the floor for average work, so average workers are at risk. That’s roughly true and worth taking seriously.
But Ethan Mollick at Wharton describes something more specific that I think applies directly here: the jagged frontier. AI outperforms expectations on some tasks and falls short on others, and the boundary is not intuitive. You cannot look at a task and reliably predict which side of the line it falls on.
For product analytics, this means the teams that thrive are not necessarily the ones who adopt AI tools most aggressively. They’re the ones who develop a calibrated sense of where those tools can be trusted and where they need a human check. That judgment requires deep familiarity with the underlying data, the business context, and the specific ways AI systems fail.
That is not a skill you can hire in or buy with a tool license. It accumulates over time, in teams that stay close to the data and the decisions simultaneously.
The research from Harvard and BCG on AI in knowledge work makes one more point worth sitting with: AI raises the floor for average work dramatically. A mediocre analyst with good AI tools can now produce output that would have required a strong analyst two years ago. What that means for product analytics teams is that the bar is rising, not staying flat. The work that used to differentiate a strong team is now table stakes. The ceiling, the judgment about what question to ask, whether the hypothesis is correctly formed, whether the instrumentation actually captures the behavior you care about, still requires human expertise.
The answer to “are we being replaced” is not yes or no. It’s: the floor is rising, the bar is rising with it, and the teams that treat that as a clarifying moment rather than a threat are going to be in a fundamentally different position than the ones that wait for someone to tell them what to do.
A note for people earlier in their careers
Most of what gets written on this topic assumes you already have organizational standing to implement it. This section is for people who don’t yet.
The skill that matters most right now is not tool fluency. It’s the ability to look at an AI-generated output and ask the right skeptical questions: where did this come from, what assumptions are baked in, and what would have to be true for this to be wrong? That’s a judgment skill, not a technical one, and it’s built through practice more than training. The analysts who develop it deliberately, who make a habit of tracing outputs back to their sources and stress-testing conclusions before they circulate, are building something that compounds. The ones who optimize for tool adoption alone are building something that depreciates.
The bar is rising for everyone. But the ceiling is still human, and it’s reachable from wherever you’re starting.
A new identity
I’ve been spending time thinking about how to describe what product analytics teams should be orienting toward, and the frame that keeps coming back is this: calibrators, not perfectionists.
Perfectionism in analytics manifests as waiting for certainty before acting, building complete frameworks before shipping anything, treating every error as a failure rather than a signal. That instinct made more sense when the data environment was slower and the stakes of each decision were lower.
Calibration means something different. It means knowing when 80% confidence is enough and when it isn’t. It means reading the terrain of AI tool outputs well enough to know which ones to trust and which ones to audit. It means being upstream of decisions enough to shape the questions before the wrong answers circulate. It means being honest about uncertainty rather than performing certainty you don’t have.
That last piece, honest representation of confidence, is the core of the trustworthiness argument. An organization that knows exactly how confident to be in its data, and acts accordingly, is more trustworthy than one that always claims certainty or always hedges. The calibration is what closes the information gap and makes trustworthiness visible enough for others to act on.
The analytical function that endures the AI transition is the one that leans into that identity. Less query writing, more hypothesis design. Less dashboard building, more instrumentation strategy. Less explaining results, more auditing the explanations AI generates. And underneath all of it: the ability to break ambiguous problems into testable hypotheses, which is the skill that has always been at the center of the work and becomes more valuable as the volume of AI-generated claims in the environment increases.
Every one of those claims is either a hypothesis worth testing or a mistake waiting to propagate. The team that knows the difference is the one the organization cannot afford to automate.
Where to start
If you’re trying to figure out what to do with any of this, three things are worth doing now regardless of where your team sits organizationally.
Audit one AI feature your team currently supports and ask honestly: what would a bad answer look like, and would we know? If the answer is “we’d see it in churn eventually,” that’s the gap. Work backward from there to what signal you’d actually need.
Find the next AI feature on your product roadmap and get into the spec review before the build starts. You don’t need a formal invitation. Show up with two questions: what does a successful interaction look like, and what would we track to know? That posture, showing up early with measurement questions, is how analytics teams earn the upstream role.
The next time an AI-generated number circulates in a leadership meeting, trace it. Not to correct it publicly, but to understand the methodology. Build that habit before it’s urgent. The first time you can say “I’ve actually verified that number and here’s the confidence level” will be more valuable than any dashboard you’ve built this year.
A note on these frameworks: they’re directional, not proven. The field is genuinely unsettled, and anyone claiming a complete solution to measuring AI answer quality or defining the evolved analytics role is ahead of the evidence. What’s here is a working model, built from watching these challenges show up in real work. Treat it that way.
The calibration frame
The calibration frame changes what you optimize for. Not fewer errors. Not more certainty. A more honest representation of what you actually know, and a clearer signal about where the uncertainty lives. In an environment where AI tools are actively producing the opposite, that honesty is not a soft skill. It is the technical contribution.
If someone in your organization made a significant decision tomorrow based on an AI-generated insight, would you know? Would you have any way to assess whether the confidence behind that insight matched the weight of the decision? If the answer is no, that’s where the work starts.



