Vendors promise 90% cost reduction and 30-minute turnaround. The research tells a different story. Synthetic participants are too consistent, too agreeable, and systematically blind to the messy contradictions that make real user insights valuable.
A startup founder messages me on LinkedIn. They have built a product that replaces user interviews with AI-generated personas. Feed it your product description, your target demographic, and your research questions. In thirty minutes you get back twenty-five simulated interview transcripts complete with quotes, pain points, and behavioral insights. The pricing is a tenth of a traditional research panel. The turnaround is hours instead of weeks. The pitch deck claims 94 percent correlation with real user research findings. I ask a simple question: correlated on what metric? The founder pauses. 'Our internal benchmarks,' they say. I ask to see the benchmark methodology. They change the subject.
This interaction captures the state of synthetic user research in 2026. The promise is extraordinary. The marketing is confident. The evidence is thin. And the gap between what synthetic participants can do and what vendors claim they can do is wide enough to swallow research budgets, product decisions, and careers. That does not mean synthetic users are worthless — far from it. It means the conversation needs to move from hype to evidence. What do synthetic participants actually do well? Where do they systematically fail? And when should a research team use them versus invest in real human participants?

Before evaluating the evidence, we need a clear definition. Synthetic users — also called AI-generated participants, simulated respondents, or digital twins — are outputs from large language models prompted to behave as specific user personas. The LLM receives a demographic profile, a behavioral context, and a research question, then generates responses as if it were that person. The sophistication ranges from simple persona prompting ('You are a 35-year-old working mother who uses food delivery apps three times per week') to complex multi-agent simulations where synthetic users interact with each other and with product prototypes.
The technology behind synthetic users is not new. Market researchers have used agent-based modeling for decades. What changed is the fidelity. Modern LLMs produce responses that are fluent, contextually appropriate, and superficially indistinguishable from human interview transcripts. A researcher reading a synthetic interview transcript often cannot tell it was AI-generated. This surface-level quality is both the technology's greatest strength and its most dangerous feature — because sounding like a real user and being a valid substitute for a real user are entirely different things.
The most robust finding in synthetic user research is also the most damning: synthetic participants are too consistent. Real humans are messy. They contradict themselves within the same interview. They say they want one thing and do another. They express strong preferences in one context and reverse them in another. They get confused, misunderstand questions, go on tangents, and surface unexpected insights precisely because their thinking is not perfectly organized.
Synthetic users do none of this. When prompted as a working mother who values convenience, the synthetic user consistently values convenience across every question. It does not suddenly reveal that actually, this particular user has been cooking more lately because her daughter started expressing interest in learning recipes. It does not contradict its stated preference for speed by spending twenty minutes describing an elaborate weekend meal prep ritual. It does not get emotional about a childhood memory triggered by a food-related question. These contradictions, tangents, and unexpected revelations are not noise in qualitative research — they are the signal. They are where insights live.
Synthetic users are too consistent. Real humans contradict themselves, go on tangents, and surface unexpected insights precisely because their thinking is messy. In qualitative research, the mess is the signal.
Empirical studies confirm this at scale. When researchers compare response distributions from synthetic panels to real human panels answering the same questions, the synthetic distributions are consistently narrower. The variance is lower. The outliers are fewer. The responses cluster more tightly around prototypical answers. This means synthetic users systematically underrepresent the diversity of real human experience. They produce a flattened, idealized version of your user base — one that is easier to analyze and harder to learn from.

LLMs are trained on internet text, which overrepresents English-speaking, college-educated, Western, relatively affluent perspectives. When you ask an LLM to simulate a user from an underrepresented demographic, it draws on a thinner, more stereotypical slice of its training data. The result is synthetic participants that perform demographic representation rather than embody it.
Ask an LLM to simulate a low-income user in rural India and you get a character that hits every expected marker — price sensitivity, limited connectivity, family-oriented decision making — without any of the specific, surprising, real details that make ethnographic research valuable. The synthetic rural Indian user does not mention the specific workaround they invented using WhatsApp Business to run their shop. They do not describe the social dynamics of shared phone usage in their household. They do not express the ambivalence that many real users feel about technology — simultaneously excited and suspicious. They produce a plausible stereotype, not a lived experience.
The critique above is not a dismissal. Synthetic users have legitimate, valuable applications — but they are narrower than vendors claim. Based on the available evidence, synthetic participants perform adequately to well in the following scenarios.
Using synthetic users to pre-test a survey, an interview guide, or a usability test protocol before investing in real participants is one of the strongest use cases. The goal here is not to get valid user insights — it is to catch problems with your research instrument. Are the questions clear? Does the task flow make sense? Are there ambiguities that will waste real participants' time? Synthetic users are excellent at identifying these issues because the failures they find are in the instrument, not in the interpretation. A confusing question confuses an LLM for similar reasons it confuses a human.
Synthetic users can help generate hypotheses that you then validate with real users. If you are exploring a new market segment, running synthetic interviews can help you identify potential pain points, feature priorities, and behavioral patterns worth investigating. The key word is 'worth investigating.' Treat synthetic findings as hypotheses, not conclusions. They narrow the search space for real research — they do not replace it.
When you need to test twenty variations of a notification message or fifty different label options, running all of them past real users is impractical. Synthetic users can effectively filter down to the top five candidates, which you then test with real users. They are better than your internal team at spotting confusing language (because they have no insider knowledge to fill gaps) while being faster and cheaper than a full user panel.

The failures are more consequential than the successes, because the research activities where synthetic users fail are precisely the activities where research matters most.
Discovery research — understanding problems you do not know about yet — is fundamentally incompatible with synthetic users. LLMs can only produce responses that are plausible given their training data. They cannot surface genuinely novel insights because they have no novel experiences. The entire value of discovery research is encountering the unexpected. A real user tells you something you could not have predicted. A synthetic user tells you something the model's training data already contains. Using synthetic users for discovery is like mining for gold in a museum gift shop — everything you find has already been found.
Emotional and experiential research is another failure zone. Users' emotional responses to products — frustration, delight, anxiety, trust — are grounded in real experiences, real stakes, and real consequences. A synthetic user asked how they feel about a medical diagnosis tool will produce plausible emotional language without any of the somatic, contextual, or relational dimensions that make emotional research meaningful. They describe the concept of frustration without being frustrated.
Using synthetic users for discovery research is like mining for gold in a museum gift shop. Everything you find has already been found.
Behavioral research — observing what users actually do, not what they say they do — is impossible with synthetic participants for an obvious reason: they have no behavior to observe. They can report what they would hypothetically do, which is precisely the say-do gap that behavioral research exists to close. If you want to know how users actually navigate your product, you need real users navigating your product. There is no simulation shortcut.
The way forward is neither wholesale adoption nor blanket rejection. It is a tiered approach that matches the tool to the task. Here is the framework I recommend to research teams considering synthetic users.
The synthetic user conversation is evolving fast, but three things would help the field mature. First, we need independent benchmarks. Vendor claims of 90-plus percent correlation are meaningless without transparent methodology, third-party replication, and task-specific breakdowns. The research community should establish standard benchmark tasks for evaluating synthetic user validity, the way NLP has standard benchmarks for model evaluation.
Second, we need better failure mode documentation. When synthetic users get it wrong, how do they get it wrong? Understanding the systematic biases — the consistency bias, the demographic flattening, the absence of genuine novelty — helps researchers compensate for them when synthetic users are used appropriately.
Third, we need ethical guidelines for synthetic user research. When a synthetic participant 'represents' a real demographic, what ethical obligations does the researcher have? Can synthetic user research be used to justify decisions that affect real populations? Should IRBs review synthetic research protocols? These questions are not hypothetical — they are being navigated ad hoc by teams right now, with no shared framework.
Synthetic users are a powerful tool miscast as a revolution. They will not replace user research any more than stock photography replaced photojournalism. What they will do — when used with methodological discipline — is make the research process more efficient by handling the preparatory and filtering work that consumes disproportionate time and budget. That is genuinely valuable. But only if we resist the temptation to extend them beyond their validity boundaries. The most expensive research mistake is not spending too much on participants. It is making a confident decision based on evidence that was never valid in the first place.
AI can cut qualitative analysis time by 80%. It can also introduce systematic biases that poison your findings. The difference is not the tool — it is knowing precisely which analytical tasks to delegate and which to protect.
Traditional interfaces promise deterministic results. AI interfaces cannot. The gap between what users expect and what probabilistic systems deliver is where trust lives or dies — and most teams are designing for the wrong side of it.