AI Research Methods·22 min read·March 9, 2026

Synthetic Users: What the Evidence Actually Shows About AI-Generated Research Participants

Vendors promise 90% cost reduction and 30-minute turnaround. The research tells a different story. Synthetic participants are too consistent, too agreeable, and systematically blind to the messy contradictions that make real user insights valuable.

Viktor BezdekEngineering / Product Leadership

A startup founder messages me on LinkedIn. They have built a product that replaces user interviews with AI-generated personas. Feed it your product description, your target demographic, and your research questions. In thirty minutes you get back twenty-five simulated interview transcripts complete with quotes, pain points, and behavioral insights. The pricing is a tenth of a traditional research panel. The turnaround is hours instead of weeks. The pitch deck claims 94 percent correlation with real user research findings. I ask a simple question: correlated on what metric? The founder pauses. 'Our internal benchmarks,' they say. I ask to see the benchmark methodology. They change the subject.

This interaction captures the state of synthetic user research in 2026. The promise is extraordinary. The marketing is confident. The evidence is thin. And the gap between what synthetic participants can do and what vendors claim they can do is wide enough to swallow research budgets, product decisions, and careers. That does not mean synthetic users are worthless — far from it. It means the conversation needs to move from hype to evidence. What do synthetic participants actually do well? Where do they systematically fail? And when should a research team use them versus invest in real human participants?

A horizontal spectrum showing different research activities from left (high synthetic validity) to right (low synthetic validity), with activities like survey pre-testing on the left and emotional response discovery on the right — The validity spectrum: synthetic participants perform well for some research activities and dangerously poorly for others

What Synthetic Users Actually Are

Before evaluating the evidence, we need a clear definition. Synthetic users — also called AI-generated participants, simulated respondents, or digital twins — are outputs from large language models prompted to behave as specific user personas. The LLM receives a demographic profile, a behavioral context, and a research question, then generates responses as if it were that person. The sophistication ranges from simple persona prompting ('You are a 35-year-old working mother who uses food delivery apps three times per week') to complex multi-agent simulations where synthetic users interact with each other and with product prototypes.

The technology behind synthetic users is not new. Market researchers have used agent-based modeling for decades. What changed is the fidelity. Modern LLMs produce responses that are fluent, contextually appropriate, and superficially indistinguishable from human interview transcripts. A researcher reading a synthetic interview transcript often cannot tell it was AI-generated. This surface-level quality is both the technology's greatest strength and its most dangerous feature — because sounding like a real user and being a valid substitute for a real user are entirely different things.

The Consistency Problem

The most robust finding in synthetic user research is also the most damning: synthetic participants are too consistent. Real humans are messy. They contradict themselves within the same interview. They say they want one thing and do another. They express strong preferences in one context and reverse them in another. They get confused, misunderstand questions, go on tangents, and surface unexpected insights precisely because their thinking is not perfectly organized.

Synthetic users do none of this. When prompted as a working mother who values convenience, the synthetic user consistently values convenience across every question. It does not suddenly reveal that actually, this particular user has been cooking more lately because her daughter started expressing interest in learning recipes. It does not contradict its stated preference for speed by spending twenty minutes describing an elaborate weekend meal prep ritual. It does not get emotional about a childhood memory triggered by a food-related question. These contradictions, tangents, and unexpected revelations are not noise in qualitative research — they are the signal. They are where insights live.

Synthetic users are too consistent. Real humans contradict themselves, go on tangents, and surface unexpected insights precisely because their thinking is messy. In qualitative research, the mess is the signal.
— Viktor Bezdek

Empirical studies confirm this at scale. When researchers compare response distributions from synthetic panels to real human panels answering the same questions, the synthetic distributions are consistently narrower. The variance is lower. The outliers are fewer. The responses cluster more tightly around prototypical answers. This means synthetic users systematically underrepresent the diversity of real human experience. They produce a flattened, idealized version of your user base — one that is easier to analyze and harder to learn from.

Two bell curves overlaid — the synthetic user curve is narrow and tall while the real user curve is wider and shorter, with labeled outlier regions that the synthetic curve misses entirely — Response distributions: synthetic users cluster around prototypical answers while real users show the variance where insights hide

The Demographic Representation Gap

LLMs are trained on internet text, which overrepresents English-speaking, college-educated, Western, relatively affluent perspectives. When you ask an LLM to simulate a user from an underrepresented demographic, it draws on a thinner, more stereotypical slice of its training data. The result is synthetic participants that perform demographic representation rather than embody it.

Ask an LLM to simulate a low-income user in rural India and you get a character that hits every expected marker — price sensitivity, limited connectivity, family-oriented decision making — without any of the specific, surprising, real details that make ethnographic research valuable. The synthetic rural Indian user does not mention the specific workaround they invented using WhatsApp Business to run their shop. They do not describe the social dynamics of shared phone usage in their household. They do not express the ambivalence that many real users feel about technology — simultaneously excited and suspicious. They produce a plausible stereotype, not a lived experience.

Where Synthetic Users Actually Work

The critique above is not a dismissal. Synthetic users have legitimate, valuable applications — but they are narrower than vendors claim. Based on the available evidence, synthetic participants perform adequately to well in the following scenarios.

Pre-testing and Pilot Studies

Using synthetic users to pre-test a survey, an interview guide, or a usability test protocol before investing in real participants is one of the strongest use cases. The goal here is not to get valid user insights — it is to catch problems with your research instrument. Are the questions clear? Does the task flow make sense? Are there ambiguities that will waste real participants' time? Synthetic users are excellent at identifying these issues because the failures they find are in the instrument, not in the interpretation. A confusing question confuses an LLM for similar reasons it confuses a human.

Hypothesis Generation

Synthetic users can help generate hypotheses that you then validate with real users. If you are exploring a new market segment, running synthetic interviews can help you identify potential pain points, feature priorities, and behavioral patterns worth investigating. The key word is 'worth investigating.' Treat synthetic findings as hypotheses, not conclusions. They narrow the search space for real research — they do not replace it.

Volume Testing of UI Copy and Labels

When you need to test twenty variations of a notification message or fifty different label options, running all of them past real users is impractical. Synthetic users can effectively filter down to the top five candidates, which you then test with real users. They are better than your internal team at spotting confusing language (because they have no insider knowledge to fill gaps) while being faster and cheaper than a full user panel.

Pre-testing research instruments: catch confusing questions, unclear tasks, and protocol gaps before spending real participant budgets
Hypothesis generation: identify potential pain points and behavioral patterns worth validating with real research
Copy and label testing at scale: filter dozens of variations down to a shortlist for human validation
Stress-testing information architectures: identify navigation confusion points across many synthetic users before tree testing with real ones
Edge case exploration: generate a wide range of possible user scenarios to find edge cases your team has not considered

A decision tree flowchart helping researchers decide when to use synthetic users versus real participants, with branches for research type, stakes level, and demographic requirements — The synthetic user decision framework: a practical guide for when to use AI participants and when real humans are non-negotiable

Where Synthetic Users Fail

The failures are more consequential than the successes, because the research activities where synthetic users fail are precisely the activities where research matters most.

Discovery research — understanding problems you do not know about yet — is fundamentally incompatible with synthetic users. LLMs can only produce responses that are plausible given their training data. They cannot surface genuinely novel insights because they have no novel experiences. The entire value of discovery research is encountering the unexpected. A real user tells you something you could not have predicted. A synthetic user tells you something the model's training data already contains. Using synthetic users for discovery is like mining for gold in a museum gift shop — everything you find has already been found.

Emotional and experiential research is another failure zone. Users' emotional responses to products — frustration, delight, anxiety, trust — are grounded in real experiences, real stakes, and real consequences. A synthetic user asked how they feel about a medical diagnosis tool will produce plausible emotional language without any of the somatic, contextual, or relational dimensions that make emotional research meaningful. They describe the concept of frustration without being frustrated.

Using synthetic users for discovery research is like mining for gold in a museum gift shop. Everything you find has already been found.
— Viktor Bezdek

Behavioral research — observing what users actually do, not what they say they do — is impossible with synthetic participants for an obvious reason: they have no behavior to observe. They can report what they would hypothetically do, which is precisely the say-do gap that behavioral research exists to close. If you want to know how users actually navigate your product, you need real users navigating your product. There is no simulation shortcut.

A Responsible Adoption Framework

The way forward is neither wholesale adoption nor blanket rejection. It is a tiered approach that matches the tool to the task. Here is the framework I recommend to research teams considering synthetic users.

Never use synthetic users as the sole evidence for a product decision. They can supplement real research, not replace it. If a finding matters enough to act on, it matters enough to validate with real participants
Always disclose when synthetic participants were used. Label synthetic findings clearly in research reports. Stakeholders deserve to know the provenance of the evidence informing their decisions
Use synthetic users for research about your research — pre-testing instruments, generating hypotheses, exploring edge cases. This is where the cost-efficiency gains are real and the validity risks are low
Do not use synthetic users for any research involving underrepresented populations. The training data gap means synthetic representations of marginalized users are unreliable at best and harmful at worst
Establish a validation ratio: for every synthetic finding you plan to act on, validate it with real participants at a ratio appropriate to the decision stakes. Low-stakes UI copy decisions might need a 3:1 synthetic-to-real ratio. High-stakes product strategy decisions should not rely on synthetic data at all

What the Field Needs Next

The synthetic user conversation is evolving fast, but three things would help the field mature. First, we need independent benchmarks. Vendor claims of 90-plus percent correlation are meaningless without transparent methodology, third-party replication, and task-specific breakdowns. The research community should establish standard benchmark tasks for evaluating synthetic user validity, the way NLP has standard benchmarks for model evaluation.

Second, we need better failure mode documentation. When synthetic users get it wrong, how do they get it wrong? Understanding the systematic biases — the consistency bias, the demographic flattening, the absence of genuine novelty — helps researchers compensate for them when synthetic users are used appropriately.

Third, we need ethical guidelines for synthetic user research. When a synthetic participant 'represents' a real demographic, what ethical obligations does the researcher have? Can synthetic user research be used to justify decisions that affect real populations? Should IRBs review synthetic research protocols? These questions are not hypothetical — they are being navigated ad hoc by teams right now, with no shared framework.

Key Takeaways

Synthetic users produce responses that sound authentic but are systematically too consistent — they lack the contradictions, tangents, and surprises that make real qualitative research valuable
LLM training data biases mean synthetic participants from underrepresented demographics reproduce stereotypes rather than authentic experiences
Legitimate use cases include pre-testing research instruments, generating hypotheses, and volume-filtering UI copy — tasks where validity risks are low and cost savings are real
Discovery research, emotional research, and behavioral observation are fundamentally incompatible with synthetic participants — these activities require the irreducible complexity of real human experience
Adopt a tiered approach: use synthetic users for research about your research, and real participants for research that drives product decisions
Always disclose synthetic participant usage in research reports — stakeholders making decisions deserve to know the provenance of the evidence

Synthetic users are a powerful tool miscast as a revolution. They will not replace user research any more than stock photography replaced photojournalism. What they will do — when used with methodological discipline — is make the research process more efficient by handling the preparatory and filtering work that consumes disproportionate time and budget. That is genuinely valuable. But only if we resist the temptation to extend them beyond their validity boundaries. The most expensive research mistake is not spending too much on participants. It is making a confident decision based on evidence that was never valid in the first place.

Synthetic UsersAI ResearchResearch MethodsResearch ValidityLLM ResearchUX Research

EXPLORE METHODS

Related Research Methods

User Testing

Testing·Feedback & Improvement

Questionnaire

Survey·Testing & Validation

Focus Group

Interview·Problem Discovery

Personas

Analytical·Visualization & Communication

In-Depth Interview

Interview·Problem Discovery

KEEP READING

Research Methods·21 min read

AI-Assisted Qualitative Analysis: When to Trust the Machine With Your Research

AI can cut qualitative analysis time by 80%. It can also introduce systematic biases that poison your findings. The difference is not the tool — it is knowing precisely which analytical tasks to delegate and which to protect.

AI UX Patterns·22 min read

Designing for Uncertainty: UX Patterns When AI Outputs Are Probabilistic

Traditional interfaces promise deterministic results. AI interfaces cannot. The gap between what users expect and what probabilistic systems deliver is where trust lives or dies — and most teams are designing for the wrong side of it.

Back to all articles

Synthetic Users: What the Evidence Actually Shows About AI-Generated Research Participants

What Synthetic Users Actually Are

The Consistency Problem

The Demographic Representation Gap

Where Synthetic Users Actually Work

Pre-testing and Pilot Studies

Hypothesis Generation

Volume Testing of UI Copy and Labels

Where Synthetic Users Fail

A Responsible Adoption Framework

What the Field Needs Next

Key Takeaways

Related Research Methods

Related Articles

AI-Assisted Qualitative Analysis: When to Trust the Machine With Your Research

Designing for Uncertainty: UX Patterns When AI Outputs Are Probabilistic

Synthetic Users: What the Evidence Actually Shows About AI-Generated Research Participants

What Synthetic Users Actually Are

The Consistency Problem

The Demographic Representation Gap

Where Synthetic Users Actually Work

Pre-testing and Pilot Studies

Hypothesis Generation

Volume Testing of UI Copy and Labels

Where Synthetic Users Fail

A Responsible Adoption Framework

What the Field Needs Next

Key Takeaways

Related Research Methods

Related Articles

AI-Assisted Qualitative Analysis: When to Trust the Machine With Your Research

Designing for Uncertainty: UX Patterns When AI Outputs Are Probabilistic