People say one thing. Their voice says another. We turn the difference into signal.

3Mirror Engine analyzes speech on three levels — what is said, how it is said, and who is speaking — to turn the gap between words and voice into a measurable signal. Calibrated by RGC measurement matrix generating new range of critical data.

01 The Incongruence Analysis Engine

3Mirror

In 1961, Dr. Carl Rogers founder of humanistic psychology described incongruence — a state where what a person experiences inside diverges from what they express outward. Rogers identified two types. Conscious — when a person knows they're holding something back. And unconscious — when the gap exists, but the person themselves doesn't see it. Rogers considered the second type one of the most important signals in human communication. For sixty-five years, it remained available only to a trained therapist — by ear, by intuition, with no way to measure it.

Modern voice AI answers a different question: what is this person expressing? For that question, combining words and voice into a single assessment is a natural architecture.

3Mirror asks a different question. Not what is being expressed — but where the expression diverges from what's underneath. Words are what the person decided to say. Voice is what the body does while saying it. The body doesn't fully obey the decision — doubt, uncertainty, excitement leak through. That leakage is the signal Rogers described. To preserve it, the channels must stay separate.

Two streams run in parallel. The system measures the distance between them — word by word, millisecond by millisecond. The third layer — who is speaking, in what role, in what context — turns the number into an interpretation.

Rogers' observation, made measurable.

MIRROR 01

What is said

The semantic layer

Your words are transcribed and analyzed: what topics are you discussing, what claims are you making, how certain do your statements sound — based purely on the text. This layer extracts meaning from language.

OutputA timestamped map of what the speaker is communicating.
MIRROR 02

How it is said

The acoustic layer

The same audio is processed separately for voice characteristics: pitch, tremor, speech rate, pauses, dynamic range. These are extracted as numbers on the speaker's own device — the audio never leaves.

OutputA timestamped map of how the speaker sounds, word by word.
MIRROR 03

Who is speaking

The context layer

Who is this person? A first-time caller or a returning client? In a job interview or a therapy session? Context determines what a divergence means. The same vocal tremor means something different in a wedding toast than in a sales negotiation.

OutputAn interpretation frame that turns raw divergence into insight.

Parallel, not fused

Words and voice analyzed as two separate streams. The gap between them is preserved — because it carries the signal.

Word-level precision

Divergence pinpointed to the exact word where voice and meaning part ways. Timestamped to the millisecond.

Readable output

No black-box scores. You see exactly which word diverged, when, and by how much.

Calibrated from second one

No training period. No prior recordings. Works with any new speaker immediately — powered by RGC.

Patent Pending · #64/016,499
02 Calibration matrix

Reverse Generative Calibration (RGC)

To measure Prosodic and Semantic Divergence (PSD), we need a targeted scale. Each value on this scale is a guaranteed zero divergence point — what we call a "generated reference."

A TTS model is built for one thing: to deliver a message precisely. It connects sound to meaning — exactly as instructed, no more, no less. It has no internal state. Nothing to doubt, fear, or feel. Nothing to leak through. The divergence between words and voice in TTS is zero — not approximately, but architecturally. If it is not — it is an architecture bug.

A human saying the same words will never land at the same zero, because it depends on their emotional state in the moment. How far from zero? — we measure it on our RGC scale.

The RGC scale. Hundreds of phrases pass through multiple TTS models. For each phrase, a generated reference is extracted — how a voice sounds when there is no divergence. Then the same phrases are generated with emotional tags that deliberately contradict the meaning — the opposite pole of the scale. Between zero and maximum, the gradations are calibrated. The full matrix supports up to 180,000 combinations of phrases, tags, voices, and divergence levels — each a calibrated coordinate on the PSD scale.

How to read this. The table has three sections. RGC Matrix — the green rows are the ruler. Synthetic voices that can't feel anything, so their words and tone always match (0.00). This is the zero point of the measurement scale. The 3Mirror engine compares a real person saying the same phrases. PSD numbers show how far the human voice departs from the words. The color shows severity: blue is low, amber is moderate, red is high. The Tag row shows the direction — which calibrated acoustic profile the voice is closest to. Conclusion & Action — what the divergence means in plain language, and what to do about it. The system doesn't guess emotions. It measures the gap between words and voice, identifies the direction, and recommends a response.
RGC Matrix
Speaker "I'm confident in this" "I'm not sure yet" "This is great news" "I disagree" "No problem at all"
TTS Voice A (no tag) 0.00 0.00 0.00 0.00 0.00
TTS Voice B (no tag) 0.00 0.00 0.00 0.00 0.00
TTS Voice C (no tag) 0.00 0.00 0.00 0.00 0.00
Averaged reference 0.00 0.00 0.00 0.00 0.00
3Mirror Analysis
Prosodic Semantic Divergence
PSD
0.00low 0.00moderate 0.00low 0.00high 0.00moderate
Tag (Emotional Direction) aligned decided aligned uncertain tense
Conclusion & Action
Conclusion Sincere Hidden opinion Sincere Masked doubt Concealed issue
Action Reliable Ask directly Reliable Address the doubt Follow up
Prosodic-Semantic Divergence (PSD) scale
0.0 · aligned 0.3 · slight 0.6 · moderate 1.0 · high
This example uses a single zero-point reference — TTS without emotional tags. The full calibration matrix supports up to 180,000 combinations
"I'm confident in this"
0.12 · aligned
Conclusion

Words and voice match. The speaker means what they're saying.

Action

No action needed. Statement is reliable.

"I'm not sure yet"
0.41 · decided
Conclusion

The voice sounds more decided than the words admit. The speaker may already have a position they're not sharing.

Action

Ask a direct question: "What are you leaning toward?" The opinion is there — it just needs space.

"This is great news"
0.08 · aligned
Conclusion

Genuine positive reaction. No divergence detected.

Action

No action needed. Reaction is authentic.

"I disagree"
0.73 · uncertain
Conclusion

The words say disagreement, but the voice says doubt. This isn't a firm position — it's insecurity presented as opposition.

Action

Don't argue against the position. Address the doubt behind it: "What would make you more comfortable with this?"

"No problem at all"
0.55 · tense
Conclusion

There is a problem. The voice carries tension the words are trying to dismiss.

Action

Don't accept the statement at face value. Follow up: "If there were an issue, what would it be?" Give permission to voice it.

Word-by-word divergence
"I'm really confident in this deal, but I need more time to think."
TTS baseline (zero point) Real speaker Direction (nearest TTS tag)

What was said"I'm really confident in this deal, but I need more time to think."

What the voice tells usWords and voice are aligned through "I'm confident in this" — the speaker sounds like they mean it. At "deal" the voice shifts toward the acoustic profile of TTS-generated uncertainty (tag: uncertain, PSD 0.41). After "but," the divergence climbs sharply. At "need" the voice is closest to TTS-generated anxiety (tag: anxious, PSD 0.73) — the highest point. At "time" it matches TTS-generated hesitation (tag: hesitant, PSD 0.71).

ConclusionThis person is not "thinking it over." They have a specific concern about the deal they are not voicing. The system detected it without asking a single question — just by measuring the distance between their words and their voice, and matching the direction of that distance against known acoustic profiles.

This is not emotion recognition. Emotion AI asks: what is the speaker feeling? 3Mirror asks: does the voice match what the words are saying? Direction labels are coordinates on a calibrated scale — not emotions assigned by a model.

Zero-point reference

A voice without internal states can't diverge from its words. That's the measurement anchor.

Universal

Works with any speaker, any language, from the first second. No enrollment. No prior data.

Reproducible

Same text, same reference, every time. Different labs, same result. That's what makes it science, not opinion.

Privacy-first

The scale is built from synthetic speech. No real voices collected. No personal data stored.

Patent Pending · US #64/018,274
03 The Mechanism Behind

How It Works

Input
Human speech
"I'm confident in this deal..."
Mirror 01
Voice
Mirror 02
Context
Mirror 03
Text
Measurement
RGC · Real Voice
real voice
Measurement
RGC · TTS
reference
Engine
3Mirror
compares real voice vs reference, weighted by context
AI Output
PSD 0.42 · uncertain
context based conclusion
Action
Recommended response
04 Where it applies

Applications

Cybersecurity

Generative models improve every month. Artifact-based detectors fall behind with every update. The market needs an approach that gets stronger as models improve — not weaker. 3Mirror detects the absence of behavioral micro-variations that characterize a real internal state. A structural property of synthetic speech that cannot be removed without compromising the model's primary function.

Conversational AI

Billions of conversations pass through voice assistants, bots, and call centers every day. Each one carries information that never makes it into the transcript — the divergence between what is said and how it sounds. 3Mirror is an analytical layer that runs in parallel with any conversational platform and turns that gap into data.

Healthcare

Changes in speech patterns appear years before a clinical diagnosis. But medicine still lacks a standardized scale for longitudinal tracking of these changes. 3Mirror gives healthcare a reproducible measurement instrument — a universal scale, any patient, from the first second, with no training on their data.

Mental Health

The mental health market is growing faster than the supply of professionals. Millions of people between sessions have no objective self-monitoring tool. 3Mirror brings this industry something it has never had — a reproducible, quantitative measurement of the divergence between words and voice. A scalable signal for clinics, teletherapy platforms, and personal monitoring.

Human Resources

The right person can transform an idea. But not everyone can put their true value into words — especially under the pressure of an interview. 3Mirror reveals what words alone can't convey: where conviction is real, where passion is genuine, and where a candidate's potential is hiding behind imperfect self-presentation.

Personal & Professional Development

A great actor commands both channels — words and voice — at once. A great leader does the same. Most people don't hear the gap between what they say and how they sound. 3Mirror makes it visible — a quantitative mirror for conscious growth in acting, public speaking, leadership, and everyday communication.

06 Contact

Let's talk