3Mirror Engine analyzes speech on three levels — what is said, how it is said, and who is speaking — to turn the gap between words and voice into a measurable signal. Calibrated by RGC measurement matrix generating new range of critical data.
In 1961, Dr. Carl Rogers founder of humanistic psychology described incongruence — a state where what a person experiences inside diverges from what they express outward. Rogers identified two types. Conscious — when a person knows they're holding something back. And unconscious — when the gap exists, but the person themselves doesn't see it. Rogers considered the second type one of the most important signals in human communication. For sixty-five years, it remained available only to a trained therapist — by ear, by intuition, with no way to measure it.
Modern voice AI answers a different question: what is this person expressing? For that question, combining words and voice into a single assessment is a natural architecture.
3Mirror asks a different question. Not what is being expressed — but where the expression diverges from what's underneath. Words are what the person decided to say. Voice is what the body does while saying it. The body doesn't fully obey the decision — doubt, uncertainty, excitement leak through. That leakage is the signal Rogers described. To preserve it, the channels must stay separate.
Two streams run in parallel. The system measures the distance between them — word by word, millisecond by millisecond. The third layer — who is speaking, in what role, in what context — turns the number into an interpretation.
Rogers' observation, made measurable.
The semantic layer
Your words are transcribed and analyzed: what topics are you discussing, what claims are you making, how certain do your statements sound — based purely on the text. This layer extracts meaning from language.
The acoustic layer
The same audio is processed separately for voice characteristics: pitch, tremor, speech rate, pauses, dynamic range. These are extracted as numbers on the speaker's own device — the audio never leaves.
The context layer
Who is this person? A first-time caller or a returning client? In a job interview or a therapy session? Context determines what a divergence means. The same vocal tremor means something different in a wedding toast than in a sales negotiation.
Words and voice analyzed as two separate streams. The gap between them is preserved — because it carries the signal.
Divergence pinpointed to the exact word where voice and meaning part ways. Timestamped to the millisecond.
No black-box scores. You see exactly which word diverged, when, and by how much.
No training period. No prior recordings. Works with any new speaker immediately — powered by RGC.
To measure Prosodic and Semantic Divergence (PSD), we need a targeted scale. Each value on this scale is a guaranteed zero divergence point — what we call a "generated reference."
A TTS model is built for one thing: to deliver a message precisely. It connects sound to meaning — exactly as instructed, no more, no less. It has no internal state. Nothing to doubt, fear, or feel. Nothing to leak through. The divergence between words and voice in TTS is zero — not approximately, but architecturally. If it is not — it is an architecture bug.
A human saying the same words will never land at the same zero, because it depends on their emotional state in the moment. How far from zero? — we measure it on our RGC scale.
The RGC scale. Hundreds of phrases pass through multiple TTS models. For each phrase, a generated reference is extracted — how a voice sounds when there is no divergence. Then the same phrases are generated with emotional tags that deliberately contradict the meaning — the opposite pole of the scale. Between zero and maximum, the gradations are calibrated. The full matrix supports up to 180,000 combinations of phrases, tags, voices, and divergence levels — each a calibrated coordinate on the PSD scale.
| Speaker | "I'm confident in this" | "I'm not sure yet" | "This is great news" | "I disagree" | "No problem at all" |
|---|---|---|---|---|---|
| TTS Voice A (no tag) | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| TTS Voice B (no tag) | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| TTS Voice C (no tag) | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Averaged reference | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 3Mirror Analysis | |||||
| Prosodic Semantic Divergence PSD |
0.00low | 0.00moderate | 0.00low | 0.00high | 0.00moderate |
| Tag (Emotional Direction) | aligned | decided | aligned | uncertain | tense |
| Conclusion & Action | |||||
| Conclusion | Sincere | Hidden opinion | Sincere | Masked doubt | Concealed issue |
| Action | Reliable | Ask directly | Reliable | Address the doubt | Follow up |
Words and voice match. The speaker means what they're saying.
No action needed. Statement is reliable.
The voice sounds more decided than the words admit. The speaker may already have a position they're not sharing.
Ask a direct question: "What are you leaning toward?" The opinion is there — it just needs space.
Genuine positive reaction. No divergence detected.
No action needed. Reaction is authentic.
The words say disagreement, but the voice says doubt. This isn't a firm position — it's insecurity presented as opposition.
Don't argue against the position. Address the doubt behind it: "What would make you more comfortable with this?"
There is a problem. The voice carries tension the words are trying to dismiss.
Don't accept the statement at face value. Follow up: "If there were an issue, what would it be?" Give permission to voice it.
What was said"I'm really confident in this deal, but I need more time to think."
What the voice tells usWords and voice are aligned through "I'm confident in this" — the speaker sounds like they mean it. At "deal" the voice shifts toward the acoustic profile of TTS-generated uncertainty (tag: uncertain, PSD 0.41). After "but," the divergence climbs sharply. At "need" the voice is closest to TTS-generated anxiety (tag: anxious, PSD 0.73) — the highest point. At "time" it matches TTS-generated hesitation (tag: hesitant, PSD 0.71).
ConclusionThis person is not "thinking it over." They have a specific concern about the deal they are not voicing. The system detected it without asking a single question — just by measuring the distance between their words and their voice, and matching the direction of that distance against known acoustic profiles.
A voice without internal states can't diverge from its words. That's the measurement anchor.
Works with any speaker, any language, from the first second. No enrollment. No prior data.
Same text, same reference, every time. Different labs, same result. That's what makes it science, not opinion.
The scale is built from synthetic speech. No real voices collected. No personal data stored.
Wherever the gap between words and voice carries information — from clinical signal to behavioral integrity.
Generative models improve every month. Artifact-based detectors fall behind with every update. The market needs an approach that gets stronger as models improve — not weaker. 3Mirror detects the absence of behavioral micro-variations that characterize a real internal state. A structural property of synthetic speech that cannot be removed without compromising the model's primary function.
Billions of conversations pass through voice assistants, bots, and call centers every day. Each one carries information that never makes it into the transcript — the divergence between what is said and how it sounds. 3Mirror is an analytical layer that runs in parallel with any conversational platform and turns that gap into data.
Changes in speech patterns appear years before a clinical diagnosis. But medicine still lacks a standardized scale for longitudinal tracking of these changes. 3Mirror gives healthcare a reproducible measurement instrument — a universal scale, any patient, from the first second, with no training on their data.
The mental health market is growing faster than the supply of professionals. Millions of people between sessions have no objective self-monitoring tool. 3Mirror brings this industry something it has never had — a reproducible, quantitative measurement of the divergence between words and voice. A scalable signal for clinics, teletherapy platforms, and personal monitoring.
The right person can transform an idea. But not everyone can put their true value into words — especially under the pressure of an interview. 3Mirror reveals what words alone can't convey: where conviction is real, where passion is genuine, and where a candidate's potential is hiding behind imperfect self-presentation.
A great actor commands both channels — words and voice — at once. A great leader does the same. Most people don't hear the gap between what they say and how they sound. 3Mirror makes it visible — a quantitative mirror for conscious growth in acting, public speaking, leadership, and everyday communication.