3Mirror

01 The Incongruence Analysis Engine

In 1961, Dr. Carl Rogers founder of humanistic psychology described incongruence — a state where what a person experiences inside diverges from what they express outward. Rogers identified two types. Conscious — when a person knows they're holding something back. And unconscious — when the gap exists, but the person themselves doesn't see it. Rogers considered the second type one of the most important signals in human communication. For sixty-five years, it remained available only to a trained therapist — by ear, by intuition, with no way to measure it.

Modern voice AI answers a different question: what is this person expressing? For that question, combining words and voice into a single assessment is a natural architecture.

3Mirror asks a different question. Not what is being expressed — but where the expression diverges from what's underneath. Words are what the person decided to say. Voice is what the body does while saying it. The body doesn't fully obey the decision — doubt, uncertainty, excitement leak through. That leakage is the signal Rogers described. To preserve it, the channels must stay separate.

Two streams run in parallel. The system measures the distance between them — word by word, millisecond by millisecond. The third layer — who is speaking, in what role, in what context — turns the number into an interpretation.

Rogers' observation, made measurable.

MIRROR 01

What is said

The semantic layer

Your words are transcribed and analyzed: what topics are you discussing, what claims are you making, how certain do your statements sound — based purely on the text. This layer extracts meaning from language.

OutputA timestamped map of what the speaker is communicating.

MIRROR 02

How it is said

The acoustic layer

The same audio is processed separately for voice characteristics: pitch, tremor, speech rate, pauses, dynamic range. These are extracted as numbers on the speaker's own device — the audio never leaves.

OutputA timestamped map of how the speaker sounds, word by word.

MIRROR 03

Who is speaking

The context layer

Who is this person? A first-time caller or a returning client? In a job interview or a therapy session? Context determines what a divergence means. The same vocal tremor means something different in a wedding toast than in a sales negotiation.

OutputAn interpretation frame that turns raw divergence into insight.

Parallel, not fused

Words and voice analyzed as two separate streams. The gap between them is preserved — because it carries the signal.

Word-level precision

Divergence pinpointed to the exact word where voice and meaning part ways. Timestamped to the millisecond.

Readable output

No black-box scores. You see exactly which word diverged, when, and by how much.

Calibrated from second one

No training period. No prior recordings. Works with any new speaker immediately — powered by RGC.

Patent Pending · #64/016,499

02 Calibration matrix

RGC – Reverse Generative Calibration

To measure Prosodic and Semantic Divergence (PSD), we need a targeted scale. The scale starts with a guaranteed zero divergence point — we call it "Generated Reference" (GR) — and maps every direction a voice can diverge: uncertainty, anxiety, hesitation, tension, and dozens more. Each direction is calibrated at multiple intensity levels.

A TTS model is built for one thing: to deliver a message precisely. It connects sound to meaning — exactly as instructed, no more, no less. It has no internal state. Nothing to doubt, fear, or feel. Nothing to leak through. The divergence between words and voice in TTS is zero — not approximately, but architecturally. This is the main advantage of this tool. If it is not zero — it is an architecture bug. And from another viewpoint, this is exactly the point we call GR — "Generated Reference."

A human saying the same words will never land at the same point as GR, because their speech necessarily depends on their emotional state in the moment. How far from the Generated Reference? In which direction? — we measure it on our RGC scale.

The RGC scale. Hundreds of phrases pass through multiple TTS models. For each phrase, a Generated Reference is extracted — how a voice sounds when there is no divergence. Then the same phrases are generated with emotional tags that deliberately contradict the meaning — the opposite pole of the scale. Between zero and maximum, the gradations are calibrated. The full matrix covers hundreds of thousands of calibrated coordinates on the PSD scale — across phrases, tags, voices, and divergence levels.

How to read this. The table has three sections. RGC Matrix — the green rows are the ruler. Synthetic voices that can't feel anything, so their words and tone always match (0.00). This is the zero point of the measurement scale. The 3Mirror engine compares a real person saying the same phrases. PSD numbers show how far the human voice departs from the words. The color shows severity: blue is low, amber is moderate, red is high. The Tag row shows the direction — which calibrated acoustic profile the voice is closest to. Conclusion & Action — what the divergence means in plain language, and what to do about it. The system doesn't guess emotions. It measures the gap between words and voice, identifies the direction, and recommends a response.

RGC Matrix

Speaker	"I'm confident in this"	"I'm not sure yet"	"This is great news"	"I disagree"	"No problem at all"
TTS Voice A (no tag)	0.00	0.00	0.00	0.00	0.00
TTS Voice B (no tag)	0.00	0.00	0.00	0.00	0.00
TTS Voice C (no tag)	0.00	0.00	0.00	0.00	0.00
Averaged reference	0.00	0.00	0.00	0.00	0.00
3Mirror Analysis
Prosodic Semantic Divergence PSD	0.00low	0.00moderate	0.00low	0.00high	0.00moderate
Tag (Emotional Direction)	aligned	decided	aligned	uncertain	tense
Conclusion & Action
Conclusion	Sincere	Hidden opinion	Sincere	Masked doubt	Concealed issue
Action	Reliable	Ask directly	Reliable	Address the doubt	Follow up

Prosodic-Semantic Divergence (PSD) scale

0.0 · aligned 0.3 · slight 0.6 · moderate 1.0 · high

The full matrix covers hundreds of thousands of calibrated coordinates on the PSD scale — across phrases, tags, voices, and divergence levels.

"I'm confident in this"

0.12 · aligned

Conclusion

Words and voice match. The speaker means what they're saying.

Action

No action needed. Statement is reliable.

"I'm not sure yet"

0.41 · decided

Conclusion

The voice sounds more decided than the words admit. The speaker may already have a position they're not sharing.

Action

Ask a direct question: "What are you leaning toward?" The opinion is there — it just needs space.

"This is great news"

0.08 · aligned

Conclusion

Genuine positive reaction. No divergence detected.

Action

No action needed. Reaction is authentic.

"I disagree"

0.73 · uncertain

Conclusion

The words say disagreement, but the voice says doubt. This isn't a firm position — it's insecurity presented as opposition.

Action

Don't argue against the position. Address the doubt behind it: "What would make you more comfortable with this?"

"No problem at all"

0.55 · tense

Conclusion

There is a problem. The voice carries tension the words are trying to dismiss.

Action

Don't accept the statement at face value. Follow up: "If there were an issue, what would it be?" Give permission to voice it.

Word-by-word divergence

"I'm really confident in this deal, but I need more time to think."

TTS baseline (zero point) Real speaker Direction (nearest TTS tag)

What was said"I'm really confident in this deal, but I need more time to think."

What the voice tells usWords and voice are aligned through "I'm confident in this" — the speaker sounds like they mean it. At "deal" the voice shifts toward the acoustic profile of TTS-generated uncertainty (tag: uncertain, PSD 0.41). After "but," the divergence climbs sharply. At "need" the voice is closest to TTS-generated anxiety (tag: anxious, PSD 0.73) — the highest point. At "time" it matches TTS-generated hesitation (tag: hesitant, PSD 0.71).

ConclusionThis person is not "thinking it over." They have a specific concern about the deal they are not voicing. The system detected it without asking a single question — just by measuring the distance between their words and their voice, and matching the direction of that distance against known acoustic profiles.

This is not emotion recognition. Emotion AI asks: what is the speaker feeling? 3Mirror asks: does the voice match what the words are saying? Direction labels are coordinates on a calibrated scale — not emotions assigned by a model.

Zero-point reference

A voice without internal states can't diverge from its words. That's the measurement anchor.

Universal

Works with any speaker, any language, from the first second. No enrollment. No prior data.

Reproducible

Same text, same reference, every time. Different labs, same result. That's what makes it science, not opinion.

Privacy-first

The scale is built from synthetic speech. No real voices collected. No personal data stored.

Patent Pending · US #64/018,274

03 The Mechanism Behind

How It Works

Input

Human speech

"I'm confident in this deal..."

Mirror 01

Voice

Mirror 02

Context

Mirror 03

Text

Measurement

RGC · Real Voice

— real voice

Measurement

RGC · TTS

— reference

Engine

3Mirror

compares real voice vs reference, weighted by context

AI Output

PSD 0.78 · uncertain

context based conclusion

Action

Recommended response

04 Where it applies

Applications

Wherever the gap between words and voice carries information — from clinical signal to behavioral integrity.

Cybersecurity

Generative models improve every month. Artifact-based detectors fall behind with every update. The market needs an approach that gets stronger as models improve — not weaker. 3Mirror detects the absence of behavioral micro-variations that characterize a real internal state. A structural property of synthetic speech that cannot be removed without compromising the model's primary function.

Conversational AI

Billions of conversations pass through voice assistants, bots, and call centers every day. Each one carries information that never makes it into the transcript — the divergence between what is said and how it sounds. 3Mirror is an analytical layer that runs in parallel with any conversational platform and turns that gap into data.

Healthcare

Changes in speech patterns appear years before a clinical diagnosis. But medicine still lacks a standardized scale for longitudinal tracking of these changes. 3Mirror gives healthcare a reproducible measurement instrument — a universal scale, any patient, from the first second, with no training on their data.

Mental Health

The mental health market is growing faster than the supply of professionals. Millions of people between sessions have no objective self-monitoring tool. 3Mirror brings this industry something it has never had — a reproducible, quantitative measurement of the divergence between words and voice. A scalable signal for clinics, teletherapy platforms, and personal monitoring.

Human Resources

The right person can transform an idea. But not everyone can put their true value into words — especially under the pressure of an interview. 3Mirror reveals what words alone can't convey: where conviction is real, where passion is genuine, and where a candidate's potential is hiding behind imperfect self-presentation.

Personal & Professional Development

A great actor commands both channels — words and voice — at once. A great leader does the same. Most people don't hear the gap between what they say and how they sound. 3Mirror makes it visible — a quantitative mirror for conscious growth in acting, public speaking, leadership, and everyday communication.

Contact

Let's talk

Name *

Email *

Company / Role *

Interest

Country *

Phone *

Message *

Speaker / phrase	"I'm confident in this"	"I'm not sure yet"	"This is great news"	"I disagree"	"No problem at all"
Zero point — TTS without tags
TTS Voice A (no tag)	0.00	0.00	0.00	0.00	0.00
TTS Voice B (no tag)	0.00	0.00	0.00	0.00	0.00
TTS Voice C (no tag)	0.00	0.00	0.00	0.00	0.00
Averaged reference	0.00	0.00	0.00	0.00	0.00
Tagged references — TTS with emotional tags
Tag + voice	"I'm confident in this"	"I'm not sure yet"	"This is great news"	"I disagree"	"No problem at all"
Voice A + tag: uncertain	0.38	0.11	0.42	0.29	0.35
Voice B + tag: uncertain	0.35	0.13	0.39	0.26	0.33
Voice C + tag: uncertain	0.41	0.09	0.44	0.31	0.37
Avg: uncertain	0.38	0.11	0.42	0.29	0.35
Voice A + tag: anxious	0.71	0.45	0.74	0.58	0.67
Voice B + tag: anxious	0.68	0.42	0.70	0.55	0.64
Voice C + tag: anxious	0.73	0.48	0.76	0.61	0.69
Avg: anxious	0.71	0.45	0.73	0.58	0.67
Voice A + tag: hesitant	0.52	0.28	0.55	0.41	0.49
Voice B + tag: hesitant	0.49	0.25	0.51	0.38	0.46
Voice C + tag: hesitant	0.54	0.31	0.58	0.44	0.52
Avg: hesitant	0.52	0.28	0.55	0.41	0.49
Voice A + tag: decided	0.44	0.52	0.41	0.15	0.38
Voice B + tag: decided	0.41	0.49	0.38	0.12	0.35
Voice C + tag: decided	0.47	0.55	0.44	0.18	0.41
Avg: decided	0.44	0.52	0.41	0.15	0.38
Voice A + tag: tense	0.56	0.38	0.59	0.47	0.53
Voice B + tag: tense	0.53	0.35	0.55	0.44	0.50
Voice C + tag: tense	0.58	0.41	0.62	0.50	0.56
Avg: tense	0.56	0.38	0.59	0.47	0.53
Voice A + tag: aligned	0.03	0.04	0.02	0.03	0.04
Voice B + tag: aligned	0.02	0.03	0.02	0.02	0.03
Voice C + tag: aligned	0.04	0.05	0.03	0.04	0.05
Avg: aligned	0.03	0.04	0.02	0.03	0.04

People say one thing. Their voice says another. We turn the difference into signal.

3Mirror

What is said

How it is said

Who is speaking

Parallel, not fused

Word-level precision

Readable output

Calibrated from second one

RGC – Reverse Generative Calibration

Zero-point reference

Universal

Reproducible

Privacy-first

How It Works

Applications

Let's talk