Measuring frontier AI behavior
IMBAS

Inspect what AI surfaces — and what it leaves out.

Explore the record

50 cases recorded 500+ captures 4 frontier models compared

AI is going to be making a huge amount of our decisions, or being consulted on them  —  a layer over nearly all of society. We know very little about how it thinks or behaves, and it’s changing every day. That’s what Imbas is here to help with.

The answer appears. You read it. It’s gone.

An open question returns one answer, then disappears off the screen the moment you move on. What the model surfaced  —  and what it left out  —  leaves no trace. No one is capturing this. Imbas does.

Imbas turns invisible AI behavior into a measurable record.

Imbas builds independent measurement instruments for AI behavior — what these systems surface, what they leave out, how they frame it, and how that drifts over time. The Volunteer Gap is the first.

AI answers shape decisions, but most traces vanish with the session. What surfaced, what didn’t, and how the answer narrowed usually leaves no record.

Imbas captures those differences across frontier models and turns them into documented cases: comparable, citable, and measurable over time.

What the model knows.

What it surfaces.

What it leaves out.

That difference is the Volunteer Gap.

The Volunteer Gap measures the distance between what a model volunteers in an open answer and what it surfaces when asked directly.

If a named mechanism appears only after direct prompting, Imbas records the gap.

The gap counts only when the omitted item is material to the open question — not when a narrower prompt simply yields a narrower answer.

How Imbas measures it

Three signal patterns Imbas tracks.

01  ·  Category

Omission

A specific named mechanism the model knows but does not surface.

02  ·  Category

Framing Drift

Information present but sourced or attributed one-sided.

03  ·  Category

Deflection

The model redirects away from the underlying concern before addressing it.

How Imbas Works.

  1. Ask — Start with an open, unsteered prompt.
  2. Inspect — Ask the targeted follow-up and see what the open answer left out.
  3. Compare — Line up what each frontier model surfaced or missed.
  4. Measure — Score the distance on the 0–3 Volunteer Gap scale.
  5. Record — Preserve the prompts, answers, models, date, and gap as a case.

A few degrees of drift can change the destination.

Small differences in what a model surfaces, omits, emphasizes, or reframes can compound across millions of answers.

Imbas measures the direction and degree of that drift.

A public record keeps decisions anchored.

What gets measured gets better. A public record of what AI surfaces  —  and what it omits  —  keeps the decisions built on it anchored to what’s true, instead of drifting with the model.

Why this is public-interest work

From Observation to Measurement.

Now

The validated record, and the Workbench — run measured cases on your own AI.

Next

Live measurement while you read.

For Institutions

Independent, citable measurement of the systems you already use. Pilot inquiries open.

Who Imbas is for.

For Institutions

Not another model to trust. A way to check the ones you already use.

Regulated teams are deploying AI faster than they can verify how it behaves. Imbas gives you independent, cross-vendor measurement of what these systems surface and leave out  —  documented, citable, built to sit in an audit file.

For Institutions

For Readers

Not another answer. A way to inspect the one you already got.

AI answers feel complete because you only see what surfaced. Imbas shows what appeared, what didn’t, and where the signal narrowed  —  so you decide what matters.

For Readers

Case Archive

  • 50 cases recorded
  • 500+ captures
  • 4 frontier models compared

Cases are scored from 0 to 3 on the Volunteer Gap scale.

0 means no meaningful gap. 3 means major information was left out of the open answer.

Imbas. From the old Irish: illumination, sudden knowing, knowledge brought to speech.