Thinkbox – airquery

The Thinkmap

Three models. One graph.

Most "AI for data" tools see only one layer — usually the schema. Thinkbox sees three and binds them together as a single navigable hypergraph. The result is the Thinkmap — the language AI uses to reason about your business, in your terms, against your data.

Layer 01 — Data Models

The literal truth of your warehouse.

Tables. Columns. Types. Primary keys. Foreign keys. Lineage. The raw, exact shape of your data — captured into the Thinkmap with zero interpretation. This is what a database administrator would draw on a whiteboard, encoded as graph nodes the agent can walk.

In our Superstore demo Thinkmap:

Two fact tables: fact_sales and fact_returns
Eight dimension tables: dim_customer, dim_product, dim_date, dim_region, dim_state, dim_segment, dim_ship_mode, dim_category, dim_sub_category
Foreign keys traced from every fact row to the dimension that defines it — orders point at customers, customers point at segments
Column types, nullability, default values, and source lineage all bound to the graph as node properties

fact_sales FACT

order_idstring

customer_idstring

product_idstring

ship_mode_idint

order_date_idint

salesdecimal

quantityint

discountdecimal

profitdecimal

dim_customer DIM

customer_idstring

customer_namestring

segment_idint

dim_product DIM

product_idstring

product_namestring

sub_category_idint

Layer 02 — Semantic Models

What the numbers actually mean.

Tables and columns by themselves don't say anything about your business. The semantic layer fixes that — it names the entities your team talks about, defines the metrics your reports rely on, and binds each one to the exact data-layer computation that produces it. Once defined here, a metric means the same thing in every answer, forever.

In the Superstore demo:

First-class entities: Order, Order Item, Customer, Product — each backed by data tables, each addressable by name
Verified metrics: revenue, profit_margin, avg_order_value, avg_discount, total_quantity, order_count
Relationships modeled with semantics: Customer PLACES Order, Order CONTAINS Order Item, Order Item REFERENCES Product
Every metric has a single signed-off definition — finance and sales no longer argue about whose "revenue" is correct

revenue

SUM(fact_sales.sales)
WHERE NOT returned

profit_margin

SUM(profit)
÷ SUM(sales)

order_count

COUNT(DISTINCT
fact_sales.order_id)

↓ ↓ ↓

fact_sales

sales

profit

order_id

fact_returns

order_id

return_id

Layer 03 — Applied Ontology Models

The shape of your business world.

An ontology captures the way concepts are organised in your domain: which products roll up into which categories, how regions decompose into states and cities, what counts as a "shipping class," how a customer becomes a "premium" customer. This is the layer of knowledge an AI cannot infer from your schema alone — you have to teach it. Once taught, the agent can reason in your language, not the database's.

In the Superstore demo:

Product hierarchy — three top-level Category nodes (Furniture, Office Supplies, Technology), each with their own sub-categories and SKUs
Geography hierarchy — Region → State → City, so the agent knows "the Midwest" without you spelling out the states
Customer segment taxonomy — Consumer, Corporate, Home Office — each with its own profitability profile baked in as a domain rule
Ship-mode service tiers — ordered from slowest to fastest, so "fastest shipping option" resolves correctly without anyone hand-coding it

Product Hierarchy

Built for relationships a regular graph can't hold.

A regular graph connects two nodes at a time. The Thinkmap is a hypergraph — a single edge can connect many nodes across all three model layers at once. A metric, the tables it sums, the business rule that defines it, and the ontology concept it serves are all bound together in one hyperedge. Ask a question, and the agent traverses these multi-way relationships natively — no joining required, no inference gymnastics.

Analytics tools

Every building block you need to deliver an answer.

A Thinkmap on its own is just a graph. Thinkbox ships with the analytics tools that actually execute against it — so the agent has everything it needs to turn a question into a defensible result.

Query Planner

Decomposes a question into a step-by-step plan over the Thinkmap, with cost estimates.

Metric Compiler

Resolves a metric name to its verified definition, every time. No drift across teams.

Time-Series Engine

First-class support for windowed aggregations, period-over-period, rolling stats, seasonality.

Comparison Engine

Same-grain compare across cohorts, regions, products. Tells you what's different, not just what's there.

Anomaly Detection

Surfaces what shouldn't be there. Thresholds, z-scores, change-point detection, contribution analysis.

Root-Cause Search

Walks the Thinkmap looking for the smallest set of nodes that explain a deviation.

SQL Execution

Compiles plans to SQL. Runs against your warehouse. Streams results.

Confidence Scoring

Every answer comes with a confidence score, the evidence behind it, and the parts of the Thinkmap it touched.

Audit Trail

Full reasoning trace stored for every question. Reproducible, citable, regulator-ready.

Evals, built in

Quality you can see improve.

A Thinkbox isn't a one-shot model you train and forget. It's a living harness that has to stay accurate as your data, your business, and the questions people ask all change. Thinkbox ships with a full evals framework — the same discipline software teams use to test code, applied to your analytics knowledge. Every change you make to the Thinkmap can be measured, regression-tested, and shipped with confidence.

Golden-set evals

A curated set of questions with known-correct answers. Run it on every change — track accuracy as a percentage over time.

Regression evals

Did the metric you just refined break a question that used to work? The eval suite catches it before your CFO does.

Coverage evals

Are the metrics, entities, and ontology concepts in your Thinkmap actually covering the questions your team asks? Surface the gaps.

Calibration evals

Confidence scores are only useful if they correlate with accuracy. Calibration evals verify a 95%-confidence answer is right 95% of the time.

Performance evals

Latency, token cost, and DAF complexity per question. Catch regressions in speed and spend, not just accuracy.

Ontology completeness

Find columns in your data with no ontology binding, metrics with no business definition, entities the model can't recognise. Then fix them.

Evals run continuously, on every Thinkmap change, and on a schedule against your live data. Quality stops being a hope — it becomes a number on a dashboard, trending up.

Why a harness, not a model

AI without a harness is improvisation.

You can hand an LLM a database and ask it questions. It will produce plausible-sounding answers. Some will be right. Some will hallucinate metrics, misjoin tables, or quietly contradict last month's report. Thinkbox is the harness that makes the LLM's reasoning structured, grounded, and reproducible. Same question, same answer, always. Cited to the metric, the rule, the row.

Get Started Free See Thinkbox on the homepage

An analytics AI harness, forged for reasoning.