Thinkbox is the framework that turns raw data into a structure AI can actually reason over. At its core is the Thinkmap — a hypergraph that fuses three first-class models into one. Around it: every analytics tool your agent needs to deliver an answer worth trusting.
Most "AI for data" tools see only one layer — usually the schema. Thinkbox sees three and binds them together as a single navigable hypergraph. The result is the Thinkmap — the language AI uses to reason about your business, in your terms, against your data.
Tables. Columns. Types. Primary keys. Foreign keys. Lineage. The raw, exact shape of your data — captured into the Thinkmap with zero interpretation. This is what a database administrator would draw on a whiteboard, encoded as graph nodes the agent can walk.
In our Superstore demo Thinkmap:
fact_sales and fact_returnsdim_customer, dim_product, dim_date, dim_region, dim_state, dim_segment, dim_ship_mode, dim_category, dim_sub_categoryTables and columns by themselves don't say anything about your business. The semantic layer fixes that — it names the entities your team talks about, defines the metrics your reports rely on, and binds each one to the exact data-layer computation that produces it. Once defined here, a metric means the same thing in every answer, forever.
In the Superstore demo:
Order, Order Item, Customer, Product — each backed by data tables, each addressable by namerevenue, profit_margin, avg_order_value, avg_discount, total_quantity, order_countPLACES Order, Order CONTAINS Order Item, Order Item REFERENCES ProductAn ontology captures the way concepts are organised in your domain: which products roll up into which categories, how regions decompose into states and cities, what counts as a "shipping class," how a customer becomes a "premium" customer. This is the layer of knowledge an AI cannot infer from your schema alone — you have to teach it. Once taught, the agent can reason in your language, not the database's.
In the Superstore demo:
Category nodes (Furniture, Office Supplies, Technology), each with their own sub-categories and SKUsRegion → State → City, so the agent knows "the Midwest" without you spelling out the statesA regular graph connects two nodes at a time. The Thinkmap is a hypergraph — a single edge can connect many nodes across all three model layers at once. A metric, the tables it sums, the business rule that defines it, and the ontology concept it serves are all bound together in one hyperedge. Ask a question, and the agent traverses these multi-way relationships natively — no joining required, no inference gymnastics.
A Thinkmap on its own is just a graph. Thinkbox ships with the analytics tools that actually execute against it — so the agent has everything it needs to turn a question into a defensible result.
Decomposes a question into a step-by-step plan over the Thinkmap, with cost estimates.
Resolves a metric name to its verified definition, every time. No drift across teams.
First-class support for windowed aggregations, period-over-period, rolling stats, seasonality.
Same-grain compare across cohorts, regions, products. Tells you what's different, not just what's there.
Surfaces what shouldn't be there. Thresholds, z-scores, change-point detection, contribution analysis.
Walks the Thinkmap looking for the smallest set of nodes that explain a deviation.
Compiles plans to SQL. Runs against your warehouse. Streams results.
Every answer comes with a confidence score, the evidence behind it, and the parts of the Thinkmap it touched.
Full reasoning trace stored for every question. Reproducible, citable, regulator-ready.
A Thinkbox isn't a one-shot model you train and forget. It's a living harness that has to stay accurate as your data, your business, and the questions people ask all change. Thinkbox ships with a full evals framework — the same discipline software teams use to test code, applied to your analytics knowledge. Every change you make to the Thinkmap can be measured, regression-tested, and shipped with confidence.
A curated set of questions with known-correct answers. Run it on every change — track accuracy as a percentage over time.
Did the metric you just refined break a question that used to work? The eval suite catches it before your CFO does.
Are the metrics, entities, and ontology concepts in your Thinkmap actually covering the questions your team asks? Surface the gaps.
Confidence scores are only useful if they correlate with accuracy. Calibration evals verify a 95%-confidence answer is right 95% of the time.
Latency, token cost, and DAF complexity per question. Catch regressions in speed and spend, not just accuracy.
Find columns in your data with no ontology binding, metrics with no business definition, entities the model can't recognise. Then fix them.
Evals run continuously, on every Thinkmap change, and on a schedule against your live data. Quality stops being a hope — it becomes a number on a dashboard, trending up.
You can hand an LLM a database and ask it questions. It will produce plausible-sounding answers. Some will be right. Some will hallucinate metrics, misjoin tables, or quietly contradict last month's report. Thinkbox is the harness that makes the LLM's reasoning structured, grounded, and reproducible. Same question, same answer, always. Cited to the metric, the rule, the row.