axm-core — Query Runtime

§01 What Core Does

Three subsystems.
One runtime.

axm-core is the hub package. It neither signs shards (that's axm-genesis) nor produces them from conversations (that's a spoke). It does one thing: turns a pile of sealed Parquet shard directories into a queryable relational database, and lets you ask it questions in English.

⚡

Spectra

Natural language query engine. Parses intent, routes to a named query in the INTENT_ROUTES table, executes parameterized SQL against DuckDB. No LLM call at query time — the LLM runs once at distill time to build the index.

⚒

Forge

Document compiler. Takes PDFs, markdown, plain text, CSV, XLSX. Extracts structured claims at four tiers. Delegates signing and Merkle construction to axm-genesis via compile_generic_shard. Never reimplements the kernel.

⬡

DuckDB Runtime

The query substrate. Mounts every shard as a glob-scanned Parquet view. Union by name. Cross-shard JOINs on episode_id and claim_id. All in memory. No separate database server. No schema migrations.

§02 Spectra Query Engine

Language in.
SQL out.
No LLM.

Spectra is deliberately not a RAG system. It doesn't embed your query, search for similar chunks, and ask an LLM to synthesize them. That path is expensive and non-deterministic.

Instead: your natural language query is classified into a named intent. The intent maps to a parameterized SQL template. DuckDB executes it. The output is a result set — structured, reproducible, auditable.

The intent classifier runs locally via Ollama at distill time. At query time the only thing that runs is SQL.

Input

Natural Language Query

"What decisions have we made about the Merkle format?"

Intent Router

Keyword Classification

Regex + keyword rules classify into a named intent. INTENT_ROUTES maps each intent to a SQL template. No LLM call.

Parameterized SQL

DuckDB Execution

Named parameters substituted safely. Query executes against glob-mounted Parquet views. Zero injection surface.

Output

Structured Result Set

Rows returned. Same query. Same shards. Same output. Always.

spectra/query.py — intent router (v2 parameterized)

# Intent routes — no raw string interpolation
INTENT_ROUTES = {
  "decisions": {
    "match": lambda q: "decision" in q or "decided" in q,
    "sql": """
      SELECT episode_id, decision_text, rationale, shard_id
      FROM decisions
      WHERE lower(decision_text) LIKE lower(:term)
      ORDER BY created_at DESC
    """,
    "params": lambda q: {"term": extract_term(q)},
  },
  "failures": {
    "match": lambda q: "fail" in q or "graveyard" in q or "tried" in q,
    "sql": """
      SELECT episode_id, problem, graveyard, solution
      FROM read_parquet(:glob)
      WHERE graveyard IS NOT NULL
      ORDER BY created_at DESC
    """,
    "params": lambda q: {"glob": "~/.axm/shards/*/ext/engineering@1.parquet"},
  },
  # ... additional routes
}

def route(query: str) -> tuple[str, dict]:
    q = query.lower()
    for name, route in INTENT_ROUTES.items():
        if route["match"](q):
            return route["sql"], route["params"](q)
    raise ValueError(f"No route for: {query}")

§03 Forge

Documents
become shards.

Forge is the document ingestion pipeline. You point it at a document. It extracts structured claim candidates at four tiers — from lossless schema lift to LLM-assisted extraction — then delegates all signing and Merkle construction to compile_generic_shard. Forge never reimplements the kernel.

The output is structurally identical to a chat shard. Same manifest.json. Same Parquet schema. Same verification path. Forge doesn't know it's compiling a field manual versus a legal statute versus a design document. It applies the same protocol to all of them.

Parse

Extract text blocks from PDF, markdown, or plain text. Segment into claim-sized units.

→ []str claims

Hash

BLAKE3 each claim. Build Merkle tree from leaf hashes. Compute root hash.

→ merkle_root: bytes32

Sign

axm-genesis signs the manifest via compile_generic_shard. Forge delegates — it never calls the signing primitives directly.

→ sig/manifest.sig

Write

Claims written to graph/claims.parquet. Manifest written. Shard directory sealed.

→ ~/.axm/shards/<shard_id>/

Forge — compile a document

# Compile a PDF into a sealed Knowledge Shard
axm-forge compile ./fm21-11-first-aid.pdf \
  --signing-key keys/publisher.pem \
  --shard-id    fm21-11-hemorrhage-v1

# Output
✓ Parsed    312 claim blocks
✓ Merkle    root: a3f9c2...d4e1
✓ Signed    ML-DSA-44 · FIPS 204
✓ Written   ~/.axm/shards/fm21-11-hemorrhage-v1/

# Verify immediately
axm-verify shard ~/.axm/shards/fm21-11-hemorrhage-v1/ \
  --trusted-key keys/publisher.pub

# → {"status":"PASS","error_count":0,"errors":[]}

§04 DuckDB Runtime

Every shard.
One database.
No server.

DuckDB runs in-process. There is no separate database server to start, configure, or migrate. When a query runs, axm-core mounts every shard directory as a glob-scanned Parquet view using read_parquet(glob, union_by_name=True).

New shards become queryable the moment they're written to disk. No ingestion step. No index rebuild. The glob just picks them up.

DuckDB mount pattern

import duckdb

conn = duckdb.connect()

# Mount all episodes across all shards
conn.execute("""
  CREATE VIEW episodes AS
    SELECT * FROM read_parquet(
      '~/.axm/shards/*/ext/episodes@1.parquet',
      union_by_name = true
    )
""")

# Mount all engineering lenses
conn.execute("""
  CREATE VIEW engineering AS
    SELECT * FROM read_parquet(
      '~/.axm/shards/*/ext/engineering@1.parquet',
      union_by_name = true
    )
""")

# Cross-shard join: decisions + their lineage
conn.execute("""
  CREATE VIEW decisions AS
    SELECT d.*, l.superseded_by, l.reason
    FROM read_parquet(
      '~/.axm/shards/*/ext/decisions@1.parquet',
      union_by_name = true
    ) d
    LEFT JOIN read_parquet(
      '~/.axm/shards/*/ext/lineage@1.parquet',
      union_by_name = true
    ) l USING (episode_id)
""")

# New shards are picked up automatically on next query.
# No re-ingestion required.

§05 Cross-Shard Joins

Every shard can
reference every other.

The references@1.parquet table in each decision shard carries pointers into other shards by shard_id and episode_id. DuckDB can resolve these at query time without a central registry.

references@1.parquet — schema

-- One row per cross-shard link
episode_id        VARCHAR   -- local episode
target_shard_id   VARCHAR   -- foreign shard
target_episode_id VARCHAR   -- foreign episode
claim_text        VARCHAR   -- the linked claim
reference_type    VARCHAR   -- supports | supersedes
                            -- contradicts | cites

Cross-shard reference query

-- Find all decisions and their source conversations
SELECT
  d.decision_text,
  r.target_shard_id  AS source_shard,
  r.claim_text       AS supporting_claim,
  r.reference_type
FROM decisions d
JOIN read_parquet(
  '~/.axm/shards/*/ext/references@1.parquet',
  union_by_name = true
) r ON d.episode_id = r.episode_id
WHERE r.reference_type = 'supersedes'
ORDER BY d.created_at DESC

lineage@1.parquet — supersession chain

-- Walk the supersession chain for the Merkle format decision
WITH RECURSIVE chain AS (
  -- anchor: find the original decision
  SELECT episode_id, decision_text, superseded_by, 0 AS depth
  FROM decisions
  WHERE lower(decision_text) LIKE '%merkle%'
    AND superseded_by IS NULL

  UNION ALL

  -- recurse: find what superseded it
  SELECT d.episode_id, d.decision_text, d.superseded_by, chain.depth + 1
  FROM decisions d
  JOIN chain ON d.superseded_by = chain.episode_id
)
SELECT * FROM chain ORDER BY depth

§06 Query Examples

Ask in English.
Get rows.

Spectra classifies these queries and executes parameterized SQL. No LLM call at query time.

$ axm-core spectra "what decisions have we made"

SELECT episode_id, decision_text, rationale FROM decisions ORDER BY created_at DESC LIMIT 20

✓ 14 rows · 3ms

$ axm-core spectra "what failed before we solved the merkle problem"

SELECT problem, graveyard, solution FROM engineering WHERE lower(problem) LIKE '%merkle%' AND graveyard IS NOT NULL ORDER BY created_at

✓ 3 rows · 6ms

$ axm-core spectra "what changed since january"

SELECT episode_id, topic, state, created_at FROM episodes WHERE created_at >= '2026-01-01' ORDER BY created_at DESC

✓ 47 rows · 4ms

$ axm-core spectra "show me all superseded decisions"

SELECT d.decision_text, l.reason, l.superseded_by FROM decisions d JOIN lineage l USING (episode_id) WHERE l.superseded_by IS NOT NULL

✓ 8 rows · 5ms

$ axm-core spectra "who have we mentioned across all shards"

SELECT DISTINCT unnest(people) AS person, count(*) AS mentions FROM episodes WHERE people IS NOT NULL GROUP BY person ORDER BY mentions DESC

✓ 23 rows · 9ms

$ axm-core spectra "what tools have we used"

SELECT DISTINCT unnest(tools) AS tool, count(*) AS occurrences FROM episodes WHERE tools IS NOT NULL GROUP BY tool ORDER BY occurrences DESC

✓ 11 rows · 7ms

§07 Quick Start

Install.
Mount.
Query.

Install

# Install axm-genesis first (required dep)
pip install -e ./axm-genesis

# Install axm-core
pip install -e ./axm-core

# Optional: install a spoke to generate shards
pip install -e ./axm-chat

Forge a shard

# Compile a document into a Knowledge Shard
axm-forge compile ./document.pdf \
  --signing-key keys/publisher.pem

# Verify it immediately
axm-verify shard ~/.axm/shards/document-v1/ \
  --trusted-key keys/publisher.pub
→ {"status":"PASS","error_count":0}

Query

# Natural language via Spectra
axm-core spectra "what decisions have we made"
axm-core spectra "what failed last week"
axm-core spectra "what changed since january"

# Raw DuckDB SQL directly
axm-core query --sql "SELECT * FROM episodes LIMIT 5"

# All shards. All parquet. In memory.

Ecosystem

# axm-genesis  — cryptographic kernel
#               BLAKE3 + ML-DSA-44 + axm-verify

# axm-core     — this package
#               Spectra + Forge + DuckDB runtime

# axm-chat     — spoke: conversation shards
# axm-show     — spoke: drone show telemetry
# axm-embodied — spoke: robot sensor streams

# All shards. Same format. Same verifier.

Three subsystems.One runtime.

Language in.SQL out.No LLM.

Documentsbecome shards.

Every shard.One database.No server.

Every shard canreference every other.

Ask in English.Get rows.