axm-core is the runtime between sealed shards and spoken questions. Spectra translates natural language to SQL. Forge compiles raw documents into shards. DuckDB executes cross-shard joins in memory. No inference at query time.
axm-core is the hub package. It neither signs shards (that's axm-genesis) nor produces them from conversations (that's a spoke). It does one thing: turns a pile of sealed Parquet shard directories into a queryable relational database, and lets you ask it questions in English.
Natural language query engine. Parses intent, routes to a named query in the INTENT_ROUTES table, executes parameterized SQL against DuckDB. No LLM call at query time — the LLM runs once at distill time to build the index.
Document compiler. Takes PDFs, markdown, plain text, CSV, XLSX. Extracts structured claims at four tiers. Delegates signing and Merkle construction to axm-genesis via compile_generic_shard. Never reimplements the kernel.
The query substrate. Mounts every shard as a glob-scanned Parquet view. Union by name. Cross-shard JOINs on episode_id and claim_id. All in memory. No separate database server. No schema migrations.
Spectra is deliberately not a RAG system. It doesn't embed your query, search for similar chunks, and ask an LLM to synthesize them. That path is expensive and non-deterministic.
Instead: your natural language query is classified into a named intent. The intent maps to a parameterized SQL template. DuckDB executes it. The output is a result set — structured, reproducible, auditable.
The intent classifier runs locally via Ollama at distill time. At query time the only thing that runs is SQL.
"What decisions have we made about the Merkle format?"
Regex + keyword rules classify into a named intent. INTENT_ROUTES maps each intent to a SQL template. No LLM call.
Named parameters substituted safely. Query executes against glob-mounted Parquet views. Zero injection surface.
Rows returned. Same query. Same shards. Same output. Always.
# Intent routes — no raw string interpolation INTENT_ROUTES = { "decisions": { "match": lambda q: "decision" in q or "decided" in q, "sql": """ SELECT episode_id, decision_text, rationale, shard_id FROM decisions WHERE lower(decision_text) LIKE lower(:term) ORDER BY created_at DESC """, "params": lambda q: {"term": extract_term(q)}, }, "failures": { "match": lambda q: "fail" in q or "graveyard" in q or "tried" in q, "sql": """ SELECT episode_id, problem, graveyard, solution FROM read_parquet(:glob) WHERE graveyard IS NOT NULL ORDER BY created_at DESC """, "params": lambda q: {"glob": "~/.axm/shards/*/ext/engineering@1.parquet"}, }, # ... additional routes } def route(query: str) -> tuple[str, dict]: q = query.lower() for name, route in INTENT_ROUTES.items(): if route["match"](q): return route["sql"], route["params"](q) raise ValueError(f"No route for: {query}")
Forge is the document ingestion pipeline. You point it at a document. It extracts structured claim candidates at four tiers — from lossless schema lift to LLM-assisted extraction — then delegates all signing and Merkle construction to compile_generic_shard. Forge never reimplements the kernel.
The output is structurally identical to a chat shard. Same manifest.json. Same Parquet schema. Same verification path. Forge doesn't know it's compiling a field manual versus a legal statute versus a design document. It applies the same protocol to all of them.
Extract text blocks from PDF, markdown, or plain text. Segment into claim-sized units.
BLAKE3 each claim. Build Merkle tree from leaf hashes. Compute root hash.
axm-genesis signs the manifest via compile_generic_shard. Forge delegates — it never calls the signing primitives directly.
Claims written to graph/claims.parquet. Manifest written. Shard directory sealed.
# Compile a PDF into a sealed Knowledge Shard axm-forge compile ./fm21-11-first-aid.pdf \ --signing-key keys/publisher.pem \ --shard-id fm21-11-hemorrhage-v1 # Output ✓ Parsed 312 claim blocks ✓ Merkle root: a3f9c2...d4e1 ✓ Signed ML-DSA-44 · FIPS 204 ✓ Written ~/.axm/shards/fm21-11-hemorrhage-v1/ # Verify immediately axm-verify shard ~/.axm/shards/fm21-11-hemorrhage-v1/ \ --trusted-key keys/publisher.pub # → {"status":"PASS","error_count":0,"errors":[]}
DuckDB runs in-process. There is no separate database server to start, configure, or migrate. When a query runs, axm-core mounts every shard directory as a glob-scanned Parquet view using read_parquet(glob, union_by_name=True).
New shards become queryable the moment they're written to disk. No ingestion step. No index rebuild. The glob just picks them up.
import duckdb conn = duckdb.connect() # Mount all episodes across all shards conn.execute(""" CREATE VIEW episodes AS SELECT * FROM read_parquet( '~/.axm/shards/*/ext/episodes@1.parquet', union_by_name = true ) """) # Mount all engineering lenses conn.execute(""" CREATE VIEW engineering AS SELECT * FROM read_parquet( '~/.axm/shards/*/ext/engineering@1.parquet', union_by_name = true ) """) # Cross-shard join: decisions + their lineage conn.execute(""" CREATE VIEW decisions AS SELECT d.*, l.superseded_by, l.reason FROM read_parquet( '~/.axm/shards/*/ext/decisions@1.parquet', union_by_name = true ) d LEFT JOIN read_parquet( '~/.axm/shards/*/ext/lineage@1.parquet', union_by_name = true ) l USING (episode_id) """) # New shards are picked up automatically on next query. # No re-ingestion required.
The references@1.parquet table in each decision shard carries pointers into other shards by shard_id and episode_id. DuckDB can resolve these at query time without a central registry.
-- One row per cross-shard link episode_id VARCHAR -- local episode target_shard_id VARCHAR -- foreign shard target_episode_id VARCHAR -- foreign episode claim_text VARCHAR -- the linked claim reference_type VARCHAR -- supports | supersedes -- contradicts | cites
-- Find all decisions and their source conversations SELECT d.decision_text, r.target_shard_id AS source_shard, r.claim_text AS supporting_claim, r.reference_type FROM decisions d JOIN read_parquet( '~/.axm/shards/*/ext/references@1.parquet', union_by_name = true ) r ON d.episode_id = r.episode_id WHERE r.reference_type = 'supersedes' ORDER BY d.created_at DESC
-- Walk the supersession chain for the Merkle format decision WITH RECURSIVE chain AS ( -- anchor: find the original decision SELECT episode_id, decision_text, superseded_by, 0 AS depth FROM decisions WHERE lower(decision_text) LIKE '%merkle%' AND superseded_by IS NULL UNION ALL -- recurse: find what superseded it SELECT d.episode_id, d.decision_text, d.superseded_by, chain.depth + 1 FROM decisions d JOIN chain ON d.superseded_by = chain.episode_id ) SELECT * FROM chain ORDER BY depth
Spectra classifies these queries and executes parameterized SQL. No LLM call at query time.
# Install axm-genesis first (required dep) pip install -e ./axm-genesis # Install axm-core pip install -e ./axm-core # Optional: install a spoke to generate shards pip install -e ./axm-chat
# Compile a document into a Knowledge Shard axm-forge compile ./document.pdf \ --signing-key keys/publisher.pem # Verify it immediately axm-verify shard ~/.axm/shards/document-v1/ \ --trusted-key keys/publisher.pub → {"status":"PASS","error_count":0}
# Natural language via Spectra axm-core spectra "what decisions have we made" axm-core spectra "what failed last week" axm-core spectra "what changed since january" # Raw DuckDB SQL directly axm-core query --sql "SELECT * FROM episodes LIMIT 5" # All shards. All parquet. In memory.
# axm-genesis — cryptographic kernel # BLAKE3 + ML-DSA-44 + axm-verify # axm-core — this package # Spectra + Forge + DuckDB runtime # axm-chat — spoke: conversation shards # axm-show — spoke: drone show telemetry # axm-embodied — spoke: robot sensor streams # All shards. Same format. Same verifier.