Sovereign inference wrapper — Marina decides, Aamir builds

card_id: 40y_sim_bitnet_wrapper cluster: IT / engineering ~75 min
simulated data · code is real
Step 1 of 6
merkle://sandbox.it/wrapper-build sandbox
Loading…
bundle: 0 cards
simulated · code is real  · 
expand
Run it past Claude — type a thought, question, or counter-example. We'll show you exactly what we're sending on your behalf before anything leaves Merkle Trust.

Long-form card prose

For visitors who'd rather read than walk.

# Sovereign inference wrapper — Marina decides, Aamir builds

Minutes 0–2 — Landing (Track A: Marina)

You're Marina, IT director at a clinic. Between meetings. The
recurring API bill, the privacy boundary your data crosses every
time, the dependency on someone else's uptime — three pressures
that have been pushing toward the same answer for six months.

The hook: a sovereign inference wrapper that runs on hardware you
control, against a model that fits, with attestation that proves
what ran.

Minutes 2–5 — The four paths, IT-flavored

For a clinic IT director evaluating local AI infrastructure, the
order is GitHub-first, mesh-second, operator-third, paste-fourth.
The path Marina is evaluating is GitHub: clone, build, deploy on the
clinic's existing on-prem rack.

Minutes 5–35 — Track A: Marina's three decisions

The deployment-shape comparison loads as a three-row table:

```
═══════════════════════════════════════════════
DEPLOYMENT-SHAPE COMPARISON
═══════════════════════════════════════════════

CLOUD-ONLY
Tok/sec: high API$/mo: $$$$
Privacy boundary crossed: yes (every call)
Uptime: vendor's

LOCAL OLLAMA (vanilla)
Tok/sec: medium API$/mo: 0
Privacy: local
Attestation: none

LOCAL OLLAMA + WRAPPER + BITNET ← target
Tok/sec: medium-high (BitNet i2_s)
API$/mo: 0
Privacy: local
Attestation: every call sealed

═══════════════════════════════════════════════
```

Marina makes three decisions and seals a handoff to the contract
engineer:

1. Which model. BitNet 7B i2_s on the rack. The math fits; the
benchmarks hold; the paper is public.
2. Fallback policy. If the local model fails, the wrapper logs
the failure, surfaces it on the ops dashboard, and does not
silently route to a cloud provider. Failed inferences are visible
failures.
3. Compliance wrapper. Every inference call sealed: input hash,
model binary hash, wrapper config hash, output hash. Soul-chain
verifies seven files at every boot.

The handoff packet seals to the case-file ceremony. Aamir's name is
on it.

Minutes 35–60 — Track B: Aamir's verification trio

Aamir picks up the handoff. Three sessions, roughly five hours of
build work spread across an afternoon and an evening.

He clones the substrate. Builds bitnet.cpp from source on the M4
rack — Microsoft's setup_env.py pins torch~=2.2.1 (no Python
3.14 wheels yet); he bypasses via the manual path documented in
ADR-031. cmake invocation: -DBITNET_ARM_TL1=ON. The build
produces the i2_s-quantized binary.

Boot test. The wrapper SHAs seven soul-chain files. Each one
verifies. The wrapper logs:

```
═══════════════════════════════════════════════
WRAPPER VERIFICATION — three checks
═══════════════════════════════════════════════

✓ i2_s quantization confirmed
(not Q4_K_M; correct ternary path)

✓ model_dispatch_backend == bitnet.cpp
(not llama.cpp; correct runtime)

✓ 1.7× tokens/sec on Path B vs Path A
(vanilla → wrapped benchmark)

Boot complete. Soul-chain verified.

═══════════════════════════════════════════════
```

The 1.7× speed-up is on the same model under the wrapper, because
the wrapper's batching and the i2_s path together unlock work the
vanilla runtime did not.

The .md button puts both tracks' summary into your tag-along
bundle. Comment field routes a build-question to your own claude.ai
session.

Minutes 60–70 — Ceremony: the wrapper manifest

Aamir files BITNET_CPP_BUILD_DESCRIPTOR.md with each
<TO_FILL_AT_BUILD_SUCCESS> slot replaced. He updates
silos/Ember/active_personality.json to reflect the new backend.
He runs the wrapper's first sealed inference on real input.

WRAPPER_MANIFEST.json gets Ember's inference_backend: bitnet.cpp
and the binary_sha256 recorded. EMBER_BENCHMARK.md carries the
1.7× number with the test corpus referenced.

The 30-day attestation history starts tonight. Every inference from
this moment on is sealed: input hash, model hash, wrapper hash,
output hash, into the case-file ceremony.

Minutes 70–75 — Marina closes the loop

Marina greenlights Aamir's deliverable. The 30-day shadow run starts
tomorrow. If the benchmarks hold and no failures surface, the
wrapper goes to production. If anything breaks, the wrapper logs the
failure visibly and the team decides; nothing silently falls back to
a cloud provider.

The handoff loop closes. The package, the cert, the recovery seed —
all ride along.

<!-- finish_text -->

Finish text

That was the simulated path through a two-track build: a
decision-maker's three decisions and an engineer's three
verification checks. The full card breaks out the build descriptor
template, the soul-chain pattern, the ADR-031 doctrine, and a
30-day attestation prediction that's yours to test.