C CANAL

Private llama.cpp branch

Long-Context Local Inference

CANAL moves old KV cache data out of GPU memory, stores it in RAM or SSD, and retrieves useful older context when the model needs it.

KV overflow + retrieval + RAM/SSD-backed KV storage + RAM compression.

Context Overview
1,010,000
tested context cap
Qwen 3.6
100/100
near-1M MRCR
Codebase QA
5/5
CANAL pass
Control
0/5
prompt too long
Package
PASS
binary proof

Current proof snapshot

Evidence Before Hype

Qwen 3.6 near-1M MRCR100/100
Mistral Small near-1M MRCR20/20
Gemma 4 26B near-1M MRCR20/20
Opus-MoE near-1M MRCR20/20
Qwen 3.6 codebase QA5/5
No-CANAL control0/5
Real-document QAPASS
Binary package proofPASS

Precise claims

What CANAL Is Not

Controlled access

Private Binary Preview

The first preview is binary-only. Testers can run CANAL without receiving the private C++ source code.

Good-fit testers should have Linux, an NVIDIA GPU, local GGUF models, and a real long-context workflow.

Preview Details