Private llama.cpp branch
Long-Context Local Inference
CANAL moves old KV cache data out of GPU memory, stores it in RAM or SSD, and retrieves useful older context when the model needs it.
KV overflow + retrieval + RAM/SSD-backed KV storage + RAM compression.
Context Overview
1,010,000
tested context cap
Qwen 3.6
100/100
near-1M MRCR
Codebase QA
5/5
CANAL pass
Control
0/5
prompt too long
Package
PASS
binary proof
Current proof snapshot
Evidence Before Hype
Precise claims
What CANAL Is Not
- Not unlimited context.
- Not perfect recall.
- Not a replacement for model quality.
- Not a public production release today.
Controlled access
Private Binary Preview
The first preview is binary-only. Testers can run CANAL without receiving the private C++ source code.
Good-fit testers should have Linux, an NVIDIA GPU, local GGUF models, and a real long-context workflow.
Preview Details