Private llama.cpp branch

Long-Context Local Inference

CANAL moves old KV cache data out of GPU memory, stores it in RAM or SSD, and retrieves useful older context when the model needs it.

KV overflow + retrieval + RAM/SSD-backed KV storage + RAM compression.

Context Overview

1,010,000

tested context cap

Qwen 3.6

100/100

near-1M MRCR

Codebase QA

5/5

CANAL pass

Control

0/5

prompt too long

Package

PASS

binary proof

Current proof snapshot

Evidence Before Hype

Qwen 3.6 near-1M MRCR100/100

Mistral Small near-1M MRCR20/20

Gemma 4 26B near-1M MRCR20/20

Opus-MoE near-1M MRCR20/20

Qwen 3.6 codebase QA5/5

No-CANAL control0/5

Real-document QAPASS

Binary package proofPASS

Precise claims

Controlled access

The first preview is binary-only. Testers can run CANAL without receiving the private C++ source code.

Good-fit testers should have Linux, an NVIDIA GPU, local GGUF models, and a real long-context workflow.

Preview Details