HAQ_ENGINE // 0.1.0

Compress without
hallucinations.

LeanRoute is the deterministic compression infrastructure for high-scale AI. Deploy 72B models on single GPUs while preserving 100% mathematical integrity and factual accuracy.

Supported Architectures

LLaMA 3 QWEN 2.5 MISTRAL GEMMA COMMAND R

THE_FRICTION

Quantization shouldn't break your reasoning. Or your math eval.

Explore the architecture

Standard quantization methods arbitrarily clip weights without understanding intelligence. Important reasoning circuits fail. Facts hallucinate. Large models become unpredictably dense.

LeanRoute replaces flat point solutions with a singular, deterministic inference pipeline. It measures Hessian trace sensitivity per block to protect intelligence while still guaranteeing aggressive compression. Trust the weights. Empower the hardware.

ANALYSIS // CORE

Compression at high velocity.

Proven Phase Testing Future

LLaMA 3 (70B) Analysis

FITS ON 1x A6000

Baseline 16-bit FP

Size (VRAM) 138.0 GB

Accuracy 92.4%

KL Shift 0.000

Standard 4-bit AWQ

Size (VRAM) 34.5 GB

Accuracy 82.4%

KL Shift 1.450

LeanRoute Adaptive HAQ

Size (VRAM) 38.6 GB

Accuracy 92.1%

KL Shift 0.041

Post-Compression MMLU Accuracy

LLaMA 3 (70B)

92.1%

Qwen 2.5 (72B)

90.8%

Mixtral 8x7B

88.5%

CAPABILITIES

Engineered for
precision.

Complex neural optimization reduced to elegant primitives.

Sensitivity Analysis

Calculates Hessian trace and activation outliers to mathematically protect vital reasoning pathways across thousands of layers.

Adaptive Allocation

Dynamically targets bit-rates—assigning rigorous 8-bit to sensitive FFN layers and dropping redundant attention heads to robust 4-bit.

Extensible CLI

Unix-philosophy inspired. Start compression jobs natively in your terminal with one simple `leanroute compress` command.

VLLM_COMPATIBLE

Zero-Loss Execution

Compiled artifacts export cleanly and are optimized for direct execution. Boot compressed models locally matching the exact logits to four decimal places.

Factuality Guard

Internal evaluation sweeps confirm zero hallucinations injected via precision degradation. Bulletproof for RAG pipelines.

Average Compression

0GB

Original 72B Network

Model Factuality Preserved

V_0.1 // BETA

Ready to compress?

Join the category leaders building deterministic neural networks. Get early API access to LeanRoute.

Limited slots available for Enterprise testing.

Compress without hallucinations.