How It Works
Learn how the WebLLM Benchmark Suite tests AI model performance entirely in your browser using cutting-edge WebGPU technology.
Overview
This benchmark runs large language models (LLMs) completely in your browser using WebGPU — no server, no cloud, no data leaving your device. We measure how fast your hardware can generate AI responses across different types of tasks.
Technical Deep Dive
Model Quantization
We use 4-bit quantization (q4f16_1) to compress models from 16-bit floats. This reduces size by ~75% while maintaining 95%+ quality. Essential for browser deployment.
WebLLM & MLC Framework
Built on Apache TVM's MLC (Machine Learning Compilation) stack. Compiles PyTorch/ONNX models to WebGPU shaders with aggressive optimization for maximum throughput.
Context Window
Dynamically adjusted: 4096 tokens for 8GB+ RAM, 2048 for 4-8GB, 1024 for low-memory devices. Larger windows = better long-context understanding but slower performance.