VAI: Zero-Overhead Model Switching for AI Inference
published: trueDescription: "Why we treat model weights like ROM, not malloc()" The Problem Every time you switch models in a typical inference setup: 1. Unload weights from GPU memory 2. Load new weights from disk 3. Rebuild execution state 4. Warm ...
Jan 25, 20264 min read3