RequestScopedPipeline: Concurrent Inference in Diffusers without Race Conditions or Memory Duplication
Diffusers pipelines weren't designed for concurrency: calling pipe() simultaneously causes race conditions in schedulers, 'Already borrowed' errors in Rust tokenizers, or duplicates entire models in memory. My contribution (#12328) introduces RequestScopedPipeline, which solves these issues by creating lightweight per-request views, cloning only small mutable components, and adding automatic locks to tokenizers. Result: a server that handles multiple concurrent users without exploding GPU memory.