researchactiveJune 2025

Dendrite

O(1) fork LLM inference engine for tree-structured reasoning

◆ What

O(1) fork LLM inference engine for tree-structured reasoning

500nsfork_latency

10000xspeedup

◆ How

RustCUDAFlashInferCandlePagedAttention

Built on: DGX Spark Platform

◆ Why

Pioneer + The Lab

Hero: Flagship projects with major impact

Overview

Agent-native inference engine with constant-time forking for tree search. 1000-10000x faster branching than vLLM/SGLang using copy-on-write KV cache. Built in Rust with FlashInfer integration. Enables efficient MCTS, beam search, and speculative decoding.