Engineers, product leaders, and researchers obsessed with making AI inference faster than anyone thought possible.
James led inference infrastructure at Google Brain before founding Inferex in 2022. He has filed 8 patents in distributed systems and ML optimization. Under his leadership, Inferex has grown to serve 45+ enterprise customers and 12B+ inferences.
The story of Inferex told through the people who built it.
James Liu and Sarah Kim left their respective roles at Google Brain and NVIDIA to tackle the biggest unsolved problem in production ML: inference at scale. They wrote the first Inferex kernel optimizer over a long weekend in James's garage.
Marcus Webb joined as VP of Engineering, bringing deep expertise in distributed systems from his time at AWS. The first 5 enterprise customers saw 60%+ latency reductions in production within weeks of deployment.
Priya Patel joined as Head of Product, transforming Inferex from a powerful but rough SDK into a full platform with monitoring, auto-scaling, and a self-service dashboard. Customer count tripled.
45+ enterprise customers. 12B+ inferences served. SOC 2 Type II certified. < 8ms average P99 latency. The journey has only just begun.
Former NVIDIA research engineer. Architect of Inferex's core kernel optimization layer and hardware abstraction stack.
Ex-AWS distributed systems lead. Built the auto-scaling infrastructure that powers Inferex's 1M+ req/s throughput.
Former product lead at Databricks. Transformed Inferex from an SDK into a full platform used by 45+ enterprise teams.
We're looking for engineers who are obsessed with performance. If you think in nanoseconds and dream in distributed systems, we want to hear from you.
Get in Touch