Skip to content
View luis-gasparschroeder's full-sized avatar
👋
👋

Block or report luis-gasparschroeder

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Website Google Scholar LinkedIn

I'm founding member of technical staff at UniversalAGI, building ML models for physics from first principles. I researched at the UC Berkeley's Sky Computing Lab with Joseph E. Gonzalez and Matei Zaharia on efficient inference and systems for LLMs (semantic caching with error guarantees, compound AI orchestration, agentic systems, database query optimization, and sparse attention). Before, I built production systems at Snowflake and Microsoft and studied computer science at TU Munich and UC Berkeley.

How do you build efficient and reliable systems when the components underneath are probabilistic?

Selected Papers

Optimizing LLM Queries in Relational Workloads

Row and column reordering algorithms to maximize KV cache reuse in batch analytics with LLMs. 3.4× faster job completion, 32% cost reduction.

MLSys 2025

vCache: Semantic Caching with Error Rate Guarantees | Code

First production-ready semantic caching with mathematical error guarantees. Outperforms all baselines on error rate and cache hit rate.

ICLR 2026

vAttention: Dynamic Sparse Attention for Efficient Inference | Code

First practical sparse attention with mathematical accuracy guarantees. Matches full model quality at up to 20× sparsity.

ICLR 2026

ALTO: Compound AI System Orchestration

Automatic optimization of compound AI systems through streaming and parallelism via nested ancestry abstraction. 10-30% latency improvements over LangGraph.

arXiv

The Danger of Overthinking in Agentic Systems

Identifies overthinking in reasoning models—when they favor extended reasoning over environmental interaction. Our mitigation strategies improve performance by 30%, reduce costs by 43%.

arXiv

Pinned Loading

  1. vcache-project/vCache vcache-project/vCache Public

    Reliable and Efficient Semantic Prompt Caching with vCache

    Python 60 3

  2. Azure/arm-ttk Azure/arm-ttk Public

    Azure Resource Manager Template Toolkit

    PowerShell 460 203

  3. clouditor/clouditor clouditor/clouditor Public

    The Clouditor is a tool to support continuous cloud assurance. Developed by Fraunhofer AISEC.

    Go 85 21

  4. ls1intum/Artemis ls1intum/Artemis Public

    Artemis - Interactive Learning with Automated Feedback

    Java 729 357

  5. skylight-org/sparse-attention-hub skylight-org/sparse-attention-hub Public

    Advancing the frontier of efficient AI

    Python 54 5