Asbestos is a high-performance cross-platform research project demonstrating the integration of the llama.cpp C++ inference engine into native mobile and desktop environments, as well as a fully autonomous local agent accessible via the web. This project showcases the ability to run Large Language Models (LLMs) and Multimodal Vision models locally on consumer-grade hardware (CPU) with minimal latency and high privacy, while securely exposing these capabilities remotely using direct port forwarding.
This repository provides a professional-grade implementation of on-device AI for Android, iOS, and Desktop. It leverages a unified C++ backend (llama.cpp) while providing idiomatic native experiences in Kotlin, Swift, and a specialized Vision CLI.
- Cross-Platform Inference Core: Integrated llama.cpp as a high-performance engine across different architectures (ARM64, x86_64).
- Android Native Excellence: Built using the Android NDK and Kotlin. Includes a custom model downloader with real-time feedback and direct storage management.
- iOS Metal Acceleration: Custom-built XCFramework that leverages Apple's Metal and Accelerate frameworks for hardware-accelerated LLM inference.
- Multimodal Vision Support: Integrated LLaVA-style multimodal projection (mmproj) to enable image description and analysis locally via the CLI.
- Memory and Storage Optimization: Implemented efficient memory mapping (mmap) and disk space verification to handle large model weights on resource-constrained devices.
The Android implementation uses the JDI (Java Native Interface) to call the C++ engine.
- Language: Kotlin, C++.
- Build System: Gradle with CMake for the native layer.
- Optimization: Compiled with NDK 26+ and optimized for ARM64-v8a.
- Features: Asynchronous model downloading from Hugging Face with background task management and disk space safety checks.
The iOS implementation is split into a reusable Swift Package and a demonstration SwiftUI app.
- Language: Swift.
- Acceleration: Uses Metal for GPU-based inference and Accelerate for CPU-based BLAS operations.
- Build System: XCFramework containing slices for both physical devices and simulators (arm64/x86_64).
- Features: SwiftUI-based chat interface with MainActor-safe state management for real-time token streaming.
Includes a dedicated runner for multimodal vision tasks. This component bridges the core C++ engine to perform complex visual analysis on-device.
- Tools: Custom
vision.shbridge usingllama-cli. - Capabilities: Local image understanding, OCR, and descriptive analysis without external APIs.
Includes a fully-featured local AI assistant capable of tool execution (shell commands, file operations, system actions) against the host machine. This component turns the local llama.cpp engine into a proactive agent that can "do things" for the user, while maintaining strict privacy and security through direct, secure port forwarding (via Cloudflare Tunnels, VS Code Dev Tunnels, or ngrok).
- Approach: A Python/FastAPI server wraps the local
llama-serverprocess to expose a standard OpenAI-compatible API (/v1/chat/completions). Public HTTPS tunnels forward this local port to the internet natively without a central broker. - Multimodal Support: Fully integrated VLM (Vision Engine) within the agent loop, allowing the agent to "see" and reason about attached images in the chat UI.
- Capabilities: Proactive tool execution with smart error recovery (e.g., automatic
mkdir -pfor missing directories during file-write operations). - Project Insight: Dedicated codebase comprehension mode providing interactive flowcharts (Mermaid.js), side-by-side code previews, and active recall quizzes with interactive grading. This feature was added to combat "knowledge atrophy" by helping developers maintain a deep mental model of their code rather than just delegating logic to AI. You can test this by scanning the asbestos/test directory within the Insight UI or using the CLI.
- Explain CLI: A terminal-based insight tool (
asbestos-cli/explain.sh) that provides structured logic analysis and interactive comprehension checks directly in your shell. - Safety: Robust human-in-the-loop intercept mechanism. The agent pauses its execution loop and requests user approval via a persistent confirmation ID system for any destructive commands or file writes.
- Accessibility: Securely control your local computer from anywhere (e.g., a smartphone) via a self-hosted, responsive Web Chat UI.
Readmore: asbestos-agent/README.md
This project is optimized for the Qwen series of models due to their exceptional performance-to-size ratio.
- Primary Model: Qwen 3.5 0.8B (Quantized GGUF)
- Vision Projector: mmproj Qwen 3.5 0.8B (bf16)
The Vision component can be executed directly to analyze images:
./asbestos-cli/vision.sh <path_to_image> "Describe this image in detail."Building on-device AI requires overcoming several hardware and software limitations.
Challenge: The initial Android build failed during native compilation because the llama.cpp project required newer CMake features than the default Android Studio environment provided. Solution: Manually configured the app's build.gradle.kts to use CMake 3.22.1 and disabled incompatible optimization flags (GGML_CPU_KLEIDIAI) that were causing library fetch errors.
Challenge: Transferring 800MB+ model files to a mobile device often resulted in IOExceptions or system crashes when the device reached capacity. Solution: Implemented pre-download disk space verification logic using Java's StatFs and Swift's FileManager. The app now checks available block sizes before beginning a download.
- Challenge: Updating the UI during real-time token generation caused race conditions and thread-safety warnings when coming from background C++ callbacks.
- Solution: Refactored the LlamaState manager to use Swift's @MainActor isolation. All background token updates are now wrapped in Task { @MainActor in } blocks.
- Challenge: Creating a unified bundle for iOS and Simulators initially failed because modern llama.cpp headers were missing from the search path during module-map generation.
- Solution: Wrote a custom build-ios-xcframework.sh script that explicitly handles header copying and module-map linking.
- Multimodal Vision for iOS and Android: Bring the image understanding capabilities (currently CLI-only) to the mobile apps by integrating the mmproj vision projector, an image picker UI, and combined image+text tokenization.
- Conversation Persistence: Save and restore chat history across app sessions using local storage (Core Data on iOS, Room on Android).
- Voice Input: Add speech-to-text support for hands-free interaction with the AI assistant.
- Model Selection: Allow users to switch between different downloaded models from within the app.
Developed as a technical showcase for local, private, and high-performance mobile AI.

