Skip to content

used to get information from database using natural language

License

Notifications You must be signed in to change notification settings

behindSolution/laragrep

Repository files navigation

LaraGrep

Latest Version Tests

Transform natural language questions into safe, parameterized SQL queries using AI.
LaraGrep uses an agent loop — the AI executes queries, sees the results,
and iteratively reasons until it can provide a final answer.


Quick Start

1. Install

composer require behindsolution/laragrep

2. Publish config and migrations

php artisan vendor:publish --tag=laragrep-config
php artisan vendor:publish --tag=laragrep-migrations

3. Create the SQLite database and run migrations

LaraGrep stores conversations, monitor logs, and recipes in a separate SQLite database by default, keeping everything isolated from your main database.

Create the file and run migrations:

# Linux / macOS
touch database/laragrep.sqlite

# Windows
type nul > database\laragrep.sqlite

Add a laragrep connection to your config/database.php:

'connections' => [
    // ... your existing connections

    'laragrep' => [
        'driver' => 'sqlite',
        'database' => database_path('laragrep.sqlite'),
        'foreign_key_constraints' => true,
    ],
],

Then point LaraGrep to it in your .env:

LARAGREP_CONVERSATION_CONNECTION=laragrep
LARAGREP_MONITOR_CONNECTION=laragrep
LARAGREP_RECIPES_CONNECTION=laragrep

Run the migrations:

php artisan migrate

Already using SQLite as your main database? You can skip the connection setup — LaraGrep will use the default sqlite connection as-is.

4. Add your API key to .env

LARAGREP_PROVIDER=openai
LARAGREP_API_KEY=sk-...
LARAGREP_MODEL=gpt-4o-mini

5. Define your tables in config/laragrep.php

use LaraGrep\Config\Table;
use LaraGrep\Config\Column;
use LaraGrep\Config\Relationship;

'contexts' => [
    'default' => [
        // ...
        'tables' => [
            Table::make('users')
                ->description('Registered users.')
                ->columns([
                    Column::id(),
                    Column::string('name'),
                    Column::string('email'),
                    Column::timestamp('created_at'),
                ]),

            Table::make('orders')
                ->description('Customer orders.')
                ->columns([
                    Column::id(),
                    Column::bigInteger('user_id')->unsigned(),
                    Column::decimal('total', 10, 2),
                    Column::enum('status', ['pending', 'paid', 'cancelled']),
                    Column::timestamp('created_at'),
                ])
                ->relationships([
                    Relationship::belongsTo('users', 'user_id'),
                ]),
        ],
    ],
],

6. Ask your first question

curl -X POST http://localhost/laragrep \
  -H "Content-Type: application/json" \
  -d '{"question": "How many users registered this week?"}'
{
    "summary": "There were 42 new registrations this week.",
    "conversation_id": "550e8400-e29b-41d4-a716-446655440000"
}

That's it. LaraGrep validates, executes, and answers automatically.


Monitor

LaraGrep includes a built-in monitoring dashboard. Enable it to track every query, error, token usage, and performance metric.

Enable

LARAGREP_MONITOR_ENABLED=true

Access the dashboard at GET /laragrep/monitor:

  • Logs — Filterable list of all queries with status, duration, iterations, and token estimates
  • Overview — Aggregate stats: success rate, errors, token usage, daily charts, top scopes
  • Detail — Full agent loop trace for each query: SQL, bindings, results, AI reasoning

Protect it with middleware:

// config/laragrep.php
'monitor' => [
    'enabled' => true,
    'middleware' => ['auth:sanctum'],
],

Async Mode

The agent loop can take 30-100+ seconds with multiple iterations, easily exceeding PHP or Nginx timeouts. Async mode dispatches the processing to a queue job and returns immediately.

Enable

LARAGREP_ASYNC_ENABLED=true
LARAGREP_ASYNC_QUEUE_CONNECTION=redis

Requires a real queue driver (redis, database, sqs, etc.). LaraGrep will throw an exception at boot if the queue connection uses the sync driver.

When enabled, all requests become async — the frontend doesn't decide, the backend does.

How It Works

POST /laragrep { "question": "..." }

-> 202 Accepted
{
    "query_id": "550e8400-e29b-41d4-a716-446655440000",
    "channel": "laragrep.550e8400-e29b-41d4-a716-446655440000"
}

The agent loop runs in a background job. When it finishes, the result is delivered via broadcasting (WebSocket) and/or polling (GET endpoint).

Polling

GET /laragrep/queries/{query_id}

Returns the current status:

{ "status": "processing" }

While the AI is working, the response includes a progress message describing the current step:

{ "status": "processing", "progress": "Counting users registered this week" }

Or when completed:

{
    "status": "completed",
    "summary": "There were 42 new registrations this week.",
    "conversation_id": "...",
    "recipe_id": 42
}

Or on failure:

{ "status": "failed", "error": "Sorry, something went wrong..." }

Broadcasting (Optional)

If you have Laravel broadcasting configured (Reverb, Pusher, Soketi, Ably), LaraGrep broadcasts two events on the channel returned in the response:

Event Payload
laragrep.answer.progress queryId, iteration, message
laragrep.answer.ready queryId, summary, conversationId, recipeId
laragrep.answer.failed queryId, error

Frontend example (Laravel Echo):

Echo.channel(response.channel)
    .listen('.laragrep.answer.progress', (e) => {
        showProgress(e.message); // "Counting users registered this week"
    })
    .listen('.laragrep.answer.ready', (e) => {
        showAnswer(e.summary);
    })
    .listen('.laragrep.answer.failed', (e) => {
        showError(e.error);
    });

For private channels, set LARAGREP_ASYNC_PRIVATE=true and register the channel authorization in your routes/channels.php:

Broadcast::channel('laragrep.{queryId}', function ($user, $queryId) {
    return true; // your authorization logic
});

Broadcasting is entirely optional — polling via GET works without any broadcasting setup. If you only want polling, make sure broadcasting is disabled in your .env:

BROADCAST_CONNECTION=null

Completed records cleanup

Async records are automatically cleaned up after 24 hours (configurable via LARAGREP_ASYNC_RETENTION_HOURS).


How It Works

Unlike simple text-to-SQL tools, LaraGrep uses an agent loop:

  1. You ask a question in natural language
  2. The AI analyzes the schema and decides which queries to run
  3. LaraGrep validates and executes the queries safely
  4. The AI sees the results and decides: run more queries, or provide the final answer
  5. Repeat until the AI has enough data to answer (up to max_iterations)

This means the AI can build on previous results, self-correct, break down complex analysis into steps, and batch independent queries in a single iteration.

"How many users and how many orders do I have?"

  -> AI: Sends 2 queries in one batch (independent)        (1 API call)
  -> AI: Sees both results, provides the final answer       (1 API call)

Configuration

AI Provider

OpenAI:

LARAGREP_PROVIDER=openai
LARAGREP_API_KEY=sk-...
LARAGREP_MODEL=gpt-4o-mini

Anthropic:

LARAGREP_PROVIDER=anthropic
LARAGREP_API_KEY=sk-ant-...
LARAGREP_MODEL=claude-sonnet-4-20250514

Ollama (local):

LARAGREP_PROVIDER=openai
LARAGREP_API_KEY=ollama
LARAGREP_MODEL=qwen3-coder:30b
LARAGREP_BASE_URL=http://localhost:11434/v1/chat/completions

Ollama exposes an OpenAI-compatible API, so it works with the openai provider. The API key can be any non-empty string. This keeps your data fully local.

Fallback Provider

If the primary provider fails (timeout, rate limit, API down), LaraGrep can automatically retry with a fallback:

LARAGREP_FALLBACK_PROVIDER=anthropic
LARAGREP_FALLBACK_API_KEY=sk-ant-...
LARAGREP_FALLBACK_MODEL=claude-sonnet-4-20250514

Works in any direction — OpenAI primary with Anthropic fallback, or vice versa. When the primary succeeds, the fallback is never called. No cooldown, no circuit breaker — just tries in order.

Schema Loading Mode

Mode Behavior
manual Only use tables defined in config (default)
auto Auto-load from information_schema (MySQL/MariaDB/PostgreSQL)
merged Auto-load first, then overlay config definitions
LARAGREP_SCHEMA_MODE=manual
  • manual is the safest — no accidental schema exposure.
  • auto is ideal for quick setup when all tables are fair game.
  • merged lets you auto-load and then add descriptions, relationships, or extra tables on top.

Table Definitions

Define tables using fluent classes with IDE autocomplete:

use LaraGrep\Config\Table;
use LaraGrep\Config\Column;
use LaraGrep\Config\Relationship;

Table::make('orders')
    ->description('Customer orders.')
    ->columns([
        Column::id(),
        Column::bigInteger('user_id')->unsigned()->description('FK to users.id.'),
        Column::decimal('total', 10, 2)->description('Order total.'),
        Column::enum('status', ['pending', 'paid', 'cancelled']),
        Column::json('metadata')
            ->description('Order metadata')
            ->template(['shipping_method' => 'express', 'tracking_code' => 'BR123456789']),
        Column::timestamp('created_at'),
    ])
    ->relationships([
        Relationship::belongsTo('users', 'user_id'),
    ]),

Supported column types: id(), bigInteger(), integer(), smallInteger(), tinyInteger(), string(), text(), decimal(), float(), boolean(), date(), dateTime(), timestamp(), json(), enum().

Modifiers: ->unsigned(), ->nullable(), ->description().

For JSON columns, ->template() provides an example structure so the AI knows how to query with JSON_EXTRACT.

Organizing Large Schemas

For projects with many tables, extract each definition into its own class:

// app/LaraGrep/Tables/OrdersTable.php
namespace App\LaraGrep\Tables;

use LaraGrep\Config\Table;
use LaraGrep\Config\Column;
use LaraGrep\Config\Relationship;

class OrdersTable
{
    public static function define(): Table
    {
        return Table::make('orders')
            ->description('Customer orders.')
            ->columns([
                Column::id(),
                Column::bigInteger('user_id')->unsigned(),
                Column::decimal('total', 10, 2),
                Column::timestamp('created_at'),
            ])
            ->relationships([
                Relationship::belongsTo('users', 'user_id'),
            ]);
    }
}
// config/laragrep.php
'tables' => [
    \App\LaraGrep\Tables\UsersTable::define(),
    \App\LaraGrep\Tables\OrdersTable::define(),
    \App\LaraGrep\Tables\ProductsTable::define(),
],

Multi-Connection Tables

When some tables live in a different database, use ->connection() to tell LaraGrep which connection to use for queries on that table:

'tables' => [
    Table::make('users')
        ->description('Registered users.')
        ->columns([
            Column::id(),
            Column::string('name'),
            Column::string('email'),
        ]),

    Table::make('analytics_events')
        ->description('Columnar analytics store.')
        ->connection('clickhouse', 'ClickHouse')
        ->columns([
            Column::string('event_name'),
            Column::timestamp('event_time'),
            Column::bigInteger('user_id'),
        ]),
],

The second parameter is optional and describes the database engine. This is important when the external database uses a different SQL dialect (e.g., ClickHouse, PostgreSQL, SQLite) — the AI will generate compatible syntax for each table.

// Connection only (same engine as the primary database)
->connection('replica')

// Connection + engine (different SQL dialect)
->connection('clickhouse', 'ClickHouse')

When the AI encounters tables on different connections, it will:

  1. Generate engine-compatible SQL for each table
  2. Include the connection name in each query entry so the executor runs it on the right database
  3. Avoid cross-connection JOINs — instead, it queries each database separately and combines the results in the final answer

Multi-Tenant / Dynamic Connections

In multi-tenant applications where each tenant has its own database, the connection name is only known at runtime. Pass a Closure instead of a string to resolve the connection dynamically:

'contexts' => [
    'default' => [
        'connection' => fn () => 'tenant_' . tenant()->id,
        'tables' => [
            Table::make('users')->columns([...]),
            Table::make('orders')->columns([...]),
        ],
    ],
],

The closure is evaluated per-request, so it works in HTTP (middleware sets the tenant), queue jobs, and artisan commands — as long as your tenant context is available.

You can mix dynamic and static connections. For example, tenant tables on a dynamic connection and shared tables on a fixed central database:

'contexts' => [
    'default' => [
        'connection' => fn () => app('tenant')->getConnectionName(),
        'tables' => [
            Table::make('orders')->columns([...]),

            Table::make('plans')
                ->connection('central')
                ->columns([...]),
        ],
    ],
],

Table-level connections also accept closures:

Table::make('orders')->connection(fn () => 'tenant_' . tenant()->id)

Named Scopes (Contexts)

Work with multiple databases or table sets:

'contexts' => [
    'default' => [
        'connection' => env('LARAGREP_CONNECTION'),
        'tables' => [...],
    ],
    'analytics' => [
        'connection' => 'analytics_db',
        'schema_mode' => 'auto',
        'database' => ['type' => 'MariaDB 10.6', 'name' => 'analytics'],
        'exclude_tables' => ['migrations', 'jobs'],
    ],
],

Select a scope via the URL: POST /laragrep/analytics

Query Protection

LARAGREP_MAX_ROWS=20
LARAGREP_MAX_QUERY_TIME=3
  • max_rows — Automatically injects LIMIT into queries that don't have one. Default: 20. Set to 0 to disable.
  • max_query_time — Maximum execution time per query in seconds. Kills slow queries before they block the database. Default: 3. Supports MySQL, MariaDB, PostgreSQL, and SQLite.

Agent Loop

LARAGREP_MAX_ITERATIONS=10

Simple questions typically resolve in 1-2 iterations. Complex analytical questions may need more. Higher values increase capability but also cost.

Smart Schema

For large databases, LaraGrep can make an initial AI call to identify only the relevant tables, reducing token usage across all iterations.

LARAGREP_SMART_SCHEMA=20

Activates automatically when the table count reaches the threshold. With 200 tables and only 5 relevant, this reduces token usage by ~60%.

Conversation Persistence

Multi-turn conversations are enabled by default. Previous questions and answers are sent as context for follow-ups.

LARAGREP_CONVERSATION_ENABLED=true
LARAGREP_CONVERSATION_CONNECTION=sqlite
LARAGREP_CONVERSATION_MAX_MESSAGES=10
LARAGREP_CONVERSATION_RETENTION_DAYS=10

Route Protection

'route' => [
    'prefix' => 'laragrep',
    'middleware' => ['auth:sanctum'],
],

Usage

API Endpoint

POST /laragrep/{scope?}

Basic request:

curl -X POST http://localhost/laragrep \
  -H "Content-Type: application/json" \
  -d '{"question": "How many users registered this week?"}'

With authentication and options:

curl -X POST http://localhost/laragrep \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{
    "question": "How many users registered this week?",
    "conversation_id": "optional-uuid-for-follow-ups",
    "debug": true
  }'

Using a named scope:

curl -X POST http://localhost/laragrep/analytics \
  -H "Content-Type: application/json" \
  -d '{"question": "What are the top 5 products by revenue?"}'

Debug response (when debug: true):

{
    "summary": "There were 42 new registrations this week.",
    "conversation_id": "550e8400-e29b-41d4-a716-446655440000",
    "steps": [
        {
            "query": "SELECT COUNT(*) as total FROM users WHERE created_at >= ?",
            "bindings": ["2025-01-20"],
            "results": [{"total": 42}],
            "reason": "Counting users registered in the current week"
        }
    ],
    "debug": {
        "queries": [
            {"query": "SELECT COUNT(*) ...", "bindings": ["..."], "time": 1.23}
        ],
        "iterations": 1
    }
}

Programmatic Usage

use LaraGrep\LaraGrep;

$laraGrep = app(LaraGrep::class);

$answer = $laraGrep->answerQuestion(
    question: 'How many orders were placed today?',
    scope: 'default',
);

echo $answer['summary'];

Formatting Results

Use formatResult() to transform raw query results into structured formats via AI.

Query format — a single consolidated SQL query for export:

$answer = $laraGrep->answerQuestion('Weekly sales by region');

$result = $laraGrep->formatResult($answer, 'query');
// [
//     'title' => 'Weekly Sales by Region',
//     'headers' => ['Region', 'Total Sales', 'Order Count'],
//     'query' => 'SELECT r.name as region, SUM(o.total) ... GROUP BY r.name',
//     'bindings' => ['2026-02-01'],
// ]

Returns the SQL itself, no LIMIT. Use it with Laravel's streaming tools:

// Stream with cursor
foreach (DB::cursor($result['query'], $result['bindings']) as $row) {
    // process row
}

// Chunk for batch processing
DB::table(DB::raw("({$result['query']}) as sub"))
    ->setBindings($result['bindings'])
    ->chunk(1000, function ($rows) {
        // process chunk
    });

Notification format — ready-to-render content for email, Slack, or webhooks:

$notification = $laraGrep->formatResult($answer, 'notification');
// [
//     'title' => 'Weekly Sales Report',
//     'html' => '<p>Sales this week totaled...</p><table>...</table>',
//     'text' => 'Sales this week totaled...\nProduct | Revenue...',
// ]

Saved Queries (Recipes)

Auto-save a "recipe" after each answer — the question, scope, and queries that worked. The response includes a recipe_id for exports, notifications, or scheduled re-execution.

Enable:

LARAGREP_RECIPES_ENABLED=true

After enabling, publish and run the migration for the laragrep_recipes table.

API response with recipe:

{
    "summary": "Sales this week totaled...",
    "conversation_id": "uuid",
    "recipe_id": 42
}

Dispatch a recipe:

curl -X POST http://localhost/laragrep/recipes/42/dispatch \
  -H "Content-Type: application/json" \
  -d '{"format": "notification", "period": "now"}'

The period parameter controls timing:

  • "now" — immediate execution (default)
  • "2026-02-10 08:00:00" — scheduled for a specific date/time

LaraGrep fires a RecipeDispatched event. Your app handles the rest via a listener:

// app/Listeners/HandleRecipeDispatch.php
use LaraGrep\Events\RecipeDispatched;

public function handle(RecipeDispatched $event)
{
    $job = new ProcessRecipeJob($event->recipe, $event->format, $event->userId);

    if ($event->period === 'now') {
        dispatch($job);
    } else {
        dispatch($job)->delay(Carbon::parse($event->period));
    }
}
// app/Jobs/ProcessRecipeJob.php
use LaraGrep\LaraGrep;

public function handle(LaraGrep $laraGrep)
{
    $answer = $laraGrep->replayRecipe($this->recipe);
    $result = $laraGrep->formatResult($answer, $this->format);

    // Send email, generate Excel, post to Slack, etc.
}

Programmatic usage:

use LaraGrep\LaraGrep;

$laraGrep = app(LaraGrep::class);

// First run
$answer = $laraGrep->answerQuestion('Weekly sales by region');
$recipe = $laraGrep->extractRecipe($answer, 'Weekly sales by region', 'default');

// Later — replay with fresh data
$freshAnswer = $laraGrep->replayRecipe($recipe);
$notification = $laraGrep->formatResult($freshAnswer, 'notification');

With monitor enabled? Inject LaraGrep\Monitor\MonitorRecorder instead of LaraGrep. It wraps the same methods (answerQuestion, replayRecipe, formatResult) and automatically records every execution in the dashboard. When the monitor is disabled, MonitorRecorder resolves to null — so use LaraGrep as the safe default.


Extending

Custom AI Client

Implement LaraGrep\Contracts\AiClientInterface and rebind in a service provider:

$this->app->singleton(AiClientInterface::class, fn () => new MyCustomClient());

Custom Metadata Loader

LaraGrep auto-detects MySQL/MariaDB and PostgreSQL. For other databases, implement LaraGrep\Contracts\MetadataLoaderInterface:

$this->app->singleton(MetadataLoaderInterface::class, fn ($app) => new MySqliteSchemaLoader($app['db']));

Custom Conversation Store

Implement LaraGrep\Contracts\ConversationStoreInterface for Redis, file-based storage, etc.:

$this->app->singleton(ConversationStoreInterface::class, fn () => new RedisConversationStore());

Environment Variables

Variable Default Description
LARAGREP_PROVIDER openai AI provider (openai, anthropic)
LARAGREP_API_KEY API key for the AI provider
LARAGREP_MODEL gpt-4o-mini Model identifier
LARAGREP_BASE_URL Override API endpoint URL
LARAGREP_MAX_TOKENS 1024 Max response tokens
LARAGREP_TIMEOUT 300 HTTP timeout in seconds
LARAGREP_FALLBACK_PROVIDER Fallback AI provider
LARAGREP_FALLBACK_API_KEY Fallback API key
LARAGREP_FALLBACK_MODEL Fallback model identifier
LARAGREP_FALLBACK_BASE_URL Fallback API endpoint URL
LARAGREP_MAX_ITERATIONS 10 Max query iterations per question
LARAGREP_MAX_ROWS 20 Max rows per query (auto LIMIT)
LARAGREP_MAX_QUERY_TIME 3 Max query execution time (seconds)
LARAGREP_SMART_SCHEMA Table count threshold for smart filtering
LARAGREP_SCHEMA_MODE manual Schema loading mode
LARAGREP_USER_LANGUAGE en AI response language
LARAGREP_CONNECTION Database connection name
LARAGREP_DATABASE_TYPE DB type hint for AI
LARAGREP_DATABASE_NAME DB_DATABASE DB name hint for AI
LARAGREP_EXCLUDE_TABLES Comma-separated tables to hide
LARAGREP_DEBUG false Enable debug mode
LARAGREP_ROUTE_PREFIX laragrep API route prefix
LARAGREP_CONVERSATION_ENABLED true Enable conversation persistence
LARAGREP_CONVERSATION_CONNECTION sqlite DB connection for conversations
LARAGREP_CONVERSATION_MAX_MESSAGES 10 Max messages per conversation
LARAGREP_CONVERSATION_RETENTION_DAYS 10 Auto-delete conversations after days
LARAGREP_MONITOR_ENABLED false Enable monitoring dashboard
LARAGREP_MONITOR_CONNECTION sqlite DB connection for monitor logs
LARAGREP_MONITOR_TABLE laragrep_logs Table name for monitor logs
LARAGREP_MONITOR_RETENTION_DAYS 30 Auto-delete logs after days
LARAGREP_RECIPES_ENABLED false Enable recipe auto-save
LARAGREP_RECIPES_CONNECTION sqlite DB connection for recipes
LARAGREP_RECIPES_TABLE laragrep_recipes Table name for recipes
LARAGREP_RECIPES_RETENTION_DAYS 30 Auto-delete recipes after days
LARAGREP_ASYNC_ENABLED false Enable async mode
LARAGREP_ASYNC_CONNECTION laragrep DB connection for async table
LARAGREP_ASYNC_TABLE laragrep_async Table name for async records
LARAGREP_ASYNC_RETENTION_HOURS 24 Auto-delete records after hours
LARAGREP_ASYNC_QUEUE default Queue name for async jobs
LARAGREP_ASYNC_QUEUE_CONNECTION Queue connection (falls back to default)
LARAGREP_ASYNC_CHANNEL_PREFIX laragrep Broadcasting channel prefix
LARAGREP_ASYNC_PRIVATE false Use private broadcasting channels

Security

  • Only SELECT queries are generated and executed — mutations are rejected.
  • All queries use parameterized bindings to prevent SQL injection.
  • Table references are validated against the known schema metadata.
  • The agent loop is capped at max_iterations to prevent runaway costs.
  • Protect the endpoint with middleware (e.g., auth:sanctum).

Testing

./vendor/bin/phpunit

License

MIT

About

used to get information from database using natural language

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published