Skip to content

YashRaj1506/DeepSer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepSer – Building My Own Perplexity Web Search

DeepSer is a weekend system-design project where I built a Perplexity-like web search engine from scratch.
Given a user query, the system generates search queries using an LLM, fetches relevant web pages, scrapes and summarizes them using browser automation, and finally produces a structured AI-generated report.

This project focuses on async systems, browser-based scraping, queues, and AI pipelines, and is built purely for learning and experimentation.

Detail Blog :

https://medium.com/@yashraj504300/building-my-own-perplexity-web-search-f6ce5cfa5d7c


Architecture Diagram

🔍 What This Project Does (Architecture in 4–5 Lines)

  1. User enters a query in the frontend.
  2. An LLM converts the query into multiple web search queries.
  3. Brave Search API fetches URLs → URLs are pushed to RabbitMQ.
  4. Async Playwright-based scraper consumes URLs, extracts content, and summarizes it using LLMs.
  5. Redis tracks progress, and the final report is served back to the frontend.

🧠 Why Playwright (Not Selenium)

  • Native async support (perfect for FastAPI + asyncio)
  • Faster startup and execution via DevTools Protocol
  • Official Docker images with browsers preinstalled
  • Lower memory usage when running parallel scraping tasks

The scraper container keeps the browser warm, reusing it across tasks for performance.
Concurrency is controlled using semaphores to limit the number of open tabs and avoid crashes in Docker.


🏗️ Tech Stack

  • Frontend: Next.js
  • Backend: FastAPI (API layer)
  • Scraping: Playwright (async, Chromium)
  • LLMs: Groq APIs
  • Search: Brave Search API
  • Queue: RabbitMQ
  • State Tracking: Redis
  • Database: PostgreSQL
  • Infra: Docker & Docker Compose

⚙️ Setup Instructions

1️⃣ Clone the Repository

git clone https://github.com/YashRaj1506/DeepSer.git
cd DeepSer

2️⃣ Create a .env File

Create a .env file in the root directory and paste the following:

BRAVE_API_KEY=
GROQ_API_KEY_1=
GROQ_API_KEY_2=
GROQ_API_KEY_3=
GROQ_API_KEY_4=
GROQ_API_KEY_5=

# Database
POSTGRES_HOST=postgres
POSTGRES_PORT=5432
POSTGRES_DB=marketresearch
POSTGRES_USER=postgres
POSTGRES_PASSWORD=password

# RabbitMQ
RABBITMQ_HOST=rabbitmq
RABBITMQ_USER=guest
RABBITMQ_PASS=guest

# Frontend / Backend
NEXT_PUBLIC_API_URL=http://localhost:8000
FRONTEND_URL=http://localhost:3000

# Django
DJANGO_SECRET_KEY=your-django-secret-key-here
DJANGO_DEBUG=True

# Google OAuth (optional)
GOOGLE_CLIENT_ID=
GOOGLE_CLIENT_SECRET=

⚠️ Make sure you add valid Brave Search and Groq API keys.

3️⃣ Build & Run with Docker

docker compose up --build

🚧 Improvements / TODO

Batch URLs in RabbitMQ and perform atomic bulk upserts to reduce DB costs.

Replace frontend polling with Server-Sent Events (SSE).

Reuse RabbitMQ connections instead of creating one per publish.

Improve retries, failure handling, and observability.

Smarter deduplication and ranking of sources.

About

Web search tool like Perplexity that scans the internet and generates analytical reports.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors