Skip to content

Research - OCR & RAG for Indic languages #607

@PritamSGB

Description

@PritamSGB

Is your feature request related to a problem? Please describe.
As part of CBC project, we have use case from various indic languages e.g Telugu, Gujarathi, Assamese, Odiya. Some of these are low resource language. Also the knowledge base documents are also poorly scanned and mix of english+local languages. Hence we need to evaluate more models/providers to get the best accuracy, latency, cost etc.

Describe the solution you'd like

  • Check out different OCR models or combination of them.
  • Check out OpenAI, Gemini for improving the accuracy of RAG

Describe alternatives you've considered

  • Marker, Google vision, Xerox, Tessaract for OCR
  • OpenAI for retrieval

Additional context
Add any other context or screenshots about the feature request here.

Sub-issues

Metadata

Metadata

Labels

parentTop level issue for Roadmap viewresearch

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions