Skip to content

NoahHellen/ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Financial Information Extraction from Text

Project Goal

The core of the project is creating a Named Entity Recognition (NER) model built using a Bi-directional Long Short-Term Memory (BLSTM) network, a Convolutional Neural Network (CNN) for character-level embeddings, and a Conditional Random Field (CRF) layer.

How to Run

1. Environment Setup

This project uses uv for environment management. To set up the required Python environment, run the following command in the root directory:

uv sync

2. Running the Code

The main logic for data processing, model training, and evaluation is contained in the solution.ipynb notebook. To run the project, open this notebook in a Jupyter environment and execute the cells sequentially.

Hardware and Software

Software

  • Python 3.11
  • uv for environment management
  • The Python dependencies are listed in the pyproject.toml file.

Hardware

  • CPU/GPU: Apple M4 Air
  • Memory: 16GB
  • Parallelization: No parallelization used
  • Runtime: The training process takes approximately 35 minutes

About

Extract time series data from images.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published