Skip to content

πŸ“ˆ A machine learning pipeline for SaaS revenue forecasting using time-series regression, Optuna hyperparameter tuning, and SHAP-based model explainability.

Notifications You must be signed in to change notification settings

M-R-Saad/SaasRevCast

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SaaSRevCast: SaaS Revenue Forecasting Pipeline

A comprehensive machine learning project for forecasting SaaS company revenues using advanced regression techniques, hyperparameter optimization, and model explainability.

πŸ“‹ Project Overview

SaaSRevCast is a predictive analytics pipeline designed to forecast revenue for SaaS (Software as a Service) companies using historical financial and market data. The project implements multiple regression models, performs rigorous hyperparameter tuning with Optuna, and provides interpretable insights through SHAP (SHapley Additive exPlanations) values.

🎯 Key Features

  • Time-Series Revenue Forecasting: Predicts SaaS company revenues using temporal data (2020-2024)
  • Multiple ML Models: Implements and compares Linear Regression, Random Forest, XGBoost, and Support Vector Regression
  • Automated Hyperparameter Tuning: Utilizes Optuna for optimization with 30+ trials
  • Model Explainability: SHAP analysis for feature importance and impact visualization
  • Feature Engineering: Advanced lag features, growth rates, profit margins, and customer metrics
  • Comprehensive Evaluation: RMSE, MAE, and MAPE metrics for model comparison

πŸ“ Project Structure

SaasRevCast/
β”‚
β”œβ”€β”€ data/                                    # Dataset and results
β”‚   β”œβ”€β”€ saas_financial_market_dataset.csv   # Main dataset (2,500+ records)
β”‚   β”œβ”€β”€ results_saas_revcast.csv            # Model predictions and results
β”‚   β”œβ”€β”€ model_comparison.csv                # Performance metrics comparison
β”‚   └── feature_importance.csv              # SHAP feature importance values
β”‚
β”œβ”€β”€ notebooks/                               # Jupyter notebooks
β”‚   └── MLPipeline.ipynb                    # Complete ML pipeline implementation
β”‚
β”œβ”€β”€ paper/                                   # Research documentation
β”‚   └── Final Term Paper ML.pdf             # Complete research paper
β”‚
└── README.md                                # Project documentation (this file)

πŸ“Š Dataset

The dataset contains financial and operational metrics for multiple SaaS companies across different industries and regions:

  • Size: 2,500+ records
  • Time Period: 2020-2024
  • Companies: Multiple SaaS companies across various industries
  • Features:
    • Revenue (USD)
    • Expenses (USD)
    • Profit (USD)
    • Customer Count
    • Churn Rate
    • ARPU (Average Revenue Per User)
    • Market Share (%)
    • Industry, Region, Founded Year

πŸ”§ Engineered Features

  1. Lag Features: revenue_lag_1, market_share_lag_1, customer_count_lag_1, churn_rate_lag_1
  2. Growth Metrics: revenue_growth (percentage change)
  3. Financial Ratios: profit_margin, expenses_per_customer
  4. Log Transformation: log_revenue (target variable) to reduce skewness

πŸ€– Models Implemented

  1. Linear Regression (Baseline)
  2. Random Forest Regressor (with Optuna tuning)
  3. XGBoost Regressor
  4. Support Vector Regression (SVR)

πŸ“ˆ Model Performance

Models are evaluated using:

  • RMSE (Root Mean Squared Error)
  • MAE (Mean Absolute Error)
  • MAPE (Mean Absolute Percentage Error)

Results are saved in data/model_comparison.csv for detailed comparison.

πŸ” Explainability

The project uses SHAP (SHapley Additive exPlanations) to provide:

  • Feature importance rankings
  • Feature impact visualization
  • Individual prediction explanations

πŸš€ Getting Started

Prerequisites

pip install pandas numpy matplotlib seaborn scikit-learn xgboost optuna shap

Running the Pipeline

  1. Navigate to the notebooks/ directory
  2. Open MLPipeline.ipynb in Jupyter Notebook or VS Code
  3. Run all cells sequentially to:
    • Load and preprocess data
    • Engineer features
    • Train models
    • Evaluate performance
    • Generate SHAP visualizations

πŸ“‘ Research Paper

A complete research paper documenting the methodology, experiments, results, and insights is available in the paper/ folder:

πŸ“„ Final Term Paper ML.pdf

The paper includes:

  • Literature review
  • Detailed methodology
  • Experimental setup and results
  • Model comparison and analysis
  • Conclusions and future work

πŸ“Š Results & Outputs

All results are automatically saved to the data/ folder:

  • Model predictions
  • Performance metrics
  • Feature importance scores
  • Comparison tables

πŸ› οΈ Technical Stack

  • Python 3.x
  • Data Processing: Pandas, NumPy
  • Visualization: Matplotlib, Seaborn
  • Machine Learning: Scikit-learn, XGBoost
  • Optimization: Optuna
  • Explainability: SHAP
  • Environment: Jupyter Notebook

πŸ“ Data Splits

  • Training Set: 2020-2022
  • Validation Set: 2023 (for hyperparameter tuning)
  • Test Set: 2024 (for final evaluation)

πŸŽ“ Use Cases

This project is suitable for:

  • Revenue forecasting for SaaS businesses
  • Financial planning and budgeting
  • Investor analysis and due diligence
  • Academic research in time-series forecasting
  • Learning ML pipelines and model explainability

πŸ“§ Contact & Contributions

This project was developed as a machine learning research project. For questions or contributions, please refer to the research paper in the paper/ folder for detailed methodology and references.

πŸ“„ License

This project is for educational and research purposes.


Note: The complete technical details, mathematical formulations, and experimental results are documented in the research paper located in the paper/ folder.

About

πŸ“ˆ A machine learning pipeline for SaaS revenue forecasting using time-series regression, Optuna hyperparameter tuning, and SHAP-based model explainability.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published