Skip to content

nklc/dbx-py-env-sync

Repository files navigation

Databricks Python Environment Setup

Test Makefile Validate Pull Request

A robust, automated makefile-based tool for creating and managing local Python environments that match specific Databricks Serverless Environment Versions.

🚀 Quick Start

# Install prerequisites
# Install uv: https://github.com/astral-sh/uv

# Create environment for Databricks version 4
make env ENV_VER=4

# Activate the environment
source .venv-db4/bin/activate

That's it! You now have a local Python environment matching Databricks Environment Version 4.

✨ Features

  • Automatic Python Version Detection: Dynamically fetches the correct Python version from Databricks documentation
  • Smart Package Management:
    • Removes Ubuntu-specific/system packages that won't work on macOS
    • Handles binary-only packages gracefully (installs if available, skips if not)
    • Cleans Databricks-specific version suffixes from packages
  • Version Validation: Only allows creation of environments for valid Databricks versions
  • Modular Pipeline: Separate targets for each step (requirements, Python install, venv setup, dependencies)
  • Lock File Generation: Creates requirements-env-X.lock for reproducible environments
  • Clean Management: Easy cleanup for specific versions or all environments
  • Comprehensive Testing: Built-in test suite to validate functionality

📋 Prerequisites

  • uv - Fast Python package installer and environment manager
    # Install uv (macOS/Linux)
    curl -LsSf https://astral.sh/uv/install.sh | sh
    
    # Add to PATH
    export PATH="$HOME/.local/bin:$PATH"
  • make - Usually pre-installed on macOS/Linux
  • curl - For downloading requirements files
  • Internet connection - For fetching Databricks documentation and packages

📖 Usage

Main Commands

# Show all available commands
make help

# List available Databricks environment versions
make list-versions

# Create complete environment (default: version 4)
make env ENV_VER=4

# Clean up specific version
make clean ENV_VER=4

# Clean up all environments
make clean-all

Incremental Commands

For more control over the process:

# 1. Download and process requirements
make requirements ENV_VER=4

# 2. Detect and install Python version
make python-version ENV_VER=4
make install-python ENV_VER=4

# 3. Create virtual environment
make setup-venv ENV_VER=4

# 4. Install dependencies
make install-deps ENV_VER=4

# 5. Generate lock file
make create-lockfile ENV_VER=4

Using the Environment

Once created, activate and use your environment:

# Activate
source .venv-db4/bin/activate

# Verify Python version
python --version

# Test imports
python -c "import pandas, numpy, pyspark; print('All imports successful!')"

# Deactivate
deactivate

🗂️ Generated Files

When you run make env ENV_VER=4, the following files are created:

.venv-db4/                      # Virtual environment directory
requirements-env-4.txt          # Processed requirements file
requirements-env-4.txt.binary   # Binary-only packages (internal use)
requirements-env-4.lock         # Lock file with installed packages

🔧 Configuration

Default Environment Version

Change the default version by modifying ENV_VER in the makefile:

ENV_VER ?= 4  # Change to your preferred default

Excluded Packages

The makefile automatically excludes packages that won't work on macOS. To modify the list, edit:

EXCLUDED_PACKAGES = unattended-upgrades|ssh-import-id|...

Binary-Only Packages

Packages that require binary wheels (will be skipped if unavailable):

BINARY_ONLY_PACKAGES = pyodbc

🎯 Available Databricks Versions

Currently supported versions (automatically fetched from Microsoft documentation):

  • Version 1
  • Version 2
  • Version 3
  • Version 4

Run make list-versions to see the latest available versions.

📦 Package Handling

Automatically Excluded Packages

The following packages are removed because they're Ubuntu/system-specific or lack ARM64 macOS wheels:

  • unattended-upgrades - Ubuntu system package
  • ssh-import-id - Ubuntu utility
  • dbus-python - Requires D-Bus system library
  • psycopg2 - PostgreSQL library requiring system dependencies
  • psutil - No compatible wheels for some versions
  • PyGObject, pycairo - GTK bindings
  • wadllib, lazr.uri, lazr.restfulclient - Launchpad utilities
  • google-api-core - Compatibility issues

Binary-Only Packages

Packages attempted with --only-binary (skipped if no wheel available):

  • pyodbc - ODBC database connector

Version Cleaning

Databricks-specific version suffixes are automatically removed:

pyspark==4.0.0+databricks.connect.17.0.1  →  pyspark==4.0.0

🧪 Testing

Quick Start

Run the full test suite:

make test

Or run the test script directly:

./test_makefile.sh

For quick validation (30 seconds):

./quick_test.sh

What Gets Tested

1. Prerequisites Checks

  • uv is installed and accessible
  • make is installed
  • makefile exists

2. Information Targets

  • make help displays correctly
  • make list-versions returns available versions

3. Validation

  • make validate-version accepts valid versions (e.g., 4)
  • make validate-version rejects invalid versions (e.g., 999)
  • make check-uv verifies uv installation

4. Requirements Processing

  • make requirements downloads requirements file
  • ✓ Requirements file is created and not empty
  • ✓ Excluded packages are removed (psycopg2, psutil, dbus-python, etc.)
  • ✓ Databricks version suffixes are cleaned from pyspark
  • ✓ Binary-only packages are separated

5. Python Version Detection

  • make python-version extracts correct Python version from docs

6. Full Environment Creation

  • make env creates complete environment
  • ✓ Virtual environment directory is created
  • ✓ Python executable exists and is functional
  • ✓ Lock file is created and contains packages
  • ✓ Key packages are installed (pandas, numpy, pyspark)

7. Cleanup

  • make clean removes files for specific version
  • make clean-all removes all generated files

Test Output

The test suite provides colored output:

  • 🟡 YELLOW: Test being run
  • 🟢 GREEN: Test passed
  • 🔴 RED: Test failed

Example output:

TEST: Testing 'make requirements' target (ENV_VER=4)
✓ PASS: requirements target executes successfully
✓ PASS: requirements-env-4.txt file created
✓ PASS: requirements-env-4.txt is not empty

========================================================================
TEST RESULTS SUMMARY
========================================================================
Total tests run:    25
Tests passed:       25
Tests failed:       0
========================================================================
All tests passed! ✓

Test Options

Quick Tests (30 seconds) ⚡

./quick_test.sh

Validates basic functionality without creating full environment.

Full Test Suite (5-10 minutes) 🔍

make test

Comprehensive tests including full environment creation and validation.

Manual Testing

Test individual targets:

make help
make list-versions
make validate-version ENV_VER=4
make requirements ENV_VER=4

Running Individual Tests

You can modify test_makefile.sh to run specific tests by commenting out sections you don't want to run.

Continuous Integration

GitHub Actions Workflows

This repository includes two automated workflows:

1. Test Makefile (.github/workflows/test.yml)

Runs on:

  • Push to main branch → Quick tests only (~30 sec)
  • Pull requests to main → Full test suite (~5-10 min)

What it does:

  • ✅ Installs uv
  • ✅ Runs quick tests (always)
  • ✅ Runs full test suite (PRs only)
  • ✅ Uploads logs on failure
  • ✅ Reports results in GitHub summary

2. Validate Pull Request (.github/workflows/validate-pr.yml)

Runs on:

  • Pull request opened/updated

What it does:

  • ✅ Quick validation (syntax, version detection, requirements)
  • ✅ Posts comment on PR with results
  • ✅ Fast feedback (~1 minute)

Manual CI/CD Integration

For other CI/CD systems:

# Example GitHub Actions
- name: Test Makefile
  run: |
    curl -LsSf https://astral.sh/uv/install.sh | sh
    export PATH="$HOME/.local/bin:$PATH"
    make test

# Example GitLab CI
test:
  script:
    - curl -LsSf https://astral.sh/uv/install.sh | sh
    - export PATH="$HOME/.local/bin:$PATH"
    - make test

Test Troubleshooting

Test Hangs

If the full environment creation test hangs, it will timeout after 10 minutes. Check /tmp/make_env_output.log for details.

PATH Issues

If tests fail with "uv is not installed", ensure uv is in your PATH:

export PATH="$HOME/.local/bin:$PATH"
make test

Cleanup Between Tests

The test script automatically runs make clean-all between major tests to ensure a clean state.

Adding New Tests

To add new tests to test_makefile.sh:

  1. Use print_test "Test description" to start a new test
  2. Run your test command
  3. Use pass "message" for successful assertions
  4. Use fail "message" for failed assertions

Example:

print_test "Testing custom target"
if make custom-target >/dev/null 2>&1; then
    pass "custom-target executed successfully"
else
    fail "custom-target failed"
fi

Test Performance

Full test suite typically takes:

  • Fast tests (validation, help, etc.): ~10 seconds
  • Full environment creation: 3-5 minutes
  • Total runtime: ~5-10 minutes

To skip the slow full environment test, comment out that section in test_makefile.sh.


🐛 Troubleshooting

"uv is not installed"

Install uv and add it to your PATH:

curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"

"Could not extract Python version from documentation"

This usually means:

  1. Network connectivity issues
  2. Databricks documentation format changed
  3. Invalid environment version number

Try running make list-versions to see available versions.

"No such file or directory: .venv-dbX/bin/uv"

This error shouldn't occur with the current version. If you see it, ensure you're using the latest makefile.

Package Installation Failures

If specific packages fail to install:

  1. Check if it's a binary-only package that lacks ARM64 wheels
  2. Add it to EXCLUDED_PACKAGES or BINARY_ONLY_PACKAGES in the makefile
  3. The environment will still be created with other packages

Virtual Environment Activation Issues

Make sure you're using the correct command for your shell:

# bash/zsh
source .venv-db4/bin/activate

# fish
source .venv-db4/bin/activate.fish

# csh/tcsh
source .venv-db4/bin/activate.csh

🏗️ Project Structure

.
├── makefile                 # Main automation script
├── README.md               # This file
├── test_makefile.sh        # Full test suite
├── quick_test.sh           # Quick validation tests
├── .venv-db{X}/            # Virtual environments (generated)
├── requirements-env-{X}.txt        # Processed requirements (generated)
├── requirements-env-{X}.txt.binary # Binary-only packages (generated)
└── requirements-env-{X}.lock       # Lock files (generated)

🔄 Workflow

┌─────────────────────────────────────────────────────────┐
│                     make env ENV_VER=4                  │
└────────────────────────┬────────────────────────────────┘
                         │
                         ▼
            ┌────────────────────────┐
            │   Check Prerequisites  │
            │   - Validate uv        │
            │   - Validate version   │
            └───────────┬────────────┘
                        │
                        ▼
            ┌────────────────────────┐
            │  Download Requirements │
            │  - Fetch from MS docs  │
            │  - Process packages    │
            │  - Exclude system pkgs │
            └───────────┬────────────┘
                        │
                        ▼
            ┌────────────────────────┐
            │   Detect Python Ver    │
            │  - Parse from docs     │
            │  - Install with uv     │
            └───────────┬────────────┘
                        │
                        ▼
            ┌────────────────────────┐
            │   Create Virtual Env   │
            │  - uv venv with ver    │
            └───────────┬────────────┘
                        │
                        ▼
            ┌────────────────────────┐
            │  Install Dependencies  │
            │  - Main packages       │
            │  - Binary-only (skip)  │
            └───────────┬────────────┘
                        │
                        ▼
            ┌────────────────────────┐
            │   Generate Lock File   │
            │  - uv pip freeze       │
            └───────────┬────────────┘
                        │
                        ▼
                   ✅ Done!

💡 Tips & Best Practices

  1. Always specify the version: make env ENV_VER=4 is clearer than relying on defaults
  2. Check available versions first: Run make list-versions before creating an environment
  3. Test incrementally: Use individual targets (make requirements, make setup-venv) when debugging
  4. Keep environments separate: Use different ENV_VER values for different projects
  5. Use lock files: Commit requirements-env-X.lock to ensure reproducible environments
  6. Clean regularly: Run make clean-all to remove old environments and free up disk space

🤝 Contributing

To contribute improvements:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/my-improvement
  3. Make your changes to the makefile
  4. Run the test suite: make test (locally)
  5. Ensure all tests pass
  6. Commit your changes: git commit -m "Add my improvement"
  7. Push to your fork: git push origin feature/my-improvement
  8. Create a Pull Request
    • The Validate PR workflow will run automatically
    • Review the validation results posted as a comment
  9. Once approved and merged, the Test Makefile workflow runs on main

Setting Up Your Repository

After cloning/creating this repository:

  1. Update badge URLs in README.md:

    Replace YOUR_USERNAME/YOUR_REPO with your actual GitHub username and repository name
  2. Enable GitHub Actions:

    • Go to repository Settings → Actions → General
    • Ensure "Allow all actions and reusable workflows" is selected
  3. First Push:

    git add .
    git commit -m "Initial commit"
    git push origin main

    The workflows will run automatically!

📝 License

This tool is provided as-is for use with Databricks environments.

🔗 Related Links

❓ FAQ

Q: Why use this instead of pip/conda?
A: This tool automatically matches Databricks environments exactly, handling version-specific quirks and platform differences.

Q: Can I use this on Linux/Windows?
A: It's designed for macOS but should work on Linux. Windows support via WSL2 is untested.

Q: What if a package I need was excluded?
A: You can manually install it in the venv, or modify EXCLUDED_PACKAGES in the makefile if you know how to handle dependencies.

Q: How do I update an existing environment?
A: Run make clean ENV_VER=X followed by make env ENV_VER=X to rebuild from scratch.

Q: Can I use this for multiple Databricks workspace versions?
A: Yes! Create separate environments: make env ENV_VER=1, make env ENV_VER=4, etc.


Made with ❤️ for Databricks developers

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors