Automated enzyme-constrained genome-scale model reconstruction with kinetic parameter prediction.
kinGEMs combines CPI-Pred kinetic parameter prediction with genome-scale metabolic models to create enzyme-constrained models with optimized kcat values through simulated annealing.
# Run full pipeline (data processing → optimization → tuning)
python scripts/run_pipeline.py configs/iML1515_GEM.json
# Force regenerate all intermediate files
python scripts/run_pipeline.py configs/iML1515_GEM.json --force
# Include flux variability analysis
python scripts/run_pipeline.py configs/iML1515_GEM_parallel_fva.jsonFVA Ablation Study - Compare constraint levels systematically:
# With biomass constraint (90-100% optimal)
python scripts/run_fva_ablation.py configs/iML1515_GEM_fva_ablation.json --parallel
# Without biomass constraint (explore full flux space)
python scripts/run_fva_ablation.py configs/iML1515_GEM_fva_ablation_no_biomass_constraint.json --parallel --no-biomass-constraintSLURM Users:
sbatch slurm_jobs/run_pipeline.sh configs/iML1515_GEM.json
sbatch slurm_jobs/run_fva_ablation.sh
sbatch slurm_jobs/run_fva_ablation_no_biomass_constraint.shTest a new model:
# 1. Place your model.xml in data/raw/
# 2. Create config file (copy from configs/iML1515_GEM.json)
# 3. Run pipeline
python scripts/run_pipeline.py configs/my_model.jsonTune existing model:
# Enable simulated annealing in config, then:
python scripts/run_pipeline.py configs/my_model.json
# Results in: results/tuning_results/{model}_{date}_{id}/Run maintenance parameter sweep:
# Set "enable_maintenance_sweep": true in config, then:
python scripts/run_pipeline.py configs/iML1515_GEM.json
# Finds optimal NGAM/GAM values automaticallyAll configuration is done through JSON files in configs/. Key parameters:
{
"model_name": "iML1515_GEM",
"organism": "E coli",
"enzyme_upper_bound": 0.25, // Enzyme pool constraint
"enable_fva": true, // Run flux variability analysis
"enable_maintenance_sweep": true, // Optimize NGAM/GAM parameters
"simulated_annealing": {
"biomass_goal": 0.87, // Target biomass
"n_top_enzymes": 500, // Number of enzymes to tune
"max_iterations": 1000 // Optimization iterations
},
"fva": {
"parallel": true, // Use parallel FVA
"workers": 8, // Number of CPU cores
"chunk_size": 50 // Reactions per task
}
}Example configs:
iML1515_GEM.json- Basic E. coli pipelineiML1515_GEM_parallel_fva.json- With parallel FVAiML1515_GEM_fva_ablation.json- FVA ablation studyiML1515_GEM_fva_ablation_no_biomass_constraint.json- Unconstrained FVA
See docs/CONFIG_GUIDE.md for full details.
After running the pipeline, find results in results/tuning_results/{model}_{date}_{id}/:
| File | Description |
|---|---|
final_model_info.csv |
Complete enzyme data with tuned kcat values |
kcat_dict.csv |
Mapping of reaction-gene pairs to tuned kcats |
annealing_progress.png |
Biomass optimization over iterations |
model_config_summary.json |
Run configuration and optimal parameters |
*_fva_results.csv |
Flux variability analysis (if enabled) |
maintenance_sweep_results.csv |
NGAM/GAM sweep (if enabled) |
Final enzyme-constrained model saved to: models/{model}_{date}_{id}.xml
- One-Command Pipeline:
run_pipeline.pyhandles everything from data processing to optimization - Smart Caching: Reuses intermediate files (use
--forceto regenerate) - Automated Tuning: Simulated annealing optimizes kcat values for target biomass
- Flexible Constraints: Handles enzyme complexes (AND), isoenzymes (OR), and promiscuous reactions
- Parallel FVA: Dask-based parallelization with chunking for large models
- Maintenance Optimization: Automatic NGAM/GAM parameter sweep
- Model Agnostic: Works with standard models and ModelSEED (auto-detected)
Essential Guides:
- Configuration Guide - Complete config reference
- Pipeline Summary - Step-by-step workflow
- Constraint Types - Understanding enzyme constraints
Troubleshooting:
- CPLEX Setup - Commercial solver configuration
- Parallel FVA - Performance tuning
- Solver Guide - GLPK vs CPLEX comparison
# Clone repository
git clone https://github.com/your-username/kinGEMs.git
cd kinGEMs
# Create environment with mamba (faster) or conda
mamba env create -f environment.yml
# OR: conda env create -f environment.yml
# Activate environment
conda activate kingems
# Verify installation
python scripts/run_pipeline.py --help# Create environment
conda create -n kingems python=3.11 -y
conda activate kingems
# Install dependencies
pip install -r requirements.txt
# Install GLPK solver
conda install -c conda-forge glpk
# Verify
glpsol --version
python -c "import cobra; print(cobra.__version__)"Solver Options:
- GLPK (free, default): Good for most models
- CPLEX (commercial): Faster for large models, parallel FVA
- See docs/CPLEX_SETUP.md for installation
# 1. Place your model in data/raw/
cp my_model.xml data/raw/
# 2. Create config (copy and edit existing one)
cp configs/iML1515_GEM.json configs/my_model.json
# Edit: model_name, organism, enzyme_upper_bound, biomass_goal
# 3. Run pipeline
python scripts/run_pipeline.py configs/my_model.json
# 4. Check results
ls results/tuning_results/my_model_*/
# View: final_model_info.csv, annealing_progress.png
# 5. Load optimized model
# Your enzyme-constrained model is in: models/my_model_*.xmlkinGEMs/
├── configs/ # JSON configuration files
├── data/
│ ├── raw/ # Input models (.xml)
│ ├── interim/ # CPI-Pred predictions, substrates, sequences
│ └── processed/ # Processed enzyme data
├── scripts/
│ ├── run_pipeline.py # 🚀 Main pipeline (START HERE)
│ └── run_fva_ablation.py # FVA ablation study
├── slurm_jobs/ # SLURM batch scripts
├── results/
│ ├── tuning_results/ # Optimization outputs
│ └── fva_ablation/ # FVA analysis results
├── models/ # Final enzyme-constrained GEMs
└── kinGEMs/ # Source code
└── modeling/ # Optimization, FVA, tuning modules
Contributions welcome! Please see CONTRIBUTING.md for guidelines.
This project is licensed under the MIT License - see LICENSE for details.
For questions or issues, please open an issue on GitHub.