DiffNovo

A Transformer-Diffusion Model for De Novo Peptide Sequencing.

DiffNovo is an innovative tool for de novo peptide sequencing using advanced machine learning techniques. This guide will help you get started with installation, dataset preparation, and running key functionalities like model training, evaluation, and prediction.

Ebrahimi, Shiva, et al.'DiffNovo: A Transformer-Diffusion Model for De Novo Peptide Sequencing'

Installation

To manage dependencies efficiently, we recommend using conda. Start by creating a dedicated conda environment:

conda create --name diffnovo_env python=3.10

Activate the environment:

conda activate diffnovo_env

Install DiffNovo and its dependencies via pip:

pip install diffnovo

To verify a successful installation, check the command-line interface:

diffnovo --help

Dataset Preparation

Download DIA Datasets

Annotated DIA datasets can be downloaded from the datasets page.

Download Pretrained Model Weights

DiffNovo requires pretrained model weights for predictions in denovo or eval modes. Compatible weights (in .ckpt format) can be found on the pretrained models page.

Specify the model file during execution using the --model parameter.

Usage

Predict Peptide Sequences

DiffNovo predicts peptide sequences from MS/MS spectra stored in MGF files. Predictions are saved as a CSV file:

diffnovo --mode=denovo --model=pretrained_checkpoint.ckpt --peak_path=path/to/spectra.mgf

Evaluate de novo Sequencing Performance

To assess the performance of de novo sequencing against known annotations:

diffnovo --mode=eval --model=pretrained_checkpoint.ckpt --peak_path=path/to/test/annotated_spectra.mgf

Annotations in the MGF file must include peptide sequences in the SEQ field.

Train a New Model

To train a new DiffNovo model from scratch, provide labeled training and validation datasets in MGF format:

diffnovo --mode=train --peak_path=path/to/train/annotated_spectra.mgf --peak_path_val=path/to/validation/annotated_spectra.mgf

MGF files must include peptide sequences in the SEQ field.

Fine-Tune an Existing Model

To fine-tune a pretrained DiffNovo model, set the --train_from_scratch parameter to false:

diffnovo --mode=train --model=pretrained_checkpoint.ckpt \
 --peak_path=path/to/train/annotated_spectra.mgf \
 --peak_path_val=path/to/validation/annotated_spectra.mgf

For further details, refer to our documentation or raise an issue on our GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data_utils		data_utils
depthcharge		depthcharge
diffnovo.egg-info		diffnovo.egg-info
diffnovo		diffnovo
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DiffNovo

A Transformer-Diffusion Model for De Novo Peptide Sequencing.

Installation

Dataset Preparation

Download DIA Datasets

Download Pretrained Model Weights

Usage

Predict Peptide Sequences

Evaluate de novo Sequencing Performance

Train a New Model

Fine-Tune an Existing Model

About

Uh oh!

Releases

Packages

Languages

License

Biocomputing-Research-Group/DiffNovo

Folders and files

Latest commit

History

Repository files navigation

DiffNovo

A Transformer-Diffusion Model for De Novo Peptide Sequencing.

Installation

Dataset Preparation

Download DIA Datasets

Download Pretrained Model Weights

Usage

Predict Peptide Sequences

Evaluate de novo Sequencing Performance

Train a New Model

Fine-Tune an Existing Model

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages