This documents describes the general structure of the package and provides helpful references to code and files for contributors. Preferably read the full document.
What is this package good for?
-
The Spectra package (and the
Spectraclass) provides a powerful infrastructure for mass spectrometry (MS) data in R (possibly see the SpectraTutorials for more information, in particular the Spectra-backends vignette for a description of the data structure). -
Powerful MS data algorithms algorithms are also available in Python, e.g. provided by the matchms and spectrum_utils libraries.
-
Why re-implement what's already available?
-
This package translates an R
Spectraobject into Python MS data structures and allows you to call similarity scoring and processing/filtering functions of the matchms package and translate the results back into R data objects.
Where to find what?
-
The R folder contains all R source files.
-
R/conversion.R contains functions to convert between R and Python data structures (e.g. between
Spectra::Spectraandmatchms.Spectrum). The conversion of the Python result into an R data type is handled by R's reticulate package, which can convert all basic data types between R and Python. -
R/compareSpectriPy.R contains the mass spectral similarity calculation functions. The core function is the internal
.compare_spectra_python()function that manages the Anaconda environment, translates the data to Python data structures and calls the Python command usingpy_run_string(). The Python command itself is generated by thepython_command()(e.g. this) command called on the parameter objectCosineGreedyParam. To use a new similarity calculation function or a new Python functionality/algorithm, ideally a new param object is implemented with thepython_command()method, which returns the python command that is specific to the new algorithm/Python functionality to run in Python.
-
-
The tests folder contains all unit tests. A general testthat.R file that configures and sets up the tests and a unit test file for each R source file (named test_.R) within the testthat folder.
-
The vignettes folder contains an quarto documents that explains the use of the SpectriPy package using examples. This is a good starting point to explore the package and its functionality.
Where are python libraries defined?
-
SpectriPy uses the R reticulate package for conversion between (basic) R and Python data types.
-
The reticulate
r_to_py()andpy_to_r()functions are used for conversion of basic data types between R and Python and vice versa. To use these functions, an Python environment with the matchms library must be used.
What data could be used in tests?
-
The package contains two test data files. The "test" and "spectra2" example data were created manually by defining m/z and intensity values of MS peaks. Data files can be added (e.g. in MGF format) if needed and put into a inst/extdata folder.
-
Alternatively, example files in mzML format would be available in Bioconductor's msdata package.
-
To test the package and newly created functionality: add the respective unit tests to the tests/testthat folder and evaluate them e.g. by running
rcmdcheck::rcmdcheck(args = "--no-manual")in an R session started within the package folder.
What could be implemented?
See the open issues, here are some major topics.
-
Integrate other Python libraries? More a discussion - see issue #24.
-
Integrate functionality for spectra processing, downstream analysis (e.g. cleaning), ... See also issue #20.
-
Ability to translate additional data structures. See also issue #18.
-
Define a use case analysis (or ideally several): show how data can be analyzed with the SpectriPy package using a "quarto" document directly combining the R and Python code: See also issue #21.
How to contribute?
-
Ideally fork the github repository, implement extensions and make a pull request to the main branch.
-
Follow the coding style guidelines and adhere to the code of conduct.