HAPLOEXPLORE PRESENTATION AND USE
SHORT DESCRIPTION.
Haplotype blocks (haploblocks) in the genome are informative of evolutionary processes and are instrumental in describing the genomic variability across human populations and susceptibility/resistance to diseases. Several software have been developed for haplotype blocks detection, but they do not distinguish between the impacts of major and minor SNP alleles. Here, we present HaploExplore, a haploblock detection software specifically designed to identify haploblocks associated with SNP minor alleles (MiA-haploblocks). MiA-haploblocks are particularly important as they can significantly influence phenotypic traits, offering a novel approach for studying genetic associations and complex traits.
HaploExplore operates on VCF files containing phased data, exhibiting rapid processing times and generating user-friendly outputs. Its results are convergent for populations starting from 100 individuals. A comparative analysis of HaploExplore against other haploblock detection software revealed its superiority in terms of either simplicity, or flexibility, or speed, with the unique capability to target minor alleles. These features make HaploExplore applicable to evolutionary genomics studies and to GWAS contexts where association effects may accumulate within specific haploblocks.
HaploExplore provides multiple modes and configurable parameters to define haploblocks within a genomic region (whole chromosome or a specified interval) for a given population. For example:
- Exhaustive mode: haploblocks are defined for each SNP present in the region.
- List mode: haploblocks are defined only for a user-provided list of SNPs.
Key parameters determining whether a tested SNP is included in a haploblock defined from a core SNP include:
- Minimal MAF: minimum minor allele frequency for SNPs considered in the region.
- D′ threshold: minimum D′ between the tested SNP and the core SNP required for inclusion.
- r2 threshold: minimum r2 between the tested SNP and the core SNP required for inclusion.
- CP (carrier percentage): the minimum percentage of individuals carrying the core SNP minor allele who must also carry the tested SNP minor allele for that SNP to be included in the haploblock.
Many more information, methodological details, usage examples, modes, parameters, and output descriptions are provided in Manetti et al. (NAR Genomics and Bioinformatics, 2025). We recommend to carefully read this publication and the supplementary material examples to optimize the use of the software.
INSTALLATION RECOMMENDATIONS.
A Streamlit-based web application has been developed for an interactive and user-friendly experience. Users can easily upload input files, adjust parameters, and visualize results without running command-line scripts. The app is accessible via Docker or can be launched locally (via App folder).
Running with Docker:
Set the Volumes path : - set the accessible Folders for your Docker in the file .env
How to build the docker image (app) ? $ docker compose -f 'docker-compose.yml' up -d --build or Run the script "docker-compose.yml"
How to run the docker image (app) ? $ docker run -p 8501:8501 haploexplore or Run the docker image via Docker desktop
To run easily the application we recommend building it on Visual studio and using Docker Desktop.
Running without Docker (requires bcftools):
To run HaploExplore only with Streamlit, ensure you have Python 3.10.12 installed along with bcftools, which is required for processing VCF files. To use the graphical interface of HaploExplore, install the required dependencies and launch the Streamlit app:
1. Install dependencies:
pip install -r requirements.txt
2. Run the application:
streamlit run app/application.py
3. Open the displayed localhost link in a web browser.
Warning : If the App can't access to the provided folders : Go to 1. DockerDesktop 2. Settings 3. File Sharing 4. Add the directory path
A version of HaploExplore exists without the application (program-only) and can be run directly from the terminal with the same functionalities as the application.
To run it, an "execution.py" file is provided to show how to use the different functions. Also to set the different parameters, a config.json file is provided.
This version requires Python 3.10.12 and bcftools, which is needed for processing VCF files.
Run the software without visual interface :
1. Download "Program" folder.
2. Run the "executable.py" file or use the functions in your own program like in the "executable.py" file.
AVAILABILITY
Both the app and software are available on Windows, MacOS and Linux.
CURRENT FUNCTIONALITIES
- Haploblock creation with LD (Find haploblocks in a region)
- Building modes :
- Standard mode : if a haploblock cannot be added to any haploblock then a new one is generated.
- Exhaustive mode : build a haploblock for each SNPs.
- ListSNP mode : build a haploblock for SNPs of the provided list only. (possibility to integrate the SNPs of the list in haploblocks or not (useful for the study of genes for example) and also to generate haploblocks for other SNPs that are not in the list. (in addition of the SNPs of the list) such as the standard mode)
- Removal of SNPs with a MAF < 0.01 (this parameter can by changed)
- Extract the Haploblocs where one (or more) SNP is contained from a result file. (Find Haploblocks for Given SNPs)
- Generate a table with the SNPs that are carried by the SNPs of the list. (List SNPs Carrying Given SNPs)
- Generate a graphic to visualize haploblocks on a chromosome. Each haploblock is represented as a rectangle, with its core SNP marked by a star. (Plot haploblocks)
- All parameters can be changed by the user.
INPUT FILES
- Find haploblocks in a region function :
- A VCF file containing the region to analyze (no limit of individuals).
- A list of SNPs ID (Chromosome:Position:Position:RefAllele:AltAllele format) in a text file - for ListSNP mode only.
- Find Haploblocks for Given SNPs function :
- A list of SNP IDs. (Chromosome:Position:Position:RefAllele:AltAllele) in a text file.
- A composition file generated by "Find haploblocks in a region" function.
- List SNPs Carrying Given SNPs function :
- VCF file. (phased & compressed/not compressed)
- List of SNP IDs. (Chromosome:Position:Position:RefAllele:AltAllele format)
- Plot Haploblocks function :
- A composition file generated by "Find haploblocks in a region" function.
- A SNP information file generated by "Find haploblocks in a region" function.
DEFAULT SETTINGS AND PARAMETERS
HaploExplore includes several default parameters that can be modified through the app:
. LD Thresholds: r²: 0.1 (default) D': 0.7 (default)
. Carrier Percentage (CP): 80% (default)
. MAF percentage cut: 0.8 (default)
. Maximum SNP Gap Within a Block: 200 SNPs (default)
. Maximum Haploblock Size: 5,000,000 base pairs (default)
. Region Size for Splitting Datasets: 10,000,000 base pairs (default)
. Minimum MAF Threshold to Include a SNP: 1%(default) - when loading the VCF file
. Extend mode : 0.9 (default) - extend haploblocks (without size limits) when its size reach x% (per default 90%) of Maximum Haploblock Size
. Pruning mode : 0.95 (default) - delete haploblocks that have at least x% (per default 95%) of its SNPs that are contained in another haploblocks (of the same size or bigger)
Note: The Carrier Percentage (CP) is the proportion of individuals who carry the minor allele of a coreSNP and also carry the minor alleles of other SNPs within the same haploblock. This ensures that only SNPs with strong biological relevance and statistical correlation are grouped together.
OUTPUT FILES
- Find haploblocks in a region function :
- Tables (.txt):
- Basics haploblocks informations
- Haploblocks SNP composition
- Haploblocks sequence
- SNPs informations (MAF & BP)
- Histograms (.pdf):
- Haploblocks distribution based on the size in bp
- Haploblocks distribution based on the size in SNPs
- Find Haploblocks for Given SNPs function :
- Text file containing for each SNP the different haploblocks it can be found in.
- List SNPs Carrying Given SNPs function :
- Tables containing SNPs and their Carrier Percentage. (.txt)
- Plot Haploblocks function :
- Graphics (.png)
The software is built with Python 3.10.12 and utilizes Streamlit for an interactive graphical interface.
CITATION
If you use HaploExplore in your research, please cite the following publication :
Manetti M, Hiet S, et al. HaploExplore: a software specifically designed for the detection of minor allele (MiA-) Haploblocks*, NAR Genomics and Bioinformatics, 2025. DOI:
LICENSE
HaploExplore is distributed under the MIT License.