Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
131 commits
Select commit Hold shift + click to select a range
ee3eb60
Updated `phab_cm`
TimD1 Mar 22, 2024
3dc8fda
Updated FNR/FDR plot for `small_sv_all`
TimD1 Mar 22, 2024
329f542
Shifted `vs_prior_work` to FDR, updated plot visuals
TimD1 Mar 22, 2024
ec324c6
Ignoring JSON.
TimD1 Mar 22, 2024
67cbbfd
Removed obsolete `reverse_ptrs_strs()`
TimD1 Mar 22, 2024
9e95ff5
Removed edit distance from vcfdist entirely.
TimD1 Mar 22, 2024
a7a111d
Started shifting to graph alignment
TimD1 Mar 25, 2024
fd54fe3
Minor plot updates.
TimD1 Mar 25, 2024
84233a8
Removed edit.*
TimD1 Mar 25, 2024
69be7c2
Removed edit.* from Makefile.
TimD1 Mar 25, 2024
b9e95af
Variant information is merged across haplotypes
TimD1 Apr 5, 2024
02a5f57
Started making graph data structure.
TimD1 May 8, 2024
b462854
Added graph initialization
TimD1 Jul 14, 2024
c5c8266
Added htslib dependency installation.
TimD1 Jul 14, 2024
56063a1
Fixed edge case for cluster merging
TimD1 Jul 17, 2024
da3ec09
Added forward-pass alignment
TimD1 Jul 17, 2024
5d95876
Updated graph definition to make truth a graph.
TimD1 Jul 21, 2024
6c5cd1d
Forward alignment uses new algorithm
TimD1 Jul 22, 2024
b3e0581
Added precision-recall calculation.
TimD1 Jul 24, 2024
cc4a810
Updated phasing and printing
TimD1 Aug 3, 2024
c32a338
Continued bug fixes.
TimD1 Sep 26, 2024
5ac1532
fix: included bugfixes from v2.5.3
TimD1 Sep 26, 2024
232036e
cleanup: removed obsolete unreachable functions
TimD1 Sep 26, 2024
0021eb3
uncommented and fixed initial per-VCF NG50 calculation
TimD1 Sep 26, 2024
6540a36
docs: added v2.5.3 to cache
TimD1 Sep 29, 2024
674a9b4
feat: added Google Test framework
TimD1 Oct 14, 2024
a5bf84a
cleanup: removed obsolete constants
TimD1 Oct 14, 2024
6c599ae
ci: added tests
TimD1 Oct 14, 2024
25a6d4c
ci: updated test.yml
TimD1 Oct 14, 2024
6eb7fd2
wip: updating credit calc
TimD1 Oct 14, 2024
2843395
feat: updated README
TimD1 Oct 14, 2024
6d2e383
fix: time linter warning
TimD1 Oct 25, 2024
42a751e
feat: added overview diagram to assets
TimD1 Oct 25, 2024
fed1a82
removed g.phase_threshold
TimD1 Oct 25, 2024
b5b448c
removed --advanced flag, hiding clustering options
TimD1 Oct 25, 2024
5519868
fix(wip): resolving diffs between v253
TimD1 Nov 2, 2024
3c0d8bf
feat: added bioconda downloads badge
TimD1 Nov 2, 2024
b180918
fix: print order by query variant
TimD1 Nov 2, 2024
2ca6582
printing mostly fixed, just switch errors
TimD1 Nov 4, 2024
8f56110
feat: finally fixed printing summary.vcf!
TimD1 Nov 5, 2024
91c349a
fix: update simple_gt if corresponding variant filtered
TimD1 Nov 6, 2024
76338e8
fix: no warning on sync groups with zero ED
TimD1 Nov 7, 2024
3585d3a
fix: set phase set for 1|1 before PS found
TimD1 Nov 7, 2024
c50215d
fix: continued VCF equivalency work on chr20 dataset
TimD1 Nov 7, 2024
0fd7543
fix: reword filter command-line option
TimD1 Nov 7, 2024
a1a2852
feat: print switch/flip error rates to summary TSV
TimD1 Nov 11, 2024
dd67202
feat: track genotype errors, print to summary.vcf
TimD1 Nov 16, 2024
237d0e7
wip: fixing sync points, genotype errors, phase sets
TimD1 Nov 16, 2024
a69d0bd
fix: ref_dist calculated using wf_ed, not variants.
TimD1 Nov 18, 2024
eca0b12
fix: disallow non-ref sync points unless both submatrices change
TimD1 Nov 19, 2024
4c05b31
fix: fix GT fixing tie for FP variants
TimD1 Nov 21, 2024
3713ec0
chore: removed realignment option
TimD1 Nov 24, 2024
835833d
feat: phase-blocks counts variants, not superclusters
TimD1 Nov 24, 2024
a286190
fix: decrement TRUTH_TP count for AC_ERR_2_TO_1
TimD1 Nov 26, 2024
991c247
feat: allele count errors are now printed
TimD1 Nov 29, 2024
637ed7c
feat: superclusters with FNs are re-evaluated
TimD1 Nov 29, 2024
da6020b
fix: typo
TimD1 Nov 29, 2024
4b309fa
wip: choose best FN when redoing
TimD1 Nov 29, 2024
b2bd1c2
fix: fixed bug introduced when adding ignore errors option
TimD1 Nov 30, 2024
eaa2107
fix: ref/alt bugfix, added some debug printing
TimD1 Nov 30, 2024
1aa755d
feat: genotype errors now written for each variant type
TimD1 Nov 30, 2024
aa0ccea
fix: RAM calculation
TimD1 Dec 2, 2024
aed0a1c
fix: loop bug in query graph
TimD1 Dec 4, 2024
10eda91
fix: align dist equal to max succeeds
TimD1 Dec 7, 2024
d9e5aae
chore: updated CLI params
TimD1 Dec 7, 2024
4d84c73
feat: large variants pulled out for second round eval
TimD1 Dec 11, 2024
f0d02c1
wip: added skeleton for second-round eval
TimD1 Dec 13, 2024
f752863
feat: removed all usage of vars->clusters[]
TimD1 Dec 20, 2024
11978e5
feat: added basic FP/FN variant extraction
TimD1 Dec 21, 2024
c4d7139
feat: merge failed small vars with unevaluated large vars
TimD1 Dec 21, 2024
6ae84b1
feat: updated main pipeline printing/timers
TimD1 Dec 21, 2024
0d46c53
fix(ci): github runner to ubuntu-24.04
TimD1 Dec 21, 2024
ef66479
fix(ci): apt install lib-bz2
TimD1 Dec 21, 2024
880d897
fix(ci): apt install liblzma
TimD1 Dec 21, 2024
82d313e
wip: merge superclusters (var eval results for small/large)
TimD1 Dec 21, 2024
c695ccc
wip: remove supercluster metadata
TimD1 Dec 22, 2024
80b68be
fix: several bug fixes related to superclusters and start/end
TimD1 Dec 22, 2024
af4e3e3
fix: off-by-one supercluster counting error
TimD1 Dec 23, 2024
3e2f251
feat: print each variant subset
TimD1 Dec 23, 2024
ec6fbb7
fix: phase set tags fixed right before phasing
TimD1 Dec 23, 2024
71a0156
fix: off-by-one error in merging clusters
TimD1 Dec 23, 2024
cb206ad
feat: added WFA forward pass
TimD1 Dec 27, 2024
bfb2d29
feat: added parse_wfa_path()
TimD1 Dec 31, 2024
a838720
feat: single move on diagonals
TimD1 Jan 1, 2025
01da73c
fix: remove printing
TimD1 Jan 1, 2025
24f9cb8
fix: parameters now output to TSV
TimD1 Jan 12, 2025
4a92e2b
chore: remove set_path.sh script
TimD1 Jan 12, 2025
e4663ef
feat: timing results now output to TSV
TimD1 Jan 12, 2025
223e9f1
feat: added pytest-workflow to project
TimD1 Jan 15, 2025
d640a5d
feat: added pytest to CI
TimD1 Jan 15, 2025
d7d5731
fix: install pytest and pytest-workflow in CI
TimD1 Jan 15, 2025
dc198cc
fix: added --git-aware to pytest
TimD1 Jan 15, 2025
54d6b77
fix: added integration test data
TimD1 Jan 15, 2025
cb8b30a
fix: added chr20 of ref FASTA to repo test data
TimD1 Jan 19, 2025
ddf5b30
fix: clustering bugs
TimD1 Jan 19, 2025
5284f2f
fix: memory error on empty contig
TimD1 Feb 9, 2025
1a4c6d5
feat: update Makefile and .gitignore
TimD1 Feb 14, 2025
3a4bc57
chore: update to v3.0.0-b0
TimD1 Feb 14, 2025
34ee6bd
fix: gitignore test binary
TimD1 Feb 14, 2025
c6d48cc
feat: automated run diff from known version
TimD1 Feb 14, 2025
4516f5a
fix: run apt-get update during build
TimD1 Apr 5, 2025
9fe9abf
fix: avoid simple_cluster() failure on empty hap
TimD1 Apr 5, 2025
11efcb4
fix: remove git-aware from pytest
TimD1 Apr 6, 2025
e50a05c
fix: WFA error when variants are adjacent to contig ends
TimD1 Apr 16, 2025
099c255
feat: WFA only precision/recall alignment!
TimD1 Apr 19, 2025
e4ec98c
fix: calculate WFA RAM usage
TimD1 Apr 19, 2025
a0ebe8e
perf: clear WFA matrices ASAP
TimD1 Apr 19, 2025
6673744
feat: added memory efficient ED gWFA (w/o backtracking)
TimD1 Apr 19, 2025
3fd06be
fix: skip FN redo if type is sub, not length 1
TimD1 May 3, 2025
d350ca1
feat: pack WFA pointers
TimD1 May 3, 2025
df1fa44
wip: draft condensing alignment memory, segfaults
TimD1 May 3, 2025
0e8f67b
fix: if vs else if, continue if unallocated
TimD1 May 4, 2025
159d6bc
feat: revert to single-pass evaluation
TimD1 May 4, 2025
1797c35
fix: updated timer printing
TimD1 May 4, 2025
c062b8c
fix: WFA pointers now packed into uint64_t
TimD1 May 4, 2025
e48a43f
feat: revert from gWFA -> Dijkstra
TimD1 May 5, 2025
a87b7a7
feat: multi-match in Dijkstra
TimD1 May 5, 2025
d66f155
perf: remove backtrack option from wf_ed() and wf_swg_aln()
TimD1 May 30, 2025
ed4888b
chore: remove obsolete calc_cig_swg_score() and extract_errors()
TimD1 May 30, 2025
1ae14b9
chore: removed cluster and variant merging
TimD1 Jun 3, 2025
41cc0a0
feat: added --max-supercluster-size from v2
TimD1 Jun 8, 2025
d7313ec
fix: rounding error in NG50 calculation
TimD1 Jun 21, 2025
26d7aa4
fix: replaced obsolete CIGAR score test with NG50 test
TimD1 Jun 21, 2025
2be7a41
chore(test): reorganized unit test src
TimD1 Jun 22, 2025
31a8f78
chore: update citation
TimD1 Jun 22, 2025
c945b34
chore: remove obsolete print_cigar()
TimD1 Jun 22, 2025
60925cd
style: rename c -> ci for callset index
TimD1 Jun 22, 2025
c23dc2d
fix: increment nc (number of clusters) when splitting cluster
TimD1 Jun 22, 2025
8afc267
feat: better genotype error summary format
TimD1 Jul 4, 2025
946646b
fix: force VCF REF/ALT and ref FASTA to use upper-case
TimD1 Jul 4, 2025
1588ed0
fix: reaching end of query before end of reference isn't an error
TimD1 Jul 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 18 additions & 8 deletions .github/workflows/build.yml → .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -1,20 +1,24 @@
name: build
name: Tests

on:
push:
branches: [ "master" ]
branches: [ "master", "dev" ]
pull_request:
branches: [ "master" ]
branches: [ "master", "dev" ]

jobs:
build:
Tests:

runs-on: ubuntu-latest
runs-on: ubuntu-24.04

steps:
- uses: actions/checkout@v3
- name: install htslib
- uses: actions/checkout@v4
with:
ref: ${{ github.ref }}
- name: Install htslib
run: |
sudo apt-get update
sudo apt-get install -y libbz2-dev liblzma-dev
wget https://github.com/samtools/htslib/releases/download/1.17/htslib-1.17.tar.bz2
tar -xvf htslib-1.17.tar.bz2
cd htslib-1.17
Expand All @@ -23,11 +27,17 @@ jobs:
make install
echo LIBRARY_PATH=`pwd`:$LIBRARY_PATH >> $GITHUB_ENV
echo LD_LIBRARY_PATH=`pwd`:$LD_LIBRARY_PATH >> $GITHUB_ENV
- name: make vcfdist
- name: Make vcfdist
run: |
cp -r htslib-1.17/htslib src
echo $LIBRARY_PATH
echo $LD_LIBRARY_PATH
cd src
make
./vcfdist -v
- name: Test vcfdist
run: |
sudo apt-get install -y libgtest-dev
pip install pytest pytest-workflow
cd tests
pytest -vv
10 changes: 9 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# ignore certain file types
*.a
.*.swp
*.o
*.log
Expand All @@ -13,8 +14,15 @@
*.gz
*.gprof
*.png
src/vcfdist
*.json
*.log
*.aux
src/test
__pycache__

# ignore binaries
src/vcfdist
tests/unit/build/test_vcfdist

# ignore analysis
analysis/*/*/
Expand Down
24 changes: 14 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# vcfdist: benchmarking phased variant calls
![build](https://github.com/timd1/vcfdist/actions/workflows/build.yml/badge.svg)
![Tests](https://github.com/timd1/vcfdist/actions/workflows/test.yml/badge.svg)
![Downloads](https://anaconda.org/bioconda/vcfdist/badges/downloads.svg)
[![DOI](https://zenodo.org/badge/472945373.svg)](https://zenodo.org/badge/latestdoi/472945373)

## Overview
Expand All @@ -14,8 +15,7 @@
## Introduction
vcfdist is a distance-based **germline variant calling evaluation tool** that:
- simultaneously evaluates **SNPS, INDELs, and SVs** up to 10Kb
- **requires local phasing** information for truth and query variants
- can **standardize** query and truth VCF **variant representations**
- requires **local phasing information** for truth and query variants
- can evaluate **flip** and **switch phasing errors**

vcfdist provides more accurate SNP, INDEL, and SV precision-recall curves than previous work, particularly when complex variants are involved.
Expand All @@ -33,8 +33,8 @@ Please cite the following works if you use vcfdist:

<pre>
@article{dunn2023vcfdist,
author={Dunn, Tim and Narayanasamy, Satish},
title={vcfdist: Accurately benchmarking phased small variant calls in human genomes},
author={Dunn, Tim and Narayanasamy, Satish},
journal={Nature Communications},
year={2023},
volume={14},
Expand All @@ -49,18 +49,21 @@ Please cite the following works if you use vcfdist:

<details>
<summary>
<a href="https://doi.org/10.1101/2024.01.23.575922" target="_blank"><b>[bioRxiv]</b> Jointly benchmarking small and structural variant calls with vcfdist</a>
<a href="https://doi.org/10.1186/s13059-024-03394-5" target="_blank"><b>[Genome Biology]</b> Jointly benchmarking small and structural variant calls with vcfdist</a>
</summary>

<pre>
@article{dunn2024vcfdist,
author={Dunn, Tim and Zook, Justin M and Holt, James M and Narayanasamy, Satish},
title={Jointly benchmarking small and structural variant calls with vcfdist},
journal={bioRxiv},
author={Dunn, Tim and Zook, Justin M and Holt, James M and Narayanasamy, Satish},
journal={Genome Biology},
volume={25},
number={1},
pages={253},
year={2024},
publisher={Cold Spring Harbor Laboratory},
doi={10.1101/2024.01.23.575922},
URL={https://doi.org/10.1101/2024.01.23.575922}
publisher={Springer},
doi={10.1186/s13059-024-03394-5},
URL={https://doi.org/10.1186/s13059-024-03394-5}
}
</pre>
</details>
Expand All @@ -85,6 +88,7 @@ sudo docker run -it timd1/vcfdist:latest vcfdist --help
### Option 3: GitHub source
vcfdist is developed for Linux and its only dependencies are GCC v9.1+ and HTSlib v1.17. If you don't have HTSlib already, please set it up as follows:
```bash
> sudo apt install wget bzip2 g++ zlib1g-dev libbz2-dev liblzma-dev
> wget https://github.com/samtools/htslib/releases/download/1.17/htslib-1.17.tar.bz2
> tar -xvf htslib-1.17.tar.bz2
> cd htslib-1.17
Expand Down
88 changes: 45 additions & 43 deletions analysis-v2/phab_cm/1_phab_eval_vcfs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,59 +2,60 @@

source globals.sh

# # truvari phab VCF normalization v4.2.1
# source ~/software/Truvari-4.2.1/venv3.10/bin/activate
# mkdir -p $out/phab
# for i in "${!query_names[@]}"; do
# echo "phab: mafft for '${query_names[i]}'"
# truvari phab \
# -r $data/giab-tr-v4.20/GIABTR.HG002.benchmark.chr20.bed \
# -b $data/${truth_name}-${truth_version}/split/${truth_name}.most.vcf.gz \
# -c $data/${query_names[i]}-${query_versions[i]}/split/${query_names[i]}.most.vcf.gz \
# --bSamples HG002 \
# --cSamples HG002 \
# -f $data/refs/$ref_name \
# -t 64 \
# -o $out/phab/${query_names[i]}.vcf.gz \
# 2> $out/phab/${query_names[i]}.log
# done
# deactivate
# truvari phab VCF normalization v4.2.1
source ~/software/Truvari-4.2.1/venv3.10/bin/activate
mkdir -p $out/phab
for i in "${!query_names[@]}"; do
echo "phab: mafft for '${query_names[i]}'"
truvari phab \
-r $data/giab-tr-v4.20/GIABTR.HG002.benchmark.chr20.bed \
-b $data/${truth_name}-${truth_version}/split/${truth_name}.most.vcf.gz \
-c $data/${query_names[i]}-${query_versions[i]}/split/${query_names[i]}.most.vcf.gz \
--bSamples HG002 \
--cSamples HG002 \
-f $data/refs/$ref_name \
-t 64 \
-o $out/phab/${query_names[i]}.vcf.gz \
2> $out/phab/${query_names[i]}.log
done
deactivate

# # split phab normalized VCF into truth/query
# in_samples=(
# "HG002"
# "p:HG002"
# )
# out_samples=(
# "TRUTH"
# "QUERY"
# )
# for s in "${out_samples[@]}"; do
# echo $s > "${s}.txt"
# done
# for i in "${!query_names[@]}"; do
# for s in "${!in_samples[@]}"; do
# bcftools view -c1 \
# -s ${in_samples[s]} \
# $out/phab/${query_names[i]}.vcf.gz \
# | bcftools reheader -s ${out_samples[s]}.txt \
# | sed "s/\//|/g" \
# | bcftools norm -a -m-any \
# | bcftools view -Oz \
# &> $out/phab/${query_names[i]}.${out_samples[s]}.vcf.gz
# tabix -f -p vcf $out/phab/${query_names[i]}.${out_samples[s]}.vcf.gz
# done
# done
# split phab normalized VCF into truth/query
in_samples=(
"HG002"
"p:HG002"
)
out_samples=(
"TRUTH"
"QUERY"
)
for s in "${out_samples[@]}"; do
echo $s > "${s}.txt"
done
for i in "${!query_names[@]}"; do
for s in "${!in_samples[@]}"; do
bcftools view -c1 \
-s ${in_samples[s]} \
$out/phab/${query_names[i]}.vcf.gz \
| bcftools reheader -s ${out_samples[s]}.txt \
| sed "s/\//|/g" \
| bcftools norm -a -m-any \
| bcftools view -Oz \
&> $out/phab/${query_names[i]}.${out_samples[s]}.vcf.gz
tabix -f -p vcf $out/phab/${query_names[i]}.${out_samples[s]}.vcf.gz
done
done

# vcfdist evaluation
mkdir -p $out/phab-vcfdist
for i in "${!query_names[@]}"; do
echo "phab-vcfdist: evaluating '${query_names[i]}'"
$timer -v /home/timdunn/vcfdist/src/vcfdist \
$timer -v /data-share/timdunn/vcfdist/src/vcfdist \
$out/phab/${query_names[i]}.QUERY.vcf.gz \
$out/phab/${query_names[i]}.TRUTH.vcf.gz \
$data/refs/$ref_name \
--bed $data/giab-tr-v4.20/GIABTR.HG002.benchmark.chr20.bed \
-t 64 -r 64 \
-l 1000 \
-p $out/phab-vcfdist/${query_names[i]}. \
2> $out/phab-vcfdist/${query_names[i]}.log
Expand All @@ -75,6 +76,7 @@ for i in "${!query_names[@]}"; do
-m ga4gh \
--ref-overlap \
--all-records \
--vcf-score-field=QUAL \
-o $out/phab-vcfeval/${query_names[i]} \
2> $out/phab-vcfeval/${query_names[i]}.log
gunzip $out/phab-vcfeval/${query_names[i]}/output.vcf.gz
Expand Down
3 changes: 2 additions & 1 deletion analysis-v2/phab_cm/2_gen_cm.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
phab = True
ds = "pav"
colors = {"hprc": "Greens", "pav": "Reds", "giab-tr": "Blues"}
names = {"t2t-q100": "Q100-dipcall", "hprc": "hifiasm-dipcall", "pav": "Q100-PAV", "giab-tr": "hifiasm-GIAB-TR"}
if phab: phab_prefix = "phab-"
else: phab_prefix = ""
truvari_prefix = f"./{phab_prefix}truvari/{ds}/result_"
Expand Down Expand Up @@ -166,7 +167,7 @@ def match(record1, record2):
for size_idx in range(SZ_DIMS):
fig, ax = plt.subplots(figsize=(2,4))
ax.matshow(np.log(counts[size_idx] + 0.1), cmap=colors[ds])
plt.title(f"{ds.upper()} {sizes[size_idx]} Confusion Matrix", fontsize=7)
plt.title(f"{names[ds]}\n{sizes[size_idx]} Confusion Matrix", fontsize=7)
plt.ylabel("vcfdist", fontsize=7)
ax.set_yticks(list(range(VD_DIMS-1)))
ax.set_yticklabels(vd_cats[1:], fontsize=5)
Expand Down
20 changes: 7 additions & 13 deletions analysis-v2/phab_cm/3_sum_cm.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
skips = np.zeros((2,2,2), dtype=int)

def record_string(record):
return f"{record.CHROM}:{record.POS} {record.REF} {record.ALT[0]} {record.genotype('QUERY')['GT']}"
return f"{record.CHROM}:{record.POS}\t{record.REF}\t{record.ALT[0]}\t{record.genotype('QUERY')['GT']}"

def get_size(record):
ref = record.REF
Expand Down Expand Up @@ -169,9 +169,6 @@ def hap2(var):

loc, ref, alt, gt = var.split()

if vd_type and not ve_type and not tv_type:
print(f"{ds} {var} VD={cats[vd_type]}:{vd_bk}")

if vd_type and ve_type and tv_type: # all tools compare
i = vd_type-1
j = (ve_type-1)*2 + tv_type-1
Expand All @@ -181,29 +178,29 @@ def hap2(var):
am += 1
# print(f"{ds} {var} VD={cats[vd_type]}:{vd_bk} VE={cats[ve_type]}:{ve_bk} TV={cats[tv_type]}:{tv_bk}")
else:
# print(f"{ds} {var} VD={cats[vd_type]}:{vd_bk} VE={cats[ve_type]}:{ve_bk} TV={cats[tv_type]}:{tv_bk}")
# print(f"truvari TP\t{ds}\t{var}\tVD={cats[vd_type]}:{vd_bk} VE={cats[ve_type]}:{ve_bk} TV={cats[tv_type]}:{tv_bk}")
pass
if i == 0 and j == 2:
# vcfeval considers two INSs at same location
# print(f"{ds} {var} VD={cats[vd_type]}:{vd_bk} VE={cats[ve_type]}:{ve_bk} TV={cats[tv_type]}:{tv_bk}")
pass
# print(f"vcfeval TP\t{ds}\t{var}\tVD={cats[vd_type]}:{vd_bk} VE={cats[ve_type]}:{ve_bk} TV={cats[tv_type]}:{tv_bk}")
if i == 0 and j == 3:
pass
# print(f"{ds} {var} VD={cats[vd_type]}:{vd_bk} VE={cats[ve_type]}:{ve_bk} TV={cats[tv_type]}:{tv_bk}")
# print(f"vcfdist FP\t{ds}\t{var}\tVD={cats[vd_type]}:{vd_bk} VE={cats[ve_type]}:{ve_bk} TV={cats[tv_type]}:{tv_bk}")
if i == 1 and j == 0:
# print(f"vcfdist TP\t{ds}\t{var}\tVD={cats[vd_type]}:{vd_bk} VE={cats[ve_type]}:{ve_bk} TV={cats[tv_type]}:{tv_bk}")
pass
# print(f"{ds} {var} VD={cats[vd_type]}:{vd_bk} VE={cats[ve_type]}:{ve_bk} TV={cats[tv_type]}:{tv_bk}")
if i == 1 and j == 1:
if vd_bk == "lm":
pp += 1
elif vd_bk == "gm" and ve_bk == "am" and tv_bk == "am":
am_diff += 1
else:
# print(f"{ds} {var} VD={cats[vd_type]}:{vd_bk} VE={cats[ve_type]}:{ve_bk} TV={cats[tv_type]}:{tv_bk}")
# print(f"vcfeval FP\t{ds}\t{var}\tVD={cats[vd_type]}:{vd_bk} VE={cats[ve_type]}:{ve_bk} TV={cats[tv_type]}:{tv_bk}")
pass
if i == 1 and j == 2:
print(f"truvari FP\t{ds}\t{var}\tVD={cats[vd_type]}:{vd_bk} VE={cats[ve_type]}:{ve_bk} TV={cats[tv_type]}:{tv_bk}")
pass
# print(f"{ds} {var} VD={cats[vd_type]}:{vd_bk} VE={cats[ve_type]}:{ve_bk} TV={cats[tv_type]}:{tv_bk}")

elif size == SZ_SNP and vd_type and ve_type and not tv_type: # Truvari skips SNPs
counts[size][vd_type-1][(ve_type-1)*2] += 1
Expand All @@ -215,9 +212,6 @@ def hap2(var):
# dels += 1
pass

elif ve_type and tv_type and not vd_type:
print(f"{ds} {var} VD={cats[vd_type]}:{vd_bk} VE={cats[ve_type]}:{ve_bk} TV={cats[tv_type]}:{tv_bk}")


sum_counts = counts[SZ_INDEL][:][:] + counts[SZ_SV][:][:]
print("[truvari-only TP] more lenient allele match:", am)
Expand Down
14 changes: 7 additions & 7 deletions analysis-v2/phab_cm/globals.sh
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
#!/bin/bash

data="/home/timdunn/vcfdist/data"
out="/home/timdunn/vcfdist/analysis-v2/phab_cm"
parallel="/home/timdunn/parallel-20211222/src/parallel"
vcfdist="/home/timdunn/vcfdist/src/vcfdist"
rtg="/home/timdunn/software/rtg-tools-3.12.1/rtg"
tabix="/home/timdunn/software/htslib-1.16/tabix"
bgzip="/home/timdunn/software/htslib-1.16/bgzip"
data="/data-share/timdunn/vcfdist/data"
out="/data-share/timdunn/vcfdist/analysis-v2/phab_cm"
parallel="/data-share/timdunn/parallel-20211222/src/parallel"
vcfdist="/data-share/timdunn/vcfdist/src/vcfdist"
rtg="/data-share/timdunn/software/rtg-tools-3.12.1/rtg"
tabix="/data-share/timdunn/software/htslib-1.16/tabix"
bgzip="/data-share/timdunn/software/htslib-1.16/bgzip"
timer="/usr/bin/time"

# define reference FASTA
Expand Down
Loading