Skip to content

Conversation

@d-laub
Copy link
Collaborator

@d-laub d-laub commented Mar 11, 2025

Closes #24.

  • docs: fix version format to be vX.Y.Z

  • feat: initial prototype for splicing.

  • Splice regions together

  • Allow different definition of an overlapping variant to be fully exonic and not overlapping with splice sites a la Haplosaurus.

  • Update Dataset API (or maybe a new class) to reflect different shape and definition of a row.

  • Tests against Haplosaurus on 1kGP chr22 @bschilder

  • Performance issues, possibly from slow RC

@d-laub d-laub marked this pull request as ready for review April 6, 2025 01:17
@d-laub d-laub marked this pull request as draft April 6, 2025 01:18
@d-laub d-laub added the type: enhancement New feature or request label Apr 8, 2025
@bschilder bschilder mentioned this pull request Apr 15, 2025
@d-laub d-laub marked this pull request as ready for review April 30, 2025 03:56
@d-laub
Copy link
Collaborator Author

d-laub commented May 12, 2025

Hey Brian, this looks good! Thanks to @BradBalderson I just caught and fixed a bug in the reverse complement function. I think it would make sense that ~half of the sequences are way off with that bug. Can you try again with the latest commits?

@bschilder
Copy link
Collaborator

bschilder commented May 13, 2025

Hey Brian, this looks good! Thanks to @BradBalderson I just caught and fixed a bug in the reverse complement function. I think it would make sense that ~half of the sequences are way off with that bug. Can you try again with the latest commits?

Without VCF normalisation

Using the VCF directly to create the GVL db, without any normalisation with bcftoools.

Nucleotide-level

MUCH better seq sim for nuc level.

image

Amino acid-level

Unfortunately, still lots of stop codons:
image

And actually the AA seq sim is a bit lower than before:
image

With VCF normalisation

[placeholder]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Subsetting sequences by coordinates

2 participants