heyfastq

FASTQ sequence file utilities, written in pure Python, with no dependencies.

Summary

The package comes with one program, heyfastq, which provides utilities for single or paired FASTQ files.

Installation

Install from PyPi with:

pip install heyfastq

Or get the dev version from GitHub:

git clone https://github.com/kylebittinger/heyfastq.git
pip install .

Usage

Run heyfastq -h to learn more about usage options.

Dev

Heyfastq is built around the idea of piping reads (or read pairs) through filter and map functions. The fundamental unit that moves through heyfastq pipelines is the R object, which can be either a Read or a ReadPair. These generic Rs move through functions that take in ReadPipes and output ReadPipes, allowing for easy composition of pipelines.

from heyfastqlib.read import Read, ReadPair, R, ReadPipe
from heyfastqlib.pipelines import filter_reads, map_reads

def unit_filter(r: R) -> bool:
  return True

def unit_map(r: R) -> R:
  return r

input_fastq = (r for r in [Read("1", "ACTG", "HHHH"), Read("2", "GTCA", "HHHH"), Read("3", "AAAA", "####")])
filter_counter = {"input_reads": 0, "input_bases": 0, "output_reads": 0, "output_bases": 0}
map_counter = {"input_reads": 0, "input_bases": 0, "output_reads": 0, "output_bases": 0}

output_fastq = map_reads(filter_reads(input_fastq, unit_filter, filter_counter), unit_map, map_counter)

This is all well and good, but how do we actually deal with fastq files, not just objects already in python?

from heyfastqlib.io import parse_fastq, write_fastq

with open("r1.fq") as f_in, open("o1.fq", "w") as f_out:
  write_fastq(f_out, parse_fastq(f_in))

with open("r1.fq") as f1_in, open("r2.fq") as f2_in, open("o1.fq", "w") as f1_out, open("o2.fq", "w") as f2_out:
  write_fastq((f1_out, f2_out), parse_fastq((f1_in, f2_in)))

Putting it all together, assuming we've imported/created the objects from above, let's read our fastqs then apply our filter then our map and finally write them to fastqs again:

with open("r1.fq") as f1_in, open("r2.fq") as f2_in, open("o1.fq", "w") as f1_out, open("o2.fq", "w") as f2_out:
  write_fastq((f1_out, f2_out), map_reads(filter_reads(parse_fastq((f1_in, f2_in)), unit_filter, filter_counter), unit_map, map_counter))

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
.github/workflows		.github/workflows
src/heyfastqlib		src/heyfastqlib
tests		tests
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
benchmarking.ipynb		benchmarking.ipynb
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

heyfastq

Summary

Installation

Usage

Dev

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

PennChopMicrobiomeProgram/heyfastq

Folders and files

Latest commit

History

Repository files navigation

heyfastq

Summary

Installation

Usage

Dev

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages