-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Hi there,
Your tool works perfect but is pretty slow : 10 minutes on a GTF of 1'756'105 rows
=> And I tested on a filtered GTF, because real one is rather 10s of millions of rows...
Could be due to usage of python regex to parse "attributes" column ?
I saw that it could be worth to "compile" regex 1st, and outside main loop ? (see: https://stackoverflow.com/questions/452104/is-it-worth-using-pythons-re-compile)
=> In practice NO time gain
There are other "regex" reimplementation out there : https://github.com/mrabarnett/mrab-regex
Otherwise, simply replace regex by simple attributes.split(" ;")
=> But that can be less robust
=> And in practice time gain is not huge either
Or maybe replace by Pola.rs, written in rust ?
Thanks !
Felix.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels