Open
Conversation
… refactoring for new datatypes
…ike. Now need to update engine.cpp logic
…ill need to finish the output tensor part of infer
freeman94
approved these changes
Oct 6, 2025
Collaborator
freeman94
left a comment
There was a problem hiding this comment.
Mostly notes for future PRs
Comment on lines
+22
to
+24
| pkgs-unstable = import nixpkgs-unstable { | ||
| inherit system; | ||
| config.allowUnfree = true; |
Collaborator
There was a problem hiding this comment.
Prefer to keep on a stable release
| struct TensorInfo { | ||
| name: String, | ||
| dims: Vec<u32>, | ||
| shape: Vec<i64>, // -1 for dynamic dimensions |
Collaborator
There was a problem hiding this comment.
Would be nice if we could convert this into an enum at some point in the future, rather than having to check for -1 and then interpret the min/max/opt fields.
| name: String, | ||
| data: Vec<u8>, | ||
| shape: Vec<i64>, // this should always be positive, just i64 for convenience | ||
| dtype: TensorDataType, |
Collaborator
There was a problem hiding this comment.
This could also be genericized by dtype so that the data Vec is appropriately cast without the user having to do so.
|
|
||
| // ASSUMPTION: we always use optimization profile 0 | ||
| // set the optimization profile to 0 so we can query output shapes after setting input shapes | ||
| mContext->setOptimizationProfileAsync(0, stream); |
Collaborator
There was a problem hiding this comment.
Probably want to make this configurable in the future.
…d only effect it on load and unload of the model
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Added support for as many dynamic axes as you want wherever you want. It doesn't just have to be the first dimension of a tensor.
Testing
All the examples run
Also integrated into a certain saronic downstream program and it works
Notes
Finished adding support for dynamic axes in libinfer wherever whenever: Some comparison
libinfer 0.0.4 DETR benchmark.rs
libinfer 0.0.5 (dynamic axes of death) DETR benchmark.rs
libinfer 0.0.4 yolov8
libinfer 0.0.5 yolov8
I found a major optimization bug where we were prematurely synchronizing the cuda stream. I introduced this in 0.0.4. By removing this we have a pretty massive performance improvement on larger models. Strangely I am getting better performance on the new tracker trained DETR model than yolov8. The DETR model is quite a bit larger and has two transformers so I am suprised. Not complaining though, this is nearly a 2x performance improvement
We are still IO bound on f32 output tensors. Will save that for 0.0.6