General matrix multiplication of f32 and f64 matrices in Rust. Supports matrices with general strides.
I wanted to use the integer gemm code from 0430cf0, and realized that there currently is no way of performing an operation on transposed matrices while I wanted to perform `A^t A`. In the BLAS context, the transpose or complex conjugate of a matrix is usually expressed as `Op(A)`, where `Op` is expressed through the parameter `TRANSA` given by the character `'N'`, `'T'`, or `'C'`. I realize that since we actually have dimensions and slices as part of our matrix `ArrayBase` structures, we can just circumvent the issue by doing a transpose of the matrix view via `fn t(mut self)`. The questions are: + Performance: does code specific to a transposed matrix with unchanged memory layout have the same performance as generic code given different stride information? + In how far is the gemm kernel for transposed matrices different from that for non-transposed matrices? While addressing this issue, it's probably also worth investigating how `DSYRK` for the specific case of `A^t A` is implemented different in the BLIS library. I have a hard time understanding how BLIS defines its kernels, specifically how the different cases of `Op(A) Op(B)` are implemented. I am happy do dig in and write a benchmark comparing the current `ndarray` approach to writing specific kernel. Can you point me to the right spot to look at?
This issue appears to be discussing a feature request or bug report related to the repository. Based on the content, it seems to be still under discussion. The issue was opened by SuperFluffy and has received 8 comments.