Inlining and target features
Inlining and target features have a few interactions that can significantly affect the speed of your program.
#[target_feature]
hinders inlining
Compilers are free to optimize functions by reordering their operations, as long as it doesn’t change the observable results.
To prevent these optimizations from reordering target feature detection, the #[target_feature]
attribute prevents inlining.
Since non-inlined function calls can be slow, try to use one large function tagged with #[target_feature]
, rather than many separate functions.
Additionally, multiple uses of #[target_feature]
may not be necessary, as explained in the next paragraph.
Functions with #[target_feature]
can still inline into other functions that support the same features via #[target_feature]
or the -Ctarget-feature
flag.
Inlined functions can inherit target features
When a function is inlined into another function tagged with #[target_feature]
, the inlined function is also compiled with those target features.
To help ensure inlining, the #[inline]
attribute can be used.
This behavior can be used to write functions without worrying about what the final target features will be—in fact, this is what portable SIMD does!
Functions in std::simd
are inlined, allowing the user to specify the target features.
In the following example, all of the code is generated using AVX, because it’s all inlined into the function with #[target_feature]
:
Target features affect calling convention
The target features of a function affect its calling convention, the way memory is arranged when calling the function. Specifically, the target features affect which vector registers are used to pass vectors.
To ensure that functions with different target features can still be used in the same program, Rust passes vectors by reference instead of by value. This can significantly slow down calling functions with vector arguments because it forces the vector to be written to memory.
To avoid this behavior, avoid using vectors as arguments in non-inlined functions. Inlined functions are not affected because the function call is optimized out.