22.5. Replacing conditional jumps by conditional moves (PPro, PII and PIII)
The PPro, PII and PIII processors have conditional move instructions intended specifically for avoiding branches because branch misprediction is very time-consuming on these processors. There are conditional move instructio ns for both integer and floating point registers. For code that will run only on these processors you may replace poorly predictable branches with conditional moves whenever possible. If you want your code to run on all processors then you may make two versions of the most critical parts of the code, one for processors that support conditional move instructions and one for those that don't (see chapter 27.10 for how to detect if conditional moves are supported).
The misprediction penalty for a branch may be so high that it is advantageous to replace it with conditional moves even when it costs several extra instructions. But a conditional move instruction has the disadvantage that it makes dependency chains longer. The conditional move waits for both register operands to be ready even though only one of them is needed. A conditional move is waiting for three operands to be ready: the condition flag and the two move operands. You have to consider if any of these three operands are likely to be delayed by dependency chains or cache misses. If the condition flag is available long before the move operands then you may as well use a branch, because a possible branch misprediction could be resolved while waiting for the move operands. In situations where you have to wait long for a move operand that may not be needed after all, the branch will be faster than the conditional move despite a possible misprediction penalty. The opposite situation is when the condition flag is delayed while both move operands are available early. In this situation the conditional move is preferred over the branch if misprediction is likely.