26.16 FLDCW (PPro, PII and PIII)
The PPro, PII and PIII have a serious stall after the FLDCW instruction if followed by any floating point instruction which reads the control word (which almost all floating point instructions do).
When C or C++ code is compiled it often generates a lot of FLDCW instructions because conversion of floating point numbers to integers is done with truncation while other floating point instructions use rounding. After translation to assembly, you can improve this code by using rounding instead of truncation where possible, or by moving the FLDCW out of a loop where truncation is needed inside the loop.
See chapter 27.5 on how to convert floating point numbers to integers whitout changing the control word.