28.2 Floating point instructions
Explanations:
Operands:
r = register, m = memory, m32 = 32 bit memory operand, etc.
Clock cycles:
The numbers are minimum values. Cache misses, misalignment, denormal operands, and exceptions may increase the clock counts considerably.
Pairability:
+ = pairable with FXCH, np = not pairable with FXCH.
i-ov:
Overlap with integer instructions. i-ov = 4 means that the last four clock cycles can overlap with subsequent integer instructions.
fp-ov:
Overlap with floating point instructions. fp-ov = 2 means that the last two clock cycles can overlap with subsequent floating point instructions. (WAIT is considered a floating point instruction here)
Instruction | Operand | Clock cycles | Pairability | i-ov | fp-ov |
FLD | r/m32/m64 | 1 | + | 0 | 0 |
FLD | m80 | 3 | np | 0 | 0 |
FBLD | m80 | 48-58 | np | 0 | 0 |
FST(P) | r | 1 | np | 0 | 0 |
FST(P) | m32/m64 | 2 m) | np | 0 | 0 |
FST(P) | m80 | 3 m) | np | 0 | 0 |
FBSTP | m80 | 148-154 | np | 0 | 0 |
FILD | m | 3 | np | 2 | 2 |
FIST(P) | m | 6 | np | 0 | 0 |
FLDZ FLD1 | 2 | np | 0 | 0 | |
FLDPI FLDL2E etc. | 5 s) | np | 2 | 2 | |
FNSTSW | AX/m16 | 6 q) | np | 0 | 0 |
FLDCW | m16 | 8 | np | 0 | 0 |
FNSTCW | m16 | 2 | np | 0 | 0 |
FADD(P) | r/m | 3 | + | 2 | 2 |
FSUB(R)(P) | r/m | 3 | + | 2 | 2 |
FMUL(P) | r/m | 3 | + | 2 | 2 n) |
FDIV(R)(P) | r/m | 19/33/39 p) | + | 38 o) | 2 |
FCHS FABS | 1 | + | 0 | 0 | |
FCOM(P)(P) FUCOM | r/m | 1 | + | 0 | 0 |
FIADD FISUB(R) | m | 6 | np | 2 | 2 |
FIMUL | m | 6 | np | 2 | 2 |
FIDIV(R) | m | 22/36/42 p) | np | 38 o) | 2 |
FICOM | m | 4 | np | 0 | 0 |
FTST | 1 | np | 0 | 0 | |
FXAM | 17-21 | np | 4 | 0 | |
FPREM | 16-64 | np | 2 | 2 | |
FPREM1 | 20-70 | np | 2 | 2 | |
FRNDINT | 9-20 | np | 0 | 0 | |
FSCALE | 20-32 | np | 5 | 0 | |
FXTRACT | 12-66 | np | 0 | 0 | |
FSQRT | 70 | np | 69 o) | 2 | |
FSIN FCOS | 65-100 r) | np | 2 | 2 | |
FSINCOS | 89-112 r) | np | 2 | 2 | |
F2XM1 | 53-59 r) | np | 2 | 2 | |
FYL2X | 103 r) | np | 2 | 2 | |
FYL2XP1 | 105 r) | np | 2 | 2 | |
FPTAN | 120-147 r) | np | 36 o) | 0 | |
FPATAN | 112-134 r) | np | 2 | 2 | |
FNOP | 1 | np | 0 | 0 | |
FXCH | r | 1 | np | 0 | 0 |
FINCSTP FDECSTP | 2 | np | 0 | 0 | |
FFREE | r | 2 | np | 0 | 0 |
FNCLEX | 6-9 | np | 0 | 0 | |
FNINIT | 12-22 | np | 0 | 0 | |
FNSAVE | m | 124-300 | np | 0 | 0 |
FRSTOR | m | 70-95 | np | 0 | 0 |
WAIT | 1 | np | 0 | 0 |
Notes:
m) The value to store is needed one clock cycle in advance.
n) 1 if the overlapping instruction is also an FMUL.
o) Cannot overlap integer multiplication instructions.
p) FDIV takes 19, 33, or 39 clock cycles for 24, 53, and 64 bit precision respectively. FIDIV takes 3 clocks more. The precision is defined by bit 8-9 of the floating point control word.
q) The first 4 clock cycles can overlap with preceding integer instructions. See chapter 26.7.
r) clock counts are typical. Trivial cases may be faster, extreme cases may be slower.
s) may be up to 3 clocks more when output needed for FST, FCHS, or FABS.