29.2 Floating point instructions | |||||||||
Instruction | Operands | micro-ops | delay | throughput | |||||
p0 | p1 | p01 | p2 | p3 | p4 | ||||
FLD | r | 1 | |||||||
FLD | m32/64 | 1 | 1 | ||||||
FLD | m80 | 2 | 2 | ||||||
FBLD | m80 | 38 | 2 | ||||||
FST(P) | r | 1 | |||||||
FST(P) | m32/m64 | 1 | 1 | 1 | |||||
FSTP | m80 | 2 | 2 | 2 | |||||
FBSTP | m80 | 165 | 2 | 2 | |||||
FXCH | r | 0 | 3/1 f) | ||||||
FILD | m | 3 | 1 | 5 | |||||
FIST(P) | m | 2 | 1 | 1 | 5 | ||||
FLDZ | 1 | ||||||||
FLD1 FLDPI FLDL2E etc. | 2 | ||||||||
FCMOVcc | r | 2 | 2 | ||||||
FNSTSW | AX | 3 | < TD>7 | ||||||
FNSTSW | m16 | 1 | 1 | 1 | |||||
FLDCW | m16 | 1 | 1 | 1 | 10 | ||||
FNSTCW | m16 | 1 | 1 | 1 | |||||
FADD(P) FSUB(R)(P) | r | 1 | 3 | 1/1 | |||||
FADD(P) FSUB(R)(P) | m | 1 | 1 | 3-4 | 1/1 | ||||
FMUL(P) | r | 1 | 5 | 1/2 g) | |||||
FMUL(P) | m | 1 | 1 | 5-6 | 1/2 g) | ||||
FDIV(R)(P) | r | 1 | 38 h) | 1/37 | |||||
FDIV(R)(P) | m | 1 | 1 | 38 h) | 1/37 | ||||
FABS | 1 | ||||||||
FCHS | 3 | 2 | |||||||
FCOM(P) FUCOM | r | 1 | 1 | ||||||
FCOM(P) FUCOM | m | 1 | 1 | 1 | |||||
FCOMPP FUCOMPP | 1 | 1 | 1 | ||||||
FCOMI(P) FUCOMI(P) | r | 1 | 1 | ||||||
FCOMI(P) FUCOMI(P) | m | 1 | 1 | 1 | |||||
FIADD FISUB(R) | m | 6 | 1 | ||||||
FI MUL | m | 6 | 1 | ||||||
FIDIV(R) | m | 6 | 1 | ||||||
FICOM(P) | m | 6 | 1 | ||||||
FTST | 1 | 1 | |||||||
FXAM | 1 | 2 | |||||||
FPREM | 23 | ||||||||
FPREM1 | 33 | ||||||||
FRNDINT | 30 | ||||||||
FSCALE | 56 | ||||||||
FXTRACT | 15 | ||||||||
FSQRT | 1 | 69 | e,i) | ||||||
FSIN FCOS | 17-97 | 27-103 | e) | ||||||
FSINCOS | 18-110 | 29-130 | e) | ||||||
F2XM1 | 17-48 | 66 | e) | ||||||
FYL2X | 36-54 | 103 | e) | ||||||
FYL2XP1 | 31-53 | 98-107 | e) | ||||||
FPTAN | 21-102 | 13-143 | e) | ||||||
FPATAN | 25-86 | 44-143 | e) | ||||||
FNOP | 1 | ||||||||
FINCSTP FDECSTP | 1 | ||||||||
FFREE | r | 1 | |||||||
FFREEP | r | 2 | |||||||
FNCLEX | 3 | ||||||||
FNINIT | 13 | ||||||||
FNSAVE | 141 | ||||||||
FRSTOR | 72 | ||||||||
WAIT | 2 |
e) not pipelined
f) FXCH generates 1 micro-op that is resolved by register renaming without going to any port.
g) FMUL uses the same circuitry as integer multiplication. Therefore, the combined throughput of mixed floating point and integer multiplications is 1 FMUL + 1 IMUL per 3 clock cycles.
h) FDIV delay depends on precision specified in control word: precision 64 bits gives delay 38, precision 53 bits gives delay 32, precision 24 bits gives delay 18. Division by a power of 2 takes 9 clocks. Throughput is 1/(delay-1).
i) faster for lower precision.