» 首页 > 程序资料 > MMX 汇编优化 > MMX 优化: How to optimize for the Pentium family of microprocessors

29.2 Floating point instructions

日期: 2000-04-03 14:00 | 联系我
关注我: Telegram, Twitter

< TD>

29.2 Floating point instructions
Instruction	Operands	micro-ops						delay	throughput
		p0	p1	p01	p2	p3	p4
FLD	r	1
FLD	m32/64				1			1
FLD	m80	2			2
FBLD	m80	38			2
FST(P)	r	1
FST(P)	m32/m64					1	1	1
FSTP	m80	2				2	2
FBSTP	m80	165				2	2
FXCH	r							0	3/1 f)
FILD	m	3			1			5
FIST(P)	m	2				1	1	5
FLDZ		1
FLD1 FLDPI FLDL2E etc.		2
FCMOVcc	r	2						2
FNSTSW	AX	3					7
FNSTSW	m16	1				1	1
FLDCW	m16	1		1	1			10
FNSTCW	m16	1				1	1
FADD(P) FSUB(R)(P)	r	1						3	1/1
FADD(P) FSUB(R)(P)	m	1			1			3-4	1/1
FMUL(P)	r	1						5	1/2 g)
FMUL(P)	m	1			1			5-6	1/2 g)
FDIV(R)(P)	r	1						38 h)	1/37
FDIV(R)(P)	m	1			1			38 h)	1/37
FABS		1
FCHS		3						2
FCOM(P) FUCOM	r	1						1
FCOM(P) FUCOM	m	1			1			1
FCOMPP FUCOMPP		1		1				1
FCOMI(P) FUCOMI(P)	r	1						1
FCOMI(P) FUCOMI(P)	m	1			1			1
FIADD FISUB(R)	m	6			1
FI MUL	m	6			1
FIDIV(R)	m	6			1
FICOM(P)	m	6			1
FTST		1						1
FXAM		1						2
FPREM		23
FPREM1		33
FRNDINT		30
FSCALE		56
FXTRACT		15
FSQRT		1						69	e,i)
FSIN FCOS		17-97						27-103	e)
FSINCOS		18-110						29-130	e)
F2XM1		17-48						66	e)
FYL2X		36-54						103	e)
FYL2XP1		31-53						98-107	e)
FPTAN		21-102						13-143	e)
FPATAN		25-86						44-143	e)
FNOP		1
FINCSTP FDECSTP		1
FFREE	r	1
FFREEP	r	2
FNCLEX				3
FNINIT		13
FNSAVE		141
FRSTOR		72
WAIT				2

Notes:

e) not pipelined

f) FXCH generates 1 micro-op that is resolved by register renaming without going to any port.

g) FMUL uses the same circuitry as integer multiplication. Therefore, the combined throughput of mixed floating point and integer multiplications is 1 FMUL + 1 IMUL per 3 clock cycles.

h) FDIV delay depends on precision specified in control word: precision 64 bits gives delay 38, precision 53 bits gives delay 32, precision 24 bits gives delay 18. Division by a power of 2 takes 9 clocks. Throughput is 1/(delay-1).

i) faster for lower precision.

前一篇：29.4 XMM instructions (PIII)
下一篇：27.8 Moving blocks of data (all processors)

标签: MMX 优化 | 浮点运算 | Floating Point