» 首页 > 程序资料 > MMX 汇编优化 > MMX 优化: How to optimize for the Pentium family of microprocessors

28.2 Floating point instructions

日期: 2000-04-03 14:00 | 联系我 | 关注我: Telegram, Twitter

Explanations:

Operands:

r = register, m = memory, m32 = 32 bit memory operand, etc.

Clock cycles:

The numbers are minimum values. Cache misses, misalignment, denormal operands, and exceptions may increase the clock counts considerably.

Pairability:

+ = pairable with FXCH, np = not pairable with FXCH.

i-ov:

Overlap with integer instructions. i-ov = 4 means that the last four clock cycles can overlap with subsequent integer instructions.

fp-ov:

Overlap with floating point instructions. fp-ov = 2 means that the last two clock cycles can overlap with subsequent floating point instructions. (WAIT is considered a floating point instruction here)

Instruction	Operand	Clock cycles	Pairability	i-ov	fp-ov
FLD	r/m32/m64	1	+	0	0
FLD	m80	3	np	0	0
FBLD	m80	48-58	np	0	0
FST(P)	r	1	np	0	0
FST(P)	m32/m64	2 m)	np	0	0
FST(P)	m80	3 m)	np	0	0
FBSTP	m80	148-154	np	0	0
FILD	m	3	np	2	2
FIST(P)	m	6	np	0	0
FLDZ FLD1		2	np	0	0
FLDPI FLDL2E etc.		5 s)	np	2	2
FNSTSW	AX/m16	6 q)	np	0	0
FLDCW	m16	8	np	0	0
FNSTCW	m16	2	np	0	0
FADD(P)	r/m	3	+	2	2
FSUB(R)(P)	r/m	3	+	2	2
FMUL(P)	r/m	3	+	2	2 n)
FDIV(R)(P)	r/m	19/33/39 p)	+	38 o)	2
FCHS FABS		1	+	0	0
FCOM(P)(P) FUCOM	r/m	1	+	0	0
FIADD FISUB(R)	m	6	np	2	2
FIMUL	m	6	np	2	2
FIDIV(R)	m	22/36/42 p)	np	38 o)	2
FICOM	m	4	np	0	0
FTST		1	np	0	0
FXAM		17-21	np	4	0
FPREM		16-64	np	2	2
FPREM1		20-70	np	2	2
FRNDINT		9-20	np	0	0
FSCALE		20-32	np	5	0
FXTRACT		12-66	np	0	0
FSQRT		70	np	69 o)	2
FSIN FCOS		65-100 r)	np	2	2
FSINCOS		89-112 r)	np	2	2
F2XM1		53-59 r)	np	2	2
FYL2X		103 r)	np	2	2
FYL2XP1		105 r)	np	2	2
FPTAN		120-147 r)	np	36 o)	0
FPATAN		112-134 r)	np	2	2
FNOP		1	np	0	0
FXCH	r	1	np	0	0
FINCSTP FDECSTP		2	np	0	0
FFREE	r	2	np	0	0
FNCLEX		6-9	np	0	0
FNINIT		12-22	np	0	0
FNSAVE	m	124-300	np	0	0
FRSTOR	m	70-95	np	0	0
WAIT		1	np	0	0

Notes:

m) The value to store is needed one clock cycle in advance.

n) 1 if the overlapping instruction is also an FMUL.

o) Cannot overlap integer multiplication instructions.

p) FDIV takes 19, 33, or 39 clock cycles for 24, 53, and 64 bit precision respectively. FIDIV takes 3 clocks more. The precision is defined by bit 8-9 of the floating point control word.

q) The first 4 clock cycles can overlap with preceding integer instructions. See chapter 26.7.

r) clock counts are typical. Trivial cases may be faster, extreme cases may be slower.

s) may be up to 3 clocks more when output needed for FST, FCHS, or FABS.

前一篇：27.6 Using integer instructions to do floating point operations (all processors)
下一篇：28.1 Integer instructions

标签: MMX 优化 | 浮点运算 | Floating Point

文章评论

发表你的评论 | 评论中心 | 联系我

第 1 楼 liushac 发表于 2009-07-19 17:40 | liushac 的所有评论

看在我是9年来第一个评论的份上,恳请专家指点一下:

我在学习float，debug以上代码后看不懂fld,fldcw,fistp,fnstcw,我大致理解了float 的IEEE 754标准是：符号位1 指数位1+7阶码尾数位23 共32位,会转换123.45到内存。
另外向你的菜园子致敬...

1: #include <stdio.h>
2:
3: int main()
4: {
004106B0 55 push ebp
004106B1 8B EC mov ebp,esp
004106B3 83 EC 4C sub esp,4Ch
004106B6 53 push ebx
004106B7 56 push esi
004106B8 57 push edi
004106B9 8D 7D B4 lea edi,[ebp-4Ch]
004106BC B9 13 00 00 00 mov ecx,13h
004106C1 B8 CC CC CC CC mov eax,0CCCCCCCCh
004106C6 F3 AB rep stos dword ptr [edi]
5: int x=12345;
004106C8 C7 45 FC 39 30 00 00 mov dword ptr [ebp-4],3039h
6:
7: float a=3458764513820540927.0;
004106CF C7 45 F8 00 00 40 5E mov dword ptr [ebp-8],5E400000h
8:
9: int c=123;
004106D6 C7 45 F4 7B 00 00 00 mov dword ptr [ebp-0Ch],7Bh
10: c=a;
004106DD D9 45 F8 fld dword ptr [ebp-8]
004106E0 E8 9F FF FF FF call __ftol (00410684)
004106E5 89 45 F4 mov dword ptr [ebp-0Ch],eax
11: return 0;
004106E8 33 C0 xor eax,eax
12: }
004106EA 5F pop edi
004106EB 5E pop esi
004106EC 5B pop ebx
004106ED 8B E5 mov esp,ebp

__ftol:
00410684 55 push ebp
00410685 8B EC mov ebp,esp
00410687 83 C4 F4 add esp,0F4h
0041068A 9B wait
0041068B D9 7D FE fnstcw word ptr [ebp-2] ;FNSTCW 将FPU控制字保存到xx，不检查非屏蔽浮点异常
0041068E 9B wait
0041068F 66 8B 45 FE mov ax,word ptr [ebp-2]
00410693 80 CC 0C or ah,0Ch ;修改FPU?
00410696 66 89 45 FC mov word ptr [ebp-4],ax
0041069A D9 6D FC fldcw word ptr [ebp-4]
0041069D DF 7D F4 fistp qword ptr [ebp-0Ch]
004106A0 D9 6D FE fldcw word ptr [ebp-2]
004106A3 8B 45 F4 mov eax,dword ptr [ebp-0Ch]
004106A6 8B 55 F8 mov edx,dword ptr [ebp-8]
004106A9 C9 leave
004106AA C3 ret
004106AB CC int 3
004106AC CC int 3
004106AD CC int 3
004106AE CC int 3
004106AF CC int 3

004106EF 5D pop ebp
004106F0 C3 ret

XiaoHui 回复于 2009-07-20 00:29:
太久没有接触了，没文档我也看不懂了。:D

共有评论 1 条, 显示 1 条。

发表你的评论如果你想针对此文发表评论, 请填写下列表单:
姓名:	* 必填 (Twitter 用户可输入以 @ 开头的用户名, Steemit 用户可输入 @@ 开头的用户名)
E-mail:	可选 (不会被公开。如果我回复了你的评论，你将会收到邮件通知)
反垃圾广告:	为了防止广告机器人自动发贴, 请计算下列表达式的值: 3 x 4 + 1 = * 必填
评论内容:	* 必填你可以使用下列标签修饰文字: [b] 文字 [/b]: 加粗文字 [quote] 文字 [/quote]: 引用文字