28.2 Floating point instructions


日期: 2000-04-03 14:00 | 联系我
关注我: Telegram, Twitter

28.2 Floating point instructions

Explanations:

Operands:

r = register, m = memory, m32 = 32 bit memory operand, etc.

Clock cycles:

The numbers are minimum values. Cache misses, misalignment, denormal operands, and exceptions may increase the clock counts considerably.

Pairability:

+ = pairable with FXCH, np = not pairable with FXCH.

i-ov:

Overlap with integer instructions. i-ov = 4 means that the last four clock cycles can overlap with subsequent integer instructions.

fp-ov:

Overlap with floating point instructions. fp-ov = 2 means that the last two clock cycles can overlap with subsequent floating point instructions. (WAIT is considered a floating point instruction here)

Instruction Operand Clock cycles Pairability i-ov fp-ov
FLDr/m32/m641+00
FLDm803np00
FBLDm8048-58np00
FST(P)r1np00
FST(P)m32/m642 m)np00
FST(P)m803 m)np00
FBSTPm80148-154np00
FILDm3np22
FIST(P)m6np00
FLDZ FLD12np00
FLDPI FLDL2E etc.5 s)np22
FNSTSWAX/m166 q)np00
FLDCWm168np00
FNSTCWm162np00
FADD(P)r/m3+22
FSUB(R)(P)r/m3+22
FMUL(P)r/m3+22 n)
FDIV(R)(P)r/m19/33/39 p)+38 o)2
FCHS FABS1+00
FCOM(P)(P) FUCOMr/m1+00
FIADD FISUB(R)m6np22
FIMULm6np22
FIDIV(R)m22/36/42 p)np38 o)2
FICOMm4np00
FTST1np00
FXAM17-21np40
FPREM16-64np22
FPREM120-70np22
FRNDINT9-20np00
FSCALE20-32np50
FXTRACT12-66np00
FSQRT70np69 o)2
FSIN FCOS65-100 r)np22
FSINCOS89-112 r)np22
F2XM153-59 r)np22
FYL2X103 r)np22
FYL2XP1105 r)np22
FPTAN120-147 r)np36 o)0
FPATAN112-134 r)np22
FNOP1np00
FXCHr1np00
FINCSTP FDECSTP2np00
FFREEr2np00
FNCLEX6-9np00
FNINIT12-22np00
FNSAVEm124-300np00
FRSTORm70-95np00
WAIT1np00

Notes:

m) The value to store is needed one clock cycle in advance.

n) 1 if the overlapping instruction is also an FMUL.

o) Cannot overlap integer multiplication instructions.

p) FDIV takes 19, 33, or 39 clock cycles for 24, 53, and 64 bit precision respectively. FIDIV takes 3 clocks more. The precision is defined by bit 8-9 of the floating point control word.

q) The first 4 clock cycles can overlap with preceding integer instructions. See chapter 26.7.

r) clock counts are typical. Trivial cases may be faster, extreme cases may be slower.

s) may be up to 3 clocks more when output needed for FST, FCHS, or FABS.


 文章评论

第 1 楼  发表于 2009-07-19 17:40 | liushac 的所有评论
看在我是9年来第一个评论的份上,恳请专家指点一下:

我在学习float,debug以上代码后看不懂fld,fldcw,fistp,fnstcw,我大致理解了float 的IEEE 754标准是:符号位1 指数位1+7阶码 尾数位23 共32位,会转换123.45到内存。
另外向你的菜园子致敬...

1: #include <stdio.h>
2:
3: int main()
4: {
004106B0 55 push ebp
004106B1 8B EC mov ebp,esp
004106B3 83 EC 4C sub esp,4Ch
004106B6 53 push ebx
004106B7 56 push esi
004106B8 57 push edi
004106B9 8D 7D B4 lea edi,[ebp-4Ch]
004106BC B9 13 00 00 00 mov ecx,13h
004106C1 B8 CC CC CC CC mov eax,0CCCCCCCCh
004106C6 F3 AB rep stos dword ptr [edi]
5: int x=12345;
004106C8 C7 45 FC 39 30 00 00 mov dword ptr [ebp-4],3039h
6:
7: float a=3458764513820540927.0;
004106CF C7 45 F8 00 00 40 5E mov dword ptr [ebp-8],5E400000h
8:
9: int c=123;
004106D6 C7 45 F4 7B 00 00 00 mov dword ptr [ebp-0Ch],7Bh
10: c=a;
004106DD D9 45 F8 fld dword ptr [ebp-8]
004106E0 E8 9F FF FF FF call __ftol (00410684)
004106E5 89 45 F4 mov dword ptr [ebp-0Ch],eax
11: return 0;
004106E8 33 C0 xor eax,eax
12: }
004106EA 5F pop edi
004106EB 5E pop esi
004106EC 5B pop ebx
004106ED 8B E5 mov esp,ebp


__ftol:
00410684 55 push ebp
00410685 8B EC mov ebp,esp
00410687 83 C4 F4 add esp,0F4h
0041068A 9B wait
0041068B D9 7D FE fnstcw word ptr [ebp-2] ;FNSTCW 将FPU控制字保存到xx,不检查非屏蔽浮点异常
0041068E 9B wait
0041068F 66 8B 45 FE mov ax,word ptr [ebp-2]
00410693 80 CC 0C or ah,0Ch ;修改FPU?
00410696 66 89 45 FC mov word ptr [ebp-4],ax
0041069A D9 6D FC fldcw word ptr [ebp-4]
0041069D DF 7D F4 fistp qword ptr [ebp-0Ch]
004106A0 D9 6D FE fldcw word ptr [ebp-2]
004106A3 8B 45 F4 mov eax,dword ptr [ebp-0Ch]
004106A6 8B 55 F8 mov edx,dword ptr [ebp-8]
004106A9 C9 leave
004106AA C3 ret
004106AB CC int 3
004106AC CC int 3
004106AD CC int 3
004106AE CC int 3
004106AF CC int 3

004106EF 5D pop ebp
004106F0 C3 ret
回复于 2009-07-20 00:29:
太久没有接触了,没文档我也看不懂了。:D

共有评论 1 条, 显示 1 条。

当前页面是本站的 Google AMP 版本。
欲查看完整版本和发表评论请点击:完整版 »

 

程序员小辉 建站于 1997
Copyright © XiaoHui.com; 保留所有权利。