26.10 FSCALE and exponential function (all processors)


日期: 2000-04-02 15:00 | 联系我
关注我: Telegram, Twitter

26.10 FSCALE and exponential function (all processors)

FSCALE is slow on all processors. Computing integer powers of 2 can be done much faster by inserting the desired power in the exponent field of the floating point number. To calculate 2N, where N is a signed integer, select from the examples below the one that fits your range of N:

For |N| < 27-1 you can use single precision:

MOV EAX, [N] SHL EAX, 23 ADD EAX, 3F800000H MOV DWORD PTR [TEMP], EAX FLD DWORD PTR [TEMP]

For |N| < 210-1 you can use double precision:

MOV EAX, [N] SHL EAX, 20 ADD EAX, 3FF00000H MOV DWORD PTR [TEMP], 0 MOV DWORD PTR [TEMP+4], EAX FLD QWORD PTR [TEMP]

For |N| < 214-1 use long double precision:

MOV EAX, [N] ADD EAX, 00003FFFH MOV DWORD PTR [TEMP], 0 MOV DWORD PTR [TEMP+4], 80000000H MOV DWORD PTR [TEMP+8], EAX FLD TBYTE PTR [TEMP]

FSCALE is often used in the calculation of exponential functions. The following code shows an exponential function without the slow FRNDINT and FSCALE instructions:

; extern "C" long double _cdecl exp (double x); _exp PROC NEAR PUBLIC _exp FLDL2E FLD QWORD PTR [ESP+4] ; x FMUL ; z = x*log2(e) FIST DWORD PTR [ESP+4] ; round(z) SUB ESP, 12 MOV DWORD PTR [ESP], 0 MOV DWORD PTR [ESP+4], 80000000H FISUB DWORD PTR [ESP+16] ; z - round(z) MOV EAX, [ESP+16] ADD EAX,3FFFH MOV [ESP+8],EAX JLE SHORT UNDERFLOW CMP EAX,8000H JGE SHORT OVERFLOW F2XM1 FLD1 FADD ; 2^(z-round(z)) FLD TBYTE PTR [ESP] ; 2^(round(z)) ADD ESP,12 FMUL ; 2^z = e^x RET UNDERFLOW: FSTP ST FLDZ ; return 0 ADD ESP,12 RET OVERFLOW: PUSH 07F800000H ; +infinity FSTP ST FLD DWORD PTR [ESP] ; return infinity ADD ESP,16 RET _exp ENDP

标签: MMX 优化

 文章评论
目前没有任何评论.

↓ 快抢占第1楼,发表你的评论和意见 ↓

当前页面是本站的 Google AMP 版本。
欲查看完整版本和发表评论请点击:完整版 »

 

程序员小辉 建站于 1997
Copyright © XiaoHui.com; 保留所有权利。