» 首页 > 程序资料 > MMX 汇编优化 > MMX 优化: How to optimize for the Pentium family of microprocessors

26.10 FSCALE and exponential function (all processors)

日期: 2000-04-02 15:00 | 联系我
关注我: Telegram, Twitter

26.10 FSCALE and exponential function (all processors)

FSCALE is slow on all processors. Computing integer powers of 2 can be done much faster by inserting the desired power in the exponent field of the floating point number. To calculate 2^N, where N is a signed integer, select from the examples below the one that fits your range of N:

For |N| < 2⁷-1 you can use single precision:

MOV EAX, [N] SHL EAX, 23 ADD EAX, 3F800000H MOV DWORD PTR [TEMP], EAX FLD DWORD PTR [TEMP]

For |N| < 2¹⁰-1 you can use double precision:

MOV EAX, [N] SHL EAX, 20 ADD EAX, 3FF00000H MOV DWORD PTR [TEMP], 0 MOV DWORD PTR [TEMP+4], EAX FLD QWORD PTR [TEMP]

For |N| < 2¹⁴-1 use long double precision:

MOV EAX, [N] ADD EAX, 00003FFFH MOV DWORD PTR [TEMP], 0 MOV DWORD PTR [TEMP+4], 80000000H MOV DWORD PTR [TEMP+8], EAX FLD TBYTE PTR [TEMP]

FSCALE is often used in the calculation of exponential functions. The following code shows an exponential function without the slow FRNDINT and FSCALE instructions:

; extern "C" long double _cdecl exp (double x); _exp PROC NEAR PUBLIC _exp FLDL2E FLD QWORD PTR [ESP+4] ; x FMUL ; z = x*log2(e) FIST DWORD PTR [ESP+4] ; round(z) SUB ESP, 12 MOV DWORD PTR [ESP], 0 MOV DWORD PTR [ESP+4], 80000000H FISUB DWORD PTR [ESP+16] ; z - round(z) MOV EAX, [ESP+16] ADD EAX,3FFFH MOV [ESP+8],EAX JLE SHORT UNDERFLOW CMP EAX,8000H JGE SHORT OVERFLOW F2XM1 FLD1 FADD ; 2^(z-round(z)) FLD TBYTE PTR [ESP] ; 2^(round(z)) ADD ESP,12 FMUL ; 2^z = e^x RET UNDERFLOW: FSTP ST FLDZ ; return 0 ADD ESP,12 RET OVERFLOW: PUSH 07F800000H ; +infinity FSTP ST FLD DWORD PTR [ESP] ; return infinity ADD ESP,16 RET _exp ENDP

前一篇：26.11 FPTAN (all processors)
下一篇：26.1 XCHG (all processors)

标签: MMX 优化