26.15 Bit scan (PPlain and PMMX)


日期: 2000-04-02 15:00 | 联系我
关注我: Telegram, Twitter

26.15 Bit scan (PPlain and PMMX)

BSF and BSR are the poorest optimized instructions on the PPlain and PMMX, taking approximately 11 + 2*n clock cycles, where n is the number of zeros skipped.

The following code emulates BSR ECX,EAX:

TEST EAX,EAX JZ SHORT BS1 MOV DWORD PTR [TEMP],EAX MOV DWORD PTR [TEMP+4],0 FILD QWORD PTR [TEMP] FSTP QWORD PTR [TEMP] WAIT ; WAIT only needed for compatibility with old 80287 processor MOV ECX, DWORD PTR [TEMP+4] SHR ECX,20 ; isolate exponent SUB ECX,3FFH ; adjust TEST EAX,EAX ; clear zero flag BS1:

The following code emulates BSF ECX,EAX:

TEST EAX,EAX JZ SHORT BS2 XOR ECX,ECX MOV DWORD PTR [TEMP+4],ECX SUB ECX,EAX AND EAX,ECX MOV DWORD PTR [TEMP],EAX FILD QWORD PTR [TEMP] FSTP QWORD PTR [TEMP] WAIT ; WAIT only needed for compatibility with old 80287 processor MOV ECX, DWORD PTR [TEMP+4] SHR ECX,20 SUB ECX,3FFH TEST EAX,EAX ; clear zero flag BS2:

These emulation codes should not be used on the PPro, PII and PIII, where the bit scan instructions take only 1 or 2 clocks, and where the emulation codes shown above have two partial memory stalls.

标签: MMX 优化 | Bit

 文章评论
目前没有任何评论.

↓ 快抢占第1楼,发表你的评论和意见 ↓

当前页面是本站的 Google AMP 版本。
欲查看完整版本和发表评论请点击:完整版 »

 

程序员小辉 建站于 1997
Copyright © XiaoHui.com; 保留所有权利。