26.15 Bit scan (PPlain and PMMX)
BSF and BSR are the poorest optimized instructions on the PPlain and PMMX, taking approximately 11 + 2*n clock cycles, where n is the number of zeros skipped.
The following code emulates BSR ECX,EAX:
TEST EAX,EAX JZ SHORT BS1 MOV DWORD PTR [TEMP],EAX MOV DWORD PTR [TEMP+4],0 FILD QWORD PTR [TEMP] FSTP QWORD PTR [TEMP] WAIT ; WAIT only needed for compatibility with old 80287 processor MOV ECX, DWORD PTR [TEMP+4] SHR ECX,20 ; isolate exponent SUB ECX,3FFH ; adjust TEST EAX,EAX ; clear zero flag BS1:
The following code emulates BSF ECX,EAX:
TEST EAX,EAX JZ SHORT BS2 XOR ECX,ECX MOV DWORD PTR [TEMP+4],ECX SUB ECX,EAX AND EAX,ECX MOV DWORD PTR [TEMP],EAX FILD QWORD PTR [TEMP] FSTP QWORD PTR [TEMP] WAIT ; WAIT only needed for compatibility with old 80287 processor MOV ECX, DWORD PTR [TEMP+4] SHR ECX,20 SUB ECX,3FFH TEST EAX,EAX ; clear zero flag BS2:
These emulation codes should not be used on the PPro, PII and PIII, where the bit scan instructions take only 1 or 2 clocks, and where the emulation codes shown above have two partial memory stalls.