» 首页 > 程序资料 > MMX 汇编优化 > MMX 优化: How to optimize for the Pentium family of microprocessors

9. Address generation interlock (PPlain and PMMX)

日期: 2000-04-01 14:00 | 联系我
关注我: Telegram, Twitter

9. Address generation interlock (PPlain and PMMX)

It takes one clock cycle to calculate the address needed by an instruction which accesses memor y. Normally, this calculation is done at a separate stage in the pipeline while the preceding instruction or instruction pair is executing. But if the address depends on the result of an instruction executing in the preceding clock cycle, then you have to wait an extra clock cycle for the address to be calculated. This is called an AGI stall. Example:

ADD EBX,4 / MOV EAX,[EBX] ; AGI stall

The stall in this example can be removed by putting some other instructions in between ADD EBX,4 and MOV EAX,[EBX] or by rewriting the code to: MOV EAX,[EBX+4] / ADD EBX,4

You can also get an AGI stall with instructions which use ESP implicitly for addressing, such as PUSH, POP, CALL, and RET, if ESP has been changed in the preceding clock cycle by instructions such as MOV, ADD, or SUB. The PPlain and PMMX have special circuitry to predict the value of ESP after a stack operation so that you do not get an AGI delay after changing ESP with PUSH, POP, or CALL. You can get an AGI stall after RET only if it has an immediate operand to add to ESP.

Examples:

ADD ESP,4 / POP ESI ; AGI stall POP EAX / POP ESI ; no stall, pair MOV ESP,EBP / RET ; AGI stall CALL L1 / L1: MOV EAX,[ESP+8] ; no stall RET / POP EAX ; no stall RET 8 / POP EAX ; AGI stall

The LEA instruction is also subject to an AGI stall if it uses a base or index register which has been changed in the preceding clock cycle. Example:

INC ESI / LEA EAX,[EBX+4*ESI] ; AGI stall

PPro, PII and PIII have no AGI stalls for memory reads and LEA, but they do have AGI stalls for memory writes. This is not very significant unless the subsequent code has to wait for the write to finish.

前一篇：8. First time versus repeated execution
下一篇：26.16 FLDCW (PPro, PII and PIII)

标签: MMX 优化