9. Address generation interlock (PPlain and PMMX)


日期: 2000-04-01 14:00 | 联系我
关注我: Telegram, Twitter

9. Address generation interlock (PPlain and PMMX)

It takes one clock cycle to calculate the address needed by an instruction which accesses memor y. Normally, this calculation is done at a separate stage in the pipeline while the preceding instruction or instruction pair is executing. But if the address depends on the result of an instruction executing in the preceding clock cycle, then you have to wait an extra clock cycle for the address to be calculated. This is called an AGI stall. Example:

ADD EBX,4 / MOV EAX,[EBX] ; AGI stall

The stall in this example can be removed by putting some other instructions in between ADD EBX,4 and MOV EAX,[EBX] or by rewriting the code to: MOV EAX,[EBX+4] / ADD EBX,4

You can also get an AGI stall with instructions which use ESP implicitly for addressing, such as PUSH, POP, CALL, and RET, if ESP has been changed in the preceding clock cycle by instructions such as MOV, ADD, or SUB. The PPlain and PMMX have special circuitry to predict the value of ESP after a stack operation so that you do not get an AGI delay after changing ESP with PUSH, POP, or CALL. You can get an AGI stall after RET only if it has an immediate operand to add to ESP.

Examples:

ADD ESP,4 / POP ESI ; AGI stall POP EAX / POP ESI ; no stall, pair MOV ESP,EBP / RET ; AGI stall CALL L1 / L1: MOV EAX,[ESP+8] ; no stall RET / POP EAX ; no stall RET 8 / POP EAX ; AGI stall

The LEA instruction is also subject to an AGI stall if it uses a base or index register which has been changed in the preceding clock cycle. Example:

INC ESI / LEA EAX,[EBX+4*ESI] ; AGI stall

PPro, PII and PIII have no AGI stalls for memory reads and LEA, but they do have AGI stalls for memory writes. This is not very significant unless the subsequent code has to wait for the write to finish.

标签: MMX 优化

 文章评论
目前没有任何评论.

↓ 快抢占第1楼,发表你的评论和意见 ↓

当前页面是本站的 Google AMP 版本。
欲查看完整版本和发表评论请点击:完整版 »

 

程序员小辉 建站于 1997
Copyright © XiaoHui.com; 保留所有权利。