10.1 Pairing integer instructions (PPlain and PMMX): Perfect pairing


日期: 2000-04-01 14:00 | 联系我
关注我: Telegram, Twitter

10. Pairing integer instructions (PPlain and PMMX)

10.1 Perfect pairing

The PPlain and PMMX have two pipelines for executing instructions, called the U-pipe and the V-pipe. Under certain conditions it is possible to execute two instructions simultaneously, one in the U-pipe and one in the V-pipe. This can almost double the speed. It is therefore advantageous to reorder your instructions to make them pair.

The following instructions are pairable in either pipe:

  • MOV register, memory, or immediate into register or memory
  • PUSH register or immediate, POP register
  • LEA, NOP
  • INC, DEC, ADD, SUB, CMP, AND, OR, XOR,
  • and some forms of TEST (see chapter 26.14)
The following instructions are pairable in the U-pipe only:
  • ADC, SBB
  • SHR, SAR, SHL, SAL with immediate count
  • ROR, ROL, RCR, RCL with an immediate count of 1
The following instructions can execute in either pipe but are only pairable when in the V-pipe:
  • near call
  • short and near jump
  • short and near conditional jump.
All other integer instructions can execute in the U-pipe only, and are not pairable.

Two consecutive instructions will pair when the following conditions are met:

1. The first instruction is pairable in the U-pipe and the second instruction is pairable in the V-pipe.

2. The second instruction does not read or write a register which the first instruction writes to.

Examples:

MOV EAX, EBX / MOV ECX, EAX ; read after write, do not pair MOV EAX, 1 / MOV EAX, 2 ; write after write, do not pair MOV EBX, EAX / MOV EAX, 2 ; write after read, pair OK MOV EBX, EAX / MOV ECX, EAX ; read after read, pair OK MOV EBX, EAX / INC EAX ; read and write after read, pair OK

3. In rule 2 partial registers are treated as full registers. Example:

MOV AL, BL / MOV AH, 0

writes to different parts of the same register, do not pair

4. Two instructions which both write to parts of the flags register can pair despite rule 2 and 3. Example:

SHR EAX, 4 / INC EBX ; pair OK

5. An instruction which writes to the flags can pair with a conditional jump despite rule 2. Example:

CMP EAX, 2 / JA LabelBigger ; pair OK

6. The following instruction combinations can pair despite the fact that they both modify the stack pointer:

PUSH + PUSH, PUSH + CALL, POP + POP

7. There are restrictions on the pairing of instructions with prefix. There are several types of prefixes:

  • instructions addressing a non-default segment have a segment prefix.
  • instructions using 16 bit data in 32 bit mode, or 32 bit data in 16 bit mode have an operand size prefix.
  • instructions using 32 bit base or index registers in 16 bit mode have an address size prefix.
  • repeated string instructions have a repeat prefix.
  • locked instructions have a LOCK prefix.
  • many instructions which were not implemented on the 8086 processor have a two byte opcode where the first byte is 0FH. The 0FH byte behaves as a prefix on the PPlain, but not on the other versions. The most common instructions with 0FH prefix are: MOVZX, MOVSX, PUSH FS, POP FS, PUSH GS, POP GS, LFS, LGS, LSS, SETcc, BT, BTC, BTR, BTS, BSF, BSR, SHLD, SHRD, and IMUL with two operands and no immediate operand.

On the PPlain, a prefixed instruction can only execute in the U-pipe, except for conditional near jumps.

On the PMMX, instructions with operand size, address size, or 0FH prefix can execute in either pipe, whereas instructions with segment, repeat, or lock prefix can only execute in the U-pipe.

8. An instruction which has both a displacement and immediate data is not pairable on the PPlain and only pairable in the U-pipe on the PMMX:

MOV DWORD PTR DS:[1000], 0 ; not pairable or only in U-pipe CMP BYTE PTR [EBX+8], 1 ; not pairable or only in U-pipe CMP BYTE PTR [EBX], 1 ; pairable CMP BYTE PTR [EBX+8], AL ; pairable

(Another problem with instructions which have both a displacement and immediate data on the PMMX is that such instructions may be longer than 7 bytes, which means that only one instruction can be decoded per clock cycle, as explained in chapter 12.)

9. Both instructions must be preloaded and decoded. This is explained in chapter 8.

10. There are special pairing rules for MMX instructions on the PMMX:

  • MMX shift, pack or unpack instructions can execute in either pipe but cannot pair with other MMX shift, pack or unpack instructions.
  • MMX multiply instructions can execute in either pipe but cannot pair with other MMX multiply instructions. They take 3 clock cycles and the last 2 clock cycles can overlap with subsequent instructions in the same way as floating point instructions can (see chapter 24).
  • an MMX instruction which accesses memory or integer registers can execute only in the U-pipe and cannot pair with a non-MMX instruction.

标签: MMX 优化

 文章评论
目前没有任何评论.

↓ 快抢占第1楼,发表你的评论和意见 ↓

当前页面是本站的 百度 MIP 版本。
欲查看完整版本和发表评论请点击:完整版 »

 

程序员小辉 建站于 1997
Copyright © XiaoHui.com; 保留所有权利。