28. List of instruction timings for PPlain and PMMX
Explanations:
Operands:
r = register, m = memory, i = immediate data, sr = segment register
m32 = 32 bit memory operand, etc.
Clock cycles:
The numbers are minimum values. Cache misses, misalignment, and exceptions may increase the clock counts considerably.
Pairability:
u = pairable in u-pipe, v = pairable in v-pipe, uv = pairable in either pipe, np = not pairable.
Instruction | Operands | Clock cycles | Pairability |
NOP | 1 | uv | |
MOV | r/m, r/m/i | 1 | uv |
MOV | r/m, sr | 1 | np |
MOV | sr , r/m | >= 2 b) | np |
MOV | m , accum | 1 | uv h) |
XCHG | (E)AX, r | 2 | np |
XCHG | r , r | 3 | np |
XCHG | r , m | >15 | np |
XLAT | 4 | np | |
PUSH | r/i | 1 | uv |
POP | r | 1 | uv |
PUSH | m | 2 | np |
POP | m | 3 | np |
PUSH | sr | 1 b) | np |
POP | sr | >= 3 b) | np |
PUSHF | 3-5 | np | |
POPF | 4-6 | np | |
PUSHA POPA | 5-9 i) | np | |
PUSHAD POPAD | 5 | np | |
LAHF SAHF | 2 | np | |
MOVSX MOVZX | r , r/m | 3 a) | np |
LEA | r , m | 1 | uv |
LDS LES LFS LGS LSS | m | 4 c) | np |
ADD SUB AND OR XOR | r , r/i | 1 | uv |
ADD SUB AND OR XOR | r , m | 2 | uv |
ADD SUB AND OR XOR | m , r/i | 3 | uv |
ADC SBB | r , r/i | 1 | u |
ADC SBB | r , m | 2 | u |
ADC SBB | m , r/i | 3 | u |
CMP | r , r/i | 1 | uv |
CMP | m , r/i | 2 | uv |
TEST | r , r | 1 | uv |
TEST | m , r | 2 | uv |
TEST | r , i | 1 | f) |
TEST | m , i | 2 | np |
INC DEC | r | 1 | uv |
INC DEC | m | 3 | uv |
NEG NOT | r/m | 1/3 | np |
MUL IMUL | r8/r16/m8/m16 | 11 | np |
MUL IMUL | all other versions | 9 d) | np |
DIV | r8/m8 | 17 | np |
DIV | r16/m16 | 25 | np |
DIV | r32/m32 | 41 | np |
IDIV | r8/m8 | 22 | np |
IDIV | r16/m16 | 30 | np |
IDIV | r32/m32 | 46 | np |
CBW CWDE | 3 | np | |
CWD CDQ | 2 | np | |
SHR SHL SAR SAL | r , i | 1 | u |
SHR SHL SAR SAL | m , i | 3 | u |
SHR SHL SAR SAL | r/m, CL | 4/5 | np |
ROR ROL RCR RCL | r/m, 1 | 1/3 | u |
ROR ROL | r/m, i(><1) | 1/3 | np |
ROR ROL | r/m, CL | 4/5 | np |
RCR RCL | r/m, i(><1) | 8/10 | np |
RCR RCL | r/m, CL | 7/9 | np |
SHLD SHRD | r, i/CL | 4 a) | np |
SHLD SHRD | m, i/CL | 5 a) | np |
BT | r, r/i | 4 a) | np |
BT | m, i | 4 a) | np |
BT | m, i | 9 a) | np |
BTR BTS BTC | r, r/i | 7 a) | np |
BTR BTS BTC | m, i | 8 a) | np |
BTR BTS BTC | m, r | 14 a)< /TD> | np |
BSF BSR | r , r/m | 7-73 a) | np |
SETcc | r/m | 1/2 a) | np |
JMP CALL | short/near | 1 e) | v |
JMP CALL | far | >= 3 e) | np |
conditional jump | short/near | 1/4/5/6 e) | v |
CALL JMP | r/m | 2/5 e | np |
RETN | 2/5 e | np | |
RETN | i | 3/6 e) | np |
RETF | 4/7 e) | np | |
RETF | i | 5/8 e) | np |
J(E)CXZ | short | 4-11 e) | np |
LOOP | short | 5-10 e) | np |
BOUND | r , m | 8 | np |
CLC STC CMC CLD STD | 2 | np | |
CLI STI | 6-9 | np | |
LODS | 2 | np | |
REP LODS | 7+3*n g) | np | |
STOS | 3 | np | |
REP STOS | 10+n g) | np | |
MOVS | 4 | np | |
REP MOVS | 12+n g) | np | |
SCAS | 4 | np | |
REP(N)E SCAS | 9+4*n g) | np | |
CMPS | 5 | np | |
REP(N)E CMPS | 8+4*n g) | np | |
BSWAP | 1 a) | np | |
CPUID | 13-16 a) | np | |
RDTSC | 6-13 a) j) | np |
Notes:
a) this instruction has a 0FH prefix which takes one clock cycle extra to decode on a PPlain unless preceded by a multicycle instruction (see chapter 12).
b) versions with FS and GS have a 0FH prefix. see note a.
c) versions with SS, FS, and GS have a 0FH prefix. see note a.
d) versions with two operands and no immediate have a 0FH prefix, see note a.
e) see chapter 22
f) only pairable if register is accumulator. see chapter 26.14.
g) add one clock cycle for decoding the repeat prefix unless preceded by a multicycle instruction (such as CLD. see chapter 12).
h) pairs as if it were writing to the accumulator. see chapter 26.14.
i) 9 if SP divisible by 4. See 10.2
j) on PPlain: 6 in priviledged or real mode, 11 in nonpriviledged, error in virtual mode. On PMMX: 8 and 13 clocks respectively.