29. List of instruction timings and micro-op breakdown for PPro, PII and PIII
Explanations:
Operands:
r = register, m = memory, i = immediate data, sr = segment register, m32 = 32 bit memory operand, etc.
Micro-ops:
The number of micro-ops that the instruction generates for each execution port.
p0: port 0: ALU, etc.
p1: port 1: ALU, jumps
p01: instructions that can go to either port 0 or 1, whichever is vacant first.
p2: port 2: load data, etc.
p3: port 3: address generation for store
p4: port 4: store data
Delay:
This is the delay that the instruction generates in a dependency chain. (This is not the same as the time spent in the execution unit. Values may be inaccurate in situations where they cannot be measured exactly, especially with memory operands). The numbers are minimum values. Cache misses, misalignment, and exceptions may increase the clock counts considerably. Floating point operands are presumed to be normal numbers. Denormal numbers, NANs and infinity increase the delays by 50-150 clocks, except in XMM move, shuffle and boolean instructions. Floating point overflow, underflow, denormal or NAN results give a similar delay.
Throughput:
The maximum throughput for several instructions of the same kind. For example, a throughput of 1/2 for FMUL means that a new FMUL instruction can start executing every 2 clock cycles.
29.1 Integer instructions | |||||||||
Instruction | Operands | micro-ops | delay | throughput | |||||
p0 | p1 | p01 | p2 | p3 | p4 | ||||
NOP | 1 | ||||||||
MOV | r,r/i | 1 | |||||||
MOV | r,m | 1 | |||||||
MOV | m,r/i | 1 | 1 | ||||||
MOV | r,sr | 1 | |||||||
MOV | m,sr | 1 | 1 | 1 | |||||
MOV | sr,r | 8 | 5 | ||||||
MOV | sr,m | 7 | 1 | 8 | |||||
MOVSX MOVZX | r,r | 1 | |||||||
MOVSX MOVZX | r,m | 1 | |||||||
CMOVcc | r,r | 1 | 1 | ||||||
CMOVcc | r,m | 1 | 1 | 1 | |||||
XCHG | r,r | 3 | |||||||
XCHG | r,m | 4 | 1 | 1 | 1 | high b) | |||
XLAT | 1 | 1 | < TD> | ||||||
PUSH | r/i | 1 | 1 | 1 | |||||
POP | r | 1 | 1 | ||||||
POP | (E)SP | 2 | 1 | ||||||
PUSH | m | 1 | 1 | 1 | 1 | ||||
POP | m | 5 | 1 | 1 | 1 | ||||
PUSH | sr | 2 | 1 | 1 | |||||
POP | sr | 8 | 1 | ||||||
PUSHF(D) | 3 | 11 | 1 | 1 | |||||
POPF(D) | 10 | 6 | 1 | ||||||
PUSHA(D) | 2 | 8 | 8 | ||||||
POPA(D) | 2 | 8 | |||||||
LAHF SAHF | 1 | ||||||||
LEA | r,m | 1 | 1 c) | ||||||
LDS LES LFS LGS LSS | m | 8 | 3 | ||||||
ADD SUB AND OR XOR | r,r/i | 1 | |||||||
ADD SUB AND OR XOR | r,m | 1 | 1 | ||||||
ADD SUB AND OR XOR | m,r/i | 1 | 1 | 1 | 1 | ||||
ADC SBB | r,r/i | 2 | |||||||
ADC SBB | r,m | 2 | 1 | ||||||
ADC SBB | m,r/i | 3 | 1 | 1 | 1 | ||||
CMP TEST | r,r/i | 1 | |||||||
CMP TEST | m,r/i | 1 | 1 | ||||||
INC DEC NEG NOT | r | 1 | |||||||
INC DEC NEG NOT | m | 1 | 1 | 1 | 1 | ||||
AAS DAA DAS | 1 | ||||||||
AAD | 1 | 2 | 4 | ||||||
AAM | 1 | 1 | 2 | 15 | |||||
MUL IMUL | r,(r),(i) | 1 | 4 | 1/1 | |||||
MUL IMUL | (r),m | 1 | 1 | 4 | 1/1 | ||||
DIV IDIV | r8 | 2 | 1 | 19 | 1/12 | ||||
DIV IDIV | r16 | 3 | 1 | 23 | 1/21 | ||||
DIV IDIV | r32 | 3 | 1 | 39 | 1/37 | ||||
DIV IDIV | m8 | 2 | 1 | 1 | 19 | 1/12 | |||
DIV IDIV | m16 | 2 | 1 | 1 | 23 | 1/21 | |||
DIV IDIV | m32 | 2 | 1 | 1 | 39 | 1/37 | |||
CBW CWDE | 1 | TD> | |||||||
CWD CDQ | 1 | ||||||||
SHR SHL SAR ROR ROL | r,i/CL | 1 | |||||||
SHR SHL SAR ROR ROL | m,i/CL | 1 | 1 | 1 | 1 | ||||
RCR RCL | r,1 | 1 | 1 | ||||||
RCR RCL | r8,i/CL | 4 | 4 | ||||||
RCR RCL | r16/32,i/CL | 3 | 3 | ||||||
RCR RCL | m,1 | 1 | 2 | 1 | 1 | 1 | |||
RCR RCL | m8,i/CL | 4 | 3 | 1 | 1 | 1 | |||
RCR RCL | m16/32,i/CL | 4 | 2 | 1 | 1 | 1 | |||
SHLD SHRD | r,r,i/CL | 2 | |||||||
SHLD SHRD | m,r,i/CL | 2 | 1 | 1 | 1 | 1 | |||
BT | r,r/i | 1 | |||||||
BT | m,r/i | 1 | 6 | 1 | |||||
BTR BTS BTC | r,r/i | 1 | |||||||
BTR BTS BTC | m,r/i | 1 | 6 | 1 | 1 | 1 | |||
BSF BSR | r,r | 1 | 1 | ||||||
BSF BSR | r,m | 1 | 1 | 1 | |||||
SETcc | r | 1 | |||||||
SETcc | m | 1 | 1 | 1 | |||||
JMP | short/near | 1 | 1/2 | ||||||
JMP | far | 21 | 1 | ||||||
JMP | r | 1 | 1/2 | ||||||
JMP | m(near) | 1 | 1 | 1/2 | |||||
JMP | m(far) | 21 | 2 | ||||||
conditional jump | short/near | 1 | 1/2 | ||||||
CALL | near | 1 | 1 | 1 | 1 | 1/2 | |||
CALL | far | 28 | 1 | 2 | 2 | ||||
CALL | r | 1 | 2 | 1 | 1 | 1/2 | |||
CALL | m(near) | 1 | 4 | 1 | 1 | 1 | 1/2 | ||
CALL | m (far) | 28 | 2 | 2 | 2 | ||||
RETN | 1 | 2 | 1 | 1/2 | |||||
RETN | i | 1 | 3 | 1 | 1/2 | ||||
RETF | 23 | 3 | |||||||
RETF | i | 23 | 3 | ||||||
J(E)CXZ | short | 1 | 1 | ||||||
LOOP | short | 2 | 1 | 8 | |||||
LOOP(N)E | short | 2 | 1 | 8 | |||||
ENTER | i,0 | 12 | 1 | 1 | |||||
ENTER | a,b | ca. 18+4b | b-1 | 2b | |||||
LEAVE | 2 | 1 | |||||||
BOUND | r,m | 7 | 6 | 2 | |||||
CLC STC CMC | 1 | ||||||||
CLD STD | 4 | ||||||||
CLI | 9 | ||||||||
STI | 17 | ||||||||
INTO | 5 | ||||||||
LODS | 2 | ||||||||
REP LODS | 10+6n | ||||||||
STOS | 1 | 1 | 1 | ||||||
REP STOS | ca. 5n a) | ||||||||
MOVS | 1 | 3 | 1 | 1 | |||||
REP MOVS | ca. 6n a) | ||||||||
SCAS | 1 | 2 | |||||||
REP(N)E SCAS | 12+7n | ||||||||
CMPS | 4 | 2 | |||||||
REP(N)E CMPS | 12+9n | ||||||||
BSWAP | 1 | 1 | |||||||
CPUID | 23-48 | ||||||||
RDTSC | 31 | ||||||||
IN | 18 | >300 | |||||||
OUT | 18 | >300 | |||||||
PREFETCHNTA d) | m | 1 | |||||||
PREFETCHT0 d) | m | 1 | |||||||
PREFETCHT1 d) | m | 1 | |||||||
PREFETCHT2 d) | m | 1 | |||||||
SFENCE d) | 1 | 1 | 1/6 |
Notes:
a) faster under certain conditions: see chapter 26.3.
b) see chapter 26.1
c) 3 if constant without base or index register
d) PIII only.