11. Splitting complex instructions into simpler ones (PPlain and PMMX)
You may split up read/modify and read/modify/write instructions to improve pairing. Example:
ADD [mem1],EAX / ADD [mem2],EBX ; 5 clock cycles
This code may be split up into a sequence which takes only 3 clock cycles:
MOV ECX,[mem1] / MOV EDX,[mem2] / ADD ECX,EAX / ADD EDX,EBX MOV [mem1],ECX / MOV [mem2],EDX
Likewise you may split up non-pairable instructions into pairable instructions:
PUSH [mem1] PUSH [mem2] ; non-pairable
Split up into:
MOV EAX,[mem1] MOV EBX,[mem2] PUSH EAX PUSH EBX ; everything pairs
Other examples of non-pairable instructions which may be split up into simpler pairable instructions:
CDQ split into: MOV EDX,EAX / SAR EDX,31
NOT EAX change to XOR EAX,-1
NEG EAX split into XOR EAX,-1 / INC EAX
MOVZX EAX,BYTE PTR [mem] split into XOR EAX,EAX / MOV AL,BYTE PTR [mem]
JECXZ split into TEST ECX,ECX / JZ
LOOP split into DEC ECX / JNZ
XLAT change to MOV AL,[EBX+EAX]
If splitting instructions doesn't improve speed, then you may keep the complex or nonpairable instructions in order to reduce code size.
Splitting instructions is not needed on the PPro, PII and PIII, except when the split instructions generate fewer uops.