A partial memory stall is somewhat analogous to a partial register stall. It occurs when you mix data sizes for the same memory address:
MOV BYTE PTR [ESI], AL MOV EBX, DWORD PTR [ESI] ; partial memory stall
Here you get a stall because the processor has to combine the byte written from AL with the next three bytes, which were in memory before, to get the four bytes needed for reading into EBX. The penalty is approximately 7-8 clocks.
Unlike the partial register stalls, you also get a partial memory stall when you write a bigger operand to memory and then read part of it, if the smaller part doesn't start at the same address:
MOV DWORD PTR [ESI], EAX MOV BL, BYTE PTR [ESI] ; no stall MOV BH, BYTE PTR [ESI+1] ; stall
You can avoid this stall by changing the last line to MOV BH,AH, but such a solution is not possible in a situation like this:
FISTP QWORD PTR [EDI] MOV EAX, DWORD PTR [EDI] MOV EDX, DWORD PTR [EDI+4] ; stall
Interestingly, you can also get a partial memory stall when writing and reading completely different addresses if they happen to have the same set-value in different cache banks:
MOV BYTE PTR [ESI], AL MOV EBX, DWORD PTR [ESI+4092] ; no stall MOV ECX, DWORD PTR [ESI+4096] ; stall