26.6 WAIT instruction (all processors)
You can often increase speed by omitting the WAIT instruction. The WAIT instruction has three functions:
a. The old 8087 processor requires a WAIT before every floating point instruction to make sure the coprocessor is ready to receive it.
b.WAIT is used for coordinating memory access between the floating point unit and the integer unit. Examples:
b.1. FISTP [mem32] WAIT ; wait for FPU to write before.. MOV EAX,[mem32] ; reading the result with the integer unit b.2. FILD [mem32] WAIT ; wait for FPU to read value.. MOV [mem32],EAX ; before overwriting it with integer unit b.3. FLD QWORD PTR [ESP] WAIT ; prevent an accidental interrupt from.. ADD ESP,8 ; overwriting value on stack
c.WAIT is sometimes used to check for exceptions. It will generate an interrupt if an unmasked exception bit in the floating point status word has been set by a preceding floating point instruction.
Regarding a:
The function in point a is never needed on any other processors than the old 8087. Unless you want your code to be compatible with the 8087 you should tell your assembler not to put in these WAIT's by specifying a higher processor. A 8087 floating point emulator also inserts WAIT instructions. You should therefore tell your assembler not to generate emulation code unless you need it.
Regarding b:
WAIT instructions to coordinate memory access are definitely needed on the 8087 and 80287 but not on the Pentiums. It is not quite clear whether it is needed on the 80387 and 80486. I have made several tests on these Intel processors and not been able to provoke any error by omitting the WAIT on any 32 bit Intel processor, although Intel manuals say that the WAIT is needed for this purpose except after FNSTSW and FNSTCW. Omitting WAIT instructions for coordinating memory access is not 100 % safe, even when writing 32 bit code, because the code may be able to run on the very rare combination of a 80386 main processor with a 287 coprocessor, which requires the WAIT. Also, I have no information on non-Intel processors, and I have not tested all possible hardware and software combinations, so there may be other situations where the WAIT is needed.
If you want to be certain that your code will work on any 32 bit processor (including non-Intel processors) then I would recommend that you include the WAIT here in order to be safe.
Regarding c:
The assembler automatically inserts a WAIT for this purpose before the following instructions: FCLEX, FINIT, FSAVE, FSTCW, FSTENV, FSTSW. You can omit the WAIT by writing FNCLEX, etc. My tests show that the WAIT is unneccessary in most cases because these instructions without WAIT will still generate an interrupt on exceptions except for FNCLEX and FNINIT on the 80387. (There is some inconsistency about whether the IRET from the interrupt points to the FN.. instruction or to the next instruction).
Almost all other floating point instructions will also generate an interrupt if a previous floating point instruction has set an unmasked exception bit, so the exception is likely to be detected sooner or later anyway. You may insert a WAIT after the last floating point instruction in your program to be sure to catch all exceptions.
You may still need the WAIT if you want to know exactly where an exception occurred in order to be able to recover from the situation. Consider, for example, the code under b.3 above: If you want to be able to recover from an exception generated by the FLD here, then you need the WAIT because an interrupt after ADD ESP,8 would overwrite the value to load. FNOP may be faster than WAIT and serve the same purpose.