30. Testing speed


日期: 2000-04-03 14:00 | 联系我
关注我: Telegram, Twitter

30. Testing speed

The Pentium family of processors have an internal 64 bit clock counter which can be read into EDX:EAX using the instruction RDTSC (read time stamp counter). This is very useful for testing exactly how many clock cycles a piece of code takes.

The program below is useful for measuring the number of clock cycles a piece of code takes. The program executes the code to test 10 times and stores the 10 clock counts. The program can be used in both 16 and 32 bit mode on the PPlain and PMMX:

;************ Test program for PPlain and PMMX: ******************** ITER EQU 10 ; number of iterations OVERHEAD EQU 15 ; 15 for PPlain, 17 for PMMX RDTSC MACRO ; define RDTSC instruction DB 0FH,31H ENDM ;************ Data segment: ******************** .DATA ; data segment ALIGN 4 COUNTER DD 0 ; loop counter TICS DD 0 ; temporary storage of clock RESULTLIST DD ITER DUP (0) ; list of test results ;************ Code segment: ******************** .CODE ; code segment BEGIN: MOV [COUNTER],0 ; reset loop counter TESTLOOP: ; test loop ;************ Do any initializations here: ******************** FINIT ;************ End of initializations ******************** RDTSC ; read clock counter MOV [TICS],EAX ; save count CLD ; non-pairable filler REPT 8 NOP ; eight NOP's to avoid shadowing effect ENDM ;************ Put instructions to test here: ******************** FLDPI ; this is only an example FSQRT RCR EBX,10 FSTP ST ;***************** End of instructions to test ******************** CLC ; non-pairable filler with shadow RDTSC ; read counter again SUB EAX,[TICS] ; compute difference SUB EAX,OVERHEAD ; subtract clocks used by fillers etc. MOV EDX,[COUNTER] ; loop counter MOV [RESULTLIST][EDX],EAX ; store result in table ADD EDX,TYPE RESULTLIST ; increment counter MOV [COUNTER],EDX ; store counter CMP EDX,ITER * (TYPE RESULTLIST) JB TESTLOOP ; repeat ITER times ; insert here code to read out the values in RESULTLIST

The 'filler' instructions before and after the piece of code to test are are included in order to get consistent results on the PPlain. The CLD is a non-pairable instruction which has been inserted to make sure the pairing is the same the first time as the subsequent times. The eight NOP instructions are inserted to prevent any prefixes in the code to test to be decoded in the shadow of the preceding instructions on the PPlain. Single byte instructions are used here to obtain the same pairing the first time as the subsequent times. The CLC after the code to test is a non-pairable instruction which has a shadow under which the 0FH prefix of the RDTSC can be decoded so that it is independent of any shadowing effect from the code to test on the PPlain.

On The PMMX you may want to insert XOR EAX,EAX / CPUID before the instructions to test if you want the FIFO instruction buffer to be empty, or some time-consuming instruction (f.ex. CLI or AAD) if you want the FIFO buffer to be full (CPUID has no shadow under which prefixes of subsequent instructions can decode).

On the PPro, PII and PIII you have to insert XOR EAX,EAX / CPUID before and after each RDTSC to prevent it from executing in parallel with anything else, and remove the filler instructions. (CPUID is a serializing instruction which means that it flushes the pipeline and waits for all pending operations to finish before proceeding. This is useful for testing purposes.)

The RDTSC instruction cannot execute in virtual mode on the PPlain and PMMX, so if you are running DOS programs you must run in real mode. (Press F8 while booting and select "safe mode command prompt only" or "bypass startup files").

The complete test program is available from www.agner.org/assem/.

The Pentium processors have special performance monitor counters which can count events such as cache misses, misalignments, various stalls, etc. Details about how to use the performance monitor counters are not covered by this manual but can be found in "Intel Architecture Software Developer's Manual", vol. 3, Appendix A.

标签: MMX 优化

 文章评论
目前没有任何评论.

↓ 快抢占第1楼,发表你的评论和意见 ↓

当前页面是本站的 Google AMP 版本。
欲查看完整版本和发表评论请点击:完整版 »

 

程序员小辉 建站于 1997
Copyright © XiaoHui.com; 保留所有权利。