(Optimization) FOR .. NEXT and DO.. LOOP - whats more efficient?

Theo Gottwald · January 02, 2007, 09:34:45 AM

Let's take a look, what the compiler makes out of our code.
In Visual C, you can see it in the Debugger, in PB you need to use DisASM.

If we construct a FOR-Loop, later we look at the DO ... LOOP.
And we compare a LONG - version to a SINGLE - version.

Code Select


Local y as SINGLE
FOR y = 1 TO LoopCounter '
INCR z '
NEXT y '
'----------------------------------
' becomes:
'----------------------------------
4023C8	DB8550FFFFFF           FILD LONG PTR [EBP+FFFFFF50]
4023CE	D99D44FFFFFF           FSTP SINGLE PTR [EBP+FFFFFF44]
4023D4	D9E8                   FLD1
4023D6	D99D48FFFFFF           FSTP SINGLE PTR [EBP+FFFFFF48]
4023DC	D9E8                   FLD1
4023DE	D99D70FFFFFF           FSTP SINGLE PTR [EBP+FFFFFF70]
4023E4	E918000000             JMP L402401
4023E9	FF855CFFFFFF           INC DWORD PTR [EBP+FFFFFF5C]
4023EF	D98548FFFFFF           FLD SINGLE PTR [EBP+FFFFFF48]
4023F5	D885                   FADDST, ST(5)
4023F7	70FF                   JO  SHORT L4023F8
4023F9	FF                     ??? ' <--- 
4023FA	FFD99D70FF             CALL D9FF : FFFF709D

while DisASM seems not to understand the code perfectly and seems to get "out of sync" here (<--).
What we can see is that the compiler need Floating-Point Instructions to use the "SINGLE" Variable.
And FP-Instructions are most often slower then Integer Instructions.

Lets take a look on the LONG Version:

Code Select

                    
                              MOV EAX, DWORD PTR [EBP+FFFFFF50]
40241C	89853CFFFFFF           MOV DWORD PTR [EBP+FFFFFF3C], EAX
402422	C7C601000000           MOV ESI, DWORD 00000001
402428	E908000000             JMP L402435
40242D	FF855CFFFFFF           INC DWORD PTR [EBP+FFFFFF5C]
402433	FFC6                   INC ESI
402435	8BC6                   MOV EAX, ESI
402437	3B853CFFFFFF           CMP EAX, DWORD PTR [EBP+FFFFFF3C]
40243D	7EEE                   JLE SHORT L40242D

This one is nearly as good as hond optimized Assembler - isn't ?
Ok, you would maybe replace the

Code Select


402435	8BC6                   MOV EAX, ESI
402437	3B853CFFFFFF           CMP EAX, DWORD PTR [EBP+FFFFFF3C]

and directly

Code Select


402437	3B853CFFFFFF           CMP ESI, DWORD PTR [EBP+FFFFFF3C]

But thats not a big diffrence in cycles.

Now let me just say, that in this case the DWORD Version is as good as the LONG-Version.

Now lets take a look at the DO-LOOP construct.

Code Select



LOCAL y AS LONG
DO UNTIL z > loopcounter '
INCR y '
LOOP '            

'becomes 
402536	8B8550FFFFFF           MOV EAX, DWORD PTR [EBP+FFFFFF50]
40253C	3B855CFFFFFF           CMP EAX, DWORD PTR [EBP+FFFFFF5C]
402542	0F8C08000000           JL  L402550
402548	FF855CFFFFFF           INC DWORD PTR [EBP+FFFFFF5C]
40254E	EBE6                   JMP SHORT L402536
402550

True, this one looks really fast. No chance for quick further optimization.

But again, if we change the variable from LONG to SINGLE, things look like this:

Code Select


402536	8B8550FFFFFF           MOV EAX, DWORD PTR [EBP+FFFFFF50]
40253C	3B855CFFFFFF           CMP EAX, DWORD PTR [EBP+FFFFFF5C]
402542	0F8C12000000           JL  L40255A
402548	D9E8                   FLD1
40254A	D98570FFFFFF           FLD SINGLE PTR [EBP+FFFFFF70]
402550	DEC1                   FADDP ST(1), ST
402552	D99D70FFFFFF           FSTP SINGLE PTR [EBP+FFFFFF70]
402558	EBDC                   JMP SHORT L402536

Even if you don't know much about ASM, you can see that this version is slower.

Finally let me note something:

A FOR-LOOP cotains more then a DO-LOOP.
the Initialization is included.

Code Select

FOR z=1 TO 10

NEXT z

In a DO - LOOP you would write this normaly extra:

Code Select


LOCAL y AS LONG

y=1 ' This Line is needed
DO UNTIL z > loopcounter '
INCR y '
LOOP '   
'-------------------------
' becomes
'-------------------------
402536	C7C601000000           MOV ESI, DWORD 00000001
40253C	39B54CFFFFFF           CMP DWORD PTR [EBP+FFFFFF4C], ESI
402542	0F8C04000000           JL  L40254C
402548	FFC7                   INC EDI
40254A	EBF0                   JMP SHORT L40253C    

'--------------------------
' using #REGISTER NONE
'--------------------------
402534	C78550FFFFFF01000000   MOV DWORD PTR [EBP+FFFFFF50], DWORD 00000001
40253E	8B854CFFFFFF           MOV EAX, DWORD PTR [EBP+FFFFFF4C]
402544	3B8550FFFFFF           CMP EAX, DWORD PTR [EBP+FFFFFF50]
40254A	0F8C08000000           JL  L402558
402550	FF856CFFFFFF           INC DWORD PTR [EBP+FFFFFF6C]
402556	EBE6                   JMP SHORT L40253E

which is in both cases perfectly optimized.

Please note that ESI and EDI are the two CPU-Registers which are used for Register Variables.

FAZIT:
If you really need every CPU Cyle, the DO LOOP is faster then the FOR-LOOP.
The reason is, that the FOR LOOP has more options (STEP - ...).
We see here, that using SINGLE is a much bigger diffrence then the diffrence between these two LOOP-contructs.
'
For those cases, where you use a FOR-NEXT Loop,
but don't really need the Features of the FOR-LOOP,
you could use this Macro:

Code Select


' Laufanweisung als MACRO
'
MACRO GFOR(P1,P2,P3)
 P1=P2
 DO UNTIL (P1>P3)
END MACRO
'
MACRO GNEX(P1)
 INCR P1
 LOOP
END MACRO  
'
' Example:
' Instead of 
' FOR i=1 TO 10
' ...
' NEXT i
' you would write in your Program:
'
' GFOR(i,1,10)
' ...
' GNEX(i)
'
which will do just the same, saving you few cycles, which may be intresting in very big loops.

News:

(Optimization) FOR .. NEXT and DO.. LOOP - whats more efficient?

Theo Gottwald