(Optimization) - Use Iteration instead of Recursion

Theo Gottwald · July 28, 2007, 09:36:37 AM

One thing that can speed up your project by between 2 and 10 times is, to prefer Iterations instead of recursion.
For example when reading directories from harddisk.

Normally Recursive Code is used, which is much slower then iterative code - if the memory Allocation for the result-stringfield is done properly in big steps.

The following code is an example for such a thing.

Code Select



#IF NOT %DEF(%A_AU_INC)
%A_AU_INC=1

'#IF NOT %DEF(%F_CL_INC) ' GetFileDateTime()
' #INCLUDE "F_CL.INC"
'#ENDIF

#IF NOT %DEF(%G_AA_INC) ' G_AT() - Enshure "\"
 #INCLUDE "G_AA.inc"
#ENDIF

#IF NOT %DEF(%F_CW_INC) ' Is_File/Dir()
 #INCLUDE "F_CW.inc"
#ENDIF

#IF NOT %DEF(%X_AQ_INC) ' DoEventsAPI()
 #INCLUDE "X_AQ.inc"
#ENDIF

#IF NOT %DEF(%F_CV_INC) ' Is_File()
 #INCLUDE "F_CV.inc"
#ENDIF

#IF NOT %DEF(%A_AU_FData_INC)
%A_AU_FData_INC=1
TYPE A_AU_FData
     files AS QUAD ' Found Files muss auf 0 gesetzt werden
     dirs AS QUAD  ' found dirs, muss auf 0 gesetzt werden
     mode AS DWORD ' 1 - pause, 2 - cancel operation 0 - normal
     bytes AS QUAD ' enthält die Bytes die zurückgegeben werden
     total AS QUAD ' Gesamtanzahl gefundener Items
END TYPE
#ENDIF

$A_AU_SZ="000000000000"
%A_AU_Recurse=1
%A_AU_Hidden=2
%A_AU_Files=4
%A_AU_Dirs=8
%A_AU_SizeDepth=16
%A_AU_CountAll=32
%A_AU_DoEvents=64
%A_AU_System=128
%A_AU_Sort_Down=256

'------------------------------------------------------------------------------
'Non-Rekursive File/Dir Pozedur, a1$=Startpath$, a2$=Dateifilter$,a3() Files-Rückgabefeld,a4() Dirs-Rückgabefeld,a5 als Rückgabe Info-Daten (s.o.)
' a6 (Bits) wie folgt:
' 0 - nur zählen,  1 - recurse, 2 - Hidden, 4 - files, 8 - dirs, 16 - depth/size und alle Kombinationen,
' 32 - (von-bis zählung) gesamtitems zählen (wenn null werden gesuchte Items gezählt),
' 64 - DOEVWNTS (0,1 ...), 128 - SYSTEM-Flag, 256 - Sortierrichtung (1=Sortierrichtung absteigend, 0=Sortierrichtung aufsteigend) )
'
' a7 (von) - (a8 bis) (Abhängig von a6/ Bitwert 32)
'
SUB A_AU(a1$,a2$,BYREF a3() AS STRING,a4() AS STRING,BYREF a5 AS A_AU_FData,BYVAL a6 AS LONG,OPT BYVAL a7 AS QUAD, OPT BYVAL a8 AS QUAD)
 LOCAL t1 AS WIN32_FIND_DATA,s1 AS ASCIIZ * %MAX_PATH,s2,s3,s4(),s5,s6 AS STRING
 LOCAL d1,d2,d3,d6,d7,d8,d9,da,db,dd,de,df,dg,dh,di,dj,dk,dm,dn,dp,dq AS LONG,d4,d5 AS DWORD,q1,q2,q3,q4 AS QUAD
 d4=0:d5=1e5:DIM s4(d5):de=1e5:REDIM a3(de): df=1e4:REDIM a4(df)
 RESET dg,dh
 ' d1 - Handle des ganzen Dirs für Findfirstfile
 ' d2 - True wenn etwas gefunden wurde
 ' d3 - 1 wenn das gefundene ein dir ist
 ' d4 - Zähler für internes dir-feld, letzer Eintrag
 ' d5 - Zähler für internes dir-feld, Aktueller Dimensionierung
 ' d6 - (1/0) ob aktuelles Element "hidden" ist
 ' de - DIM-Zähler für Ausgabefeld (files)
 ' df - DIM-Zähler für Ausgabefeld (Dirs)
 ' dg - Aktueller Ausgabezähler files
 ' dh - Aktueller Ausgabezähler Dirs
 ' di - Wenn 1 dann Kooperativ-Modus mit DOEvents
 ' dj - Zähler für (VON BIS)
 ' dk - (1/0) ob VON-BIS aktiviert ist.
 ' dm - (0/1) ob SYSTEM-Elemente übersprungen werden
 ' dn - (1/0) ob aktuelles Element SYSTEM ist
 ' dp - (1/0) Sortierrichtung create/delete
 ' q1 - interner dir-Zähler
 ' q2 - interner file-Zähler
 ' q3 - interner item-Zähler
 ' q4 - Filesize (QUAD)

 ' Optionale Parameter bearbeiten
  IF a7=0 THEN a7=0
  IF (a8<1) THEN a8=0
  IF  (a7=0) AND (a8=0) THEN dk=0 ELSE dk=1 ' dk - ob VONBIS aktiviert ist

' Bit-Paramter auswerten
  IF (a6 AND 1)=1 THEN d7=%True ELSE d7=%False  ' d7 -> recurse
  IF (a6 AND 2)=2 THEN d8=%True ELSE d8=%False  ' d8 -> Hidden
  IF (a6 AND 4)=4 THEN d9=%True ELSE d9=%False  ' d9 -> include files in result
  IF (a6 AND 8)=8 THEN da=%True ELSE da=%False   ' da-> include dirs in result
  IF (a6 AND 16)=16 THEN db=%True ELSE db=%False ' db-> start with dept/size in result
  IF (a6 AND 32)=32 THEN dd=%True ELSE dd=%False ' dd-> wenn 1 dann VON-BIS im Bezug auf Items->q3) sonst q1/q2
  IF (a6 AND 64)=64 THEN di=%True ELSE di=%False ' di-> wenn >0 dann Kooperativer Modus mit DOEVENTS
  IF (a6 AND 128)=128 THEN dm=%True ELSE dm=%False 'dm->SYSTEM-Files/Folder
  IF (a6 AND 256)=256 THEN dp=%True ELSE dp=%False 'dp=1 (Sortierrichtung absteigend, 0=Sortierrichtung aufsteigend)
 ' X_AU "Recurse:"+str$(d7)+$crlf+"Hidden:"+str$(d8)+$crlf+"Files:"+STR$(d9)+$CRLF+"Dirs:"+STR$(da)+$CRLF
 ' X_AU "Dept/Size:"+STR$(db)+$CRLF+"VonBis/Items:"+STR$(dd)+$CRLF+"DOEVENTS:"+STR$(di)+$CRLF+"System:"+STR$(dm)+$CRLF+"Sortierung down:"+STR$(dp)+$CRLF
  s1=G_AR(a1$)

 ' Erstes File holen
 LabA:
  d1=FindFirstFile(s1+a2$,BYREF t1)
  IF d1<>%INVALID_HANDLE_VALUE THEN d2=%True ELSE d2=%False
  DO WHILE d2
    s3=t1.cFileName:dq=0
'    X_AU "GOT: "+s3
    IF s3="." OR s3=".." THEN GOTO over  ' ignore "current folder"  and  "parent folder"
    s3=G_AR(s1)+s3
    IF (t1.dwFileAttributes AND %FILE_ATTRIBUTE_SYSTEM) THEN dn=1:dq=dq+4 ELSE dn=0
    IF (dn=1) AND (dm=%False) THEN GOTO over ' Wenn Hidden etc.
    IF (t1.dwFileAttributes AND %FILE_ATTRIBUTE_HIDDEN) THEN d6=1:dq=dq+8 ELSE d6=0
    IF (d6=1) AND (d8=%False) THEN GOTO over ' Wenn Hidden etc.
    INCR q3
    IF a5.mode=2 THEN EXIT SUB ' EXIT SUB when Flag Set
    WHILE (a5.mode=1): SLEEP 100:X_AQ():WEND ' Rest if Pause-Flag is set

    IF (t1.dwFileAttributes AND %FILE_ATTRIBUTE_DIRECTORY) THEN
       s3=G_AR(s3):s6=G_AW(dq+2)
       IF di THEN X_AQ() ' Kooperativ bei directories
       d3=1:INCR q1 ' DIRS
       s4(d4)=G_AT(s3$):INCR d4: IF d5<d4 THEN d5=d5*2:REDIM PRESERVE s4(d5)
       IF (da=0) THEN GOTO over ' Dirs nicht ausgeben
       IF (dk=0) THEN GOTO nocnt01
       IF (dd=0) THEN    ' counting (vonbis) abhängig von dd
          IF (q1<a7) THEN GOTO over
          IF (q1>a8) THEN GOTO enz
          ELSE
          IF (q3<a7) THEN GOTO over
          IF (q3>a8) THEN GOTO enz
       END IF
       nocnt01:
       IF (db<>0) THEN s5=s6+" "+FORMAT$(TALLY(s3,"\"),$A_AU_SZ)+" " ELSE s5$=""
       a4(dh)=s5+s3:INCR dh:IF dh>df THEN df=df+df:REDIM PRESERVE a4(df)
    ELSE ' FILES
      d3=0:INCR q2:s6=G_AW(dq+1)
       IF (d9=0) THEN
           IF (da=0) THEN GOTO overcnt ELSE GOTO over ' Files nicht ausgeben
       END IF
       IF (dk=0) THEN GOTO nocnt02
       IF (dd=0) THEN    ' counting (vonbis) abhängig von dd
          IF (q2<a7) THEN GOTO over
          IF (q2>a8) THEN GOTO enz
        ELSE
          IF (q3<a7) THEN GOTO over
          IF (q3>a8) THEN GOTO enz
       END IF
       nocnt02:
       q4=MAK(QUAD,t1.nFileSizeLow,t1.nFileSizeHigh)
       IF (db<>0) THEN s5=s6+" "+FORMAT$(q4,$A_AU_SZ)+" " ELSE s5$=""
       a3(dg)=s5+s3:INCR dg:IF dg>de THEN de=de+de:REDIM PRESERVE a3(de)
       'X_AU "DG is:"+str$(dg)+" * q3="+str$(q3)+" * "+s5
  overcnt:
       a5.bytes=a5.bytes+q4
    END IF

 over:
   d2=FindNextFile(d1,t1)
 LOOP
  FindClose(d1)
 ' Ab hier wird die Dirliste geleert
 IF (d4<1) THEN GOTO enz
   s1=G_AR(s4(0))
   ARRAY DELETE s4(0):DECR d4
   'X_AU "Going Deeper: "+STR$(d4)+"--"+s1,"640er"
   GOTO LabA
 enz:
' goto nosort
 IF dh>1 THEN
 IF (da AND db) THEN
     IF dp THEN ARRAY SORT a4() FOR dh,FROM 4 TO 16,ASCEND ELSE ARRAY SORT a4() FOR dh,FROM 1 TO 15,DESCEND
 END IF
 END IF
 IF dg>1 THEN
 IF (d9 AND db) THEN
     IF dp THEN ARRAY SORT a3() FOR dg,FROM 4 TO 16,ASCEND ELSE ARRAY SORT a3() FOR dg,FROM 1 TO 15,DESCEND
 END IF
 END IF

 nosort:
 IF d9 THEN a5.files=dg ELSE a5.files=0
 IF da THEN a5.dirs=dh ELSE a5.dirs=0
 IF (da=0 AND d9=0) THEN a5.dirs=q1:a5.files=q2:a5.total=q3 ELSE a5.total=dg+dh
END SUB

'---------------------------------------------------------------------------
#ENDIF

Donald Darden · July 29, 2007, 05:18:08 AM

We did some testing on the PureBasic forum awhile back about iteration vs. recursion when it comes to walking directory trees. The first surprise is that for directories, very little stack space is required. It is very rare to find any tree structure that goes more than four or five levels, and even the maximum length that Microsoft allowed for a total path statement helps enforce this limit. So there is no risk of exceeding stack space from recurision when handling directory trees.

The second thing, which is less of a surprise, is that recursion code is very elegant and simple compared it iteration processes. This is typical of what I've seen in recursion methods before.

The third thing is that because reading the drive hierarchy is a disk-bound process, the effort to measure performance differences between the two methods is not as significant, due to delays associated with accessing the hard drive.

If you were considering another process, such as sorting an array, and trying
to compare iteration with recursion, one of the concerns would be the debth of
the stack structure required, and the consequences of exceeding available stack space. Iteration allows you to manage the resources needed for the sort so
that you can control that aspect more effectively.

Another element is how you are managing the stack during the whole recursive process. If you look at the way a compiler does this, you will see that it is hardly optimized for speed or for using a small footprint, which would be beneficial in this case. But if you were to manage the stack yourself, you could strive to make it as efficient for this purpose as might be possible.

Consequently, the argument between those that favor one method over the other is not really conclusive.

Theo Gottwald · July 30, 2007, 08:39:55 AM

That may be true for directories.
As my Harddrive has 500 GB and a very nested structure, my results may not be the same everywhere.

Anyway its a general tip to prefer iteration before recursion.

See:

Replace Recursion with Iteration

If you have interesting discussions in the Purebasic forum, please don't mind to post the links here, then i can follow them and read them also. As you may know, I am also licenced Purebasic User since several years (licencenced, biut don't really often use it :-)).

Charles Pegge · July 30, 2007, 09:53:57 AM

Like chillies in the curry, recursion has its place, but should only be used in moderation. I would never use recursion in a sorting algoritthm but the example that sticks in my mind is the subdivisions of the spherical triangle when designing geodesic domes.

Starting with an octagon or icosahedradron, you take the basic triangle and divide it into 4 or 9 sub-triangles, each of these may further produce 4 or 9 triangles. With the final recusion you can then plot the resulting sphere or collect the cooordinates and quantities of the different triangle types and build one!

Doing this by pure iteration is probably more complex and difficult to understand.

Some languages of course discourage iteration and expect you to use only recursion: - The FUNCTIONAL languages like ML that only allow you to assign a value to a variable once. I cant see these languages becoming mainstream though. They are computer Scientist's delight, as they are easier to prove the correctness of a program.

Charles Pegge · July 30, 2007, 10:55:44 AM

Just for fun:

The Functional function will accummulate 10 different i and e values on the stack,
which may be can be traced thru to check for errors!

Functional Style Programming in PB

Code Select



' none of the `i` variables is assigned a value more than once.

printn(1,10)

'....

sub printn(i,e)
 if i<=e then print i:printn(i+1,e)
end sub

'and the iterative equivalent:

sub printn(i,e)
 for i=1 to e: print i: next
end sib

Theo Gottwald · July 31, 2007, 09:09:19 AM

Your example shows in a very simple way why recursion is slower.
Imagine, that with each call of a SUB the system has to open a new stack frame (and make therefore slow memory allocation operations), make a subroutine call and use the least time for the operation we are doing.

Thats behind this tip. While - as Donald suggests, there are cases, where speed is NOT critical, but development time. Thats the place where recursion can be used.

Charles Pegge · July 31, 2007, 11:13:58 AM

My understanding is that Functional language compilers can convert recursions back into iterations internally, where possible. So the mathematical rigour of the language is not compromised to get good performance.

Donald Darden · July 31, 2007, 08:00:49 PM

You may be surprised by the fact that total hard disk size has nothing to do with depth of your directory tree structure. Most likely you assumed that with more disk space, your directory structure goes deeper, but that is not the case.

Consider a room full of filing cabinets. You go to any cabinet and open a drawer, and inside are a series of folders which contain various papers. That is a five level structure -- room, cabinet, drawer, folder, papers. It isn't that deep. Since you can designate any room, any cabinet, any drawer, any folder, and any paper, you only have to have five references, and that would be for any size building.

Another way to look at this is take a typical mailing address. You have a name, house number, street name, city, state, country, possibly a zip code. That address is good for any location on earth, so I can write a letter here, and it will get there, as long as I pay the necessary postage. Now you can strip the address down, because the name is not really necessary, and the city and state are redundant with the zip code. In fact, I could give a location with just its
latitude and longitude, and mayber a height coordinate, and someone be able to find it.

So it does not really make any difference how large a hard drive space is involved, the method of referencing it remains fairly fixed. Now if you wanted to, you can right a hard drive scanner that would check how many folder and subfolders you have, and the maximum depth of any tree branch, even the
average depth of your directory tree, and even how much variance between PCs with large hard drives and those with small ones. The depths should remain
fairly consistent, though the totals will vary considerably.

There is some uncertainty about the matter of maximum path lengths. The different flavors of Windows file systems impose different lengths. Prior to
NTFS, the limit was about 1023 characters. NTFS extended it to 32k. But
many of the Windows APIs only allow a maximum of 260 characters. You can
walk a deep directory that you might not be able to access via a full path
reference, but this is awkward. As a consequence, it is rare to find many cases
where you have an extended directory structure because of the pitfalls involved.
That is why I can say with some assurance that recursion is not at risk of
exceeding stack space when negotiating a directory tree.

I might also point out that deep directory structures are ineffectual. You spend
too much time walking that partiion tables trying to find out where the information is stored on the hard drive, rather than getting right to the file itself. You also increase the risk of data loss by creating way more folder references.
If you lose a file, you lose just that file. But if you lose a folder, you lose all the files that were in that folder, and if you had other subfolders there, you lose them and their contents as well. Creating unnecessary subfolders, and more
subfolders within those, is like digging a deeper pit in which to bury yourself.

Fortunately, no matter how much data you might be striving to deal with, it
parses down to managable chunks with a relatively few number of references,
and a directory tree parallels our efforts to divide up chunks of data into some managable arrangement. They don't have to be very deep, and consequently
they arn't. But what you might find that that long file and folder names can cause some problems in staying in the confines of the allowed or expectd path
length. The 8.2 shortname reference may serve to help in this case.

Theo Gottwald · August 04, 2007, 10:50:35 PM

I did not really measure it. But maybe because I have a 4 Drive HD-Raid with together 32 MB Cache, i have not that much of "mechanical latencies" and therefore my subjective impression was that this subroutine was much faster.

But as you stated, when dealing with mechanical devices, like non-hybrid-hard-drives, the latencies from the software itself is mostly the smallest part.

News:

(Optimization) - Use Iteration instead of Recursion