ARRAY vs Linked List

MikeTrader · October 30, 2007, 08:10:02 AM

Hello Theo,
I hope it is ok to post in your area?
I would very much like to find out more about dimensioning an ARRAY vs allocating memory for nodes in a linked list.

Typically this becomes noticable for STRING arrays of 10k elements if the individuale strings are around 2k each.

I would like to know if using a Linked List of Strings is significantly different in speed when created like this:

Code Select


'¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
FUNCTION llbAdd ALIAS "llbAdd" ( BYVAL pFirst AS LINKED_LIST_TYPE PTR, BYVAL sData AS STRING ) EXPORT AS DWORD 

' Creates a linked list node of BINARY sData AS STRING (CHR$(0) are ok) and adds it to the end of the linked list

  LOCAL pNext, pLast AS LINKED_LIST_TYPE PTR    
     
    IF gLenllType = 0 THEN gLenllType = LEN(@pNext) ' Set once            

    pNext = GetMem(gLenllType) ' Allocate the memory for the linked list node
    IF pNext = 0 THEN  
      gsllLastError = "Node GetMem Failed in llbAdd()"
      FUNCTION = -1 
      EXIT FUNCTION  
    END IF
                
    @pNext.DataLen = LEN(sData)

    @pNext.pData = GetMem(@pNext.DataLen) ' Allocate the memory for the Asciiz Data string
    IF @pNext.pData = 0 THEN 
      gsllLastError = "Data GetMem Failed in llbAdd()"
      FreeMem pNext
      FUNCTION = -2
      EXIT FUNCTION
    END IF
       
    CopyMemory @pNext.pData, BYVAL STRPTR(sData), @pNext.DataLen 
               
    IF pFirst THEN 
      pLast = llFindLast(BYVAL pFirst) ' Find Last Node
      IF pLast THEN @pLast.pNext = pNext 
      FUNCTION = pFirst ' Return the sent first node 
    ELSE 
      FUNCTION = pNext ' Return the new first node 
    END IF

END FUNCTION  
              


'¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
FUNCTION llFindLast ALIAS "llFindLast" ( BYVAL pFirst AS LINKED_LIST_TYPE PTR ) EXPORT AS DWORD 
' Returns a pointer to the last node in the list

  LOCAL pLast AS LINKED_LIST_TYPE PTR    

    IF pFirst = 0 THEN '
      FUNCTION = 0 
      EXIT FUNCTION
    END IF

    pLast = pFirst
    DO UNTIL @pLast.pNext = 0 ' Search through the nodes looking for the one with pNext NOT set
       pLast = @pLast.pNext ' Return the one before
    LOOP
    FUNCTION = pLast 


END FUNCTION 

'¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
FUNCTION GetMem( BYVAL iBytes AS DWORD ) AS DWORD

  LOCAL pMem AS DWORD  
  
    pMem = GlobalAlloc(%GMEM_FIXED OR %GMEM_ZEROINIT, iBytes) ' Allocates fixed memory. Initializes memory contents to zero
    IF pMem THEN  
      FUNCTION = pMem 
  
    ELSE ' Try Again
      pMem = GlobalAlloc(%GMEM_FIXED OR %GMEM_ZEROINIT, iBytes)
      IF pMem THEN 
        FUNCTION = pMem ' "Success this time"
      ELSE 
        FUNCTION = -1 ' "GetMem FAILURE AGAIN"
      END IF
    END IF 

END FUNCTION

If we leave out the overhead of COPYMEMORY() and llFindLast() for the sake of discussion, is the allocation of memory for the node faster or slower than using DIM MyArray(9999)?

(full Linked List code available from Don Dickinson at www.greatwebdivide.com)

Kent Sarikaya · October 30, 2007, 09:04:27 PM

Mike, I can't answer your question directly, but I ran across this interesting site for c++ coding that finally explains linked lists, stacks, queues, vectors in such a way that it all makes sense.

The basic highlights:

Arrays are the fastest.
Linked Lists are the basis for all the other data structure systems like stacks and queues

Linked List gives you total control of putting things in and out of the list anywhere.
Stack is a linked list with Last In / First Out, there are not linked list routines for putting things anywhere in the middle. Think Stack of plates.
Queues are like stacks but First In / First Out, again just think of a line of customers waiting in a line, the first in line gets served first.

A Vector is nothing more than a single dimensional array with methods to make using it easier.

http://www.aaroncox.net/tutorials/miscellaneous/index.html

MikeTrader · October 31, 2007, 07:13:53 AM

Great link for an explanation. Thank you.
What I am wondering about is the speed of PB memory allocation using DIM vs ALLOC for the linked list memory allocation. I suspect theo might have a few words to say on the subject

Theo Gottwald · October 31, 2007, 01:34:14 PM

Before I can answer, the answer is already given.
While finally its not that easy, as you are talking from "String Arrays" not from Arrays in General.

Using Arrays in general the Memory is been allocated by the BASIC Runtime in one piece.
But not with dynamic String Arrays.
Using dynamic Elements, memory allocation is done for each element separate, they can even be in differen memory areas.

As most of the time will be used with memory management of the idividual strings themselves, I would actually not say that that the rule "Array is significantly faster then linked list" will be true for String Arrays with dynamic strings.

In this case it may depend on what shall be done with the strings. How often they change content or not,
and how the linked list are implemented. Your code looks promissing, anyway the question is, what happens to the data,
how efficient can the CPU Cache handle the memory requests.

Some links:
http://articles.techrepublic.com.com/5100-22-1050183.html

http://cslibrary.stanford.edu/103/LinkedListBasics.pdf

MikeTrader · October 31, 2007, 04:50:37 PM

>Using dynamic Elements, memory allocation is done for each element separate, they can even be in differen memory areas.

Exactly.
OK let me write some pseudo code:

Code Select


  sMyArray() AS STRING
    DIM sMyArray(9999) ' Step 1
    FOR i = 1 TO 9999
      sMyArray(i) = SPACE$(1999)' a 2k string - Step 2
    NEXT   


' Vs
      
  pRoot AS DWORD

    pRoot = 0 ' This indicates that we are creating the first node in a new linked list 
    FOR i = 1 TO 9999
      pRoot = llbAdd( pRoot, SPACE$(1999) ) ' On Creating First Node, pRoot becomes "handle" to the first node in the list
    NEXT

Using a PB Array might be slower depending on how the memory is allocated for the BSTR dynamic strings by PB (hence the decompile question). With the Linked list, the memory is allocated with:
pMem = GlobalAlloc(%GMEM_FIXED OR %GMEM_ZEROINIT, iBytes)
It would seem that this might be faster?

Theo Gottwald · November 02, 2007, 07:31:30 AM

It may depend on Element size and position.
I could imagine, that the string engine may be faster with smaller Elements, while it may be slower with bigger elements.

But finally you can do a test. In this case an Dissassembly would not help much as most time is spent in system-calls.

Paul Squires · November 02, 2007, 04:23:20 PM

If it is a situation where there is a large number of additions and deletions from your list over the life of your program then I suggest that a Linked List is much better than an Array in terms of efficiency and speed. I use Linked Lists for just about everything these days because I find that they are much easier to work with than arrays especially when you need to dynamically create and destroy multiple lists throughout your program.

I posted Linked List code in the PB source code forum. It is similar to Don's but it is a little faster (mine is double linked whereas Don's is single) and it integrates well into the Hash Table code that I also posted.

MikeTrader · November 03, 2007, 03:53:38 AM

QuoteBut finally you can do a test. In this case an Dissassembly would not help much as most time is spent in system-calls.

Oh ok. For some reason I thought you would need to dissasemble the Dimension statements for arrays. I can certainly do a test.

Paul,
I looked at your double linked list briefly. I did not immediately see the benefit, but then I am not sure I understand what the double linking does.

I removed the name parameter from the Dons linked list so no strings are harmed in the making of the list (and I loose the ability to retrieve nodes by name, but I never do that)

In this situation (think SQL server) I do not delete nodes or move them around, its just a convenient way to assign data to memory without knowing in advance how many items may be returned (allthough in this specific case I do, I think) but regardless, I would like to use the linked list more and just not care if the number of elements are known.

I agree that the convenience of freeing the memory and essentially destroying the list is great, but I need to test the speed of dimming an anrray and then redimming it, vs creating the list and freeing it.

MikeTrader · January 14, 2008, 01:22:37 AM

After a few months working with Linked lists, String Builder class and memory allocation including paged, I have realized there is very little in it in terms of assigning memory. PB strings do just a good a job as Malloc etc. The real speed trick is to allocate all the memory you will need in one initial assignment as happens with an array. I am sure this is no surprise to anyone, but I did wonder about the memory assignment.

Kent Sarikaya · January 14, 2008, 10:38:09 PM

Thanks Mike for updating us on your findings after such testing.

Eros Olmi · January 14, 2008, 10:45:48 PM

Mike,

maybe you already saw it but in http://www.thinbasic.com/index.php?option=com_docman&task=cat_view&gid=27&Itemid=66
I've posted some code called "Dynamic arrays" that maybe can give you another point of view on PB dynamic memory allocation.

Ciao
Eros

MikeTrader · January 14, 2008, 11:18:31 PM

Hi Eros,
Yes I looked at this a while ago. Very nice.
I ended up using the String Builder class. Its so simple and elegant.

News:

ARRAY vs Linked List

MikeTrader

Kent Sarikaya

MikeTrader

Theo Gottwald

MikeTrader

Theo Gottwald

Paul Squires

MikeTrader

MikeTrader

Kent Sarikaya

Eros Olmi

MikeTrader