Powerbasic Museum 2020-B

IT-Consultant: Charles Pegge => Software Design and Development Issues => Topic started by: Donald Darden on April 16, 2007, 08:52:27 AM

Title: Let's Talk Assembler
Post by: Donald Darden on April 16, 2007, 08:52:27 AM
Assembly is pretty much the universal language when it comes to PCs, because the vast majority of PCs today run either the Intel or AMD chipsets, and these
all support the x86 instruction set.

There are many tools available for writing Assembly code, and even though you
may have never directly written a line of assembly or studied it, if you have ever programmed in any programming language, you have indirectly done your share.
This is because compilers generally take high level code and renders it into suitable assembly code.

But assembly code is not understood by the computer, which only works with groups of bits that we call bytes, words, dwords (for double words), and quads.
When an Assembler converts the assembly code into numeric values, what we call binary format, we have the final conversion necessary for the computer to execute.  There are several types of binary files, which generally go by the extentions of BIN, OBJ, DLL, COM, and EXE.  A BIN file is just a binary file, with no specific purpose identified.  And OBJ file is a linkable file that can be used to
create an EXE file.  OBJ files are often library files, which you obtain or create, and can be produced by virtually any compiler language.  This is a form of static linking, and the contents of the OBJ files then become a permanent part of your final EXE file.  A DLL (Dynamic Link Library) file is an updated concept that replaces the OBJ file.  Links to it are formed in your final program, and the DLL
is then loaded and joined to your program at run time.  This minimizes the size
of your final program and allows common modules, used in various programs, to
be shared between multiple programs.  This can increase the number of programs that your computer can accomodate at one time, minimize the size of your programs on disk, and cause them to make better use of available memory.

COM files are primarily DOS executables.  They assume you only have one segment of memory for everything, and that this is the only program running.  All segment registers are set to point to the same segment.  They also run in the
same segment as any other COM program.  You can usually run COM programs
under Windows, but they cannot take advantage of some of the things that
have changed since the old DOS days, such as NTSF, USB, Long File Names, and
so on.

EXE files are generally multi-segment in nature, meaning they keep a DATA
segment, CODE segment, and STACK segment for starters.  In the original 16-bit
architecture, a segment could have at most 65,536 memory addresses.  With
32-bit and 64-bit CPUs available, the physical barrier to having larger segments
has been removed, but then you might still be constrained by the software toos available.

You do not have all of RAM available to your program when you run under Windows.  Windows will supply your program with the memory requested by the size of your program, your use of DIM and assignment statements.  Then Windows limits your access to what was allocated to your program, and respects the boundaries around other programs as well.  If your program attempts to range outside the allocated memory area, it most likely will result in a GPF (General Protection Fault) error, which will kill your program immediately.  Under
Windows 9x/Me, this would also kill Windows, but from Windows NT to Vista,
it would only kill the immediate program (usually).

Understanding something about file types, and that assembly code is  fundamental to all programs written for the PC, should help you a bit in recognizing the role of each.  Some people write a lot of assembler code directly, and many people only consider writing a small bit of it when the efficiency of that type of code is most pronounced.  As a rule of thumb, many programmers are guided by the principal that only ten percent of your code consumes up to 90 percent of your processing time.  If you can identify the specific code that is proving least efficient and is consuming the most processing time, then enhance it by refinement, which might include writing portions of it in Assembly code, then you will improve performance significantly.

The problem then is, how do you write assembly code?  How does it work?  Hopefully, this topic will be explored in more depth by contributors to this thread.

Here is a key thing to know:  The x86 architecture, that is the design and function of the "brains" of the computer, is organized around a set of registers.  There are in fact eight registers that you can use in your applications.  Originally these were just 16-bit registers, but the original instruction set and the registers themselves have been extended to include 32-bit registers, and more functional modes.  Rather than cover all eight registers right at this point, let me just explain that whatever you do in your program, most (if not all) of these registers will be involved somehow.  Any instruction that involves a destination and a source, such as ADD [destination].[source]. has to have at least one register
designated.  So this could be an add the contents of one memory address to a register, a register's contents  to another register, or a register's contents to a memory address.  But what you do not have is a mode where you can add the contents of one memory address to the contents of another memory address in one instruction.  This means that the x86 instruction set is registry centric, not stack centric or memory centric as some other architectures support.
Title: Re: What's this board good for?
Post by: José Roca on April 16, 2007, 08:55:08 AM
 
This is an interesting site for assembler programers, home of the WinAsm Studio, a free IDE for developing 32-bit Windows and 16-bit DOS programs using the Assembly Language.

WinAsm: http://www.winasm.net/

In the forum you can also find some interesting custom controls written in assembler, but usable with PowerBASIC because they provide a .dll version. See, for example, the XXControls:

http://www.winasm.net/forum/index.php?showtopic=568

(http://www.winasm.net/forum/uploads/post-7-1144550448.jpg)
Title: Re: Let's Talk Assembler
Post by: Theo Gottwald on May 22, 2007, 09:58:41 PM
One of the strengths of Powerbasic- and the original reason why i bought it that time - was that it has an INLINE ASSEMBLER.

Thats why I like the idea to have also discussions on Assembler and low-level optimizations in the forum.

I have read a lot on this topic from you Donald at otehr places (PB-Forum).
Many of this was really interesting, If you post more of it, Donald, a copy here would be welcome.
Title: Re: Let's Talk Assembler
Post by: Donald Darden on May 23, 2007, 03:55:23 AM
PowerBasic's complier does support inline assembly code, which can be a great boon to anyone willing or interested in venturing into this area of development.

PowerBasic's method of writing inline assembly statements requires that you use either the word ASM or the exclamationj (!) symbol on each line of source code which contains an assembly statement.  The format is like this:

[optional label:]
               ASM         [instruction] [dest] [,source]     [;comment field]

If you elect to use the exclamation mark instead, it would appear like this:

[optional label:]
              !              [instruction] [dest] [.source]     [;comment field]

The "white space" separators between fields is significant, but the actual number
of spaces is imaterial.  For instructions that have both a destination and a source, the comma alone will suffice as a separator.  The semicolon used to mark the comment field can be a single quote mark instead.

The Assembler used by PowerBasic is TASM (Turbo Assembler from Borland), and can be found on the Internet.  You don't need it, unless you want to have a separate assembler, or you want access to more information and details of assembly programming and conventions used with it specifically.

Note that there are many Assemblers available for the x86 processors, some limited to 16 bit programming, and others capable of 32 bit programming.  These include FASM (Fast Assembler), MASM (Microsoft Assembler, the cadillac of assemblers), NASM (Netwide Assembler), and others.  The syntax used with each is very similar to the rest.  The big differences are in the extensions and support available to each.

So what sets one Assembler apart from the others?  Well, availability would be
one thing.  You used to have to pay some pretty good money for these.  But
with the adoption of Windows, the development of better high level language compilers and specialized languages, and the myrad of details that programmers had to master, the thrill and habit of writing assembly code sort of went away for a lot of people.  There were too many new things to spend your time doing and learning.  But writing assembly code gives you the advantage of having real control over the computer, and writing lean and mean code is the epitamy of the programmer's quest.  Assembly code gets you there better than any other method.

I did mention that assemblers tended to be somewhat costly.  Well, since the bloom has gone off that rose, continued Assembler development has become something of a hobbiest project, covered by open source agreements.  you can now even get a MASM equivalent, named MASM32, as a free download off the network.  And there are those committed to trying to keep the capabilities of Assemblers up to the expanding capabilities of the processors being designed and built by Intel and AMD.  MASM32 is still being adapted, but is somewhat behind current hardware capabilities.

Another alternative is HLA (High Level Assembler), which wraps assembler into a high level language form.  Actually, if you look at HLA and PowerBasic both, you realize that PowerBasic's approach packs a lot of functionality in its approach,

WinASM, which José Roca has mentioned and provided a link to, allows you to write assembly code directly under Windows.  In other words, you have many options for learning and using assembly language under both DOS and Windows.
But you have a similar ability when it comes to Linux, Unix, and the Mac OS.
Learning the assembler language gives you an edge in all those environments.

So, if you have your choice of Assemblers, what ones would you choose?  Well,
that might depend on what you want to do.  For basic and limited use of assembly code, you probably don't need one - PowerBasic alone should suffice. 
TASM would be the best choice for supporting your use of assembly code with the PowerBasic compilers.  MASM and TASM include their own debuggers, and a lot of books feature these two compilers when teaching assembly programming.
Of course PowerBasic's debugger supports its inline assembler mode as well.

Some of the other factors to consider, would be whether you want to write stand alone programs or libraries in assembler, or if you want to create games or an assembler, compiler, or operating system of your own.  How well versed do you want to become in assembly?  Does a given assembler support 32 bit and possibly 64 bit instructions and address modes?  Does it support floating point and graphical extensions?  Which chip set is it better suited to, or does it model best?  You might even choose the Assembler that your best book on the subject recommends.

Assembly coding may not be the high profile programming tool that it used to be, but there are many sites devoted to the topic still, and searching thise out can be profitable.

Now here is something else to consider as well:  To program, you do not need to know or use assembly language, but to understand what has already been coded, you often do.  The reason is that, as stated before, assembly language is the universal computer language.  Existing programs, libraries, and modules can be run through a dissassembler, which converts the binary format back into assembly language, and if you know assembly language, you can take a stab at trying to understand it. (Note:  This process is complicated by the fact that the disassembler may need to be instructed on what portions of the program represent data, and what parts represent actual code).  Efforts to write decompilers have been far less successful, because every compiler is different, and techniques for rendering a high level language into assembler differ greatly.  The best efforts at decompiling programms involve successfully deterninging which compiler, compiler version, and supporting tools (such as libraries) were involved in the original program design.   

If you want to get started with ASM in PowerBasic, then you can start with the Help file and look up the keyword ASM.  It doesn't really tell you anything about writing assembly code, but it does list all keywords accepted after the ASM or exclamation mark, and helps identify the limits of PowerBasic's ability to understand the assembler syntax.  You will also find a line in the Help file to the Inline Assembler, and that will take you to various other links.  But like any Help file, it helps you with specific questions, but it not really a tutorial in its own right.  However, PowerBasic does nave a download section devoted to Assembly Program, but much of what is available is specific to the PB/DOS compiler, and involves 16-bit code only.
Title: Re: Let's Talk Assembler
Post by: José Roca on May 23, 2007, 04:08:51 AM
Quote
The Assembler used by PowerBasic is FASM (Fast Assembler)

Bob uses TASM, no FASM.
Title: Re: Let's Talk Assembler
Post by: Donald Darden on May 23, 2007, 04:34:34 AM
Thanks, José.  I wonder where I got that misconception?  Anyway, I corrected my post accordingly.  I looked up TASM on the internet, and came on an order form from Boreland to buy their assembler.  I wonder how old that web page is, or whether Borland still exists, or if they (or someone else) now sells TASM?  It said version 5.0.  I have MASM 6.1, and found MASM up to 9.x on the internet.
Title: Re: Let's Talk Assembler
Post by: Donald Darden on May 23, 2007, 08:21:13 AM
I said earlier that the x86 CPUs are register centric, and I explained that this mean that the typical operation of the CPU involves moving information into, out of, between, and with respect to the registers provided.

Let's refine our concept of a registry a bit by referring to the typical calculator.  It has a display that is attached to a registry, and whatever value it shows is what is in that register.  It probably has another register as well, and to distinguish between the two, the one displayed is commonly referred to as the "X" register and the hidden one is the "Y" register.

You would normally enter a number into the X register, then when you press an operation key, such as add (+) or subtract (-), the contents of the X register are transferred to the Y registered, and the X register is usually cleared.  Then you enter a second value into the X register, and when you press the equals (=) key or another operation key, the pending operation takes place between Y and X, and the results returned to the X register.

Now calculator designs vary quite a bit, and in some designs, the Y register might be cleared at this point, the pending operation may be discarded as being complete, or repeated press of the equals key might cause the pending operation to happen again and again.

The automatic shift of the original contents from the X register to the Y register identifies the X and Y registers as a form of a stack.  In more advanced calculators, such as the HP programmable calculators, there may be additional registers associated with the internal stack, such as Z, S, and T.  Certain operations permit the contents of any one register to be copied to or exchanged with another designated register.

Since simple calculators do not address the X and Y registers as such, and the transfer of information between these is handled transparently as the result of other operations, we would think of this type of calculator as stack oriented or centric.  Now many calculators also include some sort of memory register as such, but these would closely resemble what we now consider a register to be.
There are operators that allow us to save to memory, read from memory, add
to memory, subtract from memory, exchange with memory, clear memory, and so on.  Let's just call this the M registrer to put it on parity with our X and Y registers.  Now we can see that some calculators are not only stack centric, but in at least one aspect, are register centric as well.

Now a programmable calculator has program memory as well, and this is very much like the RAM (random access memory) that we find in computers.  This memory can be divided up into two types:  the portion that actually contains the programmed instructions that we entered via the keyboard or other means, and the concept of additional registers, which we might refer to as the R() registers.  These are frequently numbered registers, such as R1 - R63, or more  appropriately, R(1) to R(63).  These form a fixed size array of registers, but the parentheses are often neglected in direct references because each symbol of a left paren and right paren take up a program memory step, and program memory is limited.  However, these R registers are a true array if they can be referenced indirectly by number from within another register.  That other register would be one of the existing ones, but able to perform index operations with regards to the R() registers.

As you can see, even calculators can have multiple addressing modes.  And we find similar modes in the x86 architecture.  So why would I describe the x86 as being register centric rather than a mix of all three?  The main reason is that the x86 instruction set is principally geared to working with its registers, so that makes it simple.  The other thing is that the use of registers tends to make things go faster though the processor.  But the final word on the subject is that it would be extremely difficult to write good and effective code if you avoided the use of registers altogether.

You can see from the description of registers in a calculator that naming is an important aspect of registers.  There are also a finite number of registers, and they probably have certain key roles to play in normal operations.  That all holds  up as well when describing registers in the x86 CPU.

There are essentially eight registers that you would normally be concerned with in the x86, broken down into two groups.  The first group is the General Purpose registers, and in the 32-bit design, these are called EAX, EBX, ECX, and EDX.  The "E" means that they are 32-bit in length.  The other part of the name, the AX, BX, CX, and DX, were merely 16-bit registers in the old CPU architecture.
For compatability with older software, the CPU recognizes both types of instructions and both types of registers, using the lower 16 bits of the extended register when a 16-bit instruction is specified.

The General Purpose registers are supported by many instrctions, some of which use certain registers in a special manner.  Thus, each general register also has
specific operations for which it is best suited.  The AX, or EAX register are most commonly used for arithmetic and comparative operations.  The BX or EBX register is used as an alternate to the AX or EAX register, and also used as an offset in some addressing operations.  The CX or ECX register is often used as a counter (think FOR loop in this context).  The DX or EDX register is sometimes used in conjunction with the AX or EAX register for integer multiply and divide operations.

Because the x86 evolved from an 8-bit architecture originally, where IBM had set the standard for 8-bit bytes in its architecture, and the adoption of the 8-bit ASCII and EBCDIC codes (the last is an old IBM mainframe standard), one of the design criterias for the x86 was to support 8-bit byte operations.  To do this,
the four ?X registers just discussed have instruction addressing modes that just address the lower 8 bits, or the higher 8 bits.  Thus the lower 8 bits of the AX or EAX register are called AL, the upper 8 bits of the AX or EAX register are called AH, and for the other three registers you have BL, BH, CL, CH, DL, and DH.

That identifies and partially explains the first four registers.  The x86 architecture provides some redundancy, overlap, and override capabilities, so
there is often choice in what registers to use and how when it comes to programming.

The other group of four registers usuallly involve special addressing modes that are used in conjunction with the other four registers and memory operations.  Two of these are segment registers, where segment would be a place in memory  that marks a start of a sequence of available memory.  It sounds like a pointer, doesn't it.  Well, this is often where pointer values end up.  In the 16-bit form, these would be called DS and ES, which stand for Data Segment and Extra Segment, respectuflly.  In 32-bit mode, these are designated EDS and EES.

The other two registers in this group are the offset, or index registers, and are called SI (Source Index) and DI (Destination Index) for 16-bit addressing, or
ESI and EDI when used for 32-bit addressing.  Note that DS:SI and ES:DI are used as pairs in 16-bit machines to designate memory addresses above 16 bits in length.

Now a quirk/limitation of the 16-bit architecture is that Intel decided, for some reason, that they would never need a full 32-bit addressing range, so the DS:SI and ES:DI pairs can only address about 1 Megabyte of RAM.  The segment register is shifted right 4 places (effectively multiplied by 16) before its contents are added to that of the index register.  This represents 2^20, or 1,048,576 addresses.  If you remember Expanded RAM, or Extended RAM under DOS, this was additional memory that could be added to the PC, which then required special software drivers to access, such as HIMEM.SYS and EMM386.SYS.
The expanded memory carried you from the 640 MBytes that DOS imposed on you, and carried you up to the 1 Megabyte boundary.  The extended memory was what you could page in above the 1 megabyte boundary limit using a somewhat kludgy method of access.  This is where DOS suddenly really began to show its age.  Fortunately, Windows allows you to access and use much, much more memory than you could under DOS.

Well, that's only 2^20 addresses, and the pointers under PowerBasic are set at 2^32 bits, which translates into a maximum of 4,294,967,296 addresses.  So do
we still see EDS and ESI (or EES and EDI) combined by shifting to create a larger address space?  Actually, no.  Either EDS or EES, as segment registers, can support the 32-bit pointer values used by PowerBasic.  You can still add ESI and EDI to either segment register, and in fact you can also add the EBX register if that helps.  But since PowerBasic does not do either of these within the scope of the PowerBasic statements that it translates to assembler, that leaves the ESI and EDI registers essentially unused, when just writing BASIC statements.
So as a consequence, PowerBASIC allows you to designate up to two integer values to be assigned to registers (via the #REGISTER and REGISTER instructions), with the first one being placed in ESI and the second in EDI.

Now PowerBasic tells you in its inline assembler section that you should not alter the contents of EDS or EES, as they are used by PowerBasic itself.  This is an important matter, and it can really limit you sometimes, especially if you need to support 16-bit mode.  For now, take this as a statement of fact, but we will discuss how to get around it later in this post.

If you want to use ESI or EDI for something else, or you intend to modify the variables in memory that get assigned to ESI or EDI, and you do not want PowerBasic to accidently overwrite these when it rewrites the registry to memory, then it would be best to use #REGISTER NONE to prevent PowerBasic from hindering your separate efforts in assembly code.

These are not the only registers in the x86 architecture.  You have the status
register, which bits are conditioned with the results of various operations, and generally tested with branching statements, such as ! JE somewhere, which will jump somewhere if the EQUAL flag is set.  ! JNE would jump somewhere if the EQUAL flag is clear (zero, meaning not equal or not true).  You rarely manipulate
the status register directly, but since it can be pushed to and from the stack,
and since other registers can also be pushed to and from the stack, it is effectively possible to get it into another register and directly change it, then put it back.

You have the IP register, which is the location of the current programmed instruction (Instruction Pointer) being executed.  This is used for the different
jump and CALL statements, and it defines where you are in the execution of your program.  A copy is placed on the stack automatically when you do a CALL, so that the return (RET or RETF) command knows where to go back to later.  Again, you aren't expected to directly change the contents of the IP register
(and can cause serious issues if you try), but anything that can be put on the stack can be accessed via another register, so it can't be said that it can't be done (or hasn't been done).  Again, with 32 bit machines, this would be the EIP register.

You should have some idea of what a stack is now, and it may not surprise you that the stack is supported by a Stack Pointer (SP), which is a pointer that marks the bottom position of the stack in memory.  The stack is not fixed in memory, in fact you can have multiple stacks involved, one for each program perhaps.  But there is only one stack pointer, and it always points to your stack when you are executing your program.  The stack is where parameters are placed when calling SUBs and FUNCTIONs in PowerBasic, but these are all part of a stack frame that PowerBasic sets up and maintains for you before and after you make that call.  When you look at assembly code, you will often see where
information is being read from memory using the stack pointer (SP) with an offset.  This is the method by which those parameters are accessed once you are in the procedure (a procedure being either a SUB or a FUNCTION).  The offset value used determines the parameter you are currently accessing.  So, in essence, the stack acts as sort of an array with the SP being the pointer and the offset the index into the array.  PowerBasic automatically determines the offset required for accessing any given parameter.  Again, with the move to 32 bit architecture, the SP becomes ESP, to support the larger address space.

And then there is the Base Pointer, or BP register.  Again, this sounds like a register that was planned to be used with pointers.  Perhaps it was.  But a programmer found a better use for it - a register to capture a shapshot of the stack pointer.  If you do a ! MOV BP, SP (or ! MOV EBP, ESP), you get whatever the stack pointer is currently pointed to.  So anything you add to the stack with a PUSH statement is not reflected as a change in the base pointer, and you can
POP values from the stack and still not cause a change in the base pointer.  The base pointer then serves to ensure that, when compared to the stack pointer,
that all the PUSH and POP instructions have effectively cancelled themselves out.  And there is something else you can use the BP register for, which is to force the SP register back to its original value at any time.  If you pushed a lot of values or parameters on the stack, call some process, come back from that process, then do a !MOV SP, BP (or !MOV ESP,EBP), you effectively nullify all the previous pushes, cancelling them at once.

However, there is only one Base Pointer as well, and if I use it in my process, and someone else uses it in their process, what happens to the present or previous contents of the base pointer?  Here is where it gets a bit tricky.  Before you do a ! MOV BP, SP, to put the contents of the stack pointer into the base pointer, you do a ! PUSH PB, which saves the contents of the BP on the stack.
Then your ! MOV BP, SP sets the Base Pointer onto the stack.  Now the first thing on the stack that the base pointer points to is its previous contents.  The next thing is usually the return address to get it back to whatever process called it.  Above that in the stack (memory offset from the stack pointer) is the
parameters that were passed when this call was made.  Above that is whatever called before, which has not yet been returned from. 

Now since the base pointer and the stack pointer initially are set equal by the
! MOV command, we have the option of using either as our stack reference point.  If we use BP plus an offset, we get a comparitively static method of accessing parameters.  If we use SP plus an offset, we get the same effect, unless we perform any further PUSH or POP instructions, which would change the contents of SP, and cause us to require a different offset to reach the same parameter.

Thus, if someone is not intending on using any PUSH or POP instructions in his procedure, he may resort to using SP and offsets to reach passed parameters, and may not use the BP pointer at all.  This makes for faster programming.  But
it also means that any register changes will possibly effect the calling program on a return.  Sometimes this is what you want, often it is not.

On the other hand, if you want the freedom to use any register and/or the stack, then you would not only place BP on the stack, move the SP to BP, but you would push other registers on the stack as well, including the status register and any registers that would be effected by your process.  Now registers have to be popped off the stack in the exact opposite order as they are pushed on there, and the pops have to match the pushes on count, and this is an area where mistakes are sometimes made.  To deal with this, later CPUs had the PUSHALL and POPALL instructions added, which saves all registers onto the stack, and restores all registers from the stack, and makes this part of the process simple.  Note that if you are using the Inline Assembler under PowerBasic, you do not normally need to concern yourself with pushing and popping the registers.  PowerBasic thoughtfully takes care of that for you when you call a procedure, or exit from one.

But, remember the previous discussion about how PowerBasic uses the contents of EDS and EES for its own purposes?  And that if you empower the use of the ESI, and even the EDI registries for integer variables, that these should be considered off limits as well (unless you mean to change the content of the assigned variables)?  Well, now you have a case where you might want to save the contents of two to four specific registers, to restore them before the end of the current procedure.  You can use specific PUSH and POP instructions to handle this.  But there can be a serious downside:  What happens if you interlace BASIC statements with assembly code statements?  PowerBasic may
need reference to whatever it has in the EDS and EES registers during the BASIC statements, but you planned to put other values in those registers and use them with your assembly code.  The chances are strong that you will point to the wrong place at one point or another, and blooey!  Your code blows up.  To prevent this, either leave EDS and EES intact, or do not interlace your assembly code with PowerBasic statements.  You can of course PUSH a register and POP a register at the beginning and end of each segment of assembly code, and no harm done, but it makes your code bulkier and kills some of the efficiency you were looking for.  Don't worry about changing EDS and EES inside a procedure, as the PUSHALL and POPALL instructions used by PowerBasic should protect the original contents on exit.   

This concludes my discussion of registers for the moment.  There are other registers in the CPU, such as control registers, test registers, debug registers, floating point registers, and so on.  These would be advanced topics, not necessary for an understanding of the basic principles of assembly coding.

Note that PowerBasic uses the term "optimizing compiler" when describing its products, but optimization is sometimes a matter of perspective.  They are optimized to produce extremely fast compiled code, and generally create code that is small and runs fast.  But to write a fast compile process, you have to risk not writing the absolute smallest and most efficient code, because the analysis time required would become prohibitively long.  Instead of having a finished compile in the matter of a moment or two, or within minutes for a really large program, it could take much, much longer, which would make your efforts to develop new code and debug it even more tedious.

So the PowerBasic compilers represent a good compromise in terms of  evelopment and performance.  Once you get your program running properly,
land you want to optimize it further. that is when your understanding of assembly code could pay off.  In this regard, be aware that integer arithmetic, particularly the use of LONG numeric types, will do the most to optimize your code under PowerBasic.  But in looking at the finished code with a disassembler, you may find that PowerBasic used a large number of floating point operations,
which it will do in cases where it was not sure how you intended to do something at first glance.  Any assembly instruction that begins with the letter "F" is likely a floating point instruction, and deserves your attention when it comes to further optimization, since floating point instructions are very inefficient timewise when compared to integer mathematics that the CPU can peform on its own with the available registers as described here.






Title: Re: Let's Talk Assembler
Post by: Charles Pegge on May 23, 2007, 08:48:18 AM
Donald, I did some assembler code last year for parsing. Since it incorporates
INSTR functionality with case insensitivity in its latest version.

I thought it might be of interest to you, as you were discussing INSTR in the general section. Hope you can follow my code, I have annotated it quite intensively. Iit is surprising how important the annotation is, even for an author to follow his/her own assembler code, especially after a year's lapse.



It is in the Windows source code section:

http://www.jose.it-berater.org/smfforum/index.php?topic=684.0


Incidently, Freebasic also has an inline assembler but no exclamation marks are needed with this one.

Title: Re: Let's Talk Assembler
Post by: Donald Darden on May 23, 2007, 09:58:31 PM
Charles, I read your post yesterday, and am interested in examining your code.  Right now though, I am in considerable pain, and need to go see a doctor, so I will catch up with you later.

For those interested in the discussion about performing searches, I should point out that PowerBasic compilers now support two commands for this purpose:
  REGEXPR statement
  REGREPL statement

The INSTR() function in PowerBasic works quite well, and allows you to search from left to right, right to left, for a specific word or phrase, or one of a selection of characters.  However, it can be sometimes be difficult to apply any one tool or approach to a particular need, so having a choice of tools can be very helpful.

Don't be too alarm about my mention of pain, it's been building up for a few months, and has finally reached a point where I can't escape it, even in sleep.
So I will give the doctors a chance to do their worse. 
Title: Re: Let's Talk Assembler
Post by: Charles Pegge on May 23, 2007, 11:36:39 PM
Hope you will be feeling better soon Donald. Being in pain is a pain!

On the subject of assembler in general, I find that of all the computer languages I have encountered, assembler is one of the most satisfying.
Hexadecimal better still! May be its my hardware roots or the knowledge that the code carries no wasted cycles whatsoever and you know, or should know exactly what the CPU is doing at each step.
Title: Re: Let's Talk Assembler
Post by: Donald Darden on May 24, 2007, 10:04:01 AM
When I first studied computers in the Navy, they were all based on octal (6 bit) bytes instead of the current 8 bits.  Some of the computers had 12 or 18 bit words, and the CDC 1604A had 48 bit words which contained two 24-bit instructions.  The art of programming in raw numeric form was called writing in Machine Languiage.  The use of letter groups to represent operations like ADD,
SUB, MOV, CMP, and JMP were called MNEMONICS, which means "memory aid".
The development of Assemblers took the mnemonics and replaced them with the necessary numerical equivalents.  There was a one-on-one correspondance between the numerical form, which you call hexidecimal (meaning base 16) and
the corresponding nmemonic for that instruction.  But as time went on, new pseudo a macro codes were added to the assembler language it give it more power.  Various data types and strings of data could be appended with such statements, and the ability to identify structures.  You could label portions of your program and reference them using various instructions, and the assembler was able to correlate these to actual locations in memory.

The machines of that era were far less powerful, and generally had fewer instructions and methods of addressing than found in the x86 architecture.  The complexity of the X86 means that creating a comprehensive assermbler is a bit of a challenge.  Some adaptations avoid or overlook some portions of the x86 capabilities.  However, a clever programmer who knows the missing portions and is comfortable in writing hexidecimal code can emulate that instruction by using the hexidecimal value in place of the instruction.  You also find the converse on occasion - a dissembler that is unable to translate some binary data back into a valid instruction, and presents it in hexidecimal form instead.  If you know the hexidecimal values and the mnemonics or the way the instruction works, you can get through these problem areas with greater ease.

I personally found if beneficial to move on to using an assembler when they became available, but I remember with some fondness my earlier work with machine language routines, and the laborous effort of counting out the correct number of steps for forward and backward relative memory address references.

Incidently, a long day in pain, where I was subjected to a sedative, two pain killers, and a muscle relaxant, and my pain has largely disappeared.  The tests for other causes were negative, so the doctors concluded it was due to some disk desease, pinched nerves, muscle spasms, and feedback from being with pain for so long.  It got really bad for awhile there, but I guess it was just necessary to break the pain cycle.
Title: Re: Let's Talk Assembler
Post by: Charles Pegge on May 24, 2007, 12:07:33 PM
Glad you are feeling better Donald. You will have to avoid long sessions on the computer and move around to keep the back supple.

One warning when using Freebasic under Linux: The compiler and the assembler are so tightly integrated that they both share the same variable name space. I happen to use a lot of short variable names so I fall into this pitfall quite easily. This morning, it was a variable called DI, which is one of the x86 index registers.

So any of the variable names synonymous with x86 registers have to be avoided unless you intend accessing them of course.

These include (from my head!)
al,bl,cl,dl,
si,di, ds,es,sp,
ax,bx,cx,dx,
eax,ebx,ecx,edx,esi,edi

This is a problem that does not occur on the MS windows version of FreeBasic.
So it only becomes apparent when you move from MS to Linux. I am told that this is a problem in the GCC compiler, used by Freebasic (version 0.16) at the back end.

Title: Re: Let's Talk Assembler
Post by: Donald Darden on May 24, 2007, 11:43:03 PM
That's very true, and from the top of my head, you should include all extended regester names (those starting with "E", such as EAX).  However, this is only necessary if you intend to employ assembly language.  Let it be known that single character names, these being "A" to "Z", create no problem, and that two character names, with the second character being "0" through "9", are quite safe.  This actually supports early BASIC naming conventions quite well.

Also, double identical letters are safe, these being "AA" to "ZZ", and that already gives you 62 possible variable names without requiring any new rules.  Another 26 can be obtained by using triple identical letters, "AAA" to "ZZZ".  And if you want to get fancy, you can have certain letters-digits mixes such as AA1, or A1A, etc, that will not accidently duplicate any keyword in Basic or Assembly coding.

With all these patterns possible in your naming convention, you can afford to set aside some groups of names for certain types of variables.  In my case, I use
A to L for integer variables, M to T for floating point variables, and U to Z for temporary throwaway variables, like FOR loop counters.  For strings, I use double-identical letters AA to ZZ, and for temporary strings, I use something else, such as letter-digit.

It's a very simple naming convention, and I don't use it very rigorously, especially when writing brief, sample code to just illustrate a point.  There I might just end up naming variables A to Z, and AA to ZZ, and define them as needed.  The distinction that single letters represent numeric types, and double letters represent strings, is the convention I stick to most strongly.

There are a couple of other addressing issues to consider here.  In assembly language programming, you have your external devices attached indirectly to the CPU through a chipset, and they appear in the CPU as additional address spaces,
which are numbered, in exactly the same manner as RAM memory is.  But these are called I/O ports, rather than memory locations.  A single defice may have a number of ports associated with it, where some of those ports serve to control the device, and some are used to exchange data with the device.  Some will even return status about the state of the device, but these may also be classified as control ports.

Manufacturer's generally define the port address space allowed for their devices, and follow certain conventions and guidelines that have evolved over the decades.  There are overlaps, and inconsistencies, and conflicts.  The more devices you attach to your computer, the greater the risk of conflict between different devices.  A real problem was the limit that Intel set on Interrupt lines, which were used to alert the CPU whenever a device requested attention.  With many more devices needing interrupt service, but having to share existing interrupt lines, a lot of problems were encountered.

Windows finally brought an end to this, and the abandonment of some of the old style parallel and serial devices has helped.  New buss standards, such as USB (Universal Serial Bus) were designed to let multiple devices all work with the PC without enduring the same conflicts.  We now classify devices that used the old style busses as legacy devices, and while it might still be possible to use them with your newer PC and updated OS, you run the risk of experiencing some of the old problems associated with them.

But now there is a void.  Most information on programming in assembly language assumes that you are working with a computer that deals with legacy devices.
This is because most books on assembly language were written when the only PCs around were those that came equipped with those old devices.  There are not many new books on assembly language available, and for the latest information, you pretty much are dependent on online sites that benefit from the work of others who still see assembly coding as the way to go.

Another part of the void is that device manufacturers are faced with a requirement to write device drivers for the devices that they design, build, and sell.  A device driver is exactly that.  It standardizes the way that the device is "seen" and works with a specific operating system.  Different drivers are required when working with different operating systems, and each operating system sets what the device interface must look like in order to be integrated with that operating system.  So there is a device interface standard for DOS, another one for advanced Windows, yet a third for Linux and Unix, and so on.  The big problem then is that your device manufacturer may not have produced a device driver for your device that will enable it to work under the operating system of your choice.  He may have decided that your OS did not represent a big enough market to justify the cost and time it would take to make a suitable driver.

Device I/O (Input and Output) is slow compared to program execution speeds, mostly because of mechanical factors, and there may be little real gain in speed by trying to interface with them in assembly language.  And as you can see, trying to work with devices directly, or deal with device drivers, can make your job extremely hard.  The consequence is that many programmers that use Assembly code, only use it to address data while it is present in RAM.  The effort to read it from a device. such as a hard drive, or write it out to another device, is left to the higher level language, in this case PowerBasic, to deal with transparently.

The flexability of deciding what part of your programming will be done in Basic or other high level language, and what part will be done in assembly, can make for a very good mix.  You are not forced to do what you don't feel comfortable trying to do, or feel may beyond your skill and knowledge.  And the consequence is that you only have to learn as much assembly language as you want to try and use.  This is where a language like PowerBasic or FreeBasic really pays off, because they support the easy integration of assembly language into your program when and where you want it.  So we will not be addressing device I/O
at this point.  The information already provided should help you identify the problem, and let you search further elsewhere for that type of information, but much of it will be outmoded and obsolete, because it will be directed at supporting legacy devices.

Getting back to the use of registers in the CPU, you will note that you have some registers that allow you to address the lower 8 bits and the upper 8 bits directly, or the whole word of a 16-bit register, or the whole word of a 32-bit register.  But there is what might be considered a gap:  You cannot address the upper 16-bit word of a 32-bit register, and there is no way to refer to the lower or higher order bytes within that upper 16-bit word either.

Well, the plain fact is, that doubling 16-bit registers to 32-bits does nothing for anyone that is interested in byte or character processing.  That is still just 8 bit chunks of memory if you use bytes, ASCII or EBCDIC code, or 16 bits if you resort to Unicode.  The only time you might want more byte space in a register is when you want to compare consecutive bytes in memory, say for a string search.  But the problem is, when you search by 16-bit or 32-bit groups, the basic instructions associated with an efficient search, the automatic increment/ decrement of indirect addresses are going to be by 16-bit or 32-bit chunks of memory as well.  So if you try to test for 2 bytes at once, you will increment or decrement your address count by 2 bytes at once, effectively only checking every other byte start of a possible match.  With 32-bit registers, you only start checking with every fourth byte.  Another problem, is that the most effective may to test for string matches is if you are looking for exact matches.  This means the exact same case, and the exact same group separations and punctuation.

If a long word is split with a hyphen, say at the end of a line, so that it is actually written as pre-[cr][lf][tab]columbian, where the codes for carrage return, line feed, and tab are represented within the square brackets, then that would not match with "pre-columbian" or "precolumban", unless we somehow allow for white spaces and possible hypenation in our pattern.  In assembler, trying to allow for such variances would be extremely hard, and not supprtable with a simplified search method.  Search techniques of this nature would be hard in a higher level language as well, and coming up with a fast, efficient, and thorough search method is something of an elusive goal.  Many people look for existing code that will do this for them, and a part of the problem for the assembler programmer is that the x86 instruction set is not optimized enough in this regard to make it somewhat easier or more efficient to create such search techniques.

But you are somewhat stubbron, and want to be convinced on your own, so you ask:  But how do I get access to the upper 16 bits of a 32 bit register?  How would I treat this as two separate 8-bit bytes if I want to?  The answer that is most common, is to swap the upper 16 bits with the lower 16 bits, and then do what you want with the lower 16 bits before swapping them back.  Sounds simple, and the x86 does have an XCHG (exchange) instruction for switching any two 8-bit, 16-bit, or 32-bit reference.  But it won't work, because we have no way to refer to the upper 16-bits of a register as either a source or destination.
But there is a way to do it, and this involves the rotate left or rotate right insturctions provided.  If you rotate a register left 16 places, or rotate it right 16 places, you effectively swap the upper and lower 16 bits.  Now do what you want, and rotate the register back 16 bits to restore the upper and lower
portions to their original positions.  Another way would be to store the register into a 32-bit address space, then reference each byte or 16 bit segment separately.  And a third way would to be to push a 32-bit register onto the stack and then pop of two 16-bit words into 16-bit registers.     

If this is not that common, then what is the point of discussing these techniques?  The real point is to get you thinking about the nature of having
registers to deal with.  How many bits are there?  What does shifting have to do with the contents?  Numerically, now does shifting effect the contents?  What is the purpose of the Carry flag?  What happens when I use an ADD command, and later use an ADC command?  How about when I use SUB, and later use a SBB or
SBC (usually, only one of these is supported) command?  What happens if register pars are used with a shift or rotate command?  What difference is there between a shift and a rotate anyway?  If you can't answer these questions, then you obviously still have something to learn about the CPU registers.  But
don't spend a lot of time looking in booiks for the answers.  Most books rush through their description of the registers, because there is so much detail infolved, and they have to get though one chapter quickly in order to start the next.  You could write a book just on the registers if you wanted to.  No, the best way to learn registers is to execute some assembly instructions in step mode, to note changes that take place in the CPU registers while in the Debug mode with the PowerBasic IDE, and to return the modified value to a variable and print it in a Basic statement so that you can observe any change to it.

This would be an example of testing a left shift instruction and its effects on an integer variable with assembly code:
       LOCAL a AS LONG
       a = 12345
       ? a
       ! MOV eax, a
       ! SHL  eax, 1
       ! MOV a, eax
       ? a
You can add other commands if you want, or change the ones you see here.
For PB/CC, you can add these if you like:
  FUNCTIO PBMAIN
      COLOR 15,1
      CLS
     ... the above
     WAITKEY$
   END FUNCTION

Remember that you want to save this with some file name, then on the toolbar under Run you want to do a Compile and Debug.  The Debugger window will open up, and you want to move it down and to the side a bit so you can see the toolbar and your code above.  Then on the toolbar you want to select the icon to bring up the CPU registers, which starts off with eax and its present contents. followed by the remaining registers.  Then along the lower portion of the toolbar, you will see three small blue retangles with dotted lines and arrows above them.  The hover legend when you put your mouse cursor on each will read "Skip over call", "Step into code", and "Step out of code".  They perform differently, and either of the first two would allow you to single-step this simple example, but as a general rule, the middle one is best when stepping assembly language.  By clicking on that button repeatedly, you will advance one line in your source code for each click.  With Basic statements and multiple statements per line, all the multiple statements get executed at once (a good argument for not using multiple statements in code that needs to be debugged).  Since you can only have one instruction per line with assembly, you will see the effects of each instruction as it effects the contents of the CPU registers.  You have the option of bringing up the Variable Watch window and watching PowerBasic variable at the same time, to see any changes there as well.  Which would be a good alternative to relying on the ? (which stands for PRINT in the console compiler, and MSGBOX in the windows compiler).

After you perform this simple program, you might want to play with the ! SHL
statement a bit.  Try changing the eax to ax, al, or ah, and restep the program each time to see what happens.  Try to figure out why the results either changed or did not change in the second print (or MSGBOX) statement.

Then replace SHL with SHR, which is the right shift instruction, and repeat the steps above.  You can also change the value assigned to the variable a, perhaps even use a negative number there.  Play is the best way to learn what your PC can do and how it really works.

When you tire of the SHL and SHR instructions, you can also play with ROL and
ROR, which are another type of shift instruction, and you can also look at how to use two registers in conjunction with each other, by loading each with a value (that's what the !MOV instruction does), then using commands like ADD, SUB, AND, OR, and XOR, with one register serving as the destination and the other as the source.  You will quickly learn with these added instructions, that the source and destination registers have to be the same size, meaning the same number of bits, or you will get an assembler error.
Title: Re: Let's Talk Assembler
Post by: Theo Gottwald on May 25, 2007, 08:06:06 AM
Quoteand what part will be done in assembly, can make for a very good mix.

I have a suggestion on this.
You can use the Profile Instruction on the program, then you see:
1. Which subroutines get how often called
2. how much of the time is used in which subprogram

Then we know, it would not make much sense to waste our time on sub's which are only called once.

We take a look at those sub's which use the most time.
And maybe those which use a lot time and are often called.

Thats a general rule:
The more often a Sub is called, the more rewardinga optimization will be at the end.

Sub's in Inner Loops for example are mostly rewarding.
Sometimes we may even think of using a GOSUB instead of a SUB in such cases, depending on the total runtime.
We have discussed this in another topic.

Another thing is, if strings are involved. Then its getting a bit more tricky.
Maybe thats something for an extra topic at times: ASM-Optimization when strings are used.

Title: Re: Let's Talk Assembler
Post by: Donald Darden on May 26, 2007, 10:23:01 AM
Hey, this post is not esclusively mine.  Anybody who wants to talk about any aspect of Assembly program is welcomed to do so.  Just take off with it.  And don't be afraid to criticize what I write either.  If I mazke a mistake, it can be fixed.

There is a large number of things that the x86 CPU design supports, and most assemblers follow the Microsoft Assembler mnemonics when it comes to what the various operations, registers, and data types are named.  However, it is possible that a certain assembler may lack support for some specific area in the CPU that
is the target machine for the program being developed.  Further, the family of x86 processors have evolved, and later models have more features than the earlier ones do.  To help you in this regard, to avoid doing something that does not fit with a given CPU's capability, you can usually designated the CPU type that you intend for the program to run on.  Since later processor models generally support reatures of earlier models, targeting an earlier model may mean that your program can run on more processors than if you target a later one.

By trial and error, I've found that there are some operations that are allowed by the x86 processor that are not supported with the Inline Assembler in PowerBasic.  You get an error when you try to use them.  Now there are three things you can do when this happens:  (1)  You can try to work around the problem by attempting a different instruction or sequence of instructions that should give you the same effect,  (2)  You can figure out what the hex code sequence is for the instruction that is not supported, enter this as a sequence of data bytes,  or (3)  You can tell PowerBasic support about your problem, and see if they will address it and fix it for you in their next release.

The first method is often the most expedient way to deal with this problem.  Now if you try an instruction, and it does not work, how can you be sure that it is not something that you did, or a misunderstanding on your part?  Well, you can consult a good book on assembler and try to puzzle it out yourself, or post a question on a forum devoted to assembly coding.  Or you can attempt the same thing in a standalone assembler and see if works there or not.  If it works, you can find out what the hex code sequence is and use that with the inline assembler, which was your second option above.  And if it does not work under a standalone assembler, you can try it with yet another assembler or conclude that what you are trying to do is simply incorrect or unsupported.

Now previously, I mentioned that the registers in the 32-bit CPUs had the same name as the 16-bit registers, but with an added "E" in front.  This is generally
true, but if you look at the registers in the PowerBasic Debugger, you will see that the CS, DS, ES, FS, GS, and SS registers don't have the leading "E" letter, but otherwise are shown to be 32-bits long.  That's fine, just something that you need to be aware of.  The CS register is the Control Segment, and in 16-bit architecture, used in conjunction with the IP (Instruction Pointer).  The DS and ES registers were previouly explained.  The FS and GS registers were introduced with the move to 32 bit architecture, and as far as I know, do not have a predesignated purpose.  The SS register points to the stack segment, and is used in conjunction with the SP (stack pointer) in 16-bit mode.

The question you have to ask yourself, why does this guy keep describing things in the 16-bit mode?  What do these things do in the 32-bit mode?  Well, to tell you the truth, I'm not exactly sure.  I haven't tried enough things yet to have all the answers, and the books I have tend to be a bit vague on details in some areas.  There is still lots for me to learn as well.

This is a small sequence of PUSH and POP instructions for you to study:
   
LOCAL a AS LONG
   ! push ebp               'save the current contents of ebp to the stack
   ! push esp               'save the current stack pointer on the stack as well
   ! pop eax                 'get the stack pointer off the stack into the EAX reg.
   ! mov a, eax             'copy the stack pointer value from EAX into the "a" var.
   ! pop ebp                 'and restore ebp to its rightful register
   ? a

If you compile and debug this code, and step through it, you can have the A
variable tracked in the variable watch window, and have the CPU register open to see any changes there.  As you step the code, make sure the ERR state in the variable watch window continues to indicate 0 (zero).  If you get an error,
PowerBasic will clear some of the registers and variables. 

Now we are going to try something a little different:

! push ebp               'save the current contents of ebp onto the stack
  ! push esp               'save the current contents of esp onto the stack
  ! pop esp                'restore esp contents from the stack to the esp reg.
  ! pop ebp                'restore ebp contents from the stack to the ebp reg.



Again, step the code and it should work fine, and no ERR should occur.

But now we are going to show you something that does not work:
 

! push ebp               'save the current contents of ebp onto the stack
  ! push esp               'save the current contents of esp onto the stack
  ! pop ebp                'restore esp contents from the stack to the ebp reg.
  ! pop ebp                'restore ebp contents from the stack to the ebp reg.


This time the PowerBasic debugger will give you a ERR 24 when you get to the third instruction.  What does this mean?  Well, an ERR 24 is associated with TCP and UDP connections, according to the Help file, so it obviously is not a PowerBasic error.  It must be coming from the Assembler.

Apparently, the Assembler is alerting you to the fact that you pushed the contents of one register onto the stack, and attempted to pop in back into another.  The Assembler probably meant it as a warning, just in case you did not mean to do this, but PowerBasic took it as a fatal error.

This then, indicates that some techniques used by programmers with a standalone assembler are not going to work as expected under PowerBasic.  But if this is true, then why didn't the Assembler (and PowerBasic) complain when we popped the stack pointer back into the EAX register in the previous sample?  The answer must be that the Assembler is selective about which operations or registers it is concerned with.  EAX is often used for all manner of processors,
but EBP and ESP are registers of some concern, and there may be validity checks in place for them.

So, the inline assembler may be trying to watch out to make sure that you do not do something stupid, and in the process, may keep you from doing something too clever.  But if it is a valid instruction, you won't really know unless you step the code and watch the ERR state for any change.

The displayed Registers in the Debugger also show you a set of FLAGS.  This may be preceeded by EFLAGS in the 32-bit nomenclature.  These correspond to various states, and control the outcome of some arithmetic operations and impact on how some conditional branch instructions work.  I could not find too may writeups on the bits in the FLAG register, so I will present this one as a guide:
     bits 31 - 12 (bits 15 - 12 in 15-bit design) - unused
     bit  11 - Overflow Flag (appreviated OV or OF, when clear is NO)
     bit  10 - Direction Flag (set is down, or DN; clear is UP)
     bit   9 - Interrupt Flag (set is Enable Interrupts = EI; 0 is Disable Interrupts = DI)
     bit   8 - unused
     bit   7 - Sign Flag (set is negative = NG; clear is positive or plus = PL)
     bit   6 - Zero Flag (set is zero = Z or ZR; clear is nonzero = NZ)
     bit   5 - unused
     bit   4 - Auxillary Flag (used to indicate a carry/borrow in BCD operations) interest)
     bit   3 - unused
     bit   2 - Parity Flag (set if operand bit count even or PE; clear if not or PO)
     bit   1 - unused
     bit   0 - set if carry (C or CY); clear of no carry (NC))


The capability to perform BCD (Binary Coded Decimal) uses each Hex (4 bits or
nybble) to count only between 0 and 9, forcing an early carry (or borrow).  This
is helpful in performing simple adds and subtracts of decimal values, but the
need to do more complicated math really required the use of numbers in binary or floating point form.  Consequently, BCD is rarely used or taught anymore.

The parity flag wazs intended for serial communications, to set or verify that each byte or word was sent and received without any bits being dropped.  However, as most serial communications involved dial-up connnections and the use of modems, which had their own parity and error checking capabilities, the use of parity instructions in the x86 has also been largely ignored.

The branch instructions all start with "J", such as JE (Jump if Equal) or JNE (Jump if Not Equal).  The one exception is the LOOP instruction, that tests the contents of the CX (or ECX) register before deciding to branch, and if not zero, it decrements the contents instead.  So you do not need to look at the flag bits directly, just use an appropriate branch instruction, which checks the corresponding bit for you.

The carry bit allows the results of an addition or subtraction operation to be extended to multiple words or registers.  The overflow flag gets set during operations involving the carry bit, when the outcome may be in error.  You need to read up on its use to understand it better.

Because the carry flag can be set or cleared with an instruction, it often is used to signal whether the result of some operation was successful or not.  Then you can use the JC (Jump if Carry Set) or JNC (Jump if No Carry) as a way to test the outcome.

Because math and comparison operations are an overriding concern, the x86 does not set or clear flags based on INC or DEC (increment and decrement) operations.  If you need to perform an operation that will change any of the flag states, but you want to retain the current flag state for later use, it is not uncommon to push the flags onto the stack, do your secondary opeation, then
pop the saved flags back into the flag register.

I mentioned before that there should be a pop for every push, and each push has to come before the corresponding pop. This is mostly true, at least when performing sequential programming.  But when you resort to branching, then you may end up with two or more branches having to have their own pop instructions to pull off all the previous pushes.  In other words, every exception follows some rule in its own way.

If you can find it, one of the best books available for 80386 architecture and design is called Assembly Language Programming for the Intel 80XXX Family by William B. Giles, copyright 1991 by Macmillan Publishing Co.  It is a hardcover textbook used in college courses, and I found my copy in a used book store.  There may be better or more recent books out there, but assembly language programming has lost favor in the last decade or so, Any other resource suggestions from readers of this thread would be welcomed.  The 80386 covers much of the essentials found in later CPU architectures, such as the 486. 586. 686. Pentiums, and further.     
Title: Re: Let's Talk Assembler
Post by: Charles Pegge on May 26, 2007, 05:36:15 PM
PDF based manuals are readily available from AMD and Intel websites - quite chunky documents in several volumes that tell you everything. No detail is spared but I dont how accessible the legacy documentation might be. I still have a MS Assembler pocket reference, which is quite handy.

These devices carry the layers of their own evolution. One of my first projects was designing an 8088 based board, to fit into A GEC Multibus system
based on the 8080.  But my best assembler experience was with the ARM processor which was the heart of the Archimedes microcomputers.  Now the ARM is used in many devices including printers, PDAs, games machines and mobile phones, because of its high performance and low power consumption.
The ARM has a Reduced Instruction set or RISC and sixteen registers, most of which are general purpose. The instruction set is very regular a permutational. This makes it very easy to learn and also to write efficient code.

Here is an example:


int gcd (int i, int j)
{
   while (i != j)
      if (i > j)
          i -= j;
      else
          j -= i;
   return i;
}

n ARM assembly, the loop is:
loop   CMP    Ri, Rj       ; set condition "NE" if (i != j)
                           ;               "GT" if (i > j),
                           ;           or  "LT" if (i < j)           
       SUBGT  Ri, Ri, Rj   ; if "GT", i = i-j; 
       SUBLT  Rj, Rj, Ri   ; if "LT", j = j-i;
       BNE    loop         ; if "NE", then loop



http://en.wikipedia.org/wiki/ARM_architecture

Note the SUBGT instruction, a conditional subtraction, which saves a conditional jump.

When you make a call with the ARM, the return address is placed in register 14
instead of being pushed onto the stack. Stacking is a separate operation, but this allows very efficient single level calls.

That the x86 architecture has come to dominate PCs, seems to be an accident of history. If there was an opportunity to repeat the PC revolution, I know which CPU to chose.

PS. Theo used to have an Archimedes, do I am sure he is also familiar with the ARM. The assembler was embedded in Archimedes basic.

Title: Re: Let's Talk Assembler
Post by: Donald Darden on May 27, 2007, 06:15:12 AM
I agree that the x86 architecture is not all that it could be, but the adoption of the 8088 and 8086 CPUs by IBM, the then computer giant of the age, for their first PC really put Intel on the inside track, and Intel's continued success has been that its ramped up family of CPUs can run legacy code by sustaining the old architecture with very few modifications, just adding various extensions that do not effect the register and instruction extentions.

The original justifications for the x86 are probably all gone, but what locks us into this antiquitated design is the operating system, first DOS, then Windows.  Linux has also focused on the x86 platform because it is the de facto standard.  And of course the hardware and OS together define the environment where your existing applications and new development must live and work.

You could break away and find a new architecture and OS, new applications, new development tools, and start over if you like.  Really, the only thing that is holding you back is what's available, what you are willing to put up with and do for yourself, and the very limited market space that you would be entering at that point.

I'm going to assume that most of you realize that is too great a journey to embark on, so like it or not, you are going to stay with the prevalent hardware and software combinations currently available.  Which justifies the continuance of this discussion.

There are separate conventions for handling two types of data:  Numeric and String, as well as aniother conventions for processing the contents of memory.

With numeric data, we read most significant bit or byte to least significant bit or byte from left to right.  We do this even with decimal numbers.  Thus, 1057 is read as one thousand and fifty-seven, not seven ones, five tens, and one thousand.  That is a matter of convention.  With words, any combination of letters and digits, and text in general, we follow two rules:  First we attempt to read left-to-right, then we attempt to read from top-to-bottom.  With column data, we read left-to-right, top-to-bottom, then left-to-right again.  In moving through pages of text, we turn the pages from right to left.  These are conventions adopted for most western languages, but the do vary in other languages.

According to these conventions, we would look at a 32-bit register as having its most significant bit, representing 2*31, situated at the left side of the register, and the least significant bit, representing 2*0, situated at the right side of the register.  Numbering the corresponding bit positions, across, we would see:

3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1                              \ Powers of
1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0    /      2

Bytes are 8-bit representations, and to represent the four possible bytes that
could be loaded into this register of 32 bits, we would see them organized like this:

|3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1    |               |   \ Powers of
|1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|   /      2
      Byte 4         Byte 3         Byte 2           Byte 1           Char. Rep.

If we were to express the first four letters of the alphabet in these fourt bytes,
we would have to show them this way, in order to be consistent with the numeric or byte representation:

|        D          |        C        |          B          |         A         |

Now this would seem backwards from the ABCD order of the left-to-right rule.
It is, but it is consistent with the major to minor rule, if the first byte is considered the minor byte.  And that goes along with the idea that the least
byte occupies a lower address in memory than the next most significant byte.

To put this another way, original 8088 chip design read memory one byte at a time, and advanced through memory from a lower byte address to the next higher byte address.  For a 16-byte memory, it read the low order byte first, so the low order byte always had the lower address.  Reading the low order byte first simplified the process of perform arithmetic operations, and also made it easy to increment and decrement register or memory contents.  So if you had
the whole range of capital letters in memory, it would appear in this order:

  low address  -->  ABCDEFGHIJKLMNOPQRSTUVWXYZ  <--  high address

If you then read the first four bytes into EAX, the second into EBX, then third
into ECX, and the fourth into EDX, this is the byte arrangement in those four
registers:

EAX:    DCBA
EBX:    HGFE
ECX:    LKJI
EDX:    PONM

It is still in the same sequence, but now looks backward in each register, because of the convention of the low order byte appearing on the right.  If you
stored these registers back into memory, then you would see this:

       low memory  -->  ABCDEFGHIJKLMNOP  <--  high memory

Now let's examine the EAX register briefly.  The EBX, ECX, and EDX registers would be arranged the same way:

|       Upper 16-bit word        |   AH (8 bits)    |  AL ( 8 bits)   |
|      "D"       |      "C"      |       "B"        |       "A"       |

As explained earlier, getting to the bytes that represent D and C requires rotating the register 16 bits to the left or right. then treating them as AH and
AL respectively.

The Carry flag performs another important function with regards to shift operations.  First, when you perform a shift or rotate operation, the last bit move to the left or right is copied into the Carry bit in Flags.  You have the option then to retain that bit and include it in some other operation, such as testing it with a JC or JNC branch instruction, or comibining it in a add or substract operation using ADC or SBB (Add with Carry or Subtract with Borrow).

Second, using rotate or shift (or even ADC) with other registers or memory, you can take the carry bit and merge it with the contents of that register or memory, effectively creating a long shift function that effects two or more registers or memory addresses.  Thus, it is possible to perform quad operations within either a 32-bit or 16-bit processor.  You can extend this basic capability to handle much larger integer types as well.

To use the carry bit effectively, you have to be aware of which operations change the state of the carry flag.  There are times when you have to preserve the state of the carry flag before carrying out further operations.  A JC or JNC
branch serves the purpose of remembering a prior state by the branch taken,
or you can use the ADC or SBB instructions to preserve the contents into a register or memory location, or you can attempt to save all the flag states before continuing what you are doing.

Handling Flags is simplified by two instructions:  LAHF, which stands for Load AH register from Flags, and SAHF, which stands for Save AH into Flags.  In the original 16-bit design, there were only seven flag bits involved.
You also have the option to save the flags onto the stack with PUSHF, and to return the saved flags from the stack with POPF.

One of the things you might want is an extensive help file on the Assembly instruction set.  You can look for a file named ASM.HLP, which I find quite useful.  I'm not sure where I originally found mine, but I have it associated with the PureBasic product, so it might be on that web site (www.purebasic.com).

My previous post, where I identified the flag bits in the Flags register, is somewhat expanded on by the information in the ASM.HLP file.  There, the following breakdown is available:

      |11|10|F|E|D|C|B|A|9|8|7|6|5|4|3|2|1|0|
        |  | | | | | | | | | | | | | | | | '---  CF Carry Flag
        |  | | | | | | | | | | | | | | | '---  1
        |  | | | | | | | | | | | | | | '---  PF Parity Flag
        |  | | | | | | | | | | | | | '---  0
        |  | | | | | | | | | | | | '---  AF Auxiliary Flag
        |  | | | | | | | | | | | '---  0
        |  | | | | | | | | | | '---  ZF Zero Flag
        |  | | | | | | | | | '---  SF Sign Flag
        |  | | | | | | | | '---  TF Trap Flag  (Single Step)
        |  | | | | | | | '---  IF Interrupt Flag
        |  | | | | | | '---  DF Direction Flag
        |  | | | | | '---  OF Overflow flag
        |  | | | '-----  IOPL I/O Privilege Level  (286+ only)
        |  | | '-----  NT Nested Task Flag  (286+ only)
        |  | '-----  0
        |  '-----  RF Resume Flag (386+ only)
        '------  VM  Virtual Mode Flag (386+ only)
        - see   PUSHF  POPF  STI  CLI  STD  CLD

One of the properties of the 286+ architecture is what is known as the Protected mode.  What it really means is that a certain instruction has to be executed for 32-bit addressing and registers can be accessed, this protecting any existing 16-bit code and data from accidently being interpreted as a 32-bit
instruction.  Setting the Protection mode just means switching on the 32-bit
capability.  In the 286 design, they forgot to include an instruction to turn the
protection mode off.  Once turned on, the only way to turn it off was to power off or reset the computer.  Some people referred to the 286 as having half a
brain, or even being brain dead.  This is an exaggeration, and the 386 was introduced to correct this deficiency and add some improvements, primarily an updated FPU (Floating Point Unit), which was slightly different than the original
FPU.  The 286 has since been largely ignored.

The 486 came out that integrated the CPU and FPU together.  However, the programming features introduced with the 386 have remained essentially the same in later designs.  The key difference has in graphical extensions, high speed instruction caches, and execution pipelines that make the present design more efficient and much faster.  Multiple processing cores are the present vogue, but it is the OS that decides how your program will be processed internally.

While the PowerBasic compilers put a few restrictions on you with regards to
programming in assembly language, it alleviates much of the headache that goes with writing assembly code from scratch.  And since PowerBasic also creates a sandbox (a reasonably safe place) for your assembly code to run in, some of the lacks involved are reasonably nonintrusive, insignificant, and immaterial.  You just need to adapt your coding style accordingly.  If that does not satisfy you, you can use a tool like MASM32, which is capable ot generaging DLLs of assemble routines that can be called from PowerBasic (or other programming language of choice).  PowerBasic also automatically provides you with the Protected mode access to 32-bit registers, memory, and extended instructions.
Title: Re: Let's Talk Assembler
Post by: Charles Pegge on May 27, 2007, 08:24:25 AM

Here is a little piece of PB assembler which shows how to use hex op codes mixed in with the assembly code itself. It uses the RDTSC - Read Time Stamp Counter which PB assembler does not recognise.

The Time Stamp Counter is a free running clock cycle counter on the CPU and is very useful for measuring the performance of your system/program very accurately with a resolution of a few nanoseconds and a time span of over 100 years, being a 64 bit counter.

Note how the quad value in the edx:eax registers is passed back into a PB quad variable.

The chunk in the middle is the code being tested. (As you can see, I am rather partial to hexadecimal).


#COMPILE EXE
#DIM ALL

FUNCTION PBMAIN () AS LONG
LOCAL TimeStart AS QUAD, TimeEnd AS QUAD ' for time stamp, measuring cpu clock cycles
LOCAL st AS QUAD PTR , en AS QUAD PTR: st=VARPTR(TimeStart):en=VARPTR(TimeEnd)

'---------------------------'
! push ebx                  '
'                           ' approx because it is not a serialised instruction
'                           ' it may execute before or after other instructions
'                           ' in the pipeline.
! mov ebx,st                ' var address where count is to be stored.
! db  &h0f,&h31             ' RDTSC read time-stamp counter into edx:eax hi lo.
! mov [ebx],eax             ' save low order 4 bytes.
! mov [ebx+4],edx           ' save high order 4 bytes.
'---------------------------'

'------------------------------'
! db &h53,&h55                 ' 2 ' push: ebx ebp
'------------------------------'
! db &hb9                      ' 1 ' mov ecx, ...
! dd 10000                     ' 4 ' number of loops dword
'! db &h90,&h90,&h90            ' x ' NOPs for alignment padding tests
'------------------------------'
repeats:
! db &hb8,&h00,&h00,&h00,&h00  ' mov eax,0
! db &hba,&h00,&h00,&h00,&h00  ' mov edx,0
'! db &hb2,&h00                 ' mov dl,0

! db &hbb,&h00,&h00,&h00,&h00  ' mov ebx,0
! db &hbd,&h00,&h00,&h00,&h00  ' mov ebp,0

! db &hbe,&h00,&h00,&h00,&h00  ' mov esi,0
! db &hbf,&h00,&h00,&h00,&h00  ' mov edi,0
'
! db &h49                      ' dec ecx
! jg repeats           ' 3     ' jg repeats
'------------------------------'
! db &h5d,&h5b                 ' pop: ebp ebx
'------------------------------'

'---------------------------'
! mov ebx,[esp]             ' restore ebx value without popping the stack
'                           ' approx because it is not a serialised instruction
'                           ' it may execute before or after other instructions
'                           ' in the pipeline.
! mov ebx,en                ' var address where count is to be stored.
! db  &h0f,&h31             ' RDTSC read time-stamp counter into edx:eax hi lo.
! mov [ebx],eax             ' save low order 4 bytes.
! mov [ebx+4],edx           ' save high order 4 bytes.
! pop ebx                   '
'---------------------------'

MSGBOX "That took "+STR$(TimeEnd-TimeStart)+" clocks."
END FUNCTION




Title: Re: Let's Talk Assembler
Post by: Donald Darden on May 27, 2007, 11:43:49 PM
Nice piece of code, Charles.  There are generally three ways to try and optimize code:  Perform a byte count and strive to reduce the size of the code; count the instruction cycles used by all the instructions and total them up; or use a timing loop to see how much time is involved during execution.   For the last, you normally repeat the number of times you execute the code to average out the time needed for one cycle, by taking the total time and dividing it by the number of repeats.

I like the fact that you used hex coding, then followed it with the corresponding ASM statement as a comment.  It shows an example of coding in hex, then your
use of a comment to explain what you are doing would certainly help others understand what is going on.  Comments can also be used to explain why you are performing certain operations as well.  Commenting code is even more important in assembly coding than it is in BASIC, because with assembly, you are taking baby steps rather than giant strides, and the context of what you are doing is often harder to grasp.  The focus is more towards interactions involving registers rather than directly with memory, so working with assembly, you have to remember what the relationship between the registers and the memory locations where the contents came from, or the results are eventually stored back into.  Since your memory is often limited to your immediate understanding (you will have forgotten this when you come back to it a year from now), you can again use the comments to signify important relationships, such as 'EBX holds INT PTR -> a'. or 'EBX = @a (LONG).

Title: Re: Let's Talk Assembler
Post by: Donald Darden on May 28, 2007, 08:00:42 AM
Looking at an assembly statement, you may see places where a set of square brackets are used.  They have a special meaning, which is to represent the act of indirect addressing.  Actually, I kind of think that a better term would be to call it redirected addressing.

If I were to enter an assembly instruction like this:
   
LOCAL a, b AS LONG
    a = 12345
    b = VARPTR(a)
    ! mov eax, a
    ! mov ebx, b


Then it should not surprise you that the register EAX holds the hexidecmal equivalent to the value 12345 in decimal, while the contents of EBX is the
pointer to where a is in memory.

As it happens, my knowledge of how the PowerBasic Compilers write code into the executable file is extremely limited.  Most of my experience was obtained with the earlier PB/DOS compilers.  Some things work pretty much the same,
but other things are quite different.  I found that some things need to be
verified as to how they work and if they still work before I can comment on them to any real extent here.

Here is an example that I just ran up against:  In earlier incarnations, the VARPTR(some dynamic string) pointed to a reference for that variable in memory, and that reference was composed of two parts:  A string pointer, immediately followed by the string length.  It would look like thism more or less:

VARPTR(stringname) -> STRPTR(stringname) -> First byte of string in memory
                           LEN(stringname) ----------------^

Now if a BYREF to a string was passed as a parameter to s Sub or Function, you would find the VARPTR() value in that parameter's position on the stack, and
when you did something like !MOV EBX.aa, and the variable was named aa, then
the value that was moved into the EBX register was where the string reference was in memory.  In the Assembly level, you never see the variable name, because that only has meaning to the compller.  The assembler only knows about addresses in memory or offsets from the stack pointer or base pointer.

Okay, you have the VARPTR() value for variable aa in the EAX register, and if you did a !MOV EBX.[EBX], you would move the four bytes beginning at that address into the EBX register, and this was (and is) the STRPTR() for the variable aa, and that is the first byte of the actual string contents.  So far so good.  We know where the string is in memory now.  But we also have to know the length of the string. 

Now we know how to use LEN(aa) to get the length of the string, but we are not permitted to do this:  !MOV EAX,LEN(aa).  Here you are trying to mix an assembly operation and a BASIC function together, and the Assembler is not into BASIC at all, and PowerBasic is not able to supply a constant value for the length of aa at compile time, because in use, aa could have any length you want it to have.  This is a dynamic string, remember?  So what do we do?  Well, if this worked the way I thought it did with the new compilers, you could have used an !MOV EAX,[EBX+4] and got the length into EAX before you did the !MOV EBX,[EBX} above.  But this doesn't work for me, and I'm still studying the resulting code, trying to figure out what PowerBasic is now doing instead.

Well, not to dispair, there is usually a way.  Here, all you have to do is have another local or static variable in your procedure, and set it equal to the length of the passed string variable before you then pass that value to a register using assembly language.  Here is an example of how all that could work:

SUB Example (aa AS STRING)
  STATIC a AS LONG               'local working variable
  a = LEN(aa)                    'use it to hold the length of aa string   
  ! MOV EAX, a                   'now pass the length to the EAX register
  ! MOV EBX, aa                  'get the VARPTR(aa) into EBX
  ! MOV EBX, [EBX]               'use that to get the STRPTR(aa) indirectly
  ...
END SUB


The new PowerBasic compilers seem to make an effort to mask where they put the Data segment, breaking it up into separate pieces.  It also seems to put a number of operations into appended code elements that are called with CALL statements, for which there are usually RETN (Return Near) exits.

I'm about of the mind that the method given here, for taking particulars about strings and passing them to the registers via temporary variables, would give you less grief overall, than trying to master PowerBasic's method of finding and returning the length of a string.  Don't forget, PowerBasic can determine the length of any string, whether dynamic or fixed, and whether terminated by a $NULL or not, so it can do more than you need it to do, and possibly take longer getting it done than you think strictly necessary.

As mentioned before, it is not uncommon to use the EBP register as an offset into the passed Stack, because the normal stack pointer (ESP) will continue to
reflect any additional pushes and pops, along with Calls and Returns, and the
EBP can be used to anchor the point from which to reference the stack.  But
depending upon when the !MOV EBP,ESP instruction took place, there may be
things on the stack BELOW the point where the EBP register is set to point to.

Well, that could be awkward, right?  Suppose there were things on the stack that were below where the EBP pointed to, how would you reference them?  The answer involves the fact that if you add a negative number to a positive one, it is exactly the same as subtracting it.  Suppose I wrote an expression of n = n + (-1), then simplified it.  I would get n = n - 1, even though I did specify an add operation.  In the computer, I could have an instruction like this:

                   MOV EAX, DWORD [EBP+0FFFFFF78h]

That long value that starts with "0", has the zero in front to specify that this is a number, and the "FFFFFF78h" specifies a NEGATIVE value to be added to the
EBP register's contents during this operation.  The DWORD tells the assembler
to read a 32-bit value from that location. 

Now you can be that the ESP, whatever its value is, is set below the point being referenced here, because this is the only assurance that the contents of the stack at that point would be stable and valid. 

Note that the MOV EAX, DWORD [EBP+0FFFFFF78h] operation does not change
the contents of EBP.  That's just a computation done to determine the source.
The contents of EAX is what changes, to duplicate what is found at the source.
The source is only copied, so it is not destroyed either.  Only when the computed address is to the left of the comma, are you changing the contents of memory, because that is when it becomes a destination.

Title: Re: Let's Talk Assembler
Post by: Charles Pegge on May 28, 2007, 11:44:17 AM
This is Intel's definitive reference on all the the op codes and their precise actions. Its a PDF. Keep it on your desktop but don't try to print it out  ;D

Fortunately most of the common instructions are easy to remember and the tables in the appendices provide a good quick reference. But if you ever need chapter and verse then there is a little essay on each instruction.

Beyond the main x86 and x87 codings, things start to deviate. There are 3 producers I know of, and they are diverging from each other: Intel AMD and VIA. So to use advanced features, you will need to consult their own manuals.

Instruction set reference:
'Intel Architecture Software Developer's Manual Vol 2: Instruction Set Reference
'http://developer.intel.com/design/pentiumii/manuals/243191.htm

Title: Re: Let's Talk Assembler
Post by: Theo Gottwald on May 28, 2007, 12:45:50 PM
Nice introductions, Donald. I did some Layout changes as my contribution to your nice ASM-Intros.

Its not more like a [ code]  .... here comes the code [/code ] at the end of the code.
Leave away the spaces inside the tags, I just did them here, to show how its done.

Your writing style is easy to follow, thats why I like your postings.
Title: Re: Let's Talk Assembler
Post by: Donald Darden on May 29, 2007, 04:40:16 AM
Aside from the number of bytes allocated to a numeric type, which can be from
1 to four in 32-bit architecture, or 1 to 8 in 64-bit architecture, you can break number types into three forms for use with assembly code.  The first is the signed integer, the second is the unsigned integer, and the third is floating point, which is always signed.

Computationally the signed and unsigned integers are processed identically.  So how do they differ?  It is the way the results are tested.  With signed integers, the setting of the most significant bit always signal a negative number, and negative numbers are always deemed smaller than any positive number, where the sign bit is clear.  Jump instructions that check the results of signed computations or compares are:  JG, JGE, JE, JL, JLE, and JNE.

Unsigned integers would include ASCII code characters in bytes, or even Unicode
characters in words.  In unsigned integers, setting the sign bit merely shows that that the value is in the upper range of that integer type, not in the negative range as with signed integers.  Thus, a different set of jump instructions are available for testing the results of operations involving unsigned integers:  JA, JAE, JE, JB, JBE, and JNE.

If you can remember that Above and Below are checks for unsigned results, and Greater and Lesser are checks for signed results, you should have no probelm in this area.

Floating point values have their own internal format, which includes two sign bits, one for the mantissa and one for the exponent.  Here are the 32-bit and 64-bit floating point formats defined by the IEEE International Standard:

[code]
  S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF
  | |      | |                     |
  | |      | |                     3
  0 1      8 9                     1

S EEEEEEEEEEE FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
| |         | |                                                  |
| |         1 1                                                  6
0 1         1 2                                                  3                                                      3

Note that the IEEE format counts bit positions from left to right, but that in PC
coding, we read bits in their major to minor order as respective posers of 2.  Thus, we would tend to regard the above layout in this manner:

     Exp.        Signed Mantissa
S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF
| |      | |                     |
3 3      2 2                     |
1 0      3 2                     0

      Exponent                     Signed Mantissa
S EEEEEEEEEEE FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
| |         | |                    |                             |
6 6         5 5  <- 2nd DWord ->   |   <- 1st DWord ->           |
3 2         2 1               32-> | <-31                        0

Every number is presumed to either be an integer, a fraction, or a combination of
an integer part and a fractional part.  We use d decimal point (a period usually) to mark the separation between the integer on the left, and the fraction on the
right.  When we have only an integer part, we usually forego use of a decimal
point.  So $5 and $5.00 mean the same amount.  In scientific notation, we can indicate how many trailing or leading zeros are needed in order to position a numerical value correctly with respect to the decimal number,  A positive exponent means add additional trailing zeros as necessary.  A negative exponent
means add additional leading zeros as necessary.

The same general idea holds with Floating Point numbers, but here we are dealing with powers of two, not powers of ten.  We still mean add leading or
trailing spaces, but now each zero means to double or divide by half, not by a
factor of ten as in decimal arithmetic.  So the signed exponent tells us how
big or small the number really is, and the signed mantissa (the numerical value bits themselves) tell us whether the number is positive or negative with respect to zero.

Floating Point gives us the ability to represent extremely large or extremely small values with a fair amount of accuracy and precision.  The more digits used in the
Floating Point form, the greater the range, accuracy, and precision.  But any
computations that involve floating point numbers is very slow by comuter terms,
and best avoided, unless really needed.  You can generally recognize any instruction that involves the Floating Point Unity (FPU) in Assembly because it
will begin with "F".  And that is all the discussion about floating point numbers for the present.

It's been suggested that we look at stringx as well.  This is a good time for
that.  In general, we recognize four types of strings here:  One is the fixed
length string.  We know the size of the string, so all subsequent operations on
that string are against a fixed length.  No mystery and no muss.

The second type of string can be of any length, so we call it variable length,
but it's end is marked by the use of a zero value byte, or what we call a NULL
byte.  We use $NULL in PowerBasic to represent this null byte.  This is the
most common string type dealt with in C, C++, and some other languages.

A third type of string is a mix of the first two.  That is, it is defined to have a
maximum length, but the actual end can vary and will be marked by the
presence of a $NULL byte.  This type is often used with calls to the Windows
APIs.  In that context, it is also called a buffer.

A fourth type is the dynamic, variable length string. and is the default string
type found in many BASICs, including PowerBasic.  A dynamic, variable length string has a separate parameter associated with it called LENGTH, which tracks how may bytes are currently assigned to that string.  This string type has several advantages, from not being limited in length, but adaptive; and being able to contain zero value bytes, which the $Null terminated string types cannot hold.

There is a fifth type of string structure, but it is really just a fixed type string that is associated with a UDT, the User Defined Type.  Whether your UDT is
made up of string elements, pointers, integers, bytes, or other fixed length
strings, the whole of the UDT can be handled as though a fixed length string in its own right.  And it can contain zero value bytes with no problem.

Handling string usually involves two things:  The first is where does the string
start?  This is a pointer value, and normally points to the first byte's address.
The second question is, how long is the string?  For this, you need to know what type of string you are dealing with.  Sometimes it is necessary to convert a string from one type, say a dynamic variable length string, to another, such as a ASCIIZ string.  This is easily done in PowerBasic.

So now you presumably know the type of string you are going to handle, and
you need a plan for doing this.  You have the pointer value, so where should you
put it?  Among the typical places would be the EBX register, or the EDI or ESI
registers.  EBX is very good, because it works well with enhanced instrcutions for indirect addressing.  In 16-bit architecture, the segment:register pairs most often associated with handling strings are DS:SI and ES:DI.  Thhe DS:SI were most commonly used for reading data from memory, and the ES:DI pair were most commonly used for saving data back to memory.  The SI and DI registers were specially designed to automatically increment by some count if the direction flag was set to UP, or decrement by some count if the direction flag was set to DN, for certain instructions.  The "some count" was determined by the size of the register involved - by 1 for 8 bits, by 2 for 16 bits, and by 4 for 32 bits.

So, should you chose EBX, ESI, or EDI?  Well, much depends on what you are trying to do.  You can't really go wrong with any of these, but often, you will find that certain advantages may favor the use of one over another.  Automatic
incrementing or decrementing can be beneficial when processing strings.  There
is a REP (repeat) instruction that is designed to work with string instructions to
get the fastest possible execution done on certain types of operations involving
strings.  Some of the string instructions include  CMPS, CMPSB, CMPSW,                 CMPSD, LODS, LODSB, LODSW, LODSD, MOVS, MOVSB, MOVSW, MOVSD, SCAS,                SCASB, SCASW, SCASD, STOS, STOSB, STOSW, and STOSD.

Note that the REP and LOOP instructions involve the use of the (E)CX register as a counter that counts down to zero.  So if you have a fixed length string, you
put the maximum size of the string into the CX or ECX register and use REP to
fast downcount just one string operation, or LOOP to terminate a series of string instructions.  If you are using a zero (NULL) terminated string, you set the contents of CX or ECX to the maximum possible string length, but then your test would be modified to test for a zero or non-zero content, such as LOOPNZ.

The use of REP with some string operations means very fast processing, but are not very adaptive.  A key case were if you were looking for any letter A to Z, or
any digit 0 to 9.  Nor does it help in case involving looking for either an upper
or lower case letter.  If you are looking for an exact match, then the REP
works very well to find a given byte, word, or dword.  However. since the
increment is determined by the register size, trying to test for four consecutive
bytes by setting them in a dword would force an increment by four, and that
would mean only checking every group of four, not every byte sequence of four.

In an idealized architecture, these shortcomings would be addressed.  So instead, programmers struggle to find optimum solutions for their needs within the scope of what is available within the existing architecture.[/code]
Title: Re: Let's Talk Assembler
Post by: Charles Pegge on May 29, 2007, 10:05:25 AM
Good Morning Donald,

Your discussion on the floating point processor is very timely for me, as I am researching fpu op codes today. The problem with the FPU is that it is rather loosely integrated with the CPU. In fact they started out as two separate chips sharing the same bus with a sync protocol for passing data between them. To ensure correct operation, every maths operation had to be preceded by a WAIT (9B). Although the two chips became one with the 486, they still behave as separate devices in many respects.

Not only do they have totally separate registers, but the FPU registers are arranged as a stack of 8, and when you load a variable,  it goes on to the top of the stack and the other registers are pushed down. That means then when you have finished computing a floating point expressing you have to leave the stack as you found it, and ensure that when you store values,  they must be popped from the stack, if they are no longer required.

Here is a function for adding 2 numbers together

Powerbasic

function adds(byval a as double, byval b as double) as double
!  FLD  a                 ; loads and pushes value onto stack
!  FADD b                 ; add to the value in the top of the stack
!  FSTP function          ; store the result and pop the stack
end function



Freebasic

function adds(byval a as double, byval b as double) as double
asm
FLD qword ptr [a]                 ' loads and pushes value onto stack
FADD qword ptr [b]                ' add to the value in the top of the stack
FSTP qword ptr [function]         ' store the result and pop th stack
end asm
end function


Title: Re: Let's Talk Assembler
Post by: Charles Pegge on May 29, 2007, 01:36:30 PM
On the subject of string loops:  LoopNZ Rep etc

It seems that these clever loopy instructions available on the x86 are not as efficient as the elemental instructions. Probably it's because they are microcoded rather than hard coded instructions, and require interpretation before being streamed into the execution pipeline.

So the good news is we don't have to learn them anymore to write the most efficient code. On contemporary CPUs the fundamentals do it better.

Quote
Paul Dixon:

I haven't read through the whole of your code but if you want it to be a little faster..
The LOOP opcode is slow compared to coding the same thing yourself. You should try to avoid using it. ...

.....

Charles Pegge:

I confirm that LOOP takes a lot longer than DEC ECX: JNZ short ..
In an empty loop with 2gig repeats, the LOOP instruction took
3 seconds instead of 2 seconds (Athlon 3200).


http://www.powerbasic.com/support/forums/Forum8/HTML/003574-4.html



Title: Re: Let's Talk Assembler
Post by: Donald Darden on May 29, 2007, 11:28:08 PM
The REP and LOOP instructions perform several operations involving certain other
string operands, and consequently, involve quite a bit of overhead.  The detection
of the direction flag, the automatic increment or decrement of the (E)SI and (E)DI
registers, the test of the E(CX) register for a zero value, and the automatic decrement of the E(CX) register if it is not zero, then the branch (jump) to some other location if the condition being tested for is met.

Yes, your "fundamental" instructions are faster, but then you need more instructions to do as much.  So it is not a clear case of one or the other, but just using what works best under the circumstances.

LOOP instructions act like upside-down FOR ... NEXT statements, where the FOR is set up initially (here you would precondition the E(CX), and possibly the E(SI) and E(DI) registers in assembly), then you perform the LOOP, which acts like the NEXT. in that it performs the test and the necessary increment or decrement.

While you can set the LOOP instruction at the top of a loop range, or somewhere within, the area defined by the loop itself is generally governed by the branch address included with the LOOP instruction and any additional jumps that return
you to some part of the loop range, or take you out of the loop.  Thus, trying to analyze LOOP logic in assembler can be much more complicated than looking at a BASIC statement with its nice, neat FOR ... NEXT structure.

Additionally, there is no STEP size involved with a LOOP instruction, it is always a decrement of one (1), and the test for zero comes BEFORE the decrement.

Speaking of steps, that is another topic that needs to be understood.  One of the original chips of the X86 family, the 8088, addressed memory in terms of bytes, just 8 bits.  In reading memory for 16 bits, it read the lower 8 bits, then it read the upper 8 bits from the next address.  For compatability, the 8086 and
later chips still look at memory as though it were organized by bytes, when actually it is usually by word (16 bits) for most 16-bit CPUs, and dwords (32-bits) for most 32-bit CPUs.  But the convention is still to support access to memory by bytes, so offsets from pointers use increments of 1 for bytes, of 2 for words, and of 4 for dwords.  Naturally, with 64-bit architecture, or support for one of the PowerBasic data types, you also have increments of 8 for a quad.

Now the stack is a form of memory - in fact, it actually is part of you main memory, just set to work from someplace high in memory and work backwards down through memory addresses, rather than starting near the lower range and working up as with other memory addressing modes.  Because the stack is a part of memory, it is not a fast mode of addressing as the registers are.  It also requires a register pair of its own (SS:(E)SP) to manage it, by pointing at the current bottom of the stack.  Every program generally requires its own stack space, and you can have multiple programs and processes running at once, so the effort to keep the single SS:and E(SP) registers pointing to the right stack when performing any instructions in the corresponding program or process is a challenge that the Operating System handles transparently for you.  .     

But an oddity of the stack is that while it also recognizes memory as being organized in bytes, it really can only push and pop its contents based on word
sizing.  That is. you cannot push just AL or AH, or any other 8-bit byte, you have to push or pop at least 16 bits at once.  So the information on the stack will always be found in increments of two, and any push or pop will be in increments of two as well.

So why should you care about this?  Well, it does tell you that if you put the
current address found in E(SP) into another register to reference anything on
the stack, or even use the stack pointer itself with an offset, that offset will always be 0, or 2, or 4, or any other multiple of two when finding the leading
byte of any item placed on the stack.  It will never be odd.  It also tells you
that if you put the current flags on the stack with a PUSHF, that the flags will occupy two bytes, the upper byte will will be forced to zero where there are no corresponding flag bits, and appear on the stack first, above and before the second byte, which are the flag bits.  In fact, even though the stack works downwards through memory rather than up, it respects the low-byte then high-byte, or low-word then high-word order used with other references to memory, by decrementing the stack pointer, storing the highest byte first,
decrementing the stack pointer, storing the next highest byte, and so on until
the item is fully copied to the stack.

By the way, if you decide to store individual registers to the stack, as discussed
before, you generally pop them in reverse order from the sequence in which you
pushed them.  If you push them all with a PUSHA or PUSHAD instruction, then
pop them with a POPA or POPAD instruction, the architecture performs this reverse sequencing for you.  I've not yet found a write up that describes the sequence used with POPA or POPAD, but by knowing the contents of each
register beforehand, it would be possible to examine the stack frame and figure
this out.

Should you ever consider consider pushing all the registers onto the stack?  If you are not going to use the stack, and limit your use of the registers, it may not be necessary to save any of the contents.  If only a couple of registers need to be saved and restored, some people prefer to save these in local variables, and others may decided individual PUSH and POP instructions will suffice.  It is often a matter of programmer's preference.  PUSHA and POPA are
easy to do, and cut down on mistakes, and the stack memory is immediately returned for further use.  But with all that pushing and popping going on, there
would be a small performance hit each time you do this.

Let's be clear about something else.  The move (MOV) statement always just
copies information TO a source FROM a destination.  The information still remains at the destination; it is not destroyed in the process.  But when you POP something off the stack, it is gone from the stack.  It may actually be in memory below the place pointed to by the stack pointer for awhile, but there is nothing to protect it there, and it will be overwritten by subsequent pushes or call
statements.

I believe we are getting to a point where most of the general observations about assembly programming have been more or less covered.  If you have been reading along, some of the mystery may have gone out of the topic by now.The next stage would be to consider specific cases and see how it is done, then take
the code and tweak it some more yourself.

I am going to propose several small exercises here.  The first is to take a
dynamic, variable length string, and go through it, converting any lower case
characters to upper case.  Then do the same thing for an ASCIIZ string.

The second exercise is to take a dynamic, variable length string, and switch the
first byte with the last, the second with the next to last, and so on.  PowerBasic already has a command for doing this called STRREVERSE$(), but you can write
your own version.  And again, adapt it to work with an ASCIIZ string instead.

Post your work as replies to this post, and let's see what works best and why. 
Title: Re: Let's Talk Assembler
Post by: Charles Pegge on May 30, 2007, 10:51:35 AM
From the CPU's point of view, being aligned to 32 bit words helps to maximise the performance, and memory is absurdly cheap compared to how it was thirty years ago. The main byte bottle neck is networking bandwidth, for which data compression provides a solution.

I once worked with a Chinese IT administrator, he was trying out a Mandarin version of Windows 3.5, but they were using DOS based systems in which 2 letters could be keyed to get one Chinese character.  So yes standard keyboards are always going to be less convenient for special characters but unicode could be useful in a variety of specialise keyboards and other input devices. Perhaps APL could be revived with a keyboard of the right sort, something that could share the regular keyboards USB socket.

Now personally I would like to see a Welsh keyboard, since the Welsh language does not use K Q V X Z these keys could be freed up to do more useful things,  ;D

Title: Re: Let's Talk Assembler
Post by: Donald Darden on June 02, 2007, 08:01:54 AM
Alignment on 32-bit boundaries is good for indirect addressing purposes, as it means all 32-bits can be read in one read cycle, not in two.  But the instruction set of the x86 CPU ranges from one byte to many, and consequently, you cannot have a program that is guaranteed to always be read in one cycle - some memory references will read up two or more instructions, some will require only one read cycle, and some will still require two.

Efforts to speed up the CPU's performance  with pipes, prefetch, caches, and multicores have greatly helped, but blur the distinction as to what works best.  Obviously, there are things that can be done to improve performance, but it is rarely a case of do this or don't do that anymore.  Clients want more speed, then they may need faster computers with more memory.  It beats killing yourself trying to max out an old box.

When I learned the five-bit teletype code, it was a limited character set.  Just 26 upper case letters, a shift up and shift down key, carriage return, line feed,
and a break key.  That took 31 unique codes.  The number keys to placed on top the letters keys, along with a number of punctuation symbols.  You shifted
up, you had numbers and symbols.  Shift down, and you had letters.  I wrote
routines to convert old teletype tapes to ASCII and EBCDIC code at one time.

If I were mapping teletype code onto bytes. I would only need five bits sof the eight, and could thereby store eight characters in the space of five normal 8-bit characters.  If I went to a six-bit code, which supports 63 characters aside from a null, I could include my lower case characters as well, and have two keys represent a function shift option to extend the code to do more.

If I were smart about the present ASCII code set, I would limit myself to just
using 126 key codes, plus the null.  I would use my upmost bit to flag that my
code was not using the values 128 - 255, but that the code required a second
byte.  By setting the uppter byte, I tell my system that I need two bytes, not just one.  If the upper bit of the second byte is also set, that I need a third, and so on.  That way I could potentially "grow" my character set by adding any
additional code sets that I might need, and at the same time, commit myself to
supporting the previous sets as they become defined.

Unicode allows for code combinations from 256 - 65,535.  Big deal.  So who is
managing that growth, and what if someone decides that they want 50,000 symbols to represent Chinese?  What does that leave everyone else?  On top of,
we will all have to adapt from our existing one-byte character set to support
the two-byte code set, and no gain for us in the process.  My method just says
forget the special codes above 127, which no two fonts support exactly alike anyway.

As to mention of using color codes and 32-bit pixels to represent characters, that is for only one pixel, not for a whole character.  Think about the immensity
of supporting any text of any size and color, and trying to interpret characters and words for a text search.  Pictures are said to be worth a thousand words, but no two people would see or describe a picture the same way.  Words, properly used, can convey meanings as precisely and skillfully as the language will allow. 



 
Title: Re: Let's Talk Assembler
Post by: Charles Pegge on June 02, 2007, 10:34:45 AM

Digressing into the delights of Unicode:

http://unicode.org/charts/

Chinese and Indic scripts are very well covered by unicode. Ancient scripts like Cuneiform too. There are even proposals for Egyptian Hieroglyphics but since there are over 700 of those, - that takes up quite a lot of space for an obsolete script.

Different languages need  to be symbolically represented with a uniform system, as this facilitates multilingual translation as well as displaying script efficiently.

But yes I think you are right. For the computer-human interface, the Anglocentric 7 bit ascii is going to remain the standard for a long time to come. And it is easy for the computer to lex into names, numbers, punctuation etc.

Ideally we would have single symbols to represent single abstract concepts, as is done in pure mathematics, but these are not really part of our linguistic heritage, and we would have an excessively large symbol set if we tried to invent an individual symbol say for each intrinsic function within BASIC or an Assembler instruction set, though this is more a topic for the Computer Languages thread.

Title: Re: Let's Talk Assembler
Post by: Donald Darden on June 02, 2007, 07:23:06 PM
There is a marked difference between phonetic written languages. that try essentially to have symbols for each common sound, which then can be used to represent any word by grouping symbols together into words, and languages that attempt to represent each and every concept with its own unique symbol.

Attempting to use Unicode as the common vehicle to combine both approaches into one shows a lack of regard for the essential simplicity of phonetics.  You create a complex representation that really does nothing for anyone.

It does not make any sence that someone wants to classify a dead language as a suitable target for Unicode, because they can codify that language or any other in its own best-suited manner.  It's not like anybody is ever going to study Unicode and begin using it for everyday conversation, trying to pick up the nuances of some unknown turn of symbols.  The real future is going to be in fast and accurate slanguage-to-language translators and the adoption of common sets of languages for businesss needs.

I've heard over the years that the Chinese have had a real struggle with adopting the use of a keyboard because of the difficulty with creating pictoral representations of their written language, and that a number of different methods have been tried and gained limited acceptance.  The goal has mostly been to try and reduce the necessary set of symbols to a much smaller, more manageable set, or the use of a special keyboard where different key combinations would serve to mark different portions of one symbol.  Right.  Just what everyone wants.  A new keyboard where you design your own symbol by striking several keys at once and simultaneously.  I'd like to see anyone learn speed typing on that.

Unicode will never gain widespread acceptance, but I do expect some limited gains regarding a few choice languages, but most people will not use the extensions because they will not have all those languages in common.  It's like trying to make the traffic laws of every nation on earth uniform. where we all drive on the same side of the road.  Why should the local populance be discomforted and forced to adapt, just because some foreign visitor finds it odd that we stick to doing things our own way rather than conform to their standard or desire for uniformity?

Along similar lines, it has been the thought of some that there should be a universal computer language, a universal human tonge, and a universal written language.  Ever hear of Esperanto or Interlingua?  How about Energy Systems Language?  Artifical languages are always being devised and touted as great advances, even adopted by some, but all they do is create more choice, they to not supplant what we already have.  English is the most common language used in the world of commerce, not because it was the language of choice, but because it followed the influence of leading nations that used it as their primary tongue.  There will be a Unicode, but it will not supplant existing 8-bit codes in existing context.  There will be no driving force to change this, because there is no real need to do so.
Title: Re: Let's Talk Assembler
Post by: Charles Pegge on June 02, 2007, 11:04:40 PM
Yes there are Darwinian forces at work in the world of coding standards. Whatever is best supported and most convenient to use will win in the end.


x86 calling conventions

http://en.wikipedia.org/wiki/X86_calling_conventions

I came across this article the other, and found it very informative. Of particular interest were the new calling conventions of the 64 bit x86 in Long Mode. Instead of pushing all your parameters onto the stack, the strategy is to take advantage of some of the extra registers and use them instead.

In many instances, it is possible to avoid using the stack altogether and achieve very efficient functions.

From a coding point of view the x86 begins to look like a RISC processor.


Title: Re: Let's Talk Assembler
Post by: Donald Darden on June 02, 2007, 11:17:21 PM
Huh!  The use of registers for calling Interrupts under DOS was also used with the 8086/8088 architecture.  But as they wanted the Interrupts to do more and more,
they moved to the concept of passing input paramters via the stack, or by setting up a buffer or external memory area to be referenced, and the return values either put in AX or a combo of DX & AX.  Since the Stack is another way of using main memory, it is slow when compared to relying on registers, but more flexable in that you can put whatever you need on there.  With COM programming, the concept changed yet again in that you pass a pointer to a structure, and the structure can be set up any way that you want.

You watch, because there will be a split between the camp that wants to do everything in a way that works best with 64-bit architecture, and another group that will strive to maintain compatability with 32-bit architecture as well, for the greater market share, or just to continue with prescribed methodology.
Title: Re: Let's Talk Assembler
Post by: Charles Pegge on June 03, 2007, 12:03:01 AM
Well Linux was first in with a 64 bit version of their operating system. With a bit of smart compilation, it cant that difficult to make the transition. Only a few of the original opcodes are reassigned to new instructions, and smart optimisation by the compiler will exploit the new registers.

But to rewrite the operating systems from scratch seems impractical in view of the massive bulk of code involved. We will have to use AI to clean up the knotted tangle, and put the code into a more adaptable form for future architectures.
Title: Re: Let's Talk Assembler
Post by: Charles Pegge on June 03, 2007, 12:13:03 PM
A Framework for Assembler Functions

When doing any substantial amount of assembler, it is essential to have a
regular framework that can be consistently applied and meets the needs of the task.

Here is a simple framework I am planning to use, for 3D Opengl, involving real time graphics. Often you have to work with three or four term vectors but the functions in most high level languages only return single values and it is not always convenient to pass pointers.

In this scheme, parameters and local variables are referenced by the ESI index register, and each function has a set workspace of 16 8 byte slots,
the bottom four being reserved for parameters shared with the parental function to the top four being reserved for child functions.

The 8 byte slots can be used to accommodate LONG QUAD SINGLE or DOUBLE datatypes.

This is how parameters are set up before a call

mov  [esi+&h60], ..
mov  [esi+&h68],.
mov [ esi+&h70],..
call fun


The callee function then adds &h60 to ESI and then accesses the parameters as [esi+0], [esi+8], [esi+&h10] and so forth.  The locations [esi+&h20] to [esi+&h58] are avail for an additional 8 local variables. Above this, there are four slots available for passing down to the next level of functions.

Prior to returning, all that has to be done is deduct &h60 from the ESI register.

For calls outside this scheme, cdecl or stdcall can still be used since we have not committed the EBP register to any specific use, and we have not made any assumptions about the stack frame.

In addition to the ESI and EBP registers,  we can make use of the EBX and EDI registers. I intend to reserve the EBX register  to hold the base address for the application's shared or global variables and the EDI register as the base address of an object if one is being used.

With this scheme there are no absolute addresses so the code can be loaded and executed anywhere.

Because it uses a fixed frame size and minimises the use of the stack, verifying, testing and debugging should be a lot easier.


Title: Re: Let's Talk Assembler
Post by: Donald Darden on June 04, 2007, 04:41:36 AM
I want to get into the x86 instruction set a bit more, but I have to first explore some of the tools available to me.  First, I want to see if I can provide a link to the ASM.HLP file I have zipped and attached to this post below.  It's now in the download section as well if you need to find it easily in the future.

This file really does a lot to help you understand assembler and how to program the x86, but it can be daunting to look at, and I want to discuss one of the commands in some detail to help you understand the way to interpret the Help File.

Okay, that worked as expected.  Unfortunately, I can't control where the ADC bitmap goes, this forum software insists on putting it at the bottom and made so small that you can't read it at first glance.  But if you click on the image, it will expand, and if you click on it again, it will toggle back to small size.

The ADC instruction was selected because it works with several of the most important flags.  It also is influenced by the current setting of the Carry Flag.  Note that the mnemonic format is ADC dest,src.  This means you use ADC, and you follow it with any valid destination, then a comma, and any valid source.

The rest of the bitmap page attempts to explain what those valid references
might be, the difference in the way the instruction worked on the different x86 platforms (that's what the 286, 386, and 486 columns are for), the number of
clock cycles needed for the instruction (a way of determining performance), and
then the portion of the code that determines the destination, and finally the portion of the code that determines the source.

The use of "reg" refers to one of the named registers.  If it says reg8, then it is an 8-bit register.  If it says reg16, then it is a 16-bit register.

The use of "mem" says that this reads from a memory address.  The "imm" refers to a byte, word, or dword portion of the instruction itself.  For instance, if you
wrote an instruction ADC al,3, which means to add with carry, a three to al, then the "3" would be a constant, and embedded in the instruction itself, actually taking up one of the 8-bit bytes that forms the instruction.  We know it is just 8 bits (a byte), because al itself is just 8 bits, or one byte in size, and in all cases, we match the same size operands.

For brevity, you may also see something like "r/m8" which means that either a register or memory location of 8 bits may be specified.  You should be able to understand r16, m8, or m32 at this point as well.

When you write or read assembly code, you will sometimes see something like MOV al,"a".  This again would be a constant, and treated as an immediate part of the instruction, but the assembler would understand that the double quote would mean the character "a". which has an ASCII code of 97 in decimal.

Note that the vast majority of modern computers use binary (two state) circuitry for the greatest efficiency and reliability, into what we refer to as "bits"..  Two states can be interpreted as any conditon where only one of two outcomes is possible, such as true/false, yes/no. go/nogo, 1/0, on/off, plus/minus, set/clear, up/down and so on.  By grouping multiple bits together into groups of four (nybbles), 8 (bytes), 16 (words), 32 (double-words or dwords), and 64 (quads), we can represent quantities as well.  The most common grouping for presentation purposes is in nybbles of 4 bits, which we see
as the digits 0 to 9, followed by the letters A to F.  These correspond to a count range from 0 to 15 decimal, and represents a binary number shown in Base 16.

The x86 assembler probably expects most numbers to be entered in decimal form, rather than hexidecimal, octal, binary, or other representation.  That is
because we humans are usually taught to work with base 10 numbers first, and many of the things we do involve base 10 values.  So if you just enter a number like 100, that is one hundred in decimal.  If you want it to represent 100 in hex,
you would trail it with an "h" character, so 100h = 256 decimal.  Now there is an area of possible confusion, where you might have a number like ab, which since this is hexidecimal (base 16), we would represent as abh.  Now how is abh to be understood to be a number, and not the name of some variable?  The answer is, that all numbers must start with a digit 0 to 9 which tells the assembler that this is clearly a number.  So to the assembler, abh would be the name of something, and 0abh would be a hexidecimal number.
Title: Re: Let's Talk Assembler
Post by: Donald Darden on June 07, 2007, 08:40:29 AM

push ebp
mov ebp, esp
push ebx
push esi
push edi
sub esp, 64h
push offset sub_4010C7
xor esi, esi
push esi
push esi
push esi
push esi
nop

The code above is a small extract from an ASM file that I produced from a small EXE file.  How did I do this?  With a product called IDA Pro, freeware version 4.3, available from the SimTel network.  I downloaded and installed the product, then ran it and gave a path to the Test.exe file I wanted it to process.  When I had the file, I could examine it in depth, seeing how PowerBasic put it together.  But since I was really only interested in analyzing my small piece of it, I performed a search for the first NOP in the file (this was generated by my use of a !NOP mnemonic in the inline assembler when I used PowerBasic to create Test.exe).

But fpr this example, I wanted you to look at the sequence of instructions above
the point where the NOP (No OPeration) instruction occurred.  These instructions are generated by the PowerBasic compiler.  If you had looked at the code in the window under the IDE, you would have seen a lable named Start
and another line identifying this as the beginning of a Near Proc, or near procedure.  A near procedure is generated when a return address, without the
segment register, is placed on the stack as a return point when a RETN (RETurn Near) instruction is executed later on.  Start in this case corresponds to your PBMAIN or WINMAIN entry point.  We don't see Start here, because the ASM file lacks the naming conventions used in our source file.

Most of these initial instructions involve the Stack register in some way. By pushing ebp, we put the ebp's current contents onto the stack, thereby freeing it up for other uses.  In this case, we next put the current contents of the esp (stack pointer) in ebp to serve as a fixed reference to the stack, so that dispite other changes we might make involving the stack or the esp, we can get back to this same point on the stack, using the contents of ebp to help us.

Hext we push ebx, esi, and edi onto the stack.  This saves their contents to be restored later as well.  ESI and EDI can now be used for integer REGISTER variables,  The ebx register is the most powerful memory referencing register, and it appears that PowerBasic uses it as well, and is ensuring that its contents come back intact later, in case you end up using it yourself.

The "sub esp, 64h" instruction brings the esp pointer down an additional 100
decimal addresses.  This must correlate to a reserved area on the stack that
PowerBasic employs.  Next, the "xor esi, esi" instruction effectively sets the contents of the esi register to zero.  The xor instruction is often shorter and faster than using an instruction like "mov esi, 0".  You often see a lot of xor operations doing this, and you can tell that this is the purpose anytime the destination and source are the same, since this will always be the outcome in that case.  Then you see the four "push esi" instructions, which serve the purpose of putting four 32-bit values of zero onto the stack.  If you look at the normal requirements for defining FUNCTION WINMAIN, and compare it to the simplistic use of FUNCTION PBMAIN, you may find some parity between what you could manually define for WINMAIN, and what PowerBasic's shortcut approach under PBMAIN does for you instead with certain defaults being supplied by automatic coding.

And then there is the first NOP instruction.  I put that there, so what follows would be normally what I would be interested in.  If I don't have a second NOP somewhere, then the little bit of program I put under Tricks and Tips would just keep printing out all the code that follows.  That program uses two toggles, one called Flip and the other called Flop, to control what output appears when I examine the ASM file.



Title: Re: Let's Talk Assembler
Post by: Theo Gottwald on June 07, 2007, 08:46:15 PM
Donald --
I tried the Link from your post:

http://members.cox.net/pc_doer/assembler/asm.hlp

it did not work. Even directly on that site:

http://members.cox.net/pc_doer/Assembler/asm.htm

the download was not possible.

I found this page with the most often used commands:
http://www.jegerlehner.ch/intel/

inside this usefull PDF:
http://www.jegerlehner.ch/intel/IntelCodeTable.pdf

and finally this Text:
http://members.save-net.com/jko@save-net.com/asm/r_x86.txt

as alternative.
Title: Re: Let's Talk Assembler
Post by: Donald Darden on June 07, 2007, 11:44:05 PM
Hi Theo, thanks for letting me know of the problem.  I put the asm.hlp file in a zip and linked it to my post above, and added it to the download section, which should make things easier for all concerned.  I looked at your link, and it was a really good alternative, so I am going to include it in the download section as well.

I think we are making progress here.  I'm looking forward to more participation by our members, and more members as we gain visitors, and more visitors as we expand our posts.
Title: Re: Let's Talk Assembler
Post by: Donald Darden on June 13, 2007, 02:26:27 AM
When you begin using any language, you need to know what is possible, what is
provided for you, and how to construct valid statements within that language.  Looking at existing code, and reading available help files and other documentation can help you in each of these areas.

The problem is, is that there is simetimes general oe prior knowledge which sometimes gets in the way of going from here to there.  You end up making unexpected detours as you suddenly find that an assumption has proved incorrect, and time has to be taken to research the matter further.

Take the matter of numeric expressions for instance.  As I said earlier, many Assemblers assume that a string of digits together represents a decimal number.  This is despite the fact that the computers themselves are essentially binary devices themselves.  But programmers want the added convenience of being able to enter numeric data in other forms as well.

If you read on the subject, you find that it is not uncommon to represent a
hexidecimal (base 16) number with a leading digit and a trailing small "h".  If you wanted to represent the hexidecimal quantity of "FFFF", you would write it as 0FFFFh.  The leading zero is necessary, since FFFFh would also be a valid name for something.  Now in fact, most assemblers are case insensitive, so you could write this quantity as 0ffffH, 0FFFFH, or 0ffffH as well.

However, none of these forms are acceptable to the PowerBasic inline assembler.  I could not find an explanation of what the PowerBasic syntax should be, but in looking through the Help file, I found some clear examples.  For the PowerBasic inline assembler, you need to use a leading ampersign (&) symbol,
a character to indicate what base you are using ("h" for hexidecimal, "o" for octal, or "b" for binary), than the digits of the number itself.  If you do not know
this, then trying to enter numbers in a different base would be pretty hard to do.

If you enter a command like !MOV eax, &h32, then in all likelyhood, when you look at the conversion performed by a disassembler, you would see mov eax, 50 in the resulting assembler file.  The reason is that the disassembler is reconstructing an assembler source file based on the finished EXE file, and there is no direct correlation your original source file and the resuling ASM file.  That means you have to look at the general form of the instructions, the sequence of instructions, the uniqueness of certain instructions, and have some awareness of the fact that an exact recreation will not be possible.  The biggest lost in information will be the loss of names, labels, and comments that were included in the original source file.  But as also shown, PowerBasic's method of constructing
higher level processes in Assembler will represent another great challenge to our understanding.

What we have learned is that PowerBasic maintains a balance between a place where it sets the EBP from the current ESP value, then it sets ESP further down
by a range of addresses, so that a portion of the stack above the EBP would
represent the passed parameters and other stack entries, and an area of memory between the ESP and EBP point references would represent a temporary work area, suitable to local and static variables.  We can also see that the parameters that were passed appear above where EBP points, requiring a positive offset from EBP, and the local and static area would appear below where EBP points, requiring a negative offset from EBP.  References to both regions are via EBP, which is currently static and fixed, and ESP would continue to be used for things like CALL and RETN statements, and for any PUSH and POP statements you apply yourself.

Technically then, we should be able to do something like this:

#REGISTER NONE

SUB Test(aa AS STRING)
  LOCAL a, b, c, d AS LONG, bb AS STRING
  bb="Test String Two"
  ! mov a, esp
  ! mov b, ebp
  FOR c = a TO b+5*8 STEP 2
    IF c=a OR c=b THEN ?STRING$(70,"*")
    d = PEEK(LONG, c)
    ? USING$("  ###  ",c-a) d;
    COLOR 14
    IF c=VARPTR(a) THEN ?" a lives here";:GOTO pass
    IF c=VARPTR(b) THEN ?" b lives here";:GOTO pass
    IF c=VARPTR(c) THEN ?" c lives here";:GOTO pass
    IF c=VARPTR(d) THEN ?" d lives here";:GOTO pass
    IF c=STRPTR(aa) THEN ?" aa starts here";:GOTO pass
    IF c=STRPTR(bb) THEN ?" bb starts here";:GOTO pass
    IF d=a THEN ?" VAL a?";
    IF d=b THEN ?" VAL b?";
    IF d=c THEN ?" VAL c?";
    IF d=VARPTR(aa) THEN ?" VARPTR aa?";
    IF d=VARPTR(bb) THEN ?" VARPTR bb?";
    IF d=VARPTR(a) THEN ?" VARPTR a?";
    IF d=VARPTR(b) THEN ?" VARPTR b?";
    IF d=VARPTR(c) THEN ?" VARPTR c?";
    IF d=VARPTR(d) THEN ?" VARPTR d?";
    IF d=STRPTR(aa) THEN ?" STRPTR aa?";
    IF d=STRPTR(bb) THEN ?" STRPTR bb?";
    IF d=LEN(aa) THEN ?" LEN aa?";
    IF d=LEN(bb) THEN ?" LEN bb?";
pass:
    COLOR 15
    ?
  NEXT
END SUB

FUNCTION PBMAIN
  COLOR 15,1
  CLS
  ?"Looking at memory from ESP to EBP, plus EBP to EBP + 20"
  test "Test String 1"
  DO:LOOP
END FUNCTION

Now I guess I would say that this method of setting a mark in memory, where
your passed paramters appear above EBP, and your temporary used of memory,
including locals and statics, appear between ESP and EBP, has to be considered somewhat inspired.  To discard the temporary area, all PowerBasic needs to do is execute a mov esp, ebp statement, and the locals and statics are just gone.  At the same time, not only does the called procedure have the area set aside for temporay memory and procedures, but this happens each time the procedure is called - even if the procedure calls itself.  This fully supports the concept of recursion , with each call having its own parameters and temporary variables to work with.