(Optimization) Compare DWORDS and others...

Theo Gottwald · January 02, 2007, 09:45:29 AM

Taking a look into my DissASM, I realized that there was still some Floating-Point Code inside where i did not expect it. And did not need it.

This code for example:

Code Select


REGISTER T08 AS DWORD,T09 AS DWORD  
T09=PEEK(T06)
T08=99
IF (T09<T08) THEN INCR RL02

DisASM shows what the compiler makes out of it:

Code Select

408729	8B856CFFFFFF           MOV EAX, DWORD PTR [EBP+FFFFFF6C]
40872F	0FB600                 MOVZX EAX,BYTE PTR [EAX]
408732	89C7                   MOV EDI, EAX
408734	C7C663000000           MOV ESI, DWORD 00000063
40873A	8975A4                 MOV DWORD PTR [EBP-5C], ESI
40873D	C745A800000000         MOV DWORD PTR [EBP-58], DWORD 00000000
408744	DF6DA4                 FILD QUAD PTR [EBP-5C]
408747	897DA4                 MOV DWORD PTR [EBP-5C], EDI
40874A	C745A800000000         MOV DWORD PTR [EBP-58], DWORD 00000000
408751	DF6DA4                 FILD QUAD PTR [EBP-5C]
408754	DED9                   FCOMPP
408756	DFE0                   FNSTSW AX
408758	9E                     SAHF
408759	7306                   JNB SHORT L408761
40875B	FF8570FFFFFF           INC DWORD PTR [EBP+FFFFFF70]

Knowing that the result from the PEEK() Command is a Byte, I had expected something else.
Let me do a small change here, to get rid of the "FILD" and the "FCCOMPP".

I change the two variables from DWORD to LONG. As result we have exactly the same code, just that our register-variables are declared AS LONG instead of AS DWORD.

Code Select


REGISTER T08 AS LONG,T09 AS LONG  
T09=PEEK(T06)
T08=99
IF (T09<T08) THEN INCR RL02

The result of the small change looks promissing:

Code Select

408729	8B856CFFFFFF           MOV EAX, DWORD PTR [EBP+FFFFFF6C]
40872F	0FB600                 MOVZX EAX,BYTE PTR [EAX]
408732	89C7                   MOV EDI, EAX
408734	C7C663000000           MOV ESI, DWORD 00000063
40873A	3BF7                   CMP ESI, EDI
40873C	7E06                   JLE SHORT L408744
40873E	FF8570FFFFFF           INC DWORD PTR [EBP+FFFFFF70]

Just a small change - big change in result.
Maybe I'll declare more variables "AS LONG" where possible (where unsigned/signed is not important).

Donald Darden · April 21, 2007, 05:07:34 AM

The PowerBasic Compilers are currently optimized for handling LONG Integer math. It is frequently suggested that you use LONG variables for the most efficient results. This also agrees with VB, which generally passes integer variables as LONGs (VB does not support DWORD as a type).

However, in ASM (which is the low level language that all compilers produce), there are distinct differences in determining the results of comparing a ong with another long. a long with a DWORD, a DWORD with a Long, or a DWORD with another DWORD. Jump instructions (all starting with "J") can compare for equality, non-equality, greater than or above, less than or below, or a number of ombinations, such as Greater Than or Equal.

The trick is, that Greater Than does not mean the same thing as Above, and Less Than does not mean the same as Below. Greater Than is used as a signed comparison, just as Less Than is, meaning if the highest order bit is set in the integer type, it is considered a negative number, and that is taken into consideration before checking whether the absolute value of one is greater than the other or not. Above and Below skip the sign bit test, so they threat the comparison as an unsigned (byte, word, or dword) integer during the compare.

The comparison of a DWORD and LONG, in either order, then becomes an issue, and this takes some effort to resolve. First, your code would have to determine if one is a dword and the other is a long, then if the dword has the upper bit set, it automatically is greater than the long. If the upper bit of the Long is set, it automatically is less than the dword. This is because about half the range of values that each type supports is beyond the range of the other, and identifying when you are in that range is determined by the sign (uppermost) bit for each.

However, there is another way to make such comparisons, and that is to translate both into floating point, which encompasses both ranges completely, and then perform floating point math to determine which is the greater or lesser. But floating point math is slower, though the built-in FPU in modern PCs are quite fast, and do help keep the time penalty from being excessive.

Keep in mind, that an optimizing compiler is any compiler that attempts to improve on the code that you write. However, the compiler can only optimize to the extend possible within other constraints. The first is time - PowerBasic's compile process is super-fast, meaning you can stop and compile at any time to make sure your code will compile, and test-drive your code with or without the debugger. Spending too much time overanalyzing every statement would seriously slow the compile process.

The other constraint is the size of the finished program. PowerBasic also boasts the ability to make programs with small footprints, but that means finding ways to handle special cases in ways that are not only expeditious, but give reasonably tight code.

As a consequence, you will likely find ways to further optimize the code by doing exactly this: Check the finished Assembler code. But this requires quite a bit of familiarity with certain tools and with Assembler instructions, registers, use of memory, and other aspects of the PC architecture. In other words, it takes time and effort to achieve significant results. There is a general rule that is often quoted in the industry: About 90 percent of program's execution time is taken up by just 10 percent of the code. You find the ten percent of the code which is hogging so many cpu cycles, and you focus on trying to make it faster. Or you advise the customer to buy a faster computer - that's what Microsoft generally does.

News:

(Optimization) Compare DWORDS and others...

Theo Gottwald

Donald Darden