[SOLVED] Forward Reference Prediction Problem

Mike Lobanovsky · March 01, 2011, 03:05:02 PM

Hello Jose, Charles, Steve and everybody here,

It's been some time already since I registered with your magnificent forum and I did get an occasional helping hand from some of you in the past. I appreciate it very much. It happens so that my main sphere of interest is a little different and PB and Asm are handy but still subsidiary tools in my prime area of development.

As I've mentioned earlier, I'm a co-developer of a Basic-like interpreting language called Freestyle Basic Script Language (FBSL). For those who might be interested to know, it is hosted at http://www.fbsl.net/.

The interpreting environment of FBSL has its merits but regretfully it also has its drawbacks common to all virtual machines, one of which is lack of sufficient speed for certain time-critical applications. To bypass this limitation, I introduced a Dynamic Assembler feature a while ago -- a just-in-time assembler that compiles its MASM-like user-defined code blocks into executable native opcode in the process of loading its parent script written in FBSL. The Dynamic Assembler layer is still at its beta stage though it has already been integrated into the current FBSL v3 distribution package together with its description and a prototype Asm instruction reference file.

It has certain limitations which are imposed mainly by the fact that the FBSL binaries (an exe version and a dll version) are non-modular all-in-one products, and the size-to-versatility ratio is the one of prime importance.

I am facing a little problem, or rather an inconvenience, with the way I'm currently processing forward references found within the user-defined Asm blocks of an FBSL script. Perhaps it is due to my lack of sufficient general "eddication"

or just a temporary inability to generate fresh ideas, whatever...

I will not be wasting space here to re-write what I have already written for my fellow FBSL-ers. Here is the link to my post on the FBSL forum. The issue is outlined in simple terms for every FBSL-er to be able to grasp the idea so that it will certainly be clear for everyone who's so closely related to Assembler pragramming as many of you are.

Regretfully, I am not entitled to make my part of project open for everyone to examine the sources because the FBSL project is not open-sourced and its distribution package is full-featured but free for non-commercial purposes only.

Nonetheless, perhaps somebody would find it worthwhile to enlighten me on possible solutions and/or point me to the sources where I could find any helpful relevant information. A fresh eye cast to a problem seems to always be somewhat sharper than usual...

Thank you all in advance,

Mike Lobanovsky

Mike Lobanovsky · March 02, 2011, 01:39:25 AM

Pondering the problem once again, I narrowed it down to the following:

i/ if I still want my assembly to run as fast as it can with the existing recursive descent parser and a light-weight second-pass approach, I have to keep on assembling the opcode in the first pass using fixed-length blanks for the jmp/call instructions with forward references to be actually filled in during the second pass. Clock-wasting and nop-filling the gaps due to the difference in the byte lengths of e.g. short jumps versus near jumps are the inevitable penalty for not recomputing the instruction sizes and corresponding exact relative offsets in the second pass. Still, the overall parsing+assembly time appears to be shorter while slight inferiority of the resultant executable opcode in the sense of a dozen extra cycles and bytes per an Asm function can hardly be detected within an interpreted environment;

ii/ a full-blown computationally intensive second pass re-evaluating the instruction sizes and corresponding exact relative offsets yields denser and shorter opcode without extra overhead for plain wasted CPU cycles but apparently increases the overall application launch time which may be noticeable on slower machines.

So it's again all about the same old speed-versus-quality trade-off.

Possible remedy: choose strategy ii/ but improve the parser speed.

Am I making sense this time? If yes then is my deduction correct? If yes again then is that the only sensible solution?

Perhaps I should have launched this thread in a more Asm-oriented subforum but frankly, I didn't find any that would explicitly encourage topics outside the scope of PB except this one. If I am mistaken then may I kindly ask the Moderator to "mov" this topic to a more appropriate location of his choice?

Thank you.

Mike Lobanovsky

Theo Gottwald · March 02, 2011, 05:51:04 AM

The trend is simple:
- forget compilation speed, work on "compiler intelligence" (because hardware is getting faster from alone - long time trend)
- prefer fast and small code results, even if compile takes longer (always prefer quality)
- if you are not sure you are going the right way always prefer reliability and quality.

Mike Lobanovsky · March 02, 2011, 09:46:32 AM

Hello Theo,

I appreciate your prompt response and thank you for your pieces of advice.

1. May I just stress once again that, in contrast to some other relatively recent modular products in this area of which I am sure both of us know, I am using my quotas within the constraints of an integral executable which is the Fbsl binary. With all the features it offers the end-user now including its Dynamic Assembler layer, its uncompressed size is still as small as under 500KB.

We've been testing the Dynamic Assembler layer for quite some time now. Its syntax is almost identical to that of MASM32 (or to the minimalistic "microassembler" version of the latter) or PB's assembler inlines. This allows us to run some basic benchmarking for identical pieces of asm code. For example, I can use Steve Hutchesson's Basic source code tokenizers as Fbsl's Asm functions without any modification except changing LOCAL aliases for their "dword ptr[ebp - X]" prototypes and prepending "@" label markers instead of trailing semicolons, or Charles Pegge's matrix multiplication gallery with similar changes, etc. The results are encouraging.

Of course, dedicated standalone compilers like PB and MASM32 generate somewhat denser and shorter pieces of equivalent opcode for the reasons I stated earlier. But Fbsl also lets you run these pieces of "alien" (

) opcode via good old CallAbsolute() function calls within its interpreted environment. Comparative tests show neither perceptible nor detectable deficiency of Dynamic Assembler opcode due to extra CPU clocks in e.g. such real-time speed-critical and computation-intensive practical applications as rendering of dynamic multi-polygon objects in 3D as compared to the irreproachable PB or MASM32 samples. Perhaps, this is also due to hardware getting faster as you mentioned in your response.

2. I am absolutely sure I am going the right way. Such features as "load-and-go", or "just-in-time", assemblers for interpreters are almost exactly the same what inline asm is for compilers like PB or C with the impact being much more tangible in speed-critical bottlenecks where a JIT assembler may function up to three orders of magnitude faster than its equivalent interpreted counterpart.

3. In veiw of my responsibility before the yet "non-assembl(eriz)ed" (

) majority, fast and small Dynamic Assembler which does not take to compile its input longer than it takes to load and launch an average pure Fbsl script will be exactly my quality code results. Given the constraints of the end product integrity requirement, which is itself also its distinctive feature, Fbsl's dynamic assembly support will always be a reasonable compromise between the end product size and the value of features added.

What I am seeking is in fact some expert enlightment in the spheres of assembler methodology. I am not sure if my knowledge is sufficient to generate an alternative design of a single-threaded dynamic assembler with a competitive set of features that would:
- assemble at a speed of at least 3MB of source code per second, preferably higher, on an average 2GHz PC;
- be devoid of the extra clocks/extra bytes deficiency of current implementation;
- compile to well under 40KB together with all its opcode tables and other paraphernalia.

Anyway, thank you again for your attention.

Mike Lobanovsky

Mike Lobanovsky · March 05, 2011, 10:11:59 PM

Hello,

Thanks for your attention. I've been able to solve the problem.

Mike Lobanovsky

Theo Gottwald · August 07, 2011, 10:05:13 AM

As said. Actually in terms of compiler speed i can't imagine anybody can beat or come near PowerBasic 10.
Anyway for me Speed is not so important then comfort.

Most time we spend not with programming but with debugging.
Thats why good error-messages and intelligent compiler errors are more important then 1 or 2 or even ten seconds less compile time.
At least thats my opinion.

News:

[SOLVED] Forward Reference Prediction Problem

Mike Lobanovsky

Mike Lobanovsky

Theo Gottwald

Mike Lobanovsky

Mike Lobanovsky

Theo Gottwald