Comparison between PB's and C/C++ DLL size

Patrice Terrier · March 31, 2013, 09:12:49 PM

I am amazed by the first size comparison i have done between my GDImage compiled with PB and VS2010.

GDImage 32-bit compiled with PB 9.05 = 352 Kb
GDImage 32-bit compiled with PB 10 = 366 Kb
and
GDImage 32-bit compiled with C++ = 240 Kb
GDImage 64-bit compiled with C++ = 296 Kb

Note: I am not using ATL nor any C++ runtime, nor any POO class encapsulation. Only the pure FLAT API (even for GDI+) and native Windows WCHAR/wstring.

Who spoke of bloated

Mike Stefanik · April 01, 2013, 08:00:15 PM

One of the benefits of link-time code generation where it's not just the compiler that is relied upon to optimize your code; it can defer some of those optimizations to the linker which can have better "big picture" view of your code and ways that it can be reduced in size and improved in speed.

Patrice Terrier · April 01, 2013, 08:12:03 PM

So far i have been working only on the code translation, i am looking forward for the speed comparison...

Frederick J. Harris · April 02, 2013, 07:25:02 PM

Are you sure you aren't compiling with the /MD option Patrice? On my VS 2008 you drill down through "Properties >>> C++ >>> Code Generation. With the /MT option all the required opcodes are compiled into the binary. With the /MD option you'll need various dependencies packaged with the dll. That threw me off at first too when I got VS 2008 Pro. Somewhere between VC6 and VS 2008 Microsoft changed that as well as the default character encoding to wide character. That /MT verses /MD linking option is a completely seperate issue from the .NET, C++ CLI, MFC, ATL thing. When I first got VS 2008 I thought Wow! This is beating the crap out of PB size wise! Quite a bit later I discovered those 'hidden' switches/settings.

Mike Stefanik · April 02, 2013, 08:03:48 PM

Just as a general observation, presuming that he's building redistributable DLLs that will be used by other applications, he definitely should be using /MT. Mixed dependencies, manifest bindings and redistribution of the runtime (the myriad joys of the SxS native assembly cache and all that good stuff) is just a headache. I doubt that he wants to require the developers who use his DLLs to have to redistribute the Visual C++ runtime.

That said, if he's doing a straight-forward SDK-style conversion (more or less) and isn't making extensive use of templates, runtime classes or MFC -- in other words, if he's mostly writing his code as "plain C with classes" -- then there wouldn't be much that would actually be pulled in there by static linking to the runtime. The vast majority of that code is going to be optimized out. With Visual Studio 2010, it comes to about 25K. If he's using stdio/stdlib functions then its about 40K of overhead, give or take. Where you start getting "bloat" is if you pull in all of the iostream baggage, then you're looking up to an additional 125-200K or thereabouts, depending on what you're doing. But at that point, it really ceases to be any kind of apple-to-apple comparison.

Patrice Terrier · April 02, 2013, 08:40:30 PM

Yes, i am doing a straight-forward SDK-style conversion, i do not use any template, and no runtime classes or MFC.
It is almost plain C, except for the use of <wstring> and <vector>, that's all.

Even for GDIPLUS i do not use the class encapsulation, but direct call to the flat API, using loadlibrary.

here is the command line for C/C++
/Zi /nologo /W3 /WX- /O2 /Oi /GL /D "WIN32" /D "NDEBUG" /D "_WINDOWS" /D "_USRDLL" /D "GDIMAGE_EXPORTS" /D "_WINDLL" /D "_UNICODE" /D "UNICODE" /Gm- /EHsc /GS /Gy /fp:precise /Zc:wchar_t /Zc:forScope /Fp"x64\Release\GDImage.pch" /Fa"x64\Release\" /Fo"x64\Release\" /Fd"x64\Release\vc100.pdb" /Gz /errorReport:queue

here is the command line for the linker
/OUT:"D:\VS2010\GDImage\GDImage\x64\Release\GDImage.dll" /INCREMENTAL:NO /NOLOGO /DLL "Winmm.lib" "Psapi.lib" "Msimg32.lib" "kernel32.lib" "user32.lib" "gdi32.lib" "winspool.lib" "comdlg32.lib" "advapi32.lib" "shell32.lib" "ole32.lib" "oleaut32.lib" "uuid.lib" "odbc32.lib" "odbccp32.lib" "opengl32.lib" /MANIFEST /ManifestFile:"x64\Release\GDImage.dll.intermediate.manifest" /ALLOWISOLATION /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG /PDB:"D:\VS2010\GDImage\GDImage\x64\Release\GDImage.pdb" /SUBSYSTEM:WINDOWS /OPT:REF /OPT:ICF /PGD:"D:\VS2010\GDImage\GDImage\x64\Release\GDImage.pgd" /LTCG /TLBID:1 /DYNAMICBASE /NXCOMPAT /MACHINE:X64 /ERRORREPORT:QUEUE

here is an example of direct call to the low level GDIPLUS API

Code Select

long GdipCreateBitmapFromGdiDib(IN BITMAPINFO* GdiBitmapInfo, IN VOID* gdiBitmapData, OUT LONG_PTR &hImage) {
    long nRet = -1; // Error
    if (g_GDIPLIB) {
        long_proc (BITMAPINFO*, VOID*, LONG_PTR*);
        zProc hProc = (zProc) GetProcAddress(g_GDIPLIB, "GdipCreateBitmapFromGdiDib");
        if (hProc) { nRet = hProc(GdiBitmapInfo, gdiBitmapData, &hImage); }
    }
    return nRet;
}

Mike Stefanik · April 02, 2013, 09:08:04 PM

Based on the command line you have there, I'm guessing that you used the Visual Studio "Win32" project wizard and selected DLL. The default for that project type is /MD, which is to dynamically link to the Visual C++ runtime library. To statically link to the runtime so that you don't have to redistribute anything but your own DLL, select Project > Properties > Configuration Properties > C/C++ > Code Generation and "Runtime Library" property should be set to "Multi-threaded Debug (/MTd)" for debug builds, and "Multi-threaded (/MT)" for release builds.

Patrice Terrier · April 02, 2013, 09:58:25 PM

Mike

Adding the /MT flag adds 89 Kb to the resulting DLL.

Thanks for the TIP, because i do not want to redistribute anything but my own DLL.

I still have much to learn, especially the good compiler flags to use.

...

Mike Stefanik · April 02, 2013, 10:07:36 PM

There's ways that you could trim that down further, but honestly it's probably not worth the effort for <100K (and would mean dumping the use of the STL container classes, etc.)

Frederick J. Harris · April 02, 2013, 10:33:31 PM

Quote
Based on the command line you have there, I'm guessing that you used the Visual Studio "Win32" project wizard and selected DLL. The default for that project type is /MD, ...

Ahh! Perhaps thats it. I was wondering why the /MD or /MT wasn't showing up in Patrice's command line parameters (compiler options). I had even checked an old post from several months ago where I had discussed this with Patrice, and he had posted what he was using. I didn't see it there either. I always start new projects with the create 'Empty Project' option, so that VS doesn't include anything at all (or hardly anything). Must make a difference I guess.

Frederick J. Harris · April 04, 2013, 04:14:33 AM

Quote
So far i have been working only on the code translation, i am looking forward for the speed comparison...

Say Patrice, are you aware of these threads from over in the PowerBASIC Forums ...

Fast PowerBASIC Compilers statement

http://www.powerbasic.com/support/pbforums/showthread.php?t=42934&highlight=Fast+PowerBASIC+Compilers+Statement

Fast PowerBASIC Compilers Statement Revisited

http://www.powerbasic.com/support/pbforums/showthread.php?t=46299&highlight=Fred+Harris

A lot of pretty interesting reading there, if you have time.

Patrice Terrier · April 04, 2013, 09:35:28 AM

Indeed i have been already able to make speed comparison but, of course, only in 32-bit !

And my main motivation to move to C/C++ was the lack of a 64-bit PB's version.

I was amazed by the speed on my C# code, and C/C++ 32-bit code, but to say the truth they were both using my PB's 32-bit GDImage.dll (several examples are available on my C++ programming (SDK style) section, and C# projects are available from my web site.

My intime conviction, is because my main target is WinDows, i couldn't make any mistake selecting the same tools that are being used to write the OS itself.

Frederick J. Harris · April 04, 2013, 07:48:21 PM

Its an issue I've given a lot of thought to Patrice (performance). What truely amazed me in that thread that Jaime started was that the Microsoft compiler was actually looking at the whole algorithm Jaime provided, deciding that it rather sucked, and generating completely different and optimized machine code from what would be expected from the original source. This was proved when Paul Dixon reverse engineered the Microsoft binary. Other experiences of my own have led me to believe this is not unusual in C++ compilers.

Take loops for example. There are a number of ways which loop code can be translated to the asm code to get the job done. But only several. At some point one reaches the point at which further optimazation isn't possible. But now consider source code. If the compiler can actually identify patterns as it did with Jaime's code and come to some 'understanding' of what is being done in a higher level sense than looking at one line at a time, then the sky is the limit in terms of optimazation. And I further believe that the patterns generated by C++ classes are oftentimes something which the compiler can recognize, and in some cases optimize away, so that what actually gets written into binary is high performance C type code which runs fast.

Where I came to feel this way was my experimentation with String classes. I had my own string class, and then there is the String Class from the C++ Standard Library, which you've decided to use. With my String Class I was never able to come within one one hundredth of one percent of the speed of the Std. Lib's String Class with respect to string concatenation, i.e., operator+=. In other cases I was able to beat the speed of the Std. Lib's String Class, but not with respect to that one operation. In fact, the only way I could beat the speed of the Std. Lib's concatenation operation, was to descend to the level of C usage using the string primitives in string.h, i.e., strcat(), strcpy(), etc. I believe these are rather thin wrappers on the asm string primitives which blast bytes around. And while I tried everything I could think of to improve the performance of my concatenation operations in my string class, I never really succeeded. I almost reached the point of attempting to learn STL so I could understand how the GNU coders did it, but because I hate templates so much I gave it up.

But of one thing I'm quite sure, and that is that the compiler is somehow optimizing away as it did with Jaime's algorithm all the calls required in my string class to add strings together. There simply is so much overhead involved in that. For example, I had debug output statements in it, and saw how everything was happening. There are too many calls involved for it to go fast. So when I looked at the speed at which the Std. Lib's String Class was doing concatenations, my realization occurred that it wasn't doing it like my string class did, but was somehow recognizing the underlying pattern generated by outthe class, and outputting optimized C type code like with strcat(). Its my feeling that the folks who wrote that were very, very, good - as Paul Dixon noted when he deconstructed Jamie's binary. In other words, they created the class in such a way that they knew the compiler would optimize away their higher level abstractions.

And so you have there a really interesting idea. Let programmers create all the abstraction layers and indirections they need to think through some problem, then let the compiler optimize it all away so as to create fast code. You know, we all think badly of spegetti code realizing its no way to work. But a computer doesn't really care about it. In fact, anthropormorphizing a bit, the computer probably likes it better, and if the code works, it'll be executed correctly the first time, second time, and every time thereafter!

So I believe that's one difference between PowerBASIC code and C++ code. i don't believe the PowerBASIC compiler is trying to figure out what you are doing. It just wants to generate line by line the best translation into machine code it can. I think the C++ compiler is many times trying to optimize higher level patterns its able to detect in the source. Just my opinion.

Charles Pegge · April 04, 2013, 08:52:59 PM

Splitting the compilation down into different layers, makes it possible to carry out different optimisations. Using register instead of variables, wherever possible helps. Also eliminating expensive instructions like Dividing by a constant. Internal OOP calls can also be tuned up by making them direct. Small procedures can be turned into inline macro code.

News:

Comparison between PB's and C/C++ DLL size