PB and not being unicode

Edwin Knoppert · August 16, 2009, 09:27:21 PM

The unicode problem like for this subject ( http://www.powerbasic.com/support/pbforums/showthread.php?t=41248 ) seems to appear more often.
I ever have the idea to write me a full unicode version of PwrDev.
At least the SDK windows part should be unicode.

Though i wonder if this is something benfitial since the compiler itself ansi with a few unicode conversion features.
But i also see c compilers using L"text here" to let you use ansi text but converted during compile.
This means the c compiler is not so unicode after all.
In c# and VB.NET everything is unicode but still you enter text as it is ansi, when will one be able to enter unicode directly?

Frankly i think, as long as the compiler is not doing unicode natively, a tool like PwrDev will not really benefit.
I am a bit lost on this subject, what should my perspective be with this?

Anyone jumping in on this subject is welcome.

Patrice Terrier · August 16, 2009, 10:13:10 PM

GDIPLUS, is using UNICODE too, but UCODE$ does it.

Edwin Knoppert · August 16, 2009, 10:54:42 PM

I have thought this a bit over.
Visual development tools and in this example visual studio let you enter ansi text for a control's property.
Does this mean unicode can only be set by code fragments or did i mis the point somewhere?
If the development is always accepting ansi code then a rewrite of my PwrDev would be simple... for SDK mode that is.
Just convert all text of these properties.

>GDIPLUS, is using UNICODE too, but UCODE$ does it.
Not sure i follow, i am aware that unicode parts will work fine if you convert the text.

In c# you may program text as ansi but it ain't:
String T = "Hello World";
This will be converted to unicode.
So a string is unicode, a Char is an unicode character variable, not a byte.
This is a compiler issue of course.

A compiler like LCC will let you set unicode like:
LPSTR T = L"Hello World";
This is an ansi compiler, like PB's.
The string stores the data in 8 byte format.

José Roca · August 16, 2009, 11:36:06 PM

Just wait until PB will implement native Unicode support. Anything else is a waste of time.

Frederick J. Harris · August 17, 2009, 06:25:15 PM

I've given the issue a lot of thought too Edwin, and just within the last couple weeks I've made what had been for me a major turn around on the issue. I spend about half my time doing data recorder programming for Windows CE, and that operating system mostly uses two byte wide characters. I really hate it, because I'm so accustomed to using the char data type. Of course, what all the books and authorities have been saying for years is to use the TCHAR data type and let the precompiler do the macro expansions depending on whether...

_UNICODE
#include <tchar.h>

is defined or not.

I just hate the looks of the clutter in my code. As far as I'm concerned it makes C/C++ code about five times uglier than it is already. Here is an example of what I mean using the _T("Hello, World!") macro in conjunction with all the tchar.h redefinitions of the ansi & unicode string functions...

Code Select


void MakeConnectionString(ConnectionString& cn)
{
 String s1;

 if(!_tcscmp(_T("SQL Server"),cn.szDriver))
 {
    _ftprintf(fp,_T("  Making Connection String For Microsoft SQL Server...\n"));
    if(!_tcscmp(cn.szDBQ,_T("")))
    {
       //want to connect to SQL Server but not to any database
       _ftprintf(fp,_T("  want to connect to SQL Server but not to any database\n"));
       s1 = _T("DRIVER=");
       s1 = s1 + cn.szDriver + _T(";") + _T("SERVER=") + cn.szServer + _T(";");
       _tcscpy(cn.szConnStr,s1.lpStr());
       return;
    }
    else
    {
       //We've got a SERVER, DATABASE name and a path to the database, i.e., DBQ
       _ftprintf(fp,_T("  We've got a SERVER, DATABASE name and a path to the database, i.e., DBQ\n"));
       s1 = _T("DRIVER=");
       s1 = s1 + cn.szDriver + _T(";") + _T("SERVER=") + cn.szServer + _T(";") + \
       _T("DATABASE=") + cn.szDatabase + _T(";") + _T("DBQ=") + cn.szDBQ + _T(";");
       _tcscpy(cn.szConnStr,s1.lpStr());
       return;
    }
    return;
 }
 if(!_tcscmp(_T("Microsoft Access Driver (*.mdb)"),cn.szDriver))
 {
    _tprintf(_T("Making Connection String For Microsoft Access...\n"));
    s1=_T("DRIVER=");
    s1 = s1 + cn.szDriver + _T(";") + _T("DBQ=") + cn.szDBQ + _T(";");
    _tcscpy(cn.szConnStr,s1.lpStr());
    return;
 }
 if(!_tcscmp(_T("Microsoft Excel Driver (*.xls)"),cn.szDriver))
 {
    _tprintf(_T("Making Connection String For Microsoft Excel...\n"));
    s1=_T("DRIVER=");
    s1 = s1 + cn.szDriver + _T(";") + _T("DBQ=") + cn.szDBQ + _T(";");
    _tcscpy(cn.szConnStr,s1.lpStr());
    return;
 }
}

But, I've decided to take the unicode route nontheless. For many, many years now I've just been prepending the 'L' in front of my strings and using the wchar_t type instead of going all the way and using the UNICODE macros. It was a compromise I've decided rather painfully to abandon. Now I'm just going to go ahead and use the lousy macros.

What caused me to change my mind and stop fighting the UNICODE thing is that I have some major piecies of software I've developed that I'm afraid I'm going to have to port to desktop windows as well as other Windows CE devices. I had to finally get Visual Studio 2008 and with that I see that UNICODE is defined as the default setting, so that the wide character versions of the Api functions get called by default. You can go in the project settings and change a project's character set to ansi, but I've just decided to stop fighting the thing (after about 9 years) and go with UNICODE and the macros.

Last week to my pleasant surprise I discovered that the ODBC Api has dual definitions ( ...A and ...W) of all the ODBC functions, and I didn't have any trouble converting my ODBC routines to Unicode (thank goodness).

So, I'm not sure how all this shakes out with PowerBASIC. So far I havn't had any troubles reading/writting files created with my data recorder programs into PowerBASIC programs. There are all kinds of conversion functions and you can even write your own using MOD 2 and skipping through bytes like that (I did that before I discovered all the conversion functions, believe it or not!).

Edwin Knoppert · August 19, 2009, 05:58:24 AM

There are differences yes, like PB you show code for an ansi c compiler and requires conversion from ansi to unicode by using 'L'.
In c# this is all different, it seems one enters ansi code in the code and properties but are translated to unicode.
Even if pb is ansi mostly, it is besides the point what to do with my IDE, does it need to be prepared to take unicode at some point in the properties or code parts?
I don't know, how what that behave.. i suspect this may be an issue in countries where unicode is a must?
I am confused, maybe it is nothing more as we see, code and properties are ansi but get stored in unicode.
This would mean a propgrammer never set's unicode in the properties and will pass unicode characters to controls and code via coding only?

If i ever add an option to let users create unicode forms, they would still enter propertyvalues and code as ansicode.
For these forms they'll need to be converted to unicode so i suspect they will need something like this:
SetWindowText( hWnd, UCODE$( "hello" ) )
(Due to fact PB strings are ansicode.

These are in fact two parts to discuss, the PB compiler being ansi is what we have and indeed, maybe we should wait until strings are unicode by default.
Imagne i prepare such things and the compiler get's unicode based at some point...?

José Roca · August 19, 2009, 06:47:21 AM

Quote
For these forms they'll need to be converted to unicode so i suspect they will need something like this:
SetWindowText( hWnd, UCODE$( "hello" ) )

No. You will have to use SetWindowTextW.

Quote
Imagne i prepare such things and the compiler get's unicode based at some point...?

I'm sure that native unicode support will be implemented soon or later. It is unavoidable. Having to use UCODE$ and ACODE$ is a pain and very inefficient.

Edwin Knoppert · August 19, 2009, 08:30:21 AM

SetWindowTextW(), yes of course

News:

PB and not being unicode