Unicode and win32

The ANSI character set is 8 bit, which gives it the ability to represent 256 characters.

The char data type stores one character. Unicode or wide characters do not affect the meaning of the char data type in C.

#include <stdio.h>

int _tmain(int argc, _TCHAR* argv[])
{

	char z = 'a';
	char *pZ = &z;

	printf("%c\n", z);
	printf("sizeof(z) = %u byte\n", sizeof(z));
	//pointer will be either 4 or 8 bytes, depending on settings
	printf("%c \t %p\n", *pZ, pZ);
	printf("sizeof(pZ) = %u bytes\n", sizeof(pZ));

	return 0;
}

Unicode is a uniform two-byte system. The first Unicode standard was published in 1990. Unicode is used internally by Windows and also by the Java programming language.

Wide characters in C are based on the wchar_t data type. The whcar_t data type is defined as being the same as an unsigned short, which should make it at least 16 bits in size.

#include <stdio.h>

int _tmain(int argc, _TCHAR* argv[])
{

	wchar_t a = 'a';
	wchar_t arr[3] = {'H', 'i', ''};

	printf("%c \t is %u bytes.\n", a, sizeof(a));
	printf("%s \t is %u bytes.\n", arr, sizeof(arr));

	return 0;
}

When we initialize a pointer to a wide-character string, we should preference the string literal (the text in double quotation marks being assigned) with a capital letter L. The capital L, for long, preceding the first double quotation mark tells the compiler that the string should be stored with wide characters.

The LPWSTR type specifies a pointer to a sequence of Unicode (wide) characters that may or may not be terminated by a null character.

#include <Windows.h>
#include <stdio.h>

int _tmain(int argc, _TCHAR* argv[])
{
	wchar_t *pPtr = L"Nice boat!";
	wchar_t arrStr[] = L"Do you have stairs in your house?";
	//pointer to wide string
	LPWSTR lpStr = arrStr;
}

To find the length of a regular character string, we use the strlen() function. However, the strlen() function doesn’t work so well with a null-terminated wchar_t array. The wide-character version of the strlen() function is wcslen().

The wprintf() function is the wide string equivalent of the printf() function. The wprintf() function replaces format identifiers in the format string in the same manner that printf() does.

#include <Windows.h>
#include <stdio.h>

int _tmain(int argc, _TCHAR* argv[])
{

	char *strOne = "El Psy Congroo.";
	wchar_t *wStrTwo = L"Han shot first.";
	wchar_t *wStrThree = L"You are likely to be eaten by a grue.";
	

	printf("String 1 is %u characters long.\n", strlen(strOne));
	printf("String 2 is %u characters long.\n", wcslen(wStrTwo));
	printf("String 3 is %u characters long.\n", wcslen(wStrThree));

	printf("String 1 is %s\n", strOne);
	wprintf(L"String 2 is %s\n", wStrTwo);
	wprintf(L"String 3 is %s\n", wStrThree);

	return 0;
}

Note that in wprintf() the format string itself is a wide character string.

Similarly, the wcsncpy() function is the wide character version of the strncpy() function. It copies a specified number of characters from a source to a destination string. Likewise, the wcsncat() function is the wide character equivalent of the strncat() function, which appends a specified number of characters from a source to a destination string.

Microsoft has produced a set of generic string functions; they are generic because they can refer to either the Unicode or non-Unicode versions of the functions. The preprocessor will map the generic function onto the right one for the character set to use.

For example, the _tcspcy() function will copy a generic string; it maps onto the strcpy() or wcscpy() functions, depending. We can also use the _T() macro in place of the L prefix used for Unicode strings.

The type used for string parameters to these functions is TCHAR. TCHAR will, depending on the situation, be mapped to either char or WCHAR. We can then write our code in terms of TCHAR and its derived pointer types, LPTSTR, which is a pointer to a TCHAR, and LPCTSTR, which is a const pointer to a TCHAR.

#include 
#include 

int _tmain(int argc, _TCHAR* argv[])
{
	TCHAR *tchString = _T("On the Internet, no one knows you're a dog.");
	LPTSTR tchPtrString = tchString;
	TCHAR tchBuffer[256];

	_tprintf(_T("The string is: %s\n"), tchString);
	_tprintf(_T("The string is still: %s\n"), tchPtrString);

	_tcscpy(tchBuffer, tchString);

	return 0;
}

Remember, the Tchar.h header file contains a set of alternatives for the normal functions requiring string parameters. This header file is provided by Microsoft, and is not part of the ANSI standard.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s