Unicode literals in PowerShell

Hi folks

C# has three ways to declare Unicode literals.

See Character literals and Unicode character escape sequences

\x   hex-digit   hex-digit-opt   hex-digit-opt   hex-digit-opt
\u   hex-digit   hex-digit   hex-digit   hex-digit
\U   hex-digit   hex-digit   hex-digit hex-digit   hex-digit   hex-digit   hex-digit   hex-digit

For example

char x1Upper = '\xA';
char x2Lower = '\xab';
char x3Upper = '\xABC';
char x4Mixed = '\xaBcD';
char uUpper = '\uABCD';
char UMixed = '\U000abcD1';

But none of Unicode literals are available in PowerShell

\xnnnn and \unnnn literals can be expressed by a simple cast hex int to char.

$x1Upper = [char] 0xA
$x2Lower = [char] 0xab
$x3Upper = [char] 0xABC
$x4Mixed = [char] 0xaBcD
$uUpper = [char] 0xABCD

\Unnnnnnnn literals require a bit more sophisticated approach

$UMixed = [char]::ConvertFromUtf32(0x000abcD1)

The last approach is the most generic and works for all literals

When we need to declare a string with Unicode characters inside it requires more complex syntax

$str = "xyz$([char] 0xA)klm$([char]::ConvertFromUtf32(0x000abcD1))"

If we need to deal with many Unicode strings we can declare a helper function

function U
{
    param
    (
        [int] $Code
    )

    [char]::ConvertFromUtf32($Code)
}

And then we can use

$str = "xyz$(U 0xA)klm$(U 0x000abcD1)"

UPD: Just found that my implementation has an issue with surrogate pairs

U 0xd800

fails with

Exception calling "ConvertFromUtf32" with "1" argument(s): "A valid UTF32 value is between 0x000000 and 0x10ffff, inclusive, and should  not include surrogate codepoint values (0x00d800 ~ 0x00dfff).

To fix this we need to extend the implementation

function U
{
    param
    (
        [int] $Code
    )

    if ((0 -le $Code) -and ($Code -le 0xFFFF))
    {
        return [char] $Code
    }

    if ((0x10000 -le $Code) -and ($Code -le 0x10FFFF))
    {
        return [char]::ConvertFromUtf32($Code)
    }

    throw "Invalid character code $Code"
}
Advertisements

About mnaoumov

Senior .NET Developer in Readify
This entry was posted in Uncategorized. Bookmark the permalink.

One Response to Unicode literals in PowerShell

  1. Pingback: Unicodezeichen in Powershell benutzen bzw. die Suche nach dem Daumen nach oben | Das nie endende Chaos!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s