How to resolve the algorithm String length step by step in the Visual Basic .NET programming language

Published on 12 May 2024 09:40 PM

How to resolve the algorithm String length step by step in the Visual Basic .NET programming language

Table of Contents

Problem Statement

Find the character and byte length of a string. This means encodings like UTF-8 need to be handled properly, as there is not necessarily a one-to-one relationship between bytes and characters. By character, we mean an individual Unicode code point, not a user-visible grapheme containing combining characters. For example, the character length of "møøse" is 5 but the byte length is 7 in UTF-8 and 10 in UTF-16. Non-BMP code points (those between 0x10000 and 0x10FFFF) must also be handled correctly: answers should produce actual character counts in code points, not in code unit counts. Therefore a string like "𝔘𝔫𝔦𝔠𝔬𝔡𝔢" (consisting of the 7 Unicode characters U+1D518 U+1D52B U+1D526 U+1D520 U+1D52C U+1D521 U+1D522) is 7 characters long, not 14 UTF-16 code units; and it is 28 bytes long whether encoded in UTF-8 or in UTF-16.
Please mark your examples with ===Character Length=== or ===Byte Length===. If your language is capable of providing the string length in graphemes, mark those examples with ===Grapheme Length===.
For example, the string "J̲o̲s̲é̲" ("J\x{332}o\x{332}s\x{332}e\x{301}\x{332}") has 4 user-visible graphemes, 9 characters (code points), and 14 bytes when encoded in UTF-8.

Let's start with the solution:

Step by Step solution about How to resolve the algorithm String length step by step in the Visual Basic .NET programming language

Source code in the visual programming language

Module ByteLength
    Function GetByteLength(s As String, encoding As Text.Encoding) As Integer
        Return encoding.GetByteCount(s)
    End Function
End Module

Module CharacterLength
    Function GetUTF16CodeUnitsLength(s As String) As Integer
        Return s.Length
    End Function

    Private Function GetUTF16SurrogatePairCount(s As String) As Integer
        GetUTF16SurrogatePairCount = 0
        For i = 1 To s.Length - 1
            If Char.IsSurrogatePair(s(i - 1), s(i)) Then GetUTF16SurrogatePairCount += 1
        Next
    End Function

    Function GetCharacterLength_FromUTF16(s As String) As Integer
        Return GetUTF16CodeUnitsLength(s) - GetUTF16SurrogatePairCount(s)
    End Function

    Function GetCharacterLength_FromUTF32(s As String) As Integer
        Return GetByteLength(s, Text.Encoding.UTF32) \ 4
    End Function
End Module

Module GraphemeLength
    ' Wraps an IEnumerator, allowing it to be used as an IEnumerable.
    Private Iterator Function AsEnumerable(enumerator As IEnumerator) As IEnumerable
        Do While enumerator.MoveNext()
            Yield enumerator.Current
        Loop
    End Function

    Function GraphemeCount(s As String) As Integer
        Dim elements = Globalization.StringInfo.GetTextElementEnumerator(s)
        Return AsEnumerable(elements).OfType(Of String).Count()
    End Function
End Module

#Const PRINT_TESTCASE = True

Module Program
    ReadOnly TestCases As String() =
    {
        "Hello, world!",
        "møøse",
        "𝔘𝔫𝔦𝔠𝔬𝔡𝔢", ' String normalization of the file makes the e and diacritic in é̲ one character, so use VB's char "escapes"
        $"J{ChrW(&H332)}o{ChrW(&H332)}s{ChrW(&H332)}e{ChrW(&H301)}{ChrW(&H332)}"
    }

    Sub Main()
        Const INDENT = "    "
        Console.OutputEncoding = Text.Encoding.Unicode

        Dim writeResult = Sub(s As String, result As Integer) Console.WriteLine("{0}{1,-20}{2}", INDENT, s, result)

        For i = 0 To TestCases.Length - 1
            Dim c = TestCases(i)

            Console.Write("Test case " & i)
#If PRINT_TESTCASE Then
            Console.WriteLine(": " & c)
#Else
            Console.WriteLine()
#End If
            writeResult("graphemes", GraphemeCount(c))
            writeResult("UTF-16 units", GetUTF16CodeUnitsLength(c))
            writeResult("Cd pts from UTF-16", GetCharacterLength_FromUTF16(c))
            writeResult("Cd pts from UTF-32", GetCharacterLength_FromUTF32(c))
            Console.WriteLine()
            writeResult("bytes (UTF-8)", GetByteLength(c, Text.Encoding.UTF8))
            writeResult("bytes (UTF-16)", GetByteLength(c, Text.Encoding.Unicode))
            writeResult("bytes (UTF-32)", GetByteLength(c, Text.Encoding.UTF32))
            Console.WriteLine()
        Next

    End Sub
End Module

  

You may also check:How to resolve the algorithm Sorting Algorithms/Circle Sort step by step in the Quackery programming language
You may also check:How to resolve the algorithm Execute a system command step by step in the Delphi programming language
You may also check:How to resolve the algorithm Special characters step by step in the Raku programming language
You may also check:How to resolve the algorithm Colour pinstripe/Display step by step in the Icon and Unicon programming language
You may also check:How to resolve the algorithm Levenshtein distance step by step in the ooRexx programming language