How to resolve the algorithm String length step by step in the Zig programming language

Published on 12 May 2024 09:40 PM

How to resolve the algorithm String length step by step in the Zig programming language

Table of Contents

Problem Statement

Find the character and byte length of a string. This means encodings like UTF-8 need to be handled properly, as there is not necessarily a one-to-one relationship between bytes and characters. By character, we mean an individual Unicode code point, not a user-visible grapheme containing combining characters. For example, the character length of "møøse" is 5 but the byte length is 7 in UTF-8 and 10 in UTF-16. Non-BMP code points (those between 0x10000 and 0x10FFFF) must also be handled correctly: answers should produce actual character counts in code points, not in code unit counts. Therefore a string like "𝔘𝔫𝔦𝔠𝔬𝔡𝔢" (consisting of the 7 Unicode characters U+1D518 U+1D52B U+1D526 U+1D520 U+1D52C U+1D521 U+1D522) is 7 characters long, not 14 UTF-16 code units; and it is 28 bytes long whether encoded in UTF-8 or in UTF-16.
Please mark your examples with ===Character Length=== or ===Byte Length===. If your language is capable of providing the string length in graphemes, mark those examples with ===Grapheme Length===.
For example, the string "J̲o̲s̲é̲" ("J\x{332}o\x{332}s\x{332}e\x{301}\x{332}") has 4 user-visible graphemes, 9 characters (code points), and 14 bytes when encoded in UTF-8.

Let's start with the solution:

Step by Step solution about How to resolve the algorithm String length step by step in the Zig programming language

Source code in the zig programming language

const std = @import("std");

fn printResults(alloc: std.mem.Allocator, string: []const u8) !void {
    const cnt_codepts_utf8 = try std.unicode.utf8CountCodepoints(string);
    // There is no sane and portable extended ascii, so the best
    // we get is counting the bytes and assume regular ascii.
    const cnt_bytes_utf8 = string.len;
    const stdout_wr = std.io.getStdOut().writer();
    try stdout_wr.print("utf8  codepoints = {d}, bytes = {d}\n", .{ cnt_codepts_utf8, cnt_bytes_utf8 });

    const utf16str = try std.unicode.utf8ToUtf16LeWithNull(alloc, string);
    const cnt_codepts_utf16 = try std.unicode.utf16CountCodepoints(utf16str);
    const cnt_2bytes_utf16 = try std.unicode.calcUtf16LeLen(string);
    try stdout_wr.print("utf16 codepoints = {d}, bytes = {d}\n", .{ cnt_codepts_utf16, 2 * cnt_2bytes_utf16 });
}

pub fn main() !void {
    var arena_instance = std.heap.ArenaAllocator.init(std.heap.page_allocator);
    defer arena_instance.deinit();
    const arena = arena_instance.allocator();
    const string1: []const u8 = "Hello, world!";
    try printResults(arena, string1);
    const string2: []const u8 = "møøse";
    try printResults(arena, string2);
    const string3: []const u8 = "𝔘𝔫𝔦𝔠𝔬𝔡𝔢";
    try printResults(arena, string3);
    // \u{332} is underscore of previous character, which the browser may not
    // copy correctly
    const string4: []const u8 = "J\u{332}o\u{332}s\u{332}e\u{301}\u{332}";
    try printResults(arena, string4);
}

  

You may also check:How to resolve the algorithm Rosetta Code/Rank languages by number of users step by step in the Wren programming language
You may also check:How to resolve the algorithm Monte Carlo methods step by step in the PHP programming language
You may also check:How to resolve the algorithm Square but not cube step by step in the CLU programming language
You may also check:How to resolve the algorithm Topological sort step by step in the R programming language
You may also check:How to resolve the algorithm HTTPS/Authenticated step by step in the Tcl programming language