How to resolve the algorithm String length step by step in the Zig programming language
How to resolve the algorithm String length step by step in the Zig programming language
Table of Contents
Problem Statement
Find the character and byte length of a string.
This means encodings like UTF-8 need to be handled properly, as there is not necessarily a one-to-one relationship between bytes and characters.
By character, we mean an individual Unicode code point, not a user-visible grapheme containing combining characters.
For example, the character length of "møøse" is 5 but the byte length is 7 in UTF-8 and 10 in UTF-16.
Non-BMP code points (those between 0x10000 and 0x10FFFF) must also be handled correctly: answers should produce actual character counts in code points, not in code unit counts.
Therefore a string like "𝔘𝔫𝔦𝔠𝔬𝔡𝔢" (consisting of the 7 Unicode characters U+1D518 U+1D52B U+1D526 U+1D520 U+1D52C U+1D521 U+1D522) is 7 characters long, not 14 UTF-16 code units; and it is 28 bytes long whether encoded in UTF-8 or in UTF-16.
Please mark your examples with ===Character Length=== or ===Byte Length===.
If your language is capable of providing the string length in graphemes, mark those examples with ===Grapheme Length===.
For example, the string "J̲o̲s̲é̲" ("J\x{332}o\x{332}s\x{332}e\x{301}\x{332}") has 4 user-visible graphemes, 9 characters (code points), and 14 bytes when encoded in UTF-8.
Let's start with the solution:
Step by Step solution about How to resolve the algorithm String length step by step in the Zig programming language
Source code in the zig programming language
const std = @import("std");
fn printResults(alloc: std.mem.Allocator, string: []const u8) !void {
const cnt_codepts_utf8 = try std.unicode.utf8CountCodepoints(string);
// There is no sane and portable extended ascii, so the best
// we get is counting the bytes and assume regular ascii.
const cnt_bytes_utf8 = string.len;
const stdout_wr = std.io.getStdOut().writer();
try stdout_wr.print("utf8 codepoints = {d}, bytes = {d}\n", .{ cnt_codepts_utf8, cnt_bytes_utf8 });
const utf16str = try std.unicode.utf8ToUtf16LeWithNull(alloc, string);
const cnt_codepts_utf16 = try std.unicode.utf16CountCodepoints(utf16str);
const cnt_2bytes_utf16 = try std.unicode.calcUtf16LeLen(string);
try stdout_wr.print("utf16 codepoints = {d}, bytes = {d}\n", .{ cnt_codepts_utf16, 2 * cnt_2bytes_utf16 });
}
pub fn main() !void {
var arena_instance = std.heap.ArenaAllocator.init(std.heap.page_allocator);
defer arena_instance.deinit();
const arena = arena_instance.allocator();
const string1: []const u8 = "Hello, world!";
try printResults(arena, string1);
const string2: []const u8 = "møøse";
try printResults(arena, string2);
const string3: []const u8 = "𝔘𝔫𝔦𝔠𝔬𝔡𝔢";
try printResults(arena, string3);
// \u{332} is underscore of previous character, which the browser may not
// copy correctly
const string4: []const u8 = "J\u{332}o\u{332}s\u{332}e\u{301}\u{332}";
try printResults(arena, string4);
}
You may also check:How to resolve the algorithm Rosetta Code/Rank languages by number of users step by step in the Wren programming language
You may also check:How to resolve the algorithm Monte Carlo methods step by step in the PHP programming language
You may also check:How to resolve the algorithm Square but not cube step by step in the CLU programming language
You may also check:How to resolve the algorithm Topological sort step by step in the R programming language
You may also check:How to resolve the algorithm HTTPS/Authenticated step by step in the Tcl programming language