How to resolve the algorithm UTF-8 encode and decode step by step in the AutoHotkey programming language

Published on 12 May 2024 09:40 PM

How to resolve the algorithm UTF-8 encode and decode step by step in the AutoHotkey programming language

Table of Contents

Problem Statement

As described in UTF-8 and in Wikipedia, UTF-8 is a popular encoding of (multi-byte) Unicode code-points into eight-bit octets. The goal of this task is to write a encoder that takes a unicode code-point (an integer representing a unicode character) and returns a sequence of 1–4 bytes representing that character in the UTF-8 encoding. Then you have to write the corresponding decoder that takes a sequence of 1–4 UTF-8 encoded bytes and return the corresponding unicode character. Demonstrate the functionality of your encoder and decoder on the following five characters: Provided below is a reference implementation in Common Lisp.

Let's start with the solution:

Step by Step solution about How to resolve the algorithm UTF-8 encode and decode step by step in the AutoHotkey programming language

Source code in the autohotkey programming language

Encode_UTF(hex){
	Bytes :=  hex>=0x10000 ? 4 : hex>=0x0800 ? 3 : hex>=0x0080 ? 2 : hex>=0x0001 ? 1 : 0
	Prefix := [0, 0xC0, 0xE0, 0xF0]
	loop % Bytes {
		if (A_Index < Bytes)
			UTFCode := Format("{:X}", (hex&0x3F) + 0x80) . UTFCode		; 3F=00111111, 80=10000000
		else
			UTFCode := Format("{:X}", hex + Prefix[Bytes]) . UTFCode	; C0=11000000, E0=11100000, F0=11110000
		hex := hex>>6
	}
	return "0x" UTFCode
}
;----------------------------------------------------------------------------------------
Decode_UTF(hex){
	Bytes :=  hex>=0x10000 ? 4 : hex>=0x0800 ? 3 : hex>=0x0080 ? 2 : hex>=0x0001 ? 1 : 0
	bin := ConvertBase(16, 2, hex)
	loop, % Bytes {
		B := SubStr(bin, -7)
		if Bytes > 1
			B := LTrim(B, 1) , B := StrReplace(B, 0,,, 1) 
		bin := SubStr(bin, 1, StrLen(bin)-8)
		Uni := B . Uni
	}
	return "0x" ConvertBase(2, 16, Uni)
}
;----------------------------------------------------------------------------------------
; www.autohotkey.com/boards/viewtopic.php?f=6&t=3607#p18985
ConvertBase(InputBase, OutputBase, number){
    static u := A_IsUnicode ? "_wcstoui64" : "_strtoui64"
    static v := A_IsUnicode ? "_i64tow"    : "_i64toa"
    VarSetCapacity(s, 65, 0)
    value := DllCall("msvcrt.dll\" u, "Str", number, "UInt", 0, "UInt", InputBase, "CDECL Int64")
    DllCall("msvcrt.dll\" v, "Int64", value, "Str", s, "UInt", OutputBase, "CDECL")
    return s
}


data = 
(comment
0x0041
0x00F6
0x0416
0x20AC
0x1D11E
)

output := "unicode`t`tUTF`t`tunicode`n"
for i, Hex in StrSplit(data, "`n", "`r"){
	UTFCode := Encode_UTF(Hex)
	output .= Hex "`t`t" UTFCode "`t`t" Decode_UTF(UTFCode) "`n"
}
MsgBox % output
return


  

You may also check:How to resolve the algorithm Calendar step by step in the Java programming language
You may also check:How to resolve the algorithm Execute a Markov algorithm step by step in the Kotlin programming language
You may also check:How to resolve the algorithm Hello world/Text step by step in the Trith programming language
You may also check:How to resolve the algorithm Hello world/Line printer step by step in the UNIX Shell programming language
You may also check:How to resolve the algorithm Nim game step by step in the 8086 Assembly programming language