How to resolve the algorithm UTF-8 encode and decode step by step in the Action! programming language

Published on 12 May 2024 09:40 PM

How to resolve the algorithm UTF-8 encode and decode step by step in the Action! programming language

Table of Contents

Problem Statement

As described in UTF-8 and in Wikipedia, UTF-8 is a popular encoding of (multi-byte) Unicode code-points into eight-bit octets. The goal of this task is to write a encoder that takes a unicode code-point (an integer representing a unicode character) and returns a sequence of 1–4 bytes representing that character in the UTF-8 encoding. Then you have to write the corresponding decoder that takes a sequence of 1–4 UTF-8 encoded bytes and return the corresponding unicode character. Demonstrate the functionality of your encoder and decoder on the following five characters: Provided below is a reference implementation in Common Lisp.

Let's start with the solution:

Step by Step solution about How to resolve the algorithm UTF-8 encode and decode step by step in the Action! programming language

Source code in the action! programming language

TYPE Unicode=[BYTE bc1,bc2,bc3]
BYTE ARRAY hex=['0 '1 '2 '3 '4 '5 '6 '7 '8 '9 'A 'B 'C 'D 'E 'F]

BYTE FUNC DecodeHex(CHAR c)
  BYTE i

  FOR i=0 TO 15
  DO
    IF c=hex(i) THEN
      RETURN (i)
    FI
  OD
  Break()
RETURN (255)

BYTE FUNC DecodeHex2(CHAR c1,c2)
  BYTE h1,h2,res

  h1=DecodeHex(c1)
  h2=DecodeHex(c2)
  res=(h1 LSH 4)%h2
RETURN (res)

PROC ValUnicode(CHAR ARRAY s Unicode POINTER u)
  BYTE i,len

  len=s(0)
  IF len<6 AND len>8 THEN Break() FI
  IF s(1)#'U OR s(2)#'+ THEN Break() FI
  
  IF len=6 THEN
    u.bc1=0
  ELSEIF len=7 THEN
    u.bc1=DecodeHex(s(3))
    IF u.bc1>$10 THEN Break() FI
  ELSE
    u.bc1=DecodeHex2(s(3),s(4))
  FI
  u.bc2=DecodeHex2(s(len-3),s(len-2))
  u.bc3=DecodeHex2(s(len-1),s(len))
RETURN

PROC PrintHex2(BYTE x)
  Put(hex(x RSH 4))
  Put(hex(x&$0F))
RETURN

PROC StrUnicode(Unicode POINTER u)
  Print("U+")
  IF u.bc1>$F THEN
    PrintHex2(u.bc1)
  ELSEIF u.bc1>0 THEN
    Put(hex(u.bc1))
  FI
  PrintHex2(u.bc2)
  PrintHex2(u.bc3)
RETURN

PROC PrintArray(BYTE ARRAY a BYTE len)
  BYTE i

  Put('[)
  FOR i=0 TO len-1
  DO
    IF i>0 THEN Put(32 )FI
    PrintHex2(a(i))
  OD
  Put('])
RETURN

PROC Encode(Unicode POINTER u BYTE ARRAY buf BYTE POINTER len)
  IF u.bc1>0 THEN
    len^=4
    buf(0)=$F0 % (u.bc1 RSH 2)
    buf(1)=$80 % ((u.bc1 & $03) LSH 4) % (u.bc2 RSH 4)
    buf(2)=$80 % ((u.bc2 & $0F) LSH 2) % (u.bc3 RSH 6)
    buf(3)=$80 % (u.bc3 & $3F)
  ELSEIF u.bc2>=$08 THEN
    len^=3
    buf(0)=$E0 % (u.bc2 RSH 4)
    buf(1)=$80 % ((u.bc2 & $0F) LSH 2) % (u.bc3 RSH 6)
    buf(2)=$80 % (u.bc3 & $3F)
  ELSEIF u.bc2>0 OR u.bc3>=$80 THEN
    len^=2
    buf(0)=$C0 % (u.bc2 LSH 2) % (u.bc3 RSH 6)
    buf(1)=$80 % (u.bc3 & $3F)
  ELSE
    len^=1
    buf(0)=u.bc3
  FI
RETURN

PROC Decode(BYTE ARRAY buf BYTE len Unicode POINTER u)
  IF len=1 THEN
    u.bc1=0
    u.bc2=0
    u.bc3=buf(0)
  ELSEIF len=2 THEN
    u.bc1=0
    u.bc2=(buf(0) & $1F) RSH 2
    u.bc3=(buf(0) LSH 6) % (buf(1) & $3F)
  ELSEIF len=3 THEN
    u.bc1=0
    u.bc2=(buf(0) LSH 4) % ((buf(1) & $3F) RSH 2)
    u.bc3=(buf(1) LSH 6) % (buf(2) & $3F)
  ELSEIF len=4 THEN
    u.bc1=((buf(0) & $07) LSH 2) % ((buf(1) & $3F) RSH 4)
    u.bc2=(buf(1) LSH 4) % ((buf(2) & $3F) RSH 2)
    u.bc3=((buf(2) & $03) LSH 6) % (buf(3) & $3F)
  ELSE
    Break()
  FI
RETURN

PROC Main()
  DEFINE PTR="CARD"
  DEFINE COUNT="11"
  PTR ARRAY case(COUNT)
  Unicode uni,res
  BYTE ARRAY buf(4)
  BYTE i,len

  case(0)="U+0041"
  case(1)="U+00F6"
  case(2)="U+0416"
  case(3)="U+20AC"
  case(4)="U+1D11E"
  case(5)="U+0024"
  case(6)="U+00A2"
  case(7)="U+0939"
  case(8)="U+20AC"
  case(9)="U+D55C"
  case(10)="U+10348"

  FOR i=0 TO COUNT-1
  DO
    IF i=0 THEN
      PrintE("From RosettaCode:")
    ELSEIF i=5 THEN
      PutE() PrintE("From Wikipedia:")
    FI
    ValUnicode(case(i),uni)
    Encode(uni,buf,@len)
    Decode(buf,len,res)

    StrUnicode(uni) Print(" -> ")
    PrintArray(buf,len) Print(" -> ")
    StrUnicode(res) PutE()
  OD
RETURN

  

You may also check:How to resolve the algorithm Associative array/Iteration step by step in the SNOBOL4 programming language
You may also check:How to resolve the algorithm Anonymous recursion step by step in the EchoLisp programming language
You may also check:How to resolve the algorithm Function composition step by step in the OCaml programming language
You may also check:How to resolve the algorithm Variables step by step in the ChucK programming language
You may also check:How to resolve the algorithm Globally replace text in several files step by step in the TXR programming language