How to resolve the algorithm Strip control codes and extended characters from a string step by step in the Go programming language
How to resolve the algorithm Strip control codes and extended characters from a string step by step in the Go programming language
Table of Contents
Problem Statement
Strip control codes and extended characters from a string.
The solution should demonstrate how to achieve each of the following results:
In ASCII, the control codes have decimal codes 0 through to 31 and 127. On an ASCII based system, if the control codes are stripped, the resultant string would have all of its characters within the range of 32 to 126 decimal on the ASCII table. On a non-ASCII based system, we consider characters that do not have a corresponding glyph on the ASCII table (within the ASCII range of 32 to 126 decimal) to be an extended character for the purpose of this task.
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Strip control codes and extended characters from a string step by step in the Go programming language
The provided Go code demonstrates various methods to strip control characters and extended characters from a given string src
.
-
stripCtlFromBytes
andstripCtlAndExtFromBytes
:- Operate on byte sequences, performing character-by-character filtering.
stripCtlFromBytes
removes any characters with ASCII values less than 32 or equal to 127.stripCtlAndExtFromBytes
additionally removes characters with ASCII values between 32 and 126 (extended characters).
-
stripCtlFromUTF8
andstripCtlAndExtFromUTF8
:- Operate on UTF-8 encoded strings, performing rune-by-rune filtering.
stripCtlFromUTF8
removes any runes with Unicode code points less than 32 or equal to 127.stripCtlAndExtFromUTF8
additionally removes runes with code points between 32 and 126.
-
stripCtlAndExtFromUnicode
:- Uses advanced Unicode normalization and filtering capabilities.
- Decomposes the string using NFKD normalization and then removes all runes that satisfy a specified filter (
isOk
). - In this case, the filter removes runes with code points less than 32 or greater than or equal to 127, effectively removing control characters and extended characters.
The code snippet includes a sample input string (src
) that contains various Unicode characters, control characters, and extended characters. It demonstrates the output of each stripping function, showing how the input string is transformed after removing certain characters.
Source code in the go programming language
package main
import (
"golang.org/x/text/transform"
"golang.org/x/text/unicode/norm"
"fmt"
"strings"
)
// two byte-oriented functions identical except for operator comparing c to 127.
func stripCtlFromBytes(str string) string {
b := make([]byte, len(str))
var bl int
for i := 0; i < len(str); i++ {
c := str[i]
if c >= 32 && c != 127 {
b[bl] = c
bl++
}
}
return string(b[:bl])
}
func stripCtlAndExtFromBytes(str string) string {
b := make([]byte, len(str))
var bl int
for i := 0; i < len(str); i++ {
c := str[i]
if c >= 32 && c < 127 {
b[bl] = c
bl++
}
}
return string(b[:bl])
}
// two UTF-8 functions identical except for operator comparing c to 127
func stripCtlFromUTF8(str string) string {
return strings.Map(func(r rune) rune {
if r >= 32 && r != 127 {
return r
}
return -1
}, str)
}
func stripCtlAndExtFromUTF8(str string) string {
return strings.Map(func(r rune) rune {
if r >= 32 && r < 127 {
return r
}
return -1
}, str)
}
// Advanced Unicode normalization and filtering,
// see http://blog.golang.org/normalization and
// http://godoc.org/golang.org/x/text/unicode/norm for more
// details.
func stripCtlAndExtFromUnicode(str string) string {
isOk := func(r rune) bool {
return r < 32 || r >= 127
}
// The isOk filter is such that there is no need to chain to norm.NFC
t := transform.Chain(norm.NFKD, transform.RemoveFunc(isOk))
// This Transformer could also trivially be applied as an io.Reader
// or io.Writer filter to automatically do such filtering when reading
// or writing data anywhere.
str, _, _ = transform.String(t, str)
return str
}
const src = "déjà vu" + // precomposed unicode
"\n\000\037 \041\176\177\200\377\n" + // various boundary cases
"as⃝df̅" // unicode combining characters
func main() {
fmt.Println("source text:")
fmt.Println(src)
fmt.Println("\nas bytes, stripped of control codes:")
fmt.Println(stripCtlFromBytes(src))
fmt.Println("\nas bytes, stripped of control codes and extended characters:")
fmt.Println(stripCtlAndExtFromBytes(src))
fmt.Println("\nas UTF-8, stripped of control codes:")
fmt.Println(stripCtlFromUTF8(src))
fmt.Println("\nas UTF-8, stripped of control codes and extended characters:")
fmt.Println(stripCtlAndExtFromUTF8(src))
fmt.Println("\nas decomposed and stripped Unicode:")
fmt.Println(stripCtlAndExtFromUnicode(src))
}
You may also check:How to resolve the algorithm Egyptian division step by step in the Raku programming language
You may also check:How to resolve the algorithm Additive primes step by step in the Ksh programming language
You may also check:How to resolve the algorithm Rare numbers step by step in the REXX programming language
You may also check:How to resolve the algorithm Greyscale bars/Display step by step in the Seed7 programming language
You may also check:How to resolve the algorithm Hello world/Text step by step in the MANOOL programming language