How to resolve the algorithm Entropy step by step in the BASIC programming language
How to resolve the algorithm Entropy step by step in the BASIC programming language
Table of Contents
Problem Statement
Calculate the Shannon entropy H of a given input string. Given the discrete random variable
X
{\displaystyle X}
that is a string of
N
{\displaystyle N}
"symbols" (total characters) consisting of
n
{\displaystyle n}
different characters (n=2 for binary), the Shannon entropy of X in bits/symbol is : where
c o u n
t
i
{\displaystyle count_{i}}
is the count of character
n
i
{\displaystyle n_{i}}
. For this task, use X="1223334444" as an example. The result should be 1.84644... bits/symbol. This assumes X was a random variable, which may not be the case, or it may depend on the observer. This coding problem calculates the "specific" or "intensive" entropy that finds its parallel in physics with "specific entropy" S0 which is entropy per kg or per mole, not like physical entropy S and therefore not the "information" content of a file. It comes from Boltzmann's H-theorem where
S
k
B
N H
{\displaystyle S=k_{B}NH}
where N=number of molecules. Boltzmann's H is the same equation as Shannon's H, and it gives the specific entropy H on a "per molecule" basis. The "total", "absolute", or "extensive" information entropy is This is not the entropy being coded here, but it is the closest to physical entropy and a measure of the information content of a string. But it does not look for any patterns that might be available for compression, so it is a very restricted, basic, and certain measure of "information". Every binary file with an equal number of 1's and 0's will have S=N bits. All hex files with equal symbol frequencies will have
S
N
log
2
( 16 )
{\displaystyle S=N\log _{2}(16)}
bits of entropy. The total entropy in bits of the example above is S= 10*18.4644 = 18.4644 bits. The H function does not look for any patterns in data or check if X was a random variable. For example, X=000000111111 gives the same calculated entropy in all senses as Y=010011100101. For most purposes it is usually more relevant to divide the gzip length by the length of the original data to get an informal measure of how much "order" was in the data. Two other "entropies" are useful: Normalized specific entropy: which varies from 0 to 1 and it has units of "entropy/symbol" or just 1/symbol. For this example, Hn<\sub>= 0.923. Normalized total (extensive) entropy: which varies from 0 to N and does not have units. It is simply the "entropy", but it needs to be called "total normalized extensive entropy" so that it is not confused with Shannon's (specific) entropy or physical entropy. For this example, Sn<\sub>= 9.23. Shannon himself is the reason his "entropy/symbol" H function is very confusingly called "entropy". That's like calling a function that returns a speed a "meter". See section 1.7 of his classic A Mathematical Theory of Communication and search on "per symbol" and "units" to see he always stated his entropy H has units of "bits/symbol" or "entropy/symbol" or "information/symbol". So it is legitimate to say entropy NH is "information". In keeping with Landauer's limit, the physics entropy generated from erasing N bits is
S
H
2
N
k
B
ln ( 2 )
{\displaystyle S=H_{2}Nk_{B}\ln(2)}
if the bit storage device is perfectly efficient. This can be solved for H2*N to (arguably) get the number of bits of information that a physical entropy represents.
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Entropy step by step in the BASIC programming language
Source code in the basic programming language
10 DEF FN L(X)=LOG(X)/LOG(2)
20 S$="1223334444"
30 U$=""
40 FOR I=1 TO LEN(S$)
50 K=0
60 FOR J=1 TO LEN(U$)
70 IF MID$(U$,J,1)=MID$(S$,I,1) THEN K=1
80 NEXT J
90 IF K=0 THEN U$=U$+MID$(S$,I,1)
100 NEXT I
110 DIM R(LEN(U$)-1)
120 FOR I=1 TO LEN(U$)
130 C=0
140 FOR J=1 TO LEN(S$)
150 IF MID$(U$,I,1)=MID$(S$,J,1) THEN C=C+1
160 NEXT J
170 R(I-1)=(C/LEN(S$))*FN L(C/LEN(S$))
180 NEXT I
190 E=0
200 FOR I=0 TO LEN(U$)-1
210 E=E-R(I)
220 NEXT I
230 PRINT E
FUNCTION L (X)
L = LOG(X) / LOG(2)
END FUNCTION
S$ = "1223334444"
U$ = ""
FOR I = 1 TO LEN(S$)
K = 0
FOR J = 1 TO LEN(U$)
IF MID$(U$, J, 1) = MID$(S$, I, 1) THEN K = 1
NEXT J
IF K = 0 THEN U$ = U$ + MID$(S$, I, 1)
NEXT I
DIM R(LEN(U$) - 1)
FOR I = 1 TO LEN(U$)
C = 0
FOR J = 1 TO LEN(S$)
IF MID$(U$, I, 1) = MID$(S$, J, 1) THEN C = C + 1
NEXT J
R(I - 1) = (C / LEN(S$)) * L(C / LEN(S$))
NEXT I
E = 0
FOR I = 0 TO LEN(U$) - 1
E = E - R(I)
NEXT I
PRINT E
END
10 LET X$="1223334444"
20 LET U$=""
30 FOR I=1 TO LEN X$
40 LET K=0
50 FOR J=1 TO LEN U$
60 IF U$(J)=X$(I) THEN LET K=K+1
70 NEXT J
80 IF K=0 THEN LET U$=U$+X$(I)
90 NEXT I
100 DIM R(LEN U$)
110 FOR I=1 TO LEN U$
120 LET C=0
130 FOR J=1 TO LEN X$
140 IF U$(I)=X$(J) THEN LET C=C+1
150 NEXT J
160 LET R(I)=C/LEN X$*LN (C/LEN X$)/LN 2
170 NEXT I
180 LET E=0
190 FOR I=1 TO LEN U$
200 LET E=E-R(I)
210 NEXT I
220 PRINT E
You may also check:How to resolve the algorithm Permutations step by step in the XPL0 programming language
You may also check:How to resolve the algorithm File modification time step by step in the Objeck programming language
You may also check:How to resolve the algorithm Fusc sequence step by step in the Arturo programming language
You may also check:How to resolve the algorithm Determine if a string is numeric step by step in the Nanoquery programming language
You may also check:How to resolve the algorithm Sorting algorithms/Insertion sort step by step in the NetRexx programming language