How to resolve the algorithm Compiler/lexical analyzer step by step in the Raku programming language

Problem Statement
Step by Step Solution
Sourcecode

Problem Statement

Definition from Wikipedia: Create a lexical analyzer for the simple programming language specified below. The program should read input from a file and/or stdin, and write output to a file and/or stdout. If the language being used has a lexer module/library/class, it would be great if two versions of the solution are provided: One without the lexer module, and one with. The simple programming language to be analyzed is more or less a subset of C. It supports the following tokens: These differ from the the previous tokens, in that each occurrence of them has a value associated with it. For example, the following two program fragments are equivalent, and should produce the same token stream except for the line and column positions: The program output should be a sequence of lines, each consisting of the following whitespace-separated fields:

This task is intended to be used as part of a pipeline, with the other compiler tasks - for example: lex < hello.t | parse | gen | vm Or possibly: lex hello.t lex.out parse lex.out parse.out gen parse.out gen.out vm gen.out

This implies that the output of this task (the lexical analyzer) should be suitable as input to any of the Syntax Analyzer task programs. The following error conditions should be caught: Your solution should pass all the test cases above and the additional tests found Here.

The C and Python versions can be considered reference implementations.

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Compiler/lexical analyzer step by step in the Raku programming language

Source code in the raku programming language

grammar tiny_C {
    rule TOP { ^ <.whitespace>?  + % <.whitespace> <.whitespace>  }

    rule whitespace { [  + %  |  ] }

    token comment    { '/*' ~ '*/' .*? }

    token tokens {
        [
        |    { make $/.ast   }
        |     { make $/.ast    }
        |      { make $/.ast     }
        |  { make $/.ast }
        |     { make $/.ast    }
        |        { make $/.ast       }
        |      { make $/.ast     }
        | 
        ]
    }

    proto token operator    {*}
    token operator:sym<*>   { '*'               { make 'Op_multiply'    } }
    token operator:sym   { '/'  { make 'Op_divide'      } }
    token operator:sym<%>   { '%'               { make 'Op_mod'         } }
    token operator:sym<+>   { '+'               { make 'Op_add'         } }
    token operator:sym<->   { '-'               { make 'Op_subtract'    } }
    token operator:sym('<='){ '<='              { make 'Op_lessequal'   } }
    token operator:sym('<') { '<'               { make 'Op_less'        } }
    token operator:sym('>='){ '>='              { make 'Op_greaterequal'} }
    token operator:sym('>') { '>'               { make 'Op_greater'     } }
    token operator:sym<==>  { '=='              { make 'Op_equal'       } }
    token operator:sym  { '!='              { make 'Op_notequal'    } }
    token operator:sym   { '!'               { make 'Op_not'         } }
    token operator:sym<=>   { '='               { make 'Op_assign'      } }
    token operator:sym<&&>  { '&&'              { make 'Op_and'         } }
    token operator:sym<||>  { '||'              { make 'Op_or'          } }

    proto token keyword      {*}
    token keyword:sym    { 'if'    { make 'Keyword_if'    } }
    token keyword:sym  { 'else'  { make 'Keyword_else'  } }
    token keyword:sym  { 'putc'  { make 'Keyword_putc'  } }
    token keyword:sym { 'while' { make 'Keyword_while' } }
    token keyword:sym { 'print' { make 'Keyword_print' } }

    proto token symbol  {*}
    token symbol:sym<(> { '(' { make 'LeftParen'  } }
    token symbol:sym<)> { ')' { make 'RightParen' } }
    token symbol:sym<{> { '{' { make 'LeftBrace'  } }
    token symbol:sym<}> { '}' { make 'RightBrace' } }
    token symbol:sym<;> { ';' { make 'Semicolon'   } }
    token symbol:sym<,> { ',' { make 'Comma'       } }

    token identifier { <[_A..Za..z]><[_A..Za..z0..9]>* { make 'Identifier ' ~ $/ } }
    token integer    { <[0..9]>+                       { make 'Integer '    ~ $/ } }

    token char {
        '\'' [<-[']> | '\n' | '\\\\'] '\''
        { make 'Char_Literal ' ~ $/.subst("\\n", "\n").substr(1, *-1).ord }
    }

    token string {
        '"' <-["\n]>* '"' #'
        {
            make 'String ' ~ $/;
            note 'Error: Unknown escape sequence.' and exit if (~$/ ~~ m:r/ >[\\<-[n\\]>]> /);
        }
    }

    token eoi { $ { make 'End_of_input' } }

    token error {
        | '\'''\''                   { note 'Error: Empty character constant.' and exit }
        | '\'' <-[']> ** {2..*} '\'' { note 'Error: Multi-character constant.' and exit }
        | '/*' <-[*]>* $             { note 'Error: End-of-file in comment.'   and exit }
        | '"' <-["]>* $              { note 'Error: End-of-file in string.'    and exit }
        | '"' <-["]>*? \n            { note 'Error: End of line in string.'    and exit } #'
    }
}

sub parse_it ( $c_code ) {
    my $l;
    my @pos = gather for $c_code.lines>>.chars.kv -> $line, $v {
        take [ $line + 1, $_ ] for 1 .. ($v+1); # v+1 for newline
        $l = $line+2;
    }
    @pos.push: [ $l, 1 ]; # capture eoi

    for flat $c_code.list, $c_code -> $m {
        say join "\t", @pos[$m.from].fmt('%3d'), $m.ast;
    }
}

my $tokenizer = tiny_C.parse(@*ARGS[0].IO.slurp);
parse_it( $tokenizer );

You may also check:How to resolve the algorithm Hilbert curve step by step in the 11l programming language
You may also check:How to resolve the algorithm Hofstadter Q sequence step by step in the Nim programming language
You may also check:How to resolve the algorithm Zig-zag matrix step by step in the Seed7 programming language
You may also check:How to resolve the algorithm Plot coordinate pairs step by step in the Python programming language
You may also check:How to resolve the algorithm Exponentiation with infix operators in (or operating on) the base step by step in the Mathematica / Wolfram Language programming language

How to resolve the algorithm Compiler/lexical analyzer step by step in the Raku programming language

Table of Contents

Problem Statement

Step by Step solution about How to resolve the algorithm Compiler/lexical analyzer step by step in the Raku programming language

Source code in the raku programming language