Building an interpreter

Apr 10, 2009 at 5:09 AM
Edited Apr 10, 2009 at 5:14 AM
I bookmarked your codeproject article several months back.... saving it for the day i decided to build my first interpreter.

I first started with building a parser that understood grammar .. npeg.codeplex.com
similar to how i am reusing a parser by feeding in a set of rules.
I would like to do something similar for an interpreter.

since you have a completed and functioning interpreter wanted to see if you could provide any insight ...

Example
what references did you find most useful (wish you had when you first started)?
what are the building blocks for an interpreter?
 - example: stack (functions, echo statements),  heap (short lived objects), link register
how do you recommend i partition

I'm sure you've toyed around with creating other interpreters.. have you been able to discover a pattern across all that could be put into a library versus recreating it per project.


thanks for any info

-lm


##################################

Here is what i have so far:
(php style parser / view template parser)

<code>
 public static void Main(String[] args)
        {
            String input = @"
                markup text

                <?flexibleconfig
                    echo 'should be interpreted by interpreter';
                ?>


                <?flexibleconfig incomplete context ignored by interpreter seen as data

                <?flexibleconfig
                    function test()
                    {
                        echo 'hello world';
                    }

                    echo test() + 'should be interpreted by interpreter';
                ?>

                ?> incomplete context ignored by interpreter seen as data
            ";

            Parser parser = new Parser();
            NPEG.AstNode root = parser.Parse(input);



// convert everything to valid statements for the interpreter
            StringBuilder normalize = new StringBuilder();
            root.Children.ForEach(n => {
            
                switch(n.Token.Name)
                {
                    case "Text":
                        normalize.AppendFormat("echo '{0}';\n", n.Token.Value);
                        break;
                    case "Interpret":
                        normalize.AppendFormat("{0}\n", n.Children[0].Token.Value);
                        break;
                    default: break;
                }

            });


            Interpreter interpreter = new Interpreter();
            interpreter.Execute(normalize.ToString(), new Context());

</code>


// uses peg rules with npeg.codeplex.com:   parser.Parse(input);
            AExpression ROOT = Grammar.Load(
                @"
                    S: [\r\n ]*;
                    STARTTAG: '<?flexibleconfig' S;
                    ENDTAG: S '?>';
                    (?<Interpret>): STARTTAG (?<CODE>(!(ENDTAG / STARTTAG).)*) ENDTAG;
                    (?<Text>): (!Interpret .)*;
                    (?<Root>): (&. (Interpret / Text))* ;
                "
               );


//doesn't do anything yet .. waiting for your reply : )
            interpreter.Execute(normalize.ToString(), new Context());





normalized output that will be sent to interpreter
=================================
echo '
                markup text

                ';

echo 'should be interpreted by interpreter';

echo '


                <?flexibleconfig incomplete context ignored by interpreter seen as data

                ';

function test()
                    {
                        echo 'hello world';
                    }

                    echo test() + 'should be interpreted by interpreter';

echo '

                ?> incomplete context ignored by interpreter seen as data
           ';



Feb 23, 2011 at 9:23 PM

A bit late but here goes:

My main sources of information where the following books:

  • Compiler Writing in C by Holub
  • Game Scripting Mastery by Varanse

In terms of building blocks, I used a mix of ideas to suit my needs and make life easy, such as:

  • A variant-type string-based dictionary for memory addressing (mapping variable name strings to values). This was actually linked in a hierarchical fashion with a common globaldictionary at the top, followed by a script-scope dictionary, followed by a local functions scope dictionary. When the system looks up the contents of the variable it starts from the local scope and works upwards towards the global one
  • A stack system to handle function calls and returns (I did not use stacks for variables)
  • A high-level assembly-like language with support for variant types. For example, the opcode [ADD dst, src] intelligently combines src with dst regardless of type (int, bool, float or string) nd even when mixed types are used
  • Simple peep-hole optimization to eliminate or combine redundant instruction sequencies. This is applied iteratively until no further improvements can be made - it can cut some assembly code to 1/3 it's original size.

The system is not very efficient in practice, but I think it has pedagogic value by helping programmers understand how to build an interpreter from start to finish. 

Other this, an earlier and simpler implementation (Eezeescript) and a few even simpler experiments, I have not done much with regards to compiler/interpreter writing so I don't think I'm the best person to answer your other questions. I hope however that you find the above details useful (if you're still at it 2 years later :))