My
Thoughts on Writing Compilers and Parsers
The DIY Compiler Website
First Posted October 17, 2012
Last Update July 6, 2013
I am probably not qualified to actually comment on this subject, but,
this is the internet. We can all say what we want, even if it
is
incohernent dribble.
The most difficult thing
to write, perhaps, for a compiler is the parser. At the very
least, it is the most difficult to understand. Many of the
rules
concerning grammars are a complete mystery to me still. When
a
parser generator tells me that a grammar is ambiguous, most
of
the time, I am at a complete loss as to what to do. Although,
I
am still learning. And, as it turns out, I am not the only
one
with this problem. It seems that a lot of people out there
writing parsers are just as, or maybe even more clueless than myself.
The easiest way to write a parser is to
use a parser generator program. Here is a list of my
favorites.
1. Anagram
This program was written by Jerome Holland. He passed away
near
the beginning of the 21st century. But as of this writing,
his
website is still up and you can download a fully working copy of
Anagram. Anagram is an LALR(1) parser generator.
However,
it has a lot of extra features that make it quite powerful
and
flexible. It is also fairly easy to use. When I
first got
it, it did not take long for me to be generating grammars and parsing
input files. One feautre it has is that parsers can be made
fairly easily that do not require scanners. Although, I have
found that if it is any thing remotely complicated, having a lexer make
life a lot easier. If you need a lexer, you will have to use
either a lexer generator, or another Anagram grammar that will function
as a lexer, or just hand code one.
2. LRstar
This program is written by Paul Mann. It will generate either
an
LALR(1) parser or an LALR(k) parser. The fact that it will
generate a parser for more than one lookahead is handy because some
languages are imposible without this feature. However, so
far, I
have not encountered anything I could not do with just plain old
ordinary LALR(1). One of the things I like best about this
parser
generator is the diagnostic information it gives you in the conflicts
report. Using the data in this report made it very easy to
correct the conflicts in the grammar I was working on.
Another
handy thing is that this program has built in Abstract Syntax Tree
generation (AST). LRSTAR can be acquired on sourceforge.
Update on LRstar LR(*) as of July 5, 2013. Paul Mann just
released version 5.0. In earlier versions, it was an open
source
project. However, it now appears to be closed again.
Versions 4.2 and 4.3 did not come with source code, and now
the
only version you can download for free will only generate classic LR(1)
tables. (CLR). This is, unfortunate.
Takes a bit of
fun away from using this program. And last I checked, version
5
was not posted to sourceforge. And Paul had the tendency to
remove all traces of earlier versions from the net before he posted the
next version.
3.
PRECCX
This is an interesting program. It generates LL(k) parsers.
Precc stands for prettier compiler compiler. It
is
available only in source code format on the net. It wasn't
until I started using GCC that I was able to compile this
program
and have it run. I am going to make the WIN32 version I
compiled
here. If you are going to try using PRECCX, I recomend that
you
get a copy of CODEBLOCKS and MinGW. It is an old program.
First time I encountered it was when we were still using DOS.
But it is totally open source, so have fun.
Download
PRECCX for Win32
These are the three parser
generators I have actually used to create programs. There are
tons of parser generators out there. One that I hope to try
soon
is called PCCTS (ANTLR). This is a LL(k) parser generator.
LL parsers are a bit trickier to use, but have many redeaming
features which make them very popular. The gcc compiler was
writen using a hand coded LL parser. It was a hand coded
recursive descent parser, but this just goes to show you how many ways
there are to do this job.
And of course, there are recursive descent
parsers. If you want true DIY, this is the way to go.
Their
biggest drawback is they are very easy to break. If you need
to
change the language, it could get real messy.
RESOURCES:
This is a website that lists free tools:
Free Compiler Construction Tools: Lexers, Parser Generators, Optimizers
Have Fun parsing.