Extended SQL Parser Design
This documentation explains the technical decision made in building a SQLparser in Go. It is used to parsed the extended SELECT syntax of SQL thatintegrates TensorFlow Estimators.
Related Work
Lexer and Parser Generator
In 2001, when I was in graduate school, I defined an extended SQL syntax forquerying distributed relational databases, as part of the course project ofDistributed RDBMS by Prof. Li-zhu Zhou. I wrote the parser using bison
(amodern version of yacc
) and flex
(a modern version oflex
). yacc
and lex
generate C code;bison
and flex
generate C++ code. However, this time, I’d use Go.
I surveyed goyacc
, astandard Go tool. The usage is very similar to that of yacc
and bison
.However, the Go toolchain doesn’t provide a tool like lex
/flex
.
Google revealed golex
, which is out ofmaintenance.
The Mediumpostrecommends Ragel, which is a C++program and could generate Go lexer; however, it lacks documents.
Handwritten Lexer and Parser
Some documents, including thisonerecommends handwriting lexers. However, it doesn’t explain how to write theparser.
GoAcademy always provides high-quality tech blog posts. Thisone is from theauthor of InfluxDB. However, Istopped at where it explains wrapping a SQL statement as a string by anio.Reader
, because it is obvious that we should keep the string as a string sothat that token strings could refer to the same memory storage of the SQLstatement.
Following a link in the above GoAcademy post, I found Rob Pike’s excellent talkon how to write a lexer in Go in 2011. Many works after that change Rob’simplementation somehow but always lead to longer and less comprehensiblecodebases.
The Choice
Therefore, I wrote the lexer and parser both following Rob Pike’s idea. Afterfew days work, I realized that:
- I should borrow the idea from Rob to represent SQL statements as strings, butnot
io.Reader
as other work do, - but no need to use channels and goroutines at all, and
- it is technically intractable to write a SQL lexer/parser manually.So, I switched to write a lexer manually, and to generate the parser usinggoyacc. During my work, I referred to thisexample andthe official
yacc
manual for detailsabout operator association and precedence.