Preface
Some times ago I was scanning Stackoverflow to find a puzzle to solve, and I found one guy was trying to write a piece of software which had needed to answer on one simple question. Is given expression a well formed array? He was searching for a ready to use regular expression but he failed, because this puzzle can’t be solved using regex engine. Why, I will explain later but now I can say that this puzzle can be easily solved using Context free grammar.
Well formed array
You can ask what does the well formed array mean ? I will try to answer by providing some positive and negative examples of such arrays.
1 2 3 4 5 6 |
|
Above are well formed arrays. In opposite below are expressions which are not syntactically consistent with the definition of well formed array.
1 2 3 |
|
Studying those examples we can try to answer on this question. So well formed array is an expression which fulfills following requirements
- first, no blank, character is an open brace
'['
- the last no blank character needs to be a closed brace
- inside array, integers and other well formed arrays can appear
- integers are separated with at least one blank character
Why not regex ?
Let’s simplified our example. Let’s say that we want to write a regular expression which will generate following words
1 2 3 4 |
|
You can notice that the mention above strings are a subset of the set of strings which we want to parse. And here I don’t have also good news. We can’t use regex engine to parse such strings.
Why is it not possible ? Simply speaking regex engine modeled by a Finite Automata (FA) can’t count how many '['
we have already used and check that the same number of ']'
must appear just after the last '['
. FA doesn’t have stack to remember such things. If You are curious about formal proof you can try to google Pumping Lemma phrase. Pumping Lemma provides You a useful tool to proof if a given language (set of words which fulfill given conditions) is not a regular.
Context free grammar
I have already mentioned that to solve our problem (if a given array is well formed) we need to write a parser of some context free grammar. The model of Context free grammar is a Finite Automata with a stack. Thanks to this an Automat is able to remember some facts that have happened (e.g count braces). To write a parser we need first to write down a grammar for expression of well formed array. To do this I will use ebnf (Extended Backus–Naur Form) form.
1 2 3 4 |
|
Now we are ready to write a parser. To be precise I will use top-down parsing strategy which let me directly transform written above grammar into set of recursively called procedures.
Parser
It is a good manner to split parser into 2 parts
- lexer
- parser
Lexer is responsible for grouping letters into tokens. In our grammar we have 4 kinds of tokens
'['
(LB)']'
(RB)- number (NUMBER)
- end - token informing that there is no letter left on input (END)
Tokens are expressed by a class Token
written below
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
Lexer
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
|
Parser
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|