In the realm of formal languages and automata theory, Context-Free Languages (CFLs) hold a special place due to their significance in parsing and the syntactic structure of programming languages. This guide delves into the fundamentals of CFLs, exploring their grammar, derivation, and various examples to illustrate key concepts.
Structure of Sentences
A sentence in a CFG can be broken down into various components:
Sentence: nounPhrase, verbPhrase, prepPhrase, nounPhrase
Noun Phrase: adj + nounPhrase | noun
Verb Phrase: verbPhrase + adv | verb
Example:
The big boy runs quickly towards a catboy.
Noun Phrase: The big boy
Verb Phrase: runs quickly
Prepositional Phrase: towards a catboy
Deriving Sentences
The process of derivation involves replacing variables with production rules until terminals are achieved. The grammar yields the sentences.
Conditional Statements
Conditional statement: If predicates Then commands
Predicates: a boolean expression (e.g., expression < expression, exp = exp, exp <= exp)
Exp: variable, constant, expression + expression, expression * expression, exp^exp, exp-exp
Example:
Derive 3x + 2 < 5 - Z
Predicate: exp < exp
exp + exp < exp
exp * exp + exp < exp
3 * x + 2 < 5 - Z
Examples of Context-Free Languages
Exp1: (A union B)*
Grammar: S -> aS | bS | epsilon
Derivation for “abbaba”:
S -> aS -> abS -> abbS -> abbaS -> abbabS -> abbabaS -> abbaba
Exp2: A^nB^n (n >= 0)
Grammar: S -> aSb | epsilon
Derivation for “aaabbb”:
S -> aSb -> aaSbb -> aaaSbbb -> aaabbb
Exp3: A^nB^n (n > 0)
Grammar: S -> aSb | ab
Derivation for “aaabbb”:
S -> aSb -> aaSbb -> aaabbb
Exp4: A^2nB^n (n >= 0)
Grammar: S -> aaSb | epsilon
Derivation for “aaaaabbb”:
S -> aaSb -> aaaaSb -> aaaaabbb
Exp5: 01* union 1
Grammar: S -> 0T | 1; T -> 1T | epsilon
Derivation for “01111”:
S -> 0T -> 01T -> 011T -> 0111T -> 01111
Exp6: Words of odd length with middle symbol 0, alphabet {0,1}
Grammar 1: S -> 0 | 1E | 0E | F1 | F0; E -> S0 | S1; F -> 0S | 1S
Grammar 2: S -> 0S0 | 1S1 | 1S0 | 0S1 | 0
Grammar 3: S -> TST | 0; T -> 0 | 1
Derivation for “01101011111”:
S -> 0E -> 0S1 -> 0F11 -> 01S11 -> 01101S11111 (change S to 0)
Exp7: A^nB^m (n != m)
Grammar: S -> aSb | C | D; C -> aC | a; B -> bB | b
Derivation: Put more a or more b
More Examples of Context-Free Languages
Exp8: Palindromes, include empty words, odd or even length
Grammar: S -> 0S0 | 1S1 | 0 | 1 | epsilon
Derivation for “110011”:
S -> 1S1 -> 11S11 -> 110S011 -> 1100S10011 -> 110011
Exp9: Empty Language
Grammar: S -> S
Exp10: {Epsilon}
Grammar: S -> epsilon
Exp11: Parentheses, Sigma = {(,)}
Grammar: S -> SS | (S) | ()
Derivation for “(()())”:
S -> (S) -> (SS) -> (()S) -> (()(S)) -> (()())
Exp12: {w|w ends in aba} = Sigma*aba
Grammar: S -> aS | bS | aX; X -> bY; Y -> a
Alternative Grammar: S -> aS | bS | aba
Exp13: A^kB^mC^k (k > 0, m >= 0)
Grammar: S -> aSc | aBc; B -> bB | epsilon
Derivation for “aaabbccc”:
S -> aSc -> aaScc -> aaaSccc -> aaabbccc
Exp14: A^kB^mC^mD^l (k >= 0, m > 0, l >= 3)
Grammar: S -> ATD; A -> aA | epsilon; D -> dD | ddd; T -> bTc | bc
Derivation for “aabbbcccddd”:
S -> ATD -> aATD -> aaATD -> aaATddd -> aaTddd -> aabTcddd -> aabbTccddd -> aabbbcccddd
Exp15: A^nB^mC^mD^n (m >= 0, n >= 0)
Grammar: S -> aSd | T; T -> bTc | epsilon
Derivation for “aaabbbcccddd”:
S -> aSd -> aaSdd -> aaaSddd -> aaabTcddd -> aaabbTccddd -> aaabbbcccddd
Exp16: A^kB^lC^(k+l) (k > 0, l >= 0)
Grammar: S -> aTc | aSc; T -> bTc | epsilon
Derivation for “aabbccccc”:
S -> aTc -> abTcc -> abbTccc -> aabbcccc
Exp17: {w|w has an equal number of 1 and 0}
Grammar: S -> epsilon | S0S1S | S1S0S | S -> SS | 0S1 | 1S0 | epsilon
Exp18: {w|w has twice as many 1 and 0}
Grammar: S -> epsilon | 0S1S1 | 1S0S1S | 1S1S0S
Properties of Context-Free Languages
Regular languages are closed under: union, concatenation, *, intersection, complement.
Theorem: CFLs are closed under union:
L1 is generated by G1 = (V1, Sigma1, R1, S1)
L2 is generated by G2 = (V2, Sigma2, R2, S2)
L1 union L2 is generated by G2, R1 union R2 union {S -> S1 | S2}
Theorem: CFLs are closed under concatenation:
L1 is generated by G1 = (V1, Sigma1, R1, S1)
L2 is generated by G2 = (V2, Sigma2, R2, S2)
L1 concatenate L2 is generated by G2, R1 union R2 union {S -> S1S2}
Theorem: CFLs are closed under * (Kleene star):
L1 is generated by G1 = (V1, Sigma1, R1, S1)
L1* is generated by G1 union {S -> epsilon | SS1}
Theorem: CFLs are not closed under intersection:
L1 is generated by G1 = (V1, Sigma1, R1, S1)
L2 is generated by G2 = (V2, Sigma2, R2, S2)
L1 intersect L2 = L3 is not context-free. Example: a^nb^nc^m intersect a^nb^mc^m = a^nb^nc^n is not context-free
Theorem: CFLs are not closed under complement:
L1 is generated by G1 = (V1, Sigma1, R1, S1)
Proof by contradiction: L1 intersect L2 = complement(L1 complement union L2 complement)
Chomsky Normal Form (CNF)
There is a special type of context-free grammar called Chomsky Normal Form (CNF) grammar. All the rules are of the form A -> BC or D -> t.
Theorem: Every CFG is equivalent to a Chomsky Normal Form grammar.
Add a new start symbol variable.
S0 -> S
Epsilon rule: Remove rule A -> epsilon.
A -> epsilon. If B -> xAyACz, then add B -> xyCz | xAyCz | xyACz
Unit rule: If A -> B and B -> C and C -> dDe, then add A -> dDe and B -> dDe
Rule format: Convert rules of the form A -> u1u2u3…un where ui is a terminal or variable to either A -> BC or A -> t.
Add A1A2…A(n-2)
A -> u1A1, A1 -> u2A2, …, A(n-2) -> U(n-1)Un
If ui is a variable, then A -> CtB, Ct -> t
Example: L = b^m c^m d^n e^n, n >= 0, m >= 0
Original Grammar:
S -> AB
A -> bAc | epsilon
B -> dBe | epsilon
Convert to CNF:
S -> AB
A -> bAc | bc
B -> dBe | de
S0 -> epsilon
S0 -> bAc | bc
S0 -> dBe | de
Final CNF:
S -> AB
A -> CbA1 | CbCc
B -> CdB1 | CdCe
B1 -> BCe
S0 -> epsilon