Theory Chomsky_Schuetzenberger
theory Chomsky_Schuetzenberger
imports
Context_Free_Grammar.Parse_Tree
Context_Free_Grammar.Chomsky_Normal_Form
Finite_Automata_Not_HF
Dyck_Language_Syms
begin
text ‹This theory proves the Chomsky-Schützenberger representation theorem \<^cite>‹ChomskyS›.
We closely follow Kozen \<^cite>‹Kozen› for the proof.
The theorem states that every context-free language ‹L›
can be written as \<^term>‹h(R ∩ Dyck_lang(Γ))›, for a suitable alphabet ‹Γ›,
a regular language ‹R› and a word-homomorphism ‹h›.
The Dyck language over a set ‹Γ› (also called it's bracket language) is defined as follows:
The symbols of ‹Γ› are paired with ‹[› and ‹]›, as in ‹[⇩g› and ‹]⇩g› for ‹g ∈ Γ›.
The Dyck language over ‹Γ› is the language of correctly bracketed words.
The construction of the Dyck language is found in theory \<^theory>‹Chomsky_Schuetzenberger.Dyck_Language_Syms›.
›
section‹Overview of the Proof›
text‹
A rough proof of Chomsky-Schützenberger is as follows:
Take some context-free grammar for ‹L› with productions ‹P›.
Wlog assume it is in Chomsky Normal Form.
Now define a new language ‹L'› with productions ‹P'› in the following way from ‹P›:
If ‹π = A → BC› let ‹π' = A → [⇧1⇩π B ]⇧1⇩p [⇧2⇩π C ]⇧2⇩p›,
if ‹π = A → a› let ‹π' = A → [⇧1⇩π ]⇧1⇩p [⇧2⇩π ]⇧2⇩p›,
where the brackets are viewed as terminals and the old variables
‹A›, ‹B›, ‹C› are again viewed as nonterminals.
This transformation is implemented by the function ‹transform_prod› below.
Note brackets are now adorned with superscripts
1 and 2 to distinguish the first and second occurrences easily. That is, we work with symbols
that are triples of type ‹{[,]} × old_prod_type × {1,2}›.
This bracketing encodes the parse tree of any old word.
The old word is easily recovered by the homomorphism which sends
‹[⇧1⇩π› to ‹a› if ‹π = A → a›, and sends every other bracket to ‹ε›.
Thus we have ‹h(L') = L› by essentially exchanging ‹π› for ‹π'› and the other way round in the derivation.
The direction ‹⊇› is done in ‹transfer_parse_tree›,
the direction ‹⊆› is done directly in the proof of the main theorem.
Then all that remains to show is, that ‹L'› is of the form ‹R ∩ Dyck_lang Γ›
(for ‹Γ:= P × {1, 2}›) and the regularity of ‹R›.
For this, ‹R := R⇩S› is defined via an intersection of 5 following regular languages.
Each of these is defined via a property on words ‹x›:
➧ ‹P1 x›: after a ‹]⇧1⇩p› there always immediately follows a ‹[⇧2⇩p› in ‹x›.
This especially means, that ‹]⇧1⇩p› cannot be the end of the string.
➧ ‹successively P2 x›: a ‹]⇧2⇩π› is never directly followed by some ‹[› in ‹x›.
➧ ‹successively P3 x›: each ‹[⇧1⇘A→BC⇙› is directly followed by ‹[⇧1⇘B→_⇙› in ‹x›
(last letter isn't checked).
➧ ‹successively P4 x›: each ‹[⇧1⇘A→a⇙› is directly followed by ‹]⇧1⇘A→a⇙› in ‹x›
and each ‹[⇧2⇘A→a⇙› is directly followed by ‹]⇧2⇘A→a⇙› in ‹x› (last letter isn't checked).
➧ ‹P5 A x›: there exists some ‹y› such that the word begins with ‹[⇧1⇘A→y⇙›.
One then shows the key theorem ‹P' ⊢ A →⇧* w ⟷ w ∈ R⇩A ∩ Dyck_lang Γ›:
The ‹→›-direction (see lemma ‹P'_imp_Reg›) is easily checked, by checking that every condition holds
during all derivation steps already. For this one needs a version of R (and all the conditions)
which ignores any Terminals that might still exist in such a derivation step. Since this version
operates on symbols (a different type) it needs a fully new definition. Since these new versions
allow more flexibility on the words, it turns out that the original 5 conditions aren't enough anymore
to fully constrain to the target language. Thus we add two additional
constraints ‹successively P7› and ‹successively P8› on the symbol-version of ‹R⇩A› that vanish when
we ultimately restricts back to words consisting only of terminal symbols. With these the induction goes through:
➧ ‹(successively P7_sym) x›: each ‹Nt Y› is directly preceded by some ‹Tm [⇧1⇘A→YC⇙› or some ‹Tm [⇧2⇘A→BY⇙› in ‹x›;
➧ ‹(successively P8_sym) x›: each ‹Nt Y› is directly followed by some ‹]⇧1⇘A→YC⇙› or some ‹]⇧2⇘A→BY⇙› in ‹x›.
The ‹←›-direction (see lemma ‹Reg_and_dyck_imp_P'›) is more work.
This time we stick with fully terminal words, so we work with the standard version of ‹R⇩A›:
Proceed by induction on the length of ‹w› generalized over ‹A›.
For this, let ‹x ∈ R⇩A ∩ Dyck_lang Γ›, thus we have the properties
‹P1 x›, ‹successively Pi x› for ‹i ∈ {2,3,4,7,8}› and ‹P5 A x› available.
From ‹P5 A x› we have that there exists ‹π ∈ P› s.t. ‹fst π = A› and ‹x› begins
with ‹[⇧1⇩π›. Since ‹x ∈ Dyck_lang Γ› it is balanced, so it must be of the form
‹x = [⇧1⇩π y ]⇧1⇩π r1›
for some balanced ‹y›. From ‹P1 x› it must then be of the form
‹x = [⇧1⇩π y ]⇧1⇩π [⇧2⇩π r1'.›
Since ‹x› is balanced it must then be of the form
‹x = [⇧1⇩π y ]⇧1⇩π [⇧2⇩π z ]⇧2⇩π r2›
for some balanced ‹z›. Then ‹r2› must also be balanced. If ‹r2› was not empty
it would begin with an opening bracket, but ‹P2 x› makes this impossible - so
‹r2 = []› and as such
‹x = [⇧1⇩π y ]⇧1⇩π [⇧2⇩π z ]⇧2⇩π.›
Since our grammar is in CNF, we can consider the following case distinction on ‹π›:
➧ Case 1: ‹π = A → BC›.
Since ‹y,z› are balanced substrings of ‹x› one easily checks
‹Pi y› and ‹Pi z› for ‹i ∈ {1,2,3,4}›. From ‹P3 x› (and ‹π = A → BC›) we further obtain ‹P5 B y› and ‹P5 C z›.
So ‹y ∈ R⇩B ∩ Dyck_lang Γ› and ‹z ∈ R⇩C ∩ Dyck_lang Γ›.
From the induction hypothesis we thus obtain
‹P' ⊢ B →⇧* y› and ‹P' ⊢ C →⇧* z.›
Since ‹π = A → BC› we then have
‹A →⇧1⇘π'⇙ [⇧1⇩π B ]⇧1⇩π [⇧2⇩π C ]⇧2⇩π →⇧* [⇧1⇩π y ]⇧1⇩π [⇧2⇩π z ]⇧2⇩π = x›
as required.
➧ Case 2: ‹π = A → a›.
Suppose we didn't have ‹y = []›. Then from ‹P4 x› (and ‹π = A → a›) we would have ‹y = ]⇧1⇩π›.
But since ‹y› is balanced it needs to begin with an opening bracket, contradiction.
So it must be that ‹y = []›.
By the same argument we also have that ‹z = []›.
So really ‹x = [⇧1⇩π ]⇧1⇩π [⇧2⇩π ]⇧2⇩π› and of course from ‹π = A → a› it holds
‹A →⇧1⇘π'⇙ [⇧1⇩π ]⇧1⇩π [⇧2⇩π ]⇧2⇩π = x› as required.
From the key theorem we obtain (by setting ‹A := S›) that
‹L' = R⇩S ∩ Dyck_lang Γ› as wanted.
Only regularity remains to be shown.
For this we use that
‹R⇩S ∩ Dyck_lang Γ = (R⇩S ∩ brackets Γ) ∩ Dyck_lang Γ›,
where ‹brackets Γ (⊇ Dyck_lang Γ)› is the set of words which only
consist of brackets over ‹Γ›.
Actually, what we defined as ‹R⇩S›, isn't regular, only ‹(R⇩S ∩ brackets Γ)› is.
The intersection restricts to a finite amount of possible brackets,
that are used in states for finite automatons for the 5 languages that ‹R⇩S› is
the intersection of.
Throughout most of the proof below, we implicitly or explicitly assume that the grammar is in CNF.
This is lifted only at the very end.
›
section‹Production Transformation and Homomorphisms›
text ‹A fixed finite set of productions ‹P›, used later on:›
locale locale_P =
fixes P :: "('n,'t) Prods"
assumes finiteP: ‹finite P›
subsection‹Brackets›
text‹A type with 2 elements, for creating 2 copies as needed in the proof:›
datatype version = One | Two
type_synonym ('n,'t) bracket3 = "(('n, 't) prod × version) bracket"
abbreviation open_bracket1 :: "('n, 't) prod ⇒ ('n,'t) bracket3" ("[⇧1⇩_ " [1000]) where
"[⇧1⇩p ≡ (Open (p, One))"
abbreviation close_bracket1 :: "('n,'t) prod ⇒ ('n,'t) bracket3" ("]⇧1⇩_" [1000]) where
"]⇧1⇩p ≡ (Close (p, One))"
abbreviation open_bracket2 :: "('n,'t) prod ⇒ ('n,'t) bracket3" ("[⇧2⇩_" [1000]) where
"[⇧2⇩p ≡ (Open (p, Two))"
abbreviation close_bracket2 :: "('n,'t) prod ⇒ ('n,'t) bracket3" ("]⇧2⇩_" [1000]) where
"]⇧2⇩p ≡ (Close (p, Two))"
text‹Version for p = (A, w) (multiple letters) with bsub and esub:›
abbreviation open_bracket1' :: "('n,'t) prod ⇒ ('n,'t) bracket3" ("[⇧1⇘_⇙ ") where
"[⇧1⇘p⇙ ≡ (Open (p, One))"
abbreviation close_bracket1' :: "('n,'t) prod ⇒ ('n,'t) bracket3" ("]⇧1⇘_⇙") where
"]⇧1⇘p⇙ ≡ (Close (p, One))"
abbreviation open_bracket2' :: "('n,'t) prod ⇒ ('n,'t) bracket3" ("[⇧2⇘_⇙") where
"[⇧2⇘p⇙ ≡ (Open (p, Two))"
abbreviation close_bracket2' :: "('n,'t) prod ⇒ ('n,'t) bracket3" ("]⇧2⇘_⇙ ") where
"]⇧2⇘p⇙ ≡ (Close (p, Two))"
text ‹Nice LaTeX rendering:›
notation (latex output) open_bracket1 ("\<^latex>‹$[^1_{›_\<^latex>‹}$›")
notation (latex output) open_bracket1' ("\<^latex>‹$[^1_{›_\<^latex>‹}$›")
notation (latex output) open_bracket2 ("\<^latex>‹$[^2_{›_\<^latex>‹}$›")
notation (latex output) open_bracket2' ("\<^latex>‹$[^2_{›_\<^latex>‹}$›")
notation (latex output) close_bracket1 ("\<^latex>‹$]^1_{›_\<^latex>‹}$›")
notation (latex output) close_bracket1' ("\<^latex>‹$]^1_{›_\<^latex>‹}$›")
notation (latex output) close_bracket2 ("\<^latex>‹$]^2_{›_\<^latex>‹}$›")
notation (latex output) close_bracket2' ("\<^latex>‹$]^2_{›_\<^latex>‹}$›")
subsection ‹Transformation›
abbreviation wrap1 :: ‹'n ⇒ 't ⇒ ('n, ('n,'t) bracket3) syms› where
‹wrap1 A a ≡
[ Tm [⇧1⇘(A, [Tm a])⇙,
Tm ]⇧1⇘(A, [Tm a])⇙,
Tm [⇧2⇘(A, [Tm a])⇙,
Tm ]⇧2⇘(A, [Tm a])⇙ ]›
abbreviation wrap2 :: ‹'n ⇒ 'n ⇒ 'n ⇒ ('n, ('n,'t) bracket3) syms› where
‹wrap2 A B C ≡
[ Tm [⇧1⇘(A, [Nt B, Nt C])⇙,
Nt B,
Tm ]⇧1⇘(A, [Nt B, Nt C])⇙,
Tm [⇧2⇘(A, [Nt B, Nt C])⇙,
Nt C,
Tm ]⇧2⇘(A, [Nt B, Nt C])⇙ ]›
text‹The transformation of old productions to new productions used in the proof:›
fun transform_rhs :: "'n ⇒ ('n, 't) syms ⇒ ('n, ('n,'t) bracket3) syms" where
‹transform_rhs A [Tm a] = wrap1 A a› |
‹transform_rhs A [Nt B, Nt C] = wrap2 A B C›
text ‹The last equation is only added to permit us to state lemmas about ›
fun transform_prod :: "('n, 't) prod ⇒ ('n, ('n,'t) bracket3) prod" where
‹transform_prod (A, α) = (A, transform_rhs A α)›
subsection‹Homomorphisms›
text‹Definition of a monoid-homomorphism where multiplication is ‹(@)›:›
definition hom_list :: ‹('a list ⇒ 'b list) ⇒ bool› where
‹hom_list h = (∀a b. h (a @ b) = h a @ h b)›
lemma hom_list_Nil: "hom_list h ⟹ h [] = []"
unfolding hom_list_def by (metis self_append_conv)
text‹The homomorphism on single brackets:›
fun the_hom1 :: ‹('n,'t) bracket3 ⇒ 't list› where
‹the_hom1 [⇧1⇘(A, [Tm a])⇙ = [a]› |
‹the_hom1 _ = []›
text‹The homomorphism on single bracket symbols:›
fun the_hom_sym :: ‹('n, ('n,'t) bracket3) sym ⇒ ('n,'t) sym list› where
‹the_hom_sym (Tm [⇧1⇘(A, [Tm a])⇙) = [Tm a]› |
‹the_hom_sym (Nt A) = [Nt A]› |
‹the_hom_sym _ = []›
text‹The homomorphism on bracket words:›
fun the_hom :: ‹('n,'t) bracket3 list ⇒ 't list › ("𝗁") where
‹the_hom l = concat (map the_hom1 l)›
text‹The homomorphism extended to symbols:›
fun the_hom_syms :: ‹('n, ('n,'t) bracket3) syms ⇒ ('n,'t) syms› where
‹the_hom_syms l = concat (map the_hom_sym l)›
notation the_hom ("𝗁")
notation the_hom_syms ("𝗁𝗌")
lemma the_hom_syms_hom: ‹𝗁𝗌 (l1 @ l2) = 𝗁𝗌 l1 @ 𝗁𝗌 l2›
by simp
lemma the_hom_syms_keep_var: ‹𝗁𝗌 [(Nt A)] = [Nt A]›
by simp
lemma the_hom_syms_tms_inj: ‹𝗁𝗌 w = map Tm m ⟹ ∃w'. w = map Tm w'›
proof(induction w arbitrary: m)
case Nil
then show ?case by simp
next
case (Cons a w)
then obtain w' where ‹w = map Tm w'›
by (metis (no_types, opaque_lifting) append_Cons append_Nil map_eq_append_conv the_hom_syms_hom)
then obtain a' where ‹a = Tm a'›
proof -
assume a1: "⋀a'. a = Tm a' ⟹ thesis"
have f2: "∀ss s. [s::('a, ('a,'b) bracket3) sym] @ ss = s # ss"
by auto
have "∀ss s. (s::('a, 'b) sym) # ss = [s] @ ss"
by simp
then show ?thesis using f2 a1 by (metis sym.exhaust sym.simps(4) Cons.prems map_eq_Cons_D the_hom_syms_hom the_hom_syms_keep_var)
qed
then show ‹∃w'. a # w = map Tm w'›
by (metis List.list.simps(9) ‹w = map Tm w'›)
qed
text‹Helper for showing the upcoming lemma:›
lemma helper: ‹the_hom_sym (Tm x) = map Tm (the_hom1 x)›
by(induction x rule: the_hom1.induct)(auto split: list.splits sym.splits)
text‹Show that the extension really is an extension in some sense:›
lemma h_eq_h_ext: ‹𝗁𝗌 (map Tm x) = map Tm (𝗁 x)›
proof(induction x)
case Nil
then show ?case by simp
next
case (Cons a x)
then show ?case using helper[of a] by simp
qed
lemma the_hom1_strip: ‹(the_hom_sym x') = map Tm w ⟹ the_hom1 (destTm x') = w›
by(induction x' rule: the_hom_sym.induct; auto)
lemma the_hom1_strip2: ‹concat (map the_hom_sym w') = map Tm w ⟹ concat (map (the_hom1 ∘ destTm) w') = w›
proof(induction w' arbitrary: w)
case Nil
then show ?case by simp
next
case (Cons a w')
then show ?case
by(auto simp: the_hom1_strip map_eq_append_conv append_eq_map_conv)
qed
lemma h_eq_h_ext2:
assumes ‹𝗁𝗌 w' = (map Tm w)›
shows ‹𝗁 (map destTm w') = w›
using assms by (simp add: the_hom1_strip2)
section‹The Regular Language›
text‹The regular Language @{term ‹Reg›} will be an intersection of 5 Languages.
The languages ‹2, 3 ,4› are defined each via a relation ‹P2, P3, P4› on neighbouring letters and
lifted to a language via @{term ‹successively›}.
Language ‹1› is an intersection of another such lifted relation ‹P1'› and a
condition on the last letter (if existent).
Language ‹5› is a condition on the first letter (and requires it to exist).
It takes a term of type ‹'n› (the original variable type) as parameter.
Additionally a version of each language (taking symbols as input) is defined
which allows arbitrary interspersion of nonterminals.
As this interspersion weakens the description, the symbol version of the regular language (@{term ‹Reg_sym›})
is defined using two additional languages lifted from ‹P7› and ‹P8›. These vanish when restricted to
words only containing terminals.
As stated in the introductory text, these languages will only be regular, when constrained to a finite bracket set.
The theorems about this, are in the later section ‹Showing Regularity›.
›
subsection‹‹P1››
text‹‹P1› will define a predicate on string elements.
It will be true iff each ‹]⇧1⇩p› is directly followed by ‹[⇧2⇩p›.
That also means ‹]⇧1⇩p› cannot be the end of the string.›
text‹But first we define a helper function, that only captures the neighbouring condition for two strings:›
fun P1' :: ‹('n,'t) bracket3 ⇒ ('n,'t) bracket3 ⇒ bool› where
‹P1' ]⇧1⇩p [⇧2⇩p' = (p = p')› |
‹P1' ]⇧1⇩p y = False› |
‹P1' x y = True›
text‹A version of @{term ‹P1'›} for symbols, i.e. strings that may still contain Nt's:›
fun P1'_sym :: ‹('n, ('n,'t) bracket3) sym ⇒ ('n, ('n,'t) bracket3) sym ⇒ bool› where
‹P1'_sym (Tm ]⇧1⇩p) (Tm [⇧2⇩p') = (p = p')› |
‹P1'_sym (Tm ]⇧1⇩p) y = False› |
‹P1'_sym x y = True›
lemma P1'D[simp]:
‹P1' ]⇧1⇩p r ⟷ r = [⇧2⇩p›
by(induction ‹]⇧1⇩p› ‹r› rule: P1'.induct) auto
text‹Asserts that ‹P1'› holds for every pair in xs, and that xs doesnt end in ‹]⇧1⇩p›:›
fun P1 :: "('n, 't) bracket3 list ⇒ bool" where
‹P1 xs = ((successively P1' xs) ∧ (if xs ≠ [] then (∄p. last xs = ]⇧1⇩p) else True))›
text‹Asserts that ‹P1'› holds for every pair in xs, and that xs doesnt end in ‹Tm ]⇧1⇩p›:›
fun P1_sym where
‹P1_sym xs = ((successively P1'_sym xs) ∧ (if xs ≠ [] then (∄p. last xs = Tm ]⇧1⇩p) else True))›
lemma P1_for_tm_if_P1_sym[dest!]: ‹P1_sym (map Tm x) ⟹ P1 x›
proof(induction x rule: induct_list012)
case (3 x y zs)
then show ?case
by(cases ‹(Tm x :: ('a, ('a,'b)bracket3) sym, Tm y :: ('a, ('a,'b)bracket3) sym)› rule: P1'_sym.cases) auto
qed simp_all
lemma P1I[intro]:
assumes ‹successively P1' xs›
and ‹∄p. last xs = ]⇧1⇩p›
shows ‹P1 xs›
proof(cases xs)
case Nil
then show ?thesis using assms by force
next
case (Cons a list)
then show ?thesis using assms by (auto split: version.splits sym.splits prod.splits)
qed
lemma P1_symI[intro]:
assumes ‹successively P1'_sym xs›
and ‹∄p. last xs = Tm ]⇧1⇩p›
shows ‹P1_sym xs›
proof(cases xs rule: rev_cases)
case Nil
then show ?thesis by auto
next
case (snoc ys y)
then show ?thesis
using assms by (cases y) auto
qed
lemma P1_symD[dest]: ‹P1_sym xs ⟹ successively P1'_sym xs› by simp
lemma P1D_not_empty[intro]:
assumes ‹xs ≠ []›
and ‹P1 xs›
shows ‹last xs ≠ ]⇧1⇩p›
proof-
from assms have ‹successively P1' xs ∧ (∄p. last xs = ]⇧1⇩p)›
by simp
then show ?thesis by blast
qed
lemma P1_symD_not_empty'[intro]:
assumes ‹xs ≠ []›
and ‹P1_sym xs›
shows ‹last xs ≠ Tm ]⇧1⇩p›
proof-
from assms have ‹successively P1'_sym xs ∧ (∄p. last xs = Tm ]⇧1⇩p)›
by simp
then show ?thesis by blast
qed
lemma P1_symD_not_empty:
assumes ‹xs ≠ []›
and ‹P1_sym xs›
shows ‹∄p. last xs = Tm ]⇧1⇩p›
using P1_symD_not_empty'[OF assms] by simp
subsection‹‹P2››
text‹A ‹]⇧2⇩π› is never directly followed by some ‹[›:›
fun P2 :: ‹('n,'t) bracket3 ⇒ ('n,'t) bracket3 ⇒ bool› where
‹P2 (Close (p, Two)) (Open (p', v)) = False› |
‹P2 (Close (p, Two)) y = True› |
‹P2 x y = True›
fun P2_sym :: ‹('n, ('n,'t) bracket3) sym ⇒ ('n, ('n,'t) bracket3) sym ⇒ bool› where
‹P2_sym (Tm (Close (p, Two))) (Tm (Open (p', v))) = False› |
‹P2_sym (Tm (Close (p, Two))) y = True› |
‹P2_sym x y = True›
lemma P2_for_tm_if_P2_sym[dest]: ‹successively P2_sym (map Tm x) ⟹ successively P2 x›
apply(induction x rule: induct_list012)
apply simp
apply simp
using P2.elims(3) by fastforce
subsection‹‹P3››
text‹Each ‹[⇧1⇘A→BC⇙› is directly followed by ‹[⇧1⇘B→_⇙›,
and each ‹[⇧2⇘A→BC⇙› is directly followed by ‹[⇧1⇘C→_⇙›:›
fun P3 :: ‹('n,'t) bracket3 ⇒ ('n,'t) bracket3 ⇒ bool› where
‹P3 [⇧1⇘(A, [Nt B, Nt C])⇙ (p, ((X,y), t)) = (p = True ∧ t = One ∧ X = B)› |
‹P3 [⇧2⇘(A, [Nt B, Nt C])⇙ (p, ((X,y), t)) = (p = True ∧ t = One ∧ X = C)› |
‹P3 x y = True›
text‹Each ‹[⇧1⇘A→BC⇙› is directly followed ‹[⇧1⇘B→_⇙› or ‹Nt B›,
and each ‹[⇧2⇘A→BC⇙› is directly followed by ‹[⇧1⇘C→_⇙› or ‹Nt C›:›
fun P3_sym :: ‹('n, ('n,'t) bracket3) sym ⇒ ('n, ('n,'t) bracket3) sym ⇒ bool› where
‹P3_sym (Tm [⇧1⇘(A, [Nt B, Nt C])⇙) (Tm (p, ((X,y), t))) = (p = True ∧ t = One ∧ X = B)› |
‹P3_sym (Tm [⇧1⇘(A, [Nt B, Nt C])⇙) (Nt X) = (X = B)› |
‹P3_sym (Tm [⇧2⇘(A, [Nt B, Nt C])⇙) (Tm (p, ((X,y), t))) = (p = True ∧ t = One ∧ X = C)› |
‹P3_sym (Tm [⇧2⇘(A, [Nt B, Nt C])⇙) (Nt X) = (X = C)› |
‹P3_sym x y = True›
lemma P3D1[dest]:
fixes r::‹('n,'t) bracket3›
assumes ‹P3 [⇧1⇘(A, [Nt B, Nt C])⇙ r›
shows ‹∃l. r = [⇧1⇘(B, l)⇙›
using assms by(induction ‹[⇧1⇘(A, [Nt B, Nt C])⇙:: ('n,'t) bracket3› r rule: P3.induct) auto
lemma P3D2[dest]:
fixes r::‹('n,'t) bracket3›
assumes ‹P3 [⇧2⇘(A, [Nt B, Nt C])⇙ r›
shows ‹∃l. r = [⇧1⇘(C, l)⇙›
using assms by(induction ‹[⇧1⇘(A, [Nt B, Nt C])⇙:: ('n,'t) bracket3› r rule: P3.induct) auto
lemma P3_for_tm_if_P3_sym[dest]: ‹successively P3_sym (map Tm x) ⟹ successively P3 x›
proof(induction x rule: induct_list012)
case (3 x y zs)
then show ?case
by(cases ‹(Tm x :: ('a, ('a,'b) bracket3) sym, Tm y :: ('a, ('a,'b) bracket3) sym)› rule: P3_sym.cases) auto
qed simp_all
subsection‹‹P4››
text‹Each ‹[⇧1⇘A→a⇙› is directly followed by ‹]⇧1⇘A→a⇙›
and each ‹[⇧2⇘A→a⇙› is directly followed by ‹]⇧2⇘A→a⇙›:›
fun P4 :: ‹('n,'t) bracket3 ⇒ ('n,'t) bracket3 ⇒ bool› where
‹P4 (Open ((A, [Tm a]), s)) (p, ((X, y), t)) = (p = False ∧ X = A ∧ y = [Tm a] ∧ s = t)› |
‹P4 x y = True›
text‹Each ‹[⇧1⇘A→a⇙› is directly followed by ‹]⇧1⇘A→a⇙›
and each ‹[⇧2⇘A→a⇙› is directly followed by ‹]⇧2⇘A→a⇙›:›
fun P4_sym :: ‹('n, ('n,'t) bracket3) sym ⇒ ('n, ('n,'t) bracket3) sym ⇒ bool› where
‹P4_sym (Tm (Open ((A, [Tm a]), s))) (Tm (p, ((X, y), t))) = (p = False ∧ X = A ∧ y = [Tm a] ∧ s = t)› |
‹P4_sym (Tm (Open ((A, [Tm a]), s))) (Nt X) = False› |
‹P4_sym x y = True›
lemma P4D[dest]:
fixes r::‹('n,'t) bracket3›
assumes ‹P4 (Open ((A, [Tm a]), v)) r›
shows ‹r = Close ((A, [Tm a]), v)›
using assms by(induction ‹(Open ((A, [Tm a]), v))::('n,'t) bracket3› r rule: P4.induct) auto
lemma P4_for_tm_if_P4_sym[dest]: ‹successively P4_sym (map Tm x) ⟹ successively P4 x›
proof(induction x rule: induct_list012)
case (3 x y zs)
then show ?case
by(cases ‹(Tm x :: ('a, ('a,'b) bracket3) sym, Tm y :: ('a, ('a,'b) bracket3) sym)› rule: P4_sym.cases) auto
qed simp_all
subsection‹‹P5››
text‹‹P5 A x› holds, iff there exists some ‹y› such that ‹x› begins with ‹[⇧1⇘A→y⇙›:›
fun P5 :: ‹'n ⇒ ('n,'t) bracket3 list ⇒ bool› where
‹P5 A [] = False› |
‹P5 A ([⇧1⇘(X,x)⇙ # xs) = (X = A)› |
‹P5 A (x # xs) = False›
text‹‹P5_sym A x› holds, iff either there exists some ‹y› such that ‹x› begins with ‹[⇧1⇘A→y⇙›, or if it begins with ‹Nt A›:›
fun P5_sym :: ‹'n ⇒ ('n, ('n,'t) bracket3) syms ⇒ bool› where
‹P5_sym A [] = False› |
‹P5_sym A (Tm [⇧1⇘(X,x)⇙ # xs) = (X = A)› |
‹P5_sym A ((Nt X) # xs) = (X = A)› |
‹P5_sym A (x # xs) = False›
lemma P5D[dest]:
assumes ‹P5 A x›
shows ‹∃y. hd x = [⇧1⇘(A,y)⇙›
using assms by(induction A x rule: P5.induct) auto
lemma P5_symD[dest]:
assumes ‹P5_sym A x›
shows ‹(∃y. hd x = Tm [⇧1⇘(A,y)⇙) ∨ hd x = Nt A›
using assms by(induction A x rule: P5_sym.induct) auto
lemma P5_for_tm_if_P5_sym[dest]: ‹P5_sym A (map Tm x) ⟹ P5 A x›
by(induction x) auto
subsection‹‹P7› and ‹P8››
text‹‹(successively P7_sym) w› iff ‹Nt Y› is directly preceded by some ‹Tm [⇧1⇘A→YC⇙› or ‹Tm [⇧2⇘A→BY⇙› in w:›
fun P7_sym :: ‹('n, ('n,'t) bracket3) sym ⇒ ('n, ('n,'t) bracket3) sym ⇒ bool› where
‹P7_sym (Tm (b,(A, [Nt B, Nt C]), v )) (Nt Y) = (b = True ∧ ((Y = B ∧ v = One) ∨ (Y=C ∧ v = Two)) )› |
‹P7_sym x (Nt Y) = False› |
‹P7_sym x y = True›
lemma P7_symD[dest]:
fixes x:: ‹('n, ('n,'t) bracket3) sym›
assumes ‹P7_sym x (Nt Y)›
shows ‹(∃A C. x = Tm [⇧1⇘(A,[Nt Y, Nt C])⇙) ∨ (∃A B. x = Tm [⇧2⇘(A,[Nt B, Nt Y])⇙)›
using assms by(induction x ‹Nt Y::('n, ('n,'t) bracket3) sym› rule: P7_sym.induct) auto
text‹‹(successively P8_sym) w› iff ‹Nt Y› is directly followed by some ‹]⇧1⇘A→YC⇙› or ‹]⇧2⇘A→BY⇙› in w:›
fun P8_sym :: ‹('n, ('n,'t) bracket3) sym ⇒ ('n, ('n,'t) bracket3) sym ⇒ bool› where
‹P8_sym (Nt Y) (Tm (b,(A, [Nt B, Nt C]), v )) = (b = False ∧ ( (Y = B ∧ v = One) ∨ (Y=C ∧ v = Two)) )› |
‹P8_sym (Nt Y) x = False› |
‹P8_sym x y = True›
lemma P8_symD[dest]:
fixes x:: ‹('n, ('n,'t) bracket3) sym›
assumes ‹P8_sym (Nt Y) x›
shows ‹(∃A C. x = Tm ]⇧1⇘(A,[Nt Y, Nt C])⇙) ∨ (∃A B. x = Tm ]⇧2⇘(A,[Nt B, Nt Y])⇙)›
using assms by(induction ‹Nt Y::('n, ('n,'t) bracket3) sym› x rule: P8_sym.induct) auto
subsection‹‹Reg› and ‹Reg_sym››
text‹This is the regular language, where one takes the Start symbol as a parameter, and then has the searched for ‹R := R⇩A›:›
definition Reg :: ‹'n ⇒ ('n,'t) bracket3 list set› where
‹Reg A = {x. (P1 x) ∧
(successively P2 x) ∧
(successively P3 x) ∧
(successively P4 x) ∧
(P5 A x)}›
lemma RegI[intro]:
assumes ‹(P1 x)›
and ‹(successively P2 x)›
and ‹(successively P3 x)›
and ‹(successively P4 x)›
and ‹(P5 A x)›
shows ‹x ∈ Reg A›
using assms unfolding Reg_def by blast
lemma RegD[dest]:
assumes ‹x ∈ Reg A›
shows ‹(P1 x)›
and ‹(successively P2 x)›
and ‹(successively P3 x)›
and ‹(successively P4 x)›
and ‹(P5 A x)›
using assms unfolding Reg_def by blast+
text‹A version of ‹Reg› for symbols, i.e. strings that may still contain Nt's.
It has 2 more Properties ‹P7› and ‹P8› that vanish for pure terminal strings:›
definition Reg_sym :: ‹'n ⇒ ('n, ('n,'t) bracket3) syms set› where
‹Reg_sym A = {x. (P1_sym x) ∧
(successively P2_sym x) ∧
(successively P3_sym x) ∧
(successively P4_sym x) ∧
(P5_sym A x) ∧
(successively P7_sym x) ∧
(successively P8_sym x)}›
lemma Reg_symI[intro]:
assumes ‹P1_sym x›
and ‹successively P2_sym x›
and ‹successively P3_sym x›
and ‹successively P4_sym x›
and ‹P5_sym A x›
and ‹(successively P7_sym x)›
and ‹(successively P8_sym x)›
shows ‹x ∈ Reg_sym A›
using assms unfolding Reg_sym_def by blast
lemma Reg_symD[dest]:
assumes ‹x ∈ Reg_sym A›
shows ‹P1_sym x›
and ‹successively P2_sym x›
and ‹successively P3_sym x›
and ‹successively P4_sym x›
and ‹P5_sym A x›
and ‹(successively P7_sym x)›
and ‹(successively P8_sym x)›
using assms unfolding Reg_sym_def by blast+
lemma Reg_for_tm_if_Reg_sym[dest]: ‹(map Tm x) ∈ Reg_sym A ⟹ x ∈ Reg A›
by(rule RegI) auto
section‹Showing Regularity›
context locale_P
begin
abbreviation brackets::‹('n,'t) bracket3 list set› where
‹brackets ≡ {bs. ∀(_,p,_) ∈ set bs. p ∈ P}›
text‹This is needed for the construction that shows P2,P3,P4 regular.›