FSA-to-RegExp
Языки
Java
FSA-to-RegExp
Was created as homework for Theoretical Computer Science course at Innopolis University.
Description
Given an FSA description in the
, your program should output the
containing an error description or a regular expression that corresponds to the given FSA. The regular expression should be built according to a slightly modified version of the Kleene’s algorithm.
Input file format
states=[s1,s2,...] // s1 , s2, ... ∈ latin letters, words and numbers
alpha=[a1,a2, ...] // a1 , a2, ... ∈ latin letters, words, numbers and character '_’(underscore)
init.st=[s] // s ∈ states
fin.st=[s1,s2,...] // s1, s2 ∈ states
trans=[s1>a>s2,... ] // s1,s2,...∈ states; a ∈ alpha
Validation result
Errors:
- E0: Input file is malformed
- E1: A state 's' is not in the set of states
- E2: Some states are disjoint
- E3: A transition 'a' is not represented in the alphabet
- E4: Initial state is not defined
- E5: FSA is nondeterministic
Kleene's Algorithm
It transforms a given deterministic finite state automaton (FSA) into a regular expression.
Given an FSA M = (Q, A, δ, q0, F), with Q = {q0, . . . , qn} its set of states, the algorithm computes:
- the sets Rijk of all strings that take M from state qi to qj without going through any state numbered higher than k
- each set Rijk is represented by a regular expression
- the algorithm computes them step by step for k = −1, 0, ... , n
- since there is no state numbered higher than n, the regular expression R0jn represents the set of all strings that take M from its start state q0 to qj
- If F = {q1, ... , qf} is the set of accept states, the regular expression R01n| ... |R0fn represents the language accepted by M
The initial regular expression, for k = -1, are computed as:
- Rij-1 = a1 | ... | am if i ≠ j, where δ(qi, a1) = ... = δ(qi, am) = qj
- Rij-1 = a1 | ... | am | Ɛ if i = j, where δ(qi, a1) = ... = δ(qi, am) = qj
After that, in each step the expressions Rijk are computed from the previous ones by:
- Rijk = Rikk-1(Rkkk-1)*Rkjk-1|Rijk-1
The Kleene’s Algorithm should be used as presented above, but with following modifications:
- Denote ∅ as {}
- Denote Ɛ as eps
- Define update rule with the additional parentheses: Rijk = (Rikk-1)(Rkkk-1)*(Rkjk-1)|(Rijk-1)