diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..32cd9c9 --- /dev/null +++ b/.gitignore @@ -0,0 +1,3 @@ +target/* + +.idea/* diff --git a/.idea/vcs.xml b/.idea/vcs.xml new file mode 100644 index 0000000..94a25f7 --- /dev/null +++ b/.idea/vcs.xml @@ -0,0 +1,6 @@ + + + + + + \ No newline at end of file diff --git a/Lex.iml b/Lex.iml new file mode 100644 index 0000000..72e51be --- /dev/null +++ b/Lex.iml @@ -0,0 +1,20 @@ + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/README.assets/dfa.jpg b/README.assets/dfa.jpg new file mode 100644 index 0000000..9270668 Binary files /dev/null and b/README.assets/dfa.jpg differ diff --git a/README.assets/dtran.jpg b/README.assets/dtran.jpg new file mode 100644 index 0000000..3c21ce2 Binary files /dev/null and b/README.assets/dtran.jpg differ diff --git a/README.assets/fa-edge.jpg b/README.assets/fa-edge.jpg new file mode 100644 index 0000000..be98310 Binary files /dev/null and b/README.assets/fa-edge.jpg differ diff --git a/README.assets/fa-node.jpg b/README.assets/fa-node.jpg new file mode 100644 index 0000000..611419a Binary files /dev/null and b/README.assets/fa-node.jpg differ diff --git a/README.assets/fa-state.jpg b/README.assets/fa-state.jpg new file mode 100644 index 0000000..953c699 Binary files /dev/null and b/README.assets/fa-state.jpg differ diff --git a/README.assets/fa.jpg b/README.assets/fa.jpg new file mode 100644 index 0000000..74d4b19 Binary files /dev/null and b/README.assets/fa.jpg differ diff --git a/README.assets/nfa.jpg b/README.assets/nfa.jpg new file mode 100644 index 0000000..ddb2f09 Binary files /dev/null and b/README.assets/nfa.jpg differ diff --git a/README.assets/token.jpg b/README.assets/token.jpg new file mode 100644 index 0000000..c0c668a Binary files /dev/null and b/README.assets/token.jpg differ diff --git a/README.md b/README.md index f0058af..94198db 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,110 @@ -# Rela +# Lex 词法分析器 + +## Motivation / Aim +深入理解编译原理中词法分析的过程,通过实现 RE => NFA => DFA => minimized DFA 的算法过程,生成 Lex 中由 .l 文件生成的 DFA 转换表,再对输入的源程序/文本进行词法分析,最终输出相应的词法单元序列。自动化构建 Lex,深入理解各步转换的核心算法。 + +## Content description +在整个程序中,我是先从控制台获取用户输入,然后读取资源目录下定义正则定义的 .l 文件,依据 .l 文件中正则定义的顺序为优先级,生成对应的最小 DFA 序列。然后对输入的词素序列依次匹配。 + +## Ideas / Methods +通过定义自己的 .l lex 文件,生成一个可以用于判断词素是否合法的 DFA,基于 Lex 编程。 + +## Assumptions +1. 输入的需要分析的源程序/文本中__不含有空格__(引号中也不能出现空格)。 +2. 输入文本中支持的正则操作符有且只有 ___· | * + ? () {} [] , -___,如果需要匹配如上所述任一符号,需要在正则定义中对其进行转义。 + +基本符号 __·__ (连接符) 可以输入,也可以不输入+表示一次或多次,__?__ 表示0次或1次。 + +__{}__ 支持{n}、{m, n}、{m,}、{,n}四种形式,分别表示出现n次、出现m-n次、出现至少m次,出现至多n次。 + +__[]__ 支持[abc]简写或运算,也支持[a-zA-Z0-9]逻辑顺序连字符形式。 + +转义使用 __\\__ 进行转义。比如需要匹配左大括号 { ,即需要在 RE 中写入 \\{ + +3. 输入结束后需要在空白符后输入 __###__ 结束输入,从而让程序运行。 +4. 输出的词法单元序列即会出现在控制台,也会出现在代码项目父目录下(与项目 Lex 同级)。 + + +## Related FA description +通过自定义资源目录下 regular_expression.l 中的正则定义,即可获取到对应的 DFA,故没有一定的确切的 FA 描述。 + + +## Description of important Data Structures +(Java 定义文件位于 /finiteAutomata/entity 和 /lex/entity 下) + +1. FA:有限自动机。NFA 和 DFA 的抽象父类,包含字母表、开始状态、结束状态、所有状态,和一个判断输入词素是否合法的抽象方法。 + + ![](README.assets/fa.jpg) + +2. NFA:不确定的有限自动机。 + ![](README.assets/nfa.jpg) + +3. DFA:确定的有限自动机。 + ![](README.assets/dfa.jpg) + +4. FA_State:FA 中的一个节点。包含编号和对后续状态的链接集合。 + ![](README.assets/fa-state.jpg) + +5. FA_Edge:FA 中的一条边/链接。包含边上的标记、这条链接的后继状态。 + ![](README.assets/fa-edge.jpg) + +6. FA_Node:最小化 DFA 的时候,用于划分等价的 DFA 等价集合。 + ![](README.assets/fa-node.jpg) + +7. D_Tran:子集构造法中,包含对处理的 NFA 进行转换关系的映射。包含等价状态的开始态、等价状态的结束态、等价状态的标记。 + ![](README.assets/dtran.jpg) + +8. Token:词法分析后得到的的词法单元。包含此法单元的名称和属性值。 + ![](README.assets/token.jpg) + +## Description of core Algorithms + +(Java 定义文件位于 /finiteAutomata 下) + +### 1. RE => standardized RE postfix +首先将扩展符号 +、?、{}、[] 用基本符号 · | * 代替,如 a+ 可以替换为 aa*,a? 可以替换为 (ε|a),并为正则定义补上缺省的连接符;然后将只含有并、或、闭包,括号的中缀表达式转换为后缀表达式。在这一步骤中,如果处理的字符是一个转义字符,则将其当作一个普通字符处理,具体实现中则是使用一个布尔变量 curCharIsTransferred 在遍历输入时标记前一字符是否为转移符号 __\\__ (即当前字符是否为需要被转移的操作符号)。 + +### 2. RE => NFA +输入标准化的正则定义后缀表达式,对其遍历,一个字符一个字符处理即可。若不为操作符,即操作数或转移的操作符,即新增这样一个只有两个状态,链接边为此字符的 NFA,若为操作符号则使用栈的形式暂时保存构建过程中产生的各 NFA,如遇见连接符(·)就取栈顶两个 NFA 连接,遇见或符(|)就取栈顶两个 NFA 做或操作,遇见闭包符(*)就取栈顶 NFA 闭包。 + +### 3. NFA => DFA +对处理过的正则定义后缀表达式通过 *子集构造法* 构建等价的 DFA。 + +### 4. DFA => minimized DFA +因为 DFA 中等价的状态节点意义相同,可以合并,所以可以通过算法构建状态数目最少的 DFA。具体思想是:先将其划分为终止态集合和非终止态集合,再分别计算每一个节点状态在字母表上每一个标记的后继状态是否等价。而在实现中,用布尔变量 isWeakEqual 标记整个算法是否产生了新的等价状态,保证算法退出时有实现 look back 回头看,即后续的状态集合分化可能会引起之前已经分化的状态集合再次分化。 + +## Use cases on running +在满足 Assumptions 的前提下,可以任意修改资源文件目录下 regular_expression.l Lex文件,并在控制台输入与之对应的源程序/文本,与 Lex 文件匹配。 + +比如对实践中现有的 regular_expression.l 可输入: +``` +, ; >= != = +2 5 14 47 +2345676543 +a +t E surprise +"Wordmakesman" curTemperature10 +{aaaaaaabbbbbbbbcc} [0-9]aaa +### +``` + +另外,在 /test 目录下,也存在很多测试用例可供测试。 + +## Problems occurred and related solutions + +1.  最开始我是将所有正则定义依次构建为 NFA,再将各 NFA 连接为一整个大的 NFA,再将此 NFA 转换为具有最少状态的 DFA。这样的话,虽然连接为一整个大的 NFA 时只合并了起始态,没有合并终止态,但是因为先合并再形成 DFA,所以在 NFA => DFA 的子集构造法中取 ε 闭包时,也变相合并了终止态。所以最终形成的 DFA 中一个终止态会对应多个正则定义的模式,那至于他具体对应的是哪一个模式 pattern 就需要进一步处理。最开始采用将这多个可能的正则定义再生成一次最小 DFA,即使使用了 map 映射尽可能减少生成的次数,但是至少也会多算一次。所以在想通出现一个终止态对应多个正则定义/模式后,决定不再将所有正则定义生成的所有 NFA 连接为一个 NFA,而是分别生成 DFA,再依据输入的正则定义为优先级,串形匹配各 DFA,如果合法即可以返回。 + +2.  DFA 的最小化优化算法虽然很好理解,但是在实践过程中却一直不能合适地写出来,最后通过拆分、减小复杂度的办法,逐步实现了优化算法。 + + +## Your feelings and comments +1. 有时候虽然理解了算法能够手动计算,但是实际用程序语言去实现仍然是一件有难度的事情。 +2. 灵活使用各数据结构,能够在一定程度上避免后续的编程错误。 + +## Highlights +1. 完整、合适的注释,详略得当,便于再次理解与修改,对算法中也别需要注意的点着重注释。 +2. 对输入进行词法分析的源程序/文本进行异常处理。如输入 (*a) 形式时会抛出异常 UnexpectedRegularExprRuleException,对输入的形式无法匹配 .l 文件中的正则定义时抛出异常 NotMatchingException。 +3. 测试驱动编程。先根据情况写好测试用例,保证够狠地去进行测试。并通过测试各公开接口和使用反射测试私有方法,尽可能全面地覆盖各种情况。 +4. 实现了正则定义中的转义符号,从而匹配实践中的操作符。 +5. Maven 项目开发,使用 __log4j__ 进行日志记录输出,而不是简单的 System.out.print()。 -深入理解编译原理中词法分析的过程,通过实现 RE => NFA => DFA => minimized DFA 的算法过程,生成 Lex 中由 .l 文件生成的 DFA 转换表,再对输入的源程序/文本进行词法分析,最终输出相应的词法单元序列。自动化构建 Lex,深入理解各步转换的核心算法。 \ No newline at end of file diff --git a/pom.xml b/pom.xml new file mode 100644 index 0000000..71d0642 --- /dev/null +++ b/pom.xml @@ -0,0 +1,33 @@ + + + 4.0.0 + + cn.edu.nju.charlesfeng + charlesfeng + 1.0-SNAPSHOT + + + + junit + junit + 4.12 + + + + log4j + log4j + 1.2.17 + + + + + org.apache.commons + commons-lang3 + 3.6 + + + + + \ No newline at end of file diff --git a/src/main/java/exceptions/NotMatchingException.java b/src/main/java/exceptions/NotMatchingException.java new file mode 100644 index 0000000..379bfb3 --- /dev/null +++ b/src/main/java/exceptions/NotMatchingException.java @@ -0,0 +1,23 @@ +package exceptions; + +/** + * Created by cuihua on 2017/11/2. + *

+ * 用户要分析的字符串和 .l 正则定义不匹配 + */ +public class NotMatchingException extends Exception { + + /** + * 不合法的词素 + */ + private String lexeme; + + public NotMatchingException(String lexeme) { + this.lexeme = lexeme; + } + + @Override + public String getMessage() { + return "转换生成的 DFA 中对词素 " + lexeme + " 无匹配状态"; + } +} diff --git a/src/main/java/exceptions/UnexpectedRegularExprRuleException.java b/src/main/java/exceptions/UnexpectedRegularExprRuleException.java new file mode 100644 index 0000000..cb239d7 --- /dev/null +++ b/src/main/java/exceptions/UnexpectedRegularExprRuleException.java @@ -0,0 +1,24 @@ +package exceptions; + +/** + * Created by cuihua on 2017/10/25. + *

+ * 处理RE的时候,不期望输入的格式 + * 如:(*, (|, |), |*, ||, ·), ·*, ·|, (·, |·, ·· + */ +public class UnexpectedRegularExprRuleException extends Exception { + + /** + * 不合理的正则定义 + */ + private String re; + + public UnexpectedRegularExprRuleException(String re) { + this.re = re; + } + + @Override + public String getMessage() { + return "输入中 " + re + " 不符合规格"; + } +} diff --git a/src/main/java/finiteAutomata/DFA_Handler.java b/src/main/java/finiteAutomata/DFA_Handler.java new file mode 100644 index 0000000..60f1272 --- /dev/null +++ b/src/main/java/finiteAutomata/DFA_Handler.java @@ -0,0 +1,474 @@ +package finiteAutomata; + +import finiteAutomata.entity.*; +import org.apache.log4j.Logger; +import utilties.*; + +import java.util.*; + +/** + * Created by cuihua on 2017/10/27. + *

+ * 对 NFA 进行处理 + * NFA => DFA + * optimize DFA + */ +public class DFA_Handler { + + private static Logger logger = Logger.getLogger(DFA_Handler.class); + + private static FA_StateComparator comparator = new FA_StateComparator(); + + public DFA_Handler() { + } + + /** + * @param nfa 需要转变的NFA + * @return 与输入NFA一致的DFA + */ + public DFA getFromNFA(final NFA nfa) { + List dTrans = new LinkedList<>(); + + // dStates为<闭包, 已标记>,LinkedHashMap保证为顺序而不是 hash 过的 + Map, Boolean> dStates = new LinkedHashMap<>(); + dStates.put(closure(nfa.getStart()), false); + + // 清理当前节点计算 closure 时的递归现场 + ClosureRecursionHandler.reset(); + + while (true) { + // dStates中是否还有未标记的状态,并对未标记的状态进行处理 + boolean hasStopped = true; + List unhandled = null; + for (Map.Entry, Boolean> entry : dStates.entrySet()) { + if (!entry.getValue()) { + hasStopped = false; + entry.setValue(true); + unhandled = entry.getKey(); + break; + } + } + + // 循环的终止条件 + if (hasStopped) break; + + // 处理此时的标记 + for (char c : nfa.getAlphabet()) { + List curFollowing = move(unhandled, c); + int curFollowingSize = curFollowing.size(); + + if (curFollowingSize != 0) { + // 否则此等价状态在此字符上无后继状态,标记为空 + + // 保存当前要计算闭包的核 + List curFollowingClosure = new FA_StatesList(); + curFollowingClosure.addAll(curFollowing); + + + // 遍历后继的核,得到核的闭包 + for (FA_State tempState : curFollowing) { + List tempClosure = closure(tempState); + + // 清理当前节点计算 closure 时的递归现场 + ClosureRecursionHandler.reset(); + + // 在 curFollowingClosure 中加入所有 tempClosure 没有的元素 + curFollowingClosure.removeAll(tempClosure); + curFollowingClosure.addAll(tempClosure); + } + + // 排序后对比,判断此集合是都在dStates中 + curFollowingClosure.sort(comparator); + if (!isInDSates(dStates, curFollowingClosure)) { + dStates.put(curFollowingClosure, false); + } + + // 标记dTrans转换表 + dTrans.add(new DTran(unhandled, curFollowingClosure, c)); + } + } + } + + // 打印 DFA 的状态对应表 + logger.info("NFA => DFA 子集构造法结束"); + for (DTran dTran : dTrans) { + dTran.show(); + } + + return getEquivalentDFA(nfa, dStates, dTrans); + } + + /** + * 计算当前节点的ε闭包 ε-closure + */ + private List closure(FA_State nowState) { + List result = new FA_StatesList(); + result.add(nowState); + ClosureRecursionHandler.addState(nowState); + + // 遍历当前节点的每一个后续节点 + for (FA_Edge tempEdge : nowState.getFollows()) { + if (tempEdge.getLabel() == 'ε') { + // 若递归 closure 结果集中不包含此节点,则将此节点加入结果集 + FA_State nextState = tempEdge.getPointTo(); + if (!ClosureRecursionHandler.contain(nextState)) { + List temp = closure(nextState); + result.addAll(temp); + } + } + } + + result.sort(comparator); + return result; + } + + /** + * 将此状态以label后移 + */ + private List move(List cur, char label) { + List result = new FA_StatesList(); + + for (FA_State tempState : cur) { + for (FA_Edge tempEdge : tempState.getFollows()) { + if (tempEdge.getLabel() == label) { + result.add(tempEdge.getPointTo()); + } + } + } + + result.sort(comparator); + return result; + } + + /** + * 判断states是否已经在DSates中了 + */ + private boolean isInDSates(Map, Boolean> DStates, List states) { + for (Map.Entry, Boolean> entry : DStates.entrySet()) { + List keyStates = entry.getKey(); + if (keyStates.size() == states.size()) { + boolean allEqual = true; + for (int i = 0; i < states.size(); i++) { + if (states.get(i).getStateID() != keyStates.get(i).getStateID()) allEqual = false; + } + + // 找到已经存在的状态 + if (allEqual) return true; + } + } + return false; + } + + /** + * 判断toTest是否与pre有交集 + * 有交集,现等价状态即为现DFA的终止态 + */ + private boolean isTerminatedState(final List pre, final List toTest) { + // 取交集无并集 + // 深度拷贝复制 toTest,保证 retainAll 之后 toTest 不会被修改 + List newList = new FA_StatesList(); + newList.addAll(toTest); + newList.retainAll(pre); + + return newList.size() != 0; + } + + /** + * @param nfa 原 NFA + * @return 通过子集构造法构建的等价的简单 DFA + */ + private DFA getEquivalentDFA(NFA nfa, Map, Boolean> dStates, List dTrans) { + // 子集构造法结束,根据dStates、dTrans构造相对应的DFA(dStates从后往前即为现等价状态的产生顺序) + // pre代表原NFA,cur代表对应的DFA + List preTerminatedStates = nfa.getTerminatedStates(); + + List curStates = new FA_StatesList(); + List curTerminatedStates = new FA_StatesList(); + + // 标记子集构造法中形成的等价节点和现在简化的节点之间的映射 + Map, FA_State> faStatesConvertTable = new LinkedHashMap<>(); + + // dStates 顺序压入,重新更换为简单 FA_State 也是顺序 + int curIndex = 0; + for (List nowConvertedNFAStates : dStates.keySet()) { + FA_State equivalentState = new FA_State(curIndex); + curStates.add(equivalentState); + curIndex++; + + faStatesConvertTable.put(nowConvertedNFAStates, equivalentState); + + // 含有原 NFA 终止态的即为现终止态 + if (isTerminatedState(preTerminatedStates, nowConvertedNFAStates)) { + curTerminatedStates.add(equivalentState); + } + } + + // 把 dTrans 上的连接加入现在 DFA,并存入 DFA 成员变量 move + Map> move = new LinkedHashMap<>(); + for (DTran dTran : dTrans) { + FA_State curStart = faStatesConvertTable.get(dTran.getFrom()); + FA_State curTo = faStatesConvertTable.get(dTran.getTo()); + char label = dTran.getLabel(); + + FA_Edge curEdge = new FA_Edge(label, curTo); + curStart.getFollows().add(curEdge); + + Map curMove = move.get(curStart); + if (curMove != null) { + curMove.put(label, curTo); + } else { + curMove = new HashMap<>(); + curMove.put(label, curTo); + move.put(curStart, curMove); + } + } + + // 打印真正 DFA 的状态对应表 + logger.info("NFA 经过子集构造法完成后真正的状态转换表"); + showDFATrans(move); + + curStates.sort(comparator); + curTerminatedStates.sort(comparator); + + DFA dfa = new DFA(); + dfa.setStart(curStates.get(0)); + dfa.setAlphabet(nfa.getAlphabet()); + dfa.setStates(curStates); + dfa.setTerminatedStates(curTerminatedStates); + dfa.setMove(move); + + // 将原 NFA 对应的模式 pattern 加入现在的 DFA 映射 + DFA_StatePatternMappingController.add(dfa, NFA_StatePatternMappingController.getMap().get(nfa)); + return dfa; + } + + + /** + * @param dfa 需要被优化的DFA + * @return 具有最少状态的DFA + */ + public DFA optimize(DFA dfa) { + List nonTerminatedStates = new FA_StatesList(); + List terminatedStates = dfa.getTerminatedStates(); + List alphabet = dfa.getAlphabet(); + + // 构造初始两个集合 终结状态/非终结状态 + nonTerminatedStates.addAll(dfa.getStates()); + nonTerminatedStates.removeAll(terminatedStates); + + nonTerminatedStates.sort(comparator); + terminatedStates.sort(comparator); + + // 第一次分的这两个集合手动排序,让程序先处理非终结状态 + List nodes = new FA_NodesList(); + + if (nonTerminatedStates.size() > 0) { + // 只有终结态 + FA_Node node1 = new FA_Node(nonTerminatedStates); + nodes.add(node1); + } + FA_Node node2 = new FA_Node(terminatedStates); + nodes.add(node2); + + // while 循环保证算法的 traceBacking 回头看 + while (true) { + // 所有叶节点内部的 FA_State 都是等价的 + boolean isWeakEqual = true; + for (int i = 0; i < nodes.size(); i++) { + + // 节点中只有一个状态,已是最少,无需再进行分化此节点 + if (nodes.get(i).getStates().size() == 1) { + continue; + } + + // 子集分化 + for (int j = 0; j < alphabet.size(); ) { + + char c = alphabet.get(j); + List tempResult = optimizeOneNodeOneChar(dfa, nodes, nodes.get(i), c); + + nodes.remove(i); + nodes.addAll(i, tempResult); + + if (tempResult.size() > 1) { + // 发生了子集替换,重新遍历每个 label + isWeakEqual = false; + j = 0; + } else { + j++; + } + } + + } + + // 全都弱等价,结束算法 + if (isWeakEqual) break; + } + + // 重构 DFA + return reconstruction(dfa, nodes); + } + + /** + * 在特定字母下子集分化一个 FA_Node 节点 + * + * @param dfa 当前 DFA + * @param curDivision 目前的分化 + * @param node 要优化的叶节点 + * @param c 分化基于的条件 + */ + private List optimizeOneNodeOneChar(final DFA dfa, List curDivision, FA_Node node, char c) { + List result = new FA_NodesList(); + + if (node.getStates().size() == 1) { + // 节点中只有一个状态,已是最少,无需再进行分化此节点 + result.add(node); + return result; + } + + // 在此 label 下分别无后继分化、有后继分化 + List parentToNull = new FA_StatesList(); + Map parentToSon = new HashMap<>(); + + for (FA_State parentState : node.getStates()) { + // 该节点在该映射条件下的后继 + Map curEdges = dfa.getMove().get(parentState); + if (curEdges != null) { + FA_State sonState = curEdges.get(c); + if (sonState != null) { + // 有后继边且后继边中有 label 为 c 的边 + parentToSon.put(parentState, sonState); + } else { + parentToNull.add(parentState); + } + } else { + parentToNull.add(parentState); + } + } + + if (parentToNull.size() != 0) { + parentToNull.sort(comparator); + result.add(new FA_Node(parentToNull)); + } + + if (parentToSon.size() != 0) { + // 判断 following 是不是在同一叶节点中(FA_Node 为此次判断中原来的Node,List 为此 Node 下的父节点) + Map> judge = new HashMap<>(); + for (Map.Entry entry : parentToSon.entrySet()) { + FA_State sonState = entry.getValue(); + FA_Node belongingNode = getBelongingNode(curDivision, sonState); + if (judge.get(belongingNode) == null) { + List temp = new FA_StatesList(); + temp.add(entry.getKey()); + judge.put(belongingNode, temp); + } else { + judge.get(belongingNode).add(entry.getKey()); + } + } + + if (judge.size() > 1) { + // 形成了不同的分化 + for (List states : judge.values()) { + states.sort(comparator); + result.add(new FA_Node(states)); + } + } else { + // parentToSon 不形成新分化 + List states = new FA_StatesList(parentToSon.keySet()); + states.sort(comparator); + result.add(new FA_Node(states)); + } + } + + return result; + } + + /** + * 找到当前状态所在的节点 + */ + private FA_Node getBelongingNode(List curDivision, FA_State state) { + for (FA_Node node : curDivision) { + if (node.getStates().contains(state)) return node; + } + return null; + } + + /** + * 根据子集分化的算法结果,对 DFA 重的等价状态进行合并 + */ + private DFA reconstruction(DFA dfa, List nodes) { + // <被删除的状态节点, 用于替换的状态节点> + Map deleteTran = new HashMap<>(); + for (FA_Node node : nodes) { + // 只需要第一个状态作为代表,从后面向前记录要删除的节点 + List division = node.getStates(); + for (int i = 1; i < division.size(); i++) { + deleteTran.put(division.get(i), division.get(0)); + } + } + + // 移除这些状态 + List needDeleteStates = new FA_StatesList(deleteTran.keySet()); + needDeleteStates.sort(comparator); + + // 转移链接关系 + // 需移除节点 指向 其他节点 + for (FA_State state : needDeleteStates) { + dfa.getMove().remove(state); + } + + // 其他节点 指向 需移除节点 + Map> newMove = new LinkedHashMap<>(); + for (Map.Entry> curMove : dfa.getMove().entrySet()) { + FA_State curStart = curMove.getKey(); + + if (newMove.get(curStart) == null) { + Map edges = new LinkedHashMap<>(); + newMove.put(curStart, edges); + } + + // 转换表 + for (Map.Entry curEdge : curMove.getValue().entrySet()) { + char label = curEdge.getKey(); + FA_State deleteState = curEdge.getValue(); + if (needDeleteStates.contains(curEdge.getValue())) { + newMove.get(curStart).put(label, deleteTran.get(deleteState)); + } else { + newMove.get(curStart).put(label, deleteState); + } + } + + // 状态链接 + for (FA_Edge curEdge : curStart.getFollows()) { + if (needDeleteStates.contains(curEdge.getPointTo())) { + curEdge.setPointTo(deleteTran.get(curEdge.getPointTo())); + } + } + } + + dfa.getStates().removeAll(needDeleteStates); + dfa.getTerminatedStates().removeAll(needDeleteStates); + dfa.setMove(newMove); + + // 如果删除了初始节点 + if (needDeleteStates.contains(dfa.getStart())) { + dfa.setStart(deleteTran.get(dfa.getStart())); + } + + logger.info("DFA 已经优化为最少数目"); + showDFATrans(dfa.getMove()); + return dfa; + } + + /** + * 输出 NFA 的转换信息到控制台 + */ + private void showDFATrans(Map> move) { + for (Map.Entry> entryState : move.entrySet()) { + FA_State start = entryState.getKey(); + for (Map.Entry entryEdge : entryState.getValue().entrySet()) { + logger.info(start.getStateID() + " through " + entryEdge.getKey() + " to " + entryEdge.getValue().getStateID()); + } + } + } +} diff --git a/src/main/java/finiteAutomata/FA_Controller.java b/src/main/java/finiteAutomata/FA_Controller.java new file mode 100644 index 0000000..bf4cad3 --- /dev/null +++ b/src/main/java/finiteAutomata/FA_Controller.java @@ -0,0 +1,50 @@ +package finiteAutomata; + +import exceptions.UnexpectedRegularExprRuleException; +import finiteAutomata.entity.DFA; +import finiteAutomata.entity.NFA; +import org.apache.log4j.Logger; + +import java.util.LinkedList; +import java.util.List; + +/** + * Created by cuihua on 2017/10/27. + *

+ * 控制将输入的所有 RE 转换为拥有最少数目状态的 DFA + */ +public class FA_Controller { + + private static final Logger logger = Logger.getLogger(FA_Controller.class); + + public List lexicalAnalysis(List res, List patternType) { + RegularExpressionHandler rgHandler = new RegularExpressionHandler(); + NFA_Handler nfaHandler = new NFA_Handler(); + DFA_Handler dfaHandler = new DFA_Handler(); + + // 对每个正则定义依次生成最小 DFA + List result = new LinkedList<>(); + for (int i = 0; i < res.size(); i++) { + // 处理当前 RE + String re = res.get(i); + try { + re = rgHandler.convertInfixToPostfix(rgHandler.standardizeRE(re)); + logger.debug("正在处理正则定义 " + re); + } catch (UnexpectedRegularExprRuleException e) { + e.printStackTrace(); + } + + // RE => NFA + NFA nfa = nfaHandler.getFromRE(re, patternType.get(i)); + logger.debug("将正则定义 " + re + " 成功转化为 NFA"); + + // 转化为最小DFA + DFA dfa = dfaHandler.optimize(dfaHandler.getFromNFA(nfa)); + logger.debug("正则定义 " + re + " 的状态数量: " + dfa.getStates().size()); + + result.add(dfa); + } + + return result; + } +} diff --git a/src/main/java/finiteAutomata/NFA_Handler.java b/src/main/java/finiteAutomata/NFA_Handler.java new file mode 100644 index 0000000..fafbf6e --- /dev/null +++ b/src/main/java/finiteAutomata/NFA_Handler.java @@ -0,0 +1,305 @@ +package finiteAutomata; + +import finiteAutomata.entity.FA_Edge; +import finiteAutomata.entity.FA_State; +import finiteAutomata.entity.NFA; +import utilties.FA_StateComparator; +import utilties.FA_StateIDController; +import utilties.FA_StatesList; +import utilties.NFA_StatePatternMappingController; + +import java.util.LinkedList; +import java.util.List; +import java.util.Stack; + +/** + * Created by cuihua on 2017/10/27. + *

+ * 对 NFA 进行处理 + * RE => NFA + * combine 多个 NFA + */ +public class NFA_Handler { + + private static FA_StateComparator comparator = new FA_StateComparator(); + + /** + * @param re 标准化的正则定义后缀表达式 + * @param patternType 此正则定义对应的模式 patternType + * @return 此正则定义对应的NFA + */ + public NFA getFromRE(String re, String patternType) { + // 栈中暂时保存处理过的NFA + Stack handling = new Stack<>(); + + for (int i = 0; i < re.length(); i++) { + char c = re.charAt(i); + + if (c == '\\') { + // 转义字符不作为连接符,直接连接 + handling = add(handling, re.charAt(i+1)); + i++; + } else { + switch (c) { + case '·': + handling = join(handling); + break; + case '|': + handling = or(handling); + break; + case '*': + handling = closure(handling); + break; + default: + handling = add(handling, c); + break; + } + } + } + + // 映射 NFA 与其对应的模式 pattern + NFA result = handling.get(0); + NFA_StatePatternMappingController.add(result, patternType); + + // 最终栈中剩下的唯一NFA即为所求 + return result; + } + + /** + * 将字符c转换为一个NFA + */ + private Stack add(Stack handling, char c) { + int nowID = FA_StateIDController.getID(); + + FA_State start = new FA_State(nowID); + FA_State end = new FA_State(++nowID); + FA_StateIDController.setID(++nowID); + + FA_Edge edge = new FA_Edge(c, end); + List follows = new LinkedList<>(); + follows.add(edge); + start.setFollows(follows); + + + // 构造NFA,边的标记为 ε 时,不计入字母表 + List alphabet = new LinkedList<>(); + if (c != 'ε') alphabet.add(c); + + List terminatedStates = new FA_StatesList(); + terminatedStates.add(end); + + List states = new FA_StatesList(); + states.add(start); + states.add(end); + + NFA newNFA = new NFA(); + newNFA.setStart(start); + newNFA.setAlphabet(alphabet); + newNFA.setTerminatedStates(terminatedStates); + newNFA.setStates(states); + + // 得到结果压栈 + handling.push(newNFA); + return handling; + } + + /** + * 根据连接符取栈顶两个NFA连接 + */ + private Stack join(Stack handling) { + NFA after = handling.pop(); + NFA before = handling.pop(); + + // 将after加到before后面 + FA_State joinStart = before.getTerminatedStates().get(0); + FA_State joinEnd = after.getStart(); + + FA_Edge joinEdge = new FA_Edge('ε', joinEnd); + List follows = new LinkedList<>(); + follows.add(joinEdge); + joinStart.setFollows(follows); + + // 连接之后字母表取无重复并集,所有状态相加,终止态变为after,起始态不变 + List beforeAlphabet = before.getAlphabet(); + List afterAlphabet = after.getAlphabet(); + beforeAlphabet.removeAll(afterAlphabet); + beforeAlphabet.addAll(afterAlphabet); + + List beforeStates = before.getStates(); + beforeStates.addAll(after.getStates()); + before.setTerminatedStates(after.getTerminatedStates()); + + handling.push(before); + return handling; + } + + /** + * 根据或符取栈顶两个NFA做或操作 + */ + private Stack or(Stack handling) { + NFA nfa1 = handling.pop(); + NFA nfa2 = handling.pop(); + + // 新增两个连接态 + int nowID = FA_StateIDController.getID(); + FA_State newStart = new FA_State(nowID); + FA_State newEnd = new FA_State(++nowID); + FA_StateIDController.setID(++nowID); + + // 将 nfa1 和 nfa2 并联 + FA_State preStart1 = nfa1.getStart(); + FA_State preStart2 = nfa2.getStart(); + FA_State preEnd1 = nfa1.getTerminatedStates().get(0); + FA_State preEnd2 = nfa2.getTerminatedStates().get(0); + + FA_Edge orEdge1 = new FA_Edge('ε', preStart1); + FA_Edge orEdge2 = new FA_Edge('ε', preStart2); + FA_Edge orEdge3 = new FA_Edge('ε', newEnd); + FA_Edge orEdge4 = new FA_Edge('ε', newEnd); + + // 完善开始态 + List startFollows = new LinkedList<>(); + startFollows.add(orEdge1); + startFollows.add(orEdge2); + newStart.setFollows(startFollows); + + // 修改原终止态 + List preEndFollows1 = new LinkedList<>(); + preEndFollows1.add(orEdge3); + preEnd1.setFollows(preEndFollows1); + List preEndFollows2 = new LinkedList<>(); + preEndFollows2.add(orEdge4); + preEnd2.setFollows(preEndFollows2); + + // 重新构造NFA,字母集为无重复并集,所有状态相加,开始态和终止态为新态 + List alphabet = new LinkedList<>(); + alphabet.addAll(nfa1.getAlphabet()); + alphabet.removeAll(nfa2.getAlphabet()); + alphabet.addAll(nfa2.getAlphabet()); + + List terminatedStates = new FA_StatesList(); + terminatedStates.add(newEnd); + + List states = new FA_StatesList(); + states.addAll(nfa1.getStates()); + states.addAll(nfa2.getStates()); + states.add(newStart); + states.add(newEnd); + + NFA newNFA = new NFA(); + newNFA.setStart(newStart); + newNFA.setAlphabet(alphabet); + newNFA.setTerminatedStates(terminatedStates); + newNFA.setStates(states); + + handling.push(newNFA); + return handling; + } + + /** + * 取栈顶NFA做闭包操作 + */ + private Stack closure(Stack handling) { + NFA nfa = handling.pop(); + + // 新增两个连接态 + int nowID = FA_StateIDController.getID(); + FA_State newStart = new FA_State(nowID); + FA_State newEnd = new FA_State(++nowID); + FA_StateIDController.setID(++nowID); + + // 新增连接边 + FA_State preStart = nfa.getStart(); + FA_State preEnd = nfa.getTerminatedStates().get(0); + + FA_Edge newEdge1 = new FA_Edge('ε', preStart); + FA_Edge newEdge2 = new FA_Edge('ε', newEnd); + FA_Edge newEdge3 = new FA_Edge('ε', newEnd); + FA_Edge newEdge4 = new FA_Edge('ε', preStart); + + List startFollows = new LinkedList<>(); + startFollows.add(newEdge1); + startFollows.add(newEdge3); + newStart.setFollows(startFollows); + + // 修改原终止态 + List preEndFollows = new LinkedList<>(); + preEndFollows.add(newEdge2); + preEndFollows.add(newEdge4); + preEnd.setFollows(preEndFollows); + + // 对闭包NFA,字母表不变,修改开始态、终止态、所有状态 + nfa.setStart(newStart); + + List terminatedStates = new FA_StatesList(); + terminatedStates.add(newEnd); + nfa.setTerminatedStates(terminatedStates); + + List states = nfa.getStates(); + states.add(newStart); + states.add(newEnd); + + handling.push(nfa); + return handling; + } + + + /** + * @param nfaStack 需要被连接的所有NFA + * @return 连接为一个NFA + */ + public NFA combine(Stack nfaStack) { + if (nfaStack.size() > 1) { + while (nfaStack.size() > 1) { + NFA nfa1 = nfaStack.pop(); + NFA nfa2 = nfaStack.pop(); + + NFA newNFA = combineTwoNFA(nfa1, nfa2); + nfaStack.push(newNFA); + } + } + return nfaStack.pop(); + } + + private NFA combineTwoNFA(NFA nfa1, NFA nfa2) { + // 增加一个新的起始节点作为初始态 + int nowID = FA_StateIDController.getID(); + FA_State newStart = new FA_State(nowID); + FA_StateIDController.setID(++nowID); + + // 连接原来的两个NFA + FA_Edge newEdge1 = new FA_Edge('ε', nfa1.getStart()); + FA_Edge newEdge2 = new FA_Edge('ε', nfa2.getStart()); + + List startFollows = new LinkedList<>(); + startFollows.add(newEdge1); + startFollows.add(newEdge2); + newStart.setFollows(startFollows); + + // 重新构造NFA,字母集为无重复并集,所有状态和终止态相加,开始态为新态 + List alphabet = new LinkedList<>(); + alphabet.addAll(nfa1.getAlphabet()); + alphabet.removeAll(nfa2.getAlphabet()); + alphabet.addAll(nfa2.getAlphabet()); + + List terminatedStates = new FA_StatesList(); + terminatedStates.addAll(nfa1.getTerminatedStates()); + terminatedStates.addAll(nfa2.getTerminatedStates()); + terminatedStates.sort(comparator); + + List states = new FA_StatesList(); + states.addAll(nfa1.getStates()); + states.addAll(nfa2.getStates()); + states.add(newStart); + states.sort(comparator); + + NFA newNFA = new NFA(); + newNFA.setStart(newStart); + newNFA.setAlphabet(alphabet); + newNFA.setTerminatedStates(terminatedStates); + newNFA.setStates(states); + + return newNFA; + } + +} diff --git a/src/main/java/finiteAutomata/RegularExpressionHandler.java b/src/main/java/finiteAutomata/RegularExpressionHandler.java new file mode 100644 index 0000000..939c50f --- /dev/null +++ b/src/main/java/finiteAutomata/RegularExpressionHandler.java @@ -0,0 +1,524 @@ +package finiteAutomata; + +import exceptions.UnexpectedRegularExprRuleException; +import org.apache.commons.lang3.ArrayUtils; +import org.apache.log4j.Logger; +import utilties.ExtendedMark; +import utilties.SquareBracketMarkInnerType; + +import java.util.LinkedList; +import java.util.List; +import java.util.Stack; + +/** + * Created by cuihua on 2017/10/25. + *

+ * 输入的正则表达式 + */ +public class RegularExpressionHandler { + + private static final Logger logger = Logger.getLogger(RegularExpressionHandler.class); + + /** + * 不可能存在的正则定义 + */ + private static List unexpectedRERules; + + /** + * 标准化的正则表达式中优先级序列 + * 优先级越高,越靠后 + */ + private static List priority; + + /** + * 匹配如 [a-z] + */ + private static char[] lowCaseCharSequence = { + 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', + 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z' + }; + + /** + * 匹配如 [A-Z] + */ + private static char[] upCaseCharSequence = { + 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', + 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z' + }; + + /** + * 匹配如 [0-9] + */ + private static char[] intSequence = { + '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' + }; + + public RegularExpressionHandler() { + unexpectedRERules = new LinkedList<>(); + unexpectedRERules.add("(*"); + unexpectedRERules.add("(|"); + unexpectedRERules.add("|)"); + unexpectedRERules.add("|*"); + unexpectedRERules.add("||"); + unexpectedRERules.add("·)"); + unexpectedRERules.add("·*"); + unexpectedRERules.add("·|"); + unexpectedRERules.add("(·"); + unexpectedRERules.add("|·"); + unexpectedRERules.add("··"); + + priority = new LinkedList<>(); + priority.add(0, '('); + priority.add(1, '·'); + priority.add(2, '|'); + priority.add(3, '*'); + } + + /** + * 默认re不含有连接符 + * 将扩展符 +、?、{}、[] 用基本符号代替 + * 添加省略的连接符'·'(对所有操作符画出所有的可能情况) + * + * @param re 输入的正则表达式 + * @return 标准的没有扩展语法如[], +, ? + */ + public String standardizeRE(String re) throws UnexpectedRegularExprRuleException { + // 替换所有空格,便于控制 + re = re.replace(" ", ""); + + // 替换扩展符号,result存储替换后的字符串,differ表示替换前后的对当前处理字符的Index差 + StringBuffer result = new StringBuffer().append(re); + int differ = 0; + for (int i = 0; i < re.length(); i++) { + char c = re.charAt(i); + + if (c == '\\') { + // 转义字符跳过处理 + i++; + } else { + int preLength = result.length(); + if (c == '?') { + result = standardizeExtendedMark(result, i + differ, ExtendedMark.QUESTION_MARK); + } + if (c == '+') { + result = standardizeExtendedMark(result, i + differ, ExtendedMark.PLUS_MARK); + } + if (c == '{') { + result = standardizeExtendedMark(result, i + differ, ExtendedMark.BRACE_MARK); + } + if (c == '[') { + result = standardizeSquareBracketMark(result, i + differ); + } + differ += result.length() - preLength; + } + } + + String tempResult = result.toString(); + checkREValidation(tempResult); + + + // 补充连接符,joinCount表示连接前后的对当前处理字符的Index差,curCharIsTransferred表示当前字符是否是转义字符 + int joinCount = 0; + boolean curCharIsTransferred = false; + for (int i = 0; i < tempResult.length() - 1; i++) { + char before = tempResult.charAt(i); + char after = tempResult.charAt(i + 1); + + if (before == '\\') { + // 转义字符之间不添加连接符,跳过检查下一个操作符 + curCharIsTransferred = true; + continue; + } + + // 合法情况下含有连接符号的都不需要处理 + if (before == '·' || after == '·') { + curCharIsTransferred = false; + continue; + } + + if (after == '(' || isValidChar(false, after)) { + if (before == ')' || before == '*' || isValidChar(curCharIsTransferred, before)) { + result = standardizeJoinMark(result, i + joinCount); + joinCount++; + } + } + + curCharIsTransferred = false; + } + return result.toString(); + } + + /** + * 处理扩展符号(+ / ? / {}) + */ + private StringBuffer standardizeExtendedMark(final StringBuffer re, int markIndex, ExtendedMark mark) throws UnexpectedRegularExprRuleException { + StringBuffer result = new StringBuffer(); + + // ? 前面是括号,需要找到核 + String content; + int contentStartIndex; + + if (re.charAt(markIndex - 1) == ')') { + // 核为非单字符 + contentStartIndex = getContentStartIndexOfExtendedMark(re, markIndex); + content = re.substring(contentStartIndex, markIndex); + } else { + // 核直接是前面的单个字符 + if (markIndex >= 2 && re.charAt(markIndex - 2) == '\\') { + // 核为转义字符 + contentStartIndex = markIndex - 2; + content = re.substring(contentStartIndex, markIndex); + } else { + // 核为普通单个字符 + contentStartIndex = markIndex - 1; + content = String.valueOf(re.charAt(markIndex - 1)); + } + } + + result.append(re.substring(0, contentStartIndex)); + + if (mark == ExtendedMark.QUESTION_MARK) result.append("(ε|").append(content).append(')'); + else if (mark == ExtendedMark.PLUS_MARK) result.append(content).append(content).append('*'); + else if (mark == ExtendedMark.BRACE_MARK) { + // 大括号里面的内容 + String sub = re.substring(markIndex + 1); + int braceEndIndex = sub.indexOf("}"); + int commaIndex = sub.indexOf(","); + + if (braceEndIndex == -1) throw new UnexpectedRegularExprRuleException(re.toString()); + + if (commaIndex == -1) { + // {n} 类型。没有逗号,只有数字,重复数字遍即可 + int times = Integer.parseInt(sub.substring(0, braceEndIndex)); + for (int i = 0; i < times; i++) { + result.append(content); + } + } else { + if (commaIndex == 0) { + // {, n} 类型,重复0-n遍 + int times = Integer.parseInt(sub.substring(1, braceEndIndex)); + for (int i = 0; i < times; i++) { + result.append("(ε|").append(content).append(')'); + } + } else if (commaIndex == braceEndIndex - 1) { + // {n, } 类型,重复最少n遍 + int times = Integer.parseInt(sub.substring(0, commaIndex)); + for (int i = 0; i < times; i++) { + result.append(content); + } + result.append(content).append("*"); + } else { + // {m, n} 类型,最少m遍,最多n遍 + int mTimes = Integer.parseInt(sub.substring(0, commaIndex)); + int nTimes = Integer.parseInt(sub.substring(commaIndex + 1, braceEndIndex)); + for (int i = 0; i < mTimes; i++) { + result.append(content); + } + for (int i = mTimes; i < nTimes; i++) { + result.append("(ε|").append(content).append(')'); + } + } + } + + // 如果 {} 不是最后一个字符,加上后续字符 + if (braceEndIndex != re.length() - 1) result.append(sub.substring(braceEndIndex + 1)); + } + + if (mark == ExtendedMark.QUESTION_MARK | mark == ExtendedMark.PLUS_MARK) { + // 如果 +/? 不是最后一个字符,加上后续字符 + if (markIndex != re.length() - 1) result.append(re.substring(markIndex + 1)); + } + logger.debug(result); + return result; + } + + /** + * 找到扩展符号(+ / ? / {})作用的核的左括号 + */ + private int getContentStartIndexOfExtendedMark(final StringBuffer re, int markIndex) { + int pairCount = 0; + + int contentStartIndex; + for (contentStartIndex = markIndex - 1; contentStartIndex >= 0; contentStartIndex--) { + char c = re.charAt(contentStartIndex); + if (c == ')') pairCount++; + else if (c == '(') { + if (pairCount == 1) break; + else pairCount--; + } + } + return contentStartIndex; + } + + /** + * 增加省略的连接符(·) + * + * @param joinIndex 需要在两个字符中间添加连接符号,第一个字符的index + */ + private StringBuffer standardizeJoinMark(final StringBuffer re, int joinIndex) { + StringBuffer sb = new StringBuffer(); + sb.append(re.substring(0, joinIndex + 1)).append('·').append(re.substring(joinIndex + 1)); + return sb; + } + + /** + * 将方括号里面的内容替换为普通的表达式 + */ + private StringBuffer standardizeSquareBracketMark(final StringBuffer re, int markIndex) + throws UnexpectedRegularExprRuleException { + StringBuffer result = new StringBuffer(); + result.append(re.substring(0, markIndex)); + + // 方括号里面的内容 + String sub = re.substring(markIndex + 1); + int bracketEndIndex = sub.indexOf("]"); + + if (bracketEndIndex == -1) throw new UnexpectedRegularExprRuleException(re.toString()); + + String bracketContent = sub.substring(0, bracketEndIndex); + + List bracketCompleted = new LinkedList<>(); + for (int i = 0; i < bracketContent.length(); ) { + if (i < bracketContent.length() - 1 && bracketContent.charAt(i + 1) == '-') { + // 是连字符形式,按范围或起来 + char start = bracketContent.charAt(i); + char end = bracketContent.charAt(i + 2); + + int startIndex, endIndex; + if (ArrayUtils.contains(lowCaseCharSequence, start)) { + // 小写字母 + startIndex = ArrayUtils.indexOf(lowCaseCharSequence, start); + endIndex = ArrayUtils.indexOf(lowCaseCharSequence, end); + bracketCompleted.add(standardizeSquareBracketMarkSeparatorToCompleted(startIndex, endIndex, + SquareBracketMarkInnerType.LOW_CHAR)); + } else if (ArrayUtils.contains(upCaseCharSequence, start)) { + // 大写字母 + startIndex = ArrayUtils.indexOf(upCaseCharSequence, start); + endIndex = ArrayUtils.indexOf(upCaseCharSequence, end); + bracketCompleted.add(standardizeSquareBracketMarkSeparatorToCompleted(startIndex, endIndex, + SquareBracketMarkInnerType.UP_CHAR)); + } else if (ArrayUtils.contains(intSequence, start)) { + // 数字 + startIndex = ArrayUtils.indexOf(intSequence, start); + endIndex = ArrayUtils.indexOf(intSequence, end); + bracketCompleted.add(standardizeSquareBracketMarkSeparatorToCompleted(startIndex, endIndex, + SquareBracketMarkInnerType.INT)); + } + + i += 3; + } else { + // 没有被跳过最后一个字(单个字符)/不是连字符形式(单个字符),或起来 + StringBuffer sb = new StringBuffer(); + sb.append(bracketContent.charAt(i)); + bracketCompleted.add(sb); + + i++; + } + } + + + // 将 bracketCompleted 中的结果集或起来 + if (bracketCompleted.size() > 1) { + result.append("(").append(bracketCompleted.get(0)); + for (int i = 1; i < bracketCompleted.size(); i++) { + result.append("|").append(bracketCompleted.get(i)); + + } + result.append(")"); + } else { + result.append(bracketCompleted.get(0)); + } + + if (bracketEndIndex != sub.length() - 1) result.append(sub.substring(bracketEndIndex + 1)); + logger.debug(result); + return result; + } + + /** + * 对 [m-n] 类型的字符串进行补全 + * + * @param startIndex 补全的第一个字母(含) + * @param endIndex 补全的最后一个字母(含) + */ + private StringBuffer standardizeSquareBracketMarkSeparatorToCompleted(int startIndex, int endIndex, + SquareBracketMarkInnerType innerType) { + StringBuffer sb = new StringBuffer(); + sb.append("("); + switch (innerType) { + case LOW_CHAR: + for (int i = startIndex; i < endIndex; i++) { + sb.append(lowCaseCharSequence[i]).append("|"); + } + sb.append(lowCaseCharSequence[endIndex]); + break; + + case UP_CHAR: + for (int i = startIndex; i < endIndex; i++) { + sb.append(upCaseCharSequence[i]).append("|"); + } + sb.append(upCaseCharSequence[endIndex]); + break; + + case INT: + for (int i = startIndex; i < endIndex; i++) { + sb.append(intSequence[i]).append("|"); + } + sb.append(intSequence[endIndex]); + break; + } + sb.append(")"); + return sb; + } + + /** + * 检查标准化正则定义的正确性 + */ + private void checkREValidation(final String re) throws UnexpectedRegularExprRuleException { + for (int i = 0; i < re.length() - 1; i++) { + char before = re.charAt(i); + char after = re.charAt(i + 1); + + // 输入RE不合法 + String temp = before + "" + after; + if (unexpectedRERules.contains(temp)) { + throw new UnexpectedRegularExprRuleException(temp); + } + } + + // 转义字符不合法 + // {m, n} 形式已在标准化时处理 + String toCheckComma = re; + int commaIndex; + while ((commaIndex = toCheckComma.indexOf(",")) != -1) { + if (commaIndex == 0) throw new UnexpectedRegularExprRuleException(re); + if (toCheckComma.charAt(commaIndex - 1) != '\\') throw new UnexpectedRegularExprRuleException(re); + else toCheckComma = toCheckComma.substring(commaIndex + 1); + } + + // [m-n] 形式已在标准化时处理 + String toCheckSeparator = re; + int separatorIndex; + while ((separatorIndex = toCheckSeparator.indexOf("-")) != -1) { + if (separatorIndex == 0) throw new UnexpectedRegularExprRuleException(re); + if (toCheckSeparator.charAt(separatorIndex - 1) != '\\') throw new UnexpectedRegularExprRuleException(re); + else toCheckSeparator = toCheckSeparator.substring(separatorIndex + 1); + } + + } + + + /** + * 判断 re 中的输入字符 c 在条件 isTransferred 下是否合法 + */ + private boolean isValidChar(boolean isTransferred, char toTest) { + if (isTransferred) { + // 转义字符 + return isOperand(toTest); + } else { + // 普通字符 + return !isOperand(toTest); + + } + } + + /** + * 判断字符 c 是不是操作符 + */ + private boolean isOperand(char c) { + return (c == '·' || c == '|' || c == '*' || c == '(' || c == ')' || c == '+' || c == '?' || c == '{' || c == '}' + || c == '[' || c == ']' || c == '-') || c == ','; + } + + + /** + * 将标准化后的正则定义的中缀表达式改为后缀表达式 + * 只含有并、或、闭包,括号 + */ + public String convertInfixToPostfix(String re) { + // 存储结果的后缀字符串 + StringBuilder sb = new StringBuilder(re.length()); + + // 操作符的栈 + Stack operandStack = new Stack<>(); + + // 判断当前字符是否是转义字符 + boolean curCharIsTransferred = false; + for (int i = 0; i < re.length(); i++) { + char c = re.charAt(i); + + // 转义的操作符 + if (c == '\\') { + sb.append(c); + curCharIsTransferred = true; + continue; + } + + // 非操作符 + if (isValidChar(curCharIsTransferred, c)) { + sb.append(c); + curCharIsTransferred = false; + continue; + } + + // 操作符 + if (c == '(') operandStack.push('('); + else if (c == ')') { + // 退栈至匹配的'(' + char top; + while ((top = operandStack.pop()) != '(') { + sb.append(top); + } + } else { + if (!operandStack.empty()) { + char top = operandStack.peek(); + + while (true) { + // 退栈高优先级的操作符,最后再压栈当前操作符 + // 没有优先级更高的操作符时跳出 + if (comparePriority(c, top)) { + operandStack.pop(); + sb.append(top); + } else break; + + // 操作栈不为空时继续比较,否则跳出 + if (!operandStack.empty()) { + top = operandStack.peek(); + } else break; + } + + operandStack.push(c); + } else { + // 操作符栈中之前无堆栈,将此操作符压栈 + operandStack.push(c); + } + + } + + curCharIsTransferred = false; + } + + // 栈中剩余操作符 + while (!operandStack.empty()) { + char top = operandStack.pop(); + sb.append(top); + } + + return sb.toString(); + } + + + /** + * @param curChar 当前读取的操作符 + * @param top 当前符号栈的栈顶操作符 + * @return true 如果 curChar 优先级小于等于 top 优先级,top 需要被弹出。false otherwise + */ + private boolean comparePriority(char curChar, char top) { + int curCharIndex = priority.indexOf(curChar); + int topCharIndex = priority.indexOf(top); + + boolean result = (curCharIndex - topCharIndex) <= 0; + logger.debug("优先级:当前符号" + curChar + "小于等于栈顶符号" + top + ": " + result); + return result; + } + +} diff --git a/src/main/java/finiteAutomata/entity/DFA.java b/src/main/java/finiteAutomata/entity/DFA.java new file mode 100644 index 0000000..42a01a5 --- /dev/null +++ b/src/main/java/finiteAutomata/entity/DFA.java @@ -0,0 +1,45 @@ +package finiteAutomata.entity; + +import java.util.Map; + +/** + * Created by cuihua on 2017/10/24. + *

+ * Deterministic FA,确定的有穷自动机 + */ +public class DFA extends FA { + + /** + * DFA 中各状态之间的转换关系 + * 第一个 state(FA_State) 通过 label(Character) 到达第二个 state(FA_State) + */ + private Map> move; + + + public Map> getMove() { + return move; + } + + public void setMove(Map> move) { + this.move = move; + } + + + @Override + public boolean isValid(String lexeme) { + FA_State curState = getStart(); + + for (char c : lexeme.toCharArray()) { + boolean canFind = false; + for (FA_Edge curEdge : curState.getFollows()) { + if (curEdge.getLabel() == c) { + curState = curEdge.getPointTo(); + canFind = true; + break; + } + } + if (!canFind) return false; + } + return getTerminatedStates().contains(curState); + } +} diff --git a/src/main/java/finiteAutomata/entity/DTran.java b/src/main/java/finiteAutomata/entity/DTran.java new file mode 100644 index 0000000..e58cdd5 --- /dev/null +++ b/src/main/java/finiteAutomata/entity/DTran.java @@ -0,0 +1,74 @@ +package finiteAutomata.entity; + +import org.apache.log4j.Logger; + +import java.util.List; + +/** + * Created by cuihua on 2017/10/24. + *

+ * 标记子集构造法中产生的映射关系 + */ +public class DTran { + + private static Logger logger = Logger.getLogger(DTran.class); + + /** + * 构造中产生的等价转换的出发状态 + */ + private List from; + + /** + * 构造中产生的等价转换的到达状态 + */ + private List to; + + /** + * 标记的转换条件 + */ + private char label; + + public DTran(List from, List to, char label) { + this.from = from; + this.to = to; + this.label = label; + } + + public List getFrom() { + return from; + } + + public void setFrom(List from) { + this.from = from; + } + + public List getTo() { + return to; + } + + public void setTo(List to) { + this.to = to; + } + + public char getLabel() { + return label; + } + + public void setLabel(char label) { + this.label = label; + } + + // 控制台呈现该DTran + public void show() { + StringBuilder sb = new StringBuilder(); + sb.append("\n"); + for (FA_State state : from) { + sb.append(state.getStateID() + " "); + } + sb.append("\n" + label + "\n"); + for (FA_State state : to) { + sb.append(state.getStateID() + " "); + } + logger.debug(sb.toString()); + } +} diff --git a/src/main/java/finiteAutomata/entity/FA.java b/src/main/java/finiteAutomata/entity/FA.java new file mode 100644 index 0000000..10efd87 --- /dev/null +++ b/src/main/java/finiteAutomata/entity/FA.java @@ -0,0 +1,70 @@ +package finiteAutomata.entity; + +import java.util.List; + +/** + * Created by cuihua on 2017/10/24. + *

+ * 表示有穷自动机 + */ +public abstract class FA { + + /** + * 开始状态 + */ + private FA_State start; + + /** + * 所有状态 + */ + private List states; + + /** + * 终止/接受态 + */ + private List terminatedStates; + + /** + * 字母表 + */ + private List alphabet; + + public FA_State getStart() { + return start; + } + + public void setStart(FA_State start) { + this.start = start; + } + + public List getStates() { + return states; + } + + public void setStates(List states) { + this.states = states; + } + + public List getTerminatedStates() { + return terminatedStates; + } + + public void setTerminatedStates(List terminatedStates) { + this.terminatedStates = terminatedStates; + } + + public List getAlphabet() { + return alphabet; + } + + public void setAlphabet(List alphabet) { + this.alphabet = alphabet; + } + + /** + * @param lexeme 要检查的词素 + * @return 词素是否合法 + */ + public abstract boolean isValid(String lexeme); + +} diff --git a/src/main/java/finiteAutomata/entity/FA_Edge.java b/src/main/java/finiteAutomata/entity/FA_Edge.java new file mode 100644 index 0000000..c555bf8 --- /dev/null +++ b/src/main/java/finiteAutomata/entity/FA_Edge.java @@ -0,0 +1,41 @@ +package finiteAutomata.entity; + +/** + * Created by cuihua on 2017/10/24. + *

+ * 有穷自动机中的链接 + */ +public class FA_Edge { + + /** + * 这条边上的标记,空用ε表示 + */ + private char label; + + /** + * 这条边指向的后记状态 + */ + private FA_State pointTo; + + + public FA_Edge(char label, FA_State pointTo) { + this.label = label; + this.pointTo = pointTo; + } + + public char getLabel() { + return label; + } + + public void setLabel(char label) { + this.label = label; + } + + public FA_State getPointTo() { + return pointTo; + } + + public void setPointTo(FA_State pointTo) { + this.pointTo = pointTo; + } +} diff --git a/src/main/java/finiteAutomata/entity/FA_Node.java b/src/main/java/finiteAutomata/entity/FA_Node.java new file mode 100644 index 0000000..bab46ee --- /dev/null +++ b/src/main/java/finiteAutomata/entity/FA_Node.java @@ -0,0 +1,28 @@ +package finiteAutomata.entity; + +import java.util.List; + +/** + * Created by cuihua on 2017/10/26. + *

+ * 最小化 DFA 过程中形成的等价 FA_State 组成的集合 + */ +public class FA_Node { + + /** + * 等价 FA_State + */ + private List states; + + public FA_Node(List states) { + this.states = states; + } + + public List getStates() { + return states; + } + + public void setStates(List states) { + this.states = states; + } +} diff --git a/src/main/java/finiteAutomata/entity/FA_State.java b/src/main/java/finiteAutomata/entity/FA_State.java new file mode 100644 index 0000000..16602be --- /dev/null +++ b/src/main/java/finiteAutomata/entity/FA_State.java @@ -0,0 +1,44 @@ +package finiteAutomata.entity; + +import java.util.LinkedList; +import java.util.List; + +/** + * Created by cuihua on 2017/10/24. + *

+ * 有穷自动机中的节点 + */ +public class FA_State { + + /** + * 此节点对应的ID编号 + */ + private int stateID; + + /** + * 此节点对应的后续状态链接 + */ + private List follows; + + + public FA_State(int stateID) { + this.stateID = stateID; + this.follows = new LinkedList<>(); + } + + public int getStateID() { + return stateID; + } + + public void setStateID(int stateID) { + this.stateID = stateID; + } + + public List getFollows() { + return follows; + } + + public void setFollows(List follows) { + this.follows = follows; + } +} diff --git a/src/main/java/finiteAutomata/entity/NFA.java b/src/main/java/finiteAutomata/entity/NFA.java new file mode 100644 index 0000000..92c233a --- /dev/null +++ b/src/main/java/finiteAutomata/entity/NFA.java @@ -0,0 +1,17 @@ +package finiteAutomata.entity; + +/** + * Created by cuihua on 2017/10/24. + *

+ * Nondeterministic FA,不确定的有穷自动机 + */ +public class NFA extends FA { + + /** + * TODO 因为没有使用 NFA 来校验词素,所以等需要的时候再来实现 + */ + @Override + public boolean isValid(String lexeme) { + return false; + } +} diff --git a/src/main/java/lex/LexicalAnalyzer.java b/src/main/java/lex/LexicalAnalyzer.java new file mode 100644 index 0000000..6389ea3 --- /dev/null +++ b/src/main/java/lex/LexicalAnalyzer.java @@ -0,0 +1,41 @@ +package lex; + +import exceptions.NotMatchingException; +import finiteAutomata.entity.DFA; +import lex.entity.Token; +import utilties.DFA_StatePatternMappingController; + +import java.util.List; + +/** + * Created by cuihua on 2017/11/2. + *

+ * 词法分析器 + */ +public class LexicalAnalyzer { + + /** + * 由当前 .l 文件生成的最小 DFA + */ + private List allDFAs; + + public LexicalAnalyzer(List allDFAs) { + this.allDFAs = allDFAs; + } + + /** + * 对每一个词素都进行分析 + * + * @param lexeme 要分析的词素 + * @return 分析结束之后的的结果词法单元 + */ + public Token analyze(String lexeme) throws NotMatchingException { + for (DFA curDFA : allDFAs) { + // 按优先级顺序依次对比,满足了就返回 + if (curDFA.isValid(lexeme)) + return new Token(DFA_StatePatternMappingController.getMap().get(curDFA), lexeme); + } + throw new NotMatchingException(lexeme); + + } +} diff --git a/src/main/java/lex/Main.java b/src/main/java/lex/Main.java new file mode 100644 index 0000000..b7b41fb --- /dev/null +++ b/src/main/java/lex/Main.java @@ -0,0 +1,43 @@ +package lex; + +import exceptions.NotMatchingException; +import finiteAutomata.entity.DFA; +import lex.entity.Token; +import lex.generator.LexInputHandler; +import lex.generator.LexInputReader; + +import java.util.LinkedList; +import java.util.List; + +/** + * Created by cuihua on 2017/11/1. + *

+ * 主程序 + * 输入:用户输入程序 + * 输出:根据已有的 .l 文件输出 Token 序列 + */ +public class Main { + + public static void main(String[] args) throws NotMatchingException { + UserInteractionController userInteractionController = new UserInteractionController(); + List lexemes = userInteractionController.readUserContent(); + + + // 解析 .l 文件代表的 DFA + LexInputReader lexInputReader = new LexInputReader(); + List lexContent = lexInputReader.readREs(); + + LexInputHandler lexInputHandler = new LexInputHandler(lexContent); + List allDFAs = lexInputHandler.convert(); + + // 生成词法分析器 + LexicalAnalyzer lexicalAnalyzer = new LexicalAnalyzer(allDFAs); + List resultTokens = new LinkedList<>(); + for (String lexeme : lexemes) { + resultTokens.add(lexicalAnalyzer.analyze(lexeme)); + } + + userInteractionController.showAllTokens(resultTokens); + + } +} diff --git a/src/main/java/lex/UserInteractionController.java b/src/main/java/lex/UserInteractionController.java new file mode 100644 index 0000000..35c92a7 --- /dev/null +++ b/src/main/java/lex/UserInteractionController.java @@ -0,0 +1,90 @@ +package lex; + +import lex.entity.Token; + +import java.io.File; +import java.io.FileWriter; +import java.io.IOException; +import java.time.LocalDateTime; +import java.util.LinkedList; +import java.util.List; +import java.util.Scanner; + +/** + * Created by cuihua on 2017/11/1. + *

+ * 与使用词法分析器的用户进行交互 + */ +public class UserInteractionController { + + /** + * 读取用户输入并进行简单处理,返回所有的词素 lexemes + */ + public List readUserContent() { + Scanner sc = new Scanner(System.in); + + List lexemes = new LinkedList<>(); + String line; + while (!(line = sc.nextLine()).equals("###")) { + String[] parts = line.split(" "); + for (String lexeme : parts) { + if (!lexeme.equals("")) + lexemes.add(lexeme); + } + } + + return lexemes; + } + + /** + * 向用户展示所有的词法单元结果 + */ + public void showAllTokens(List tokens) { + String s = getTokenOutput(tokens); + showInConsole(s); + try { + showInFile(s); + } catch (IOException e) { + System.out.println("Token 序列输出到文件:失败!"); + } + } + + /** + * 从 token 序列中获取要输出的内容 + */ + private String getTokenOutput(List tokens) { + StringBuilder sb = new StringBuilder(); + sb.append("-------------------\n"); + for (Token token : tokens) { + sb.append("< ").append(token.getPatternType()); + if (token.getAttribute() != null) { + sb.append(", "); + sb.append(token.getAttribute()); + } + sb.append(" >").append("\n"); + } + sb.append("-------------------"); + return sb.toString(); + } + + /** + * 控制台输出 + */ + private void showInConsole(String s) { + System.out.println(s); + } + + /** + * 文件输出 + */ + private void showInFile(String s) throws IOException { + File file = new File(System.getProperty("user.dir") + " "+ LocalDateTime.now() + ".txt"); + if (file.createNewFile()) { + FileWriter writer = new FileWriter(file); + writer.write(s); + writer.flush(); + writer.close(); + } + + } +} diff --git a/src/main/java/lex/entity/Token.java b/src/main/java/lex/entity/Token.java new file mode 100644 index 0000000..6d566ce --- /dev/null +++ b/src/main/java/lex/entity/Token.java @@ -0,0 +1,40 @@ +package lex.entity; + +/** + * Created by cuihua on 2017/10/31. + * + * 词法单元 + */ +public class Token { + + /** + * 模式 + */ + private String patternType; + + /** + * 属性值 + */ + private String attribute; + + public Token(String patternType, String attribute) { + this.patternType = patternType; + this.attribute = attribute; + } + + public String getPatternType() { + return patternType; + } + + public void setPatternType(String patternType) { + this.patternType = patternType; + } + + public String getAttribute() { + return attribute; + } + + public void setAttribute(String attribute) { + this.attribute = attribute; + } +} diff --git a/src/main/java/lex/generator/LexInputHandler.java b/src/main/java/lex/generator/LexInputHandler.java new file mode 100644 index 0000000..1e0c9f8 --- /dev/null +++ b/src/main/java/lex/generator/LexInputHandler.java @@ -0,0 +1,54 @@ +package lex.generator; + +import finiteAutomata.FA_Controller; +import finiteAutomata.entity.DFA; + +import java.util.*; + +/** + * Created by cuihua on 2017/11/1. + *

+ * 处理 Lex .l 文件中的数据(只含有正则定义) + */ +public class LexInputHandler { + + /** + * .l 文件的内容(模式 pattern + 正则定义 re) + */ + private List content; + + /** + * 模式 与 正则定义 的一一映射 + */ + private Map patternREMap; + + public LexInputHandler(List content) { + this.content = content; + initMap(); + } + + /** + * 根据 .l 文件初始化映射表 + * LinkedHashMao 保证顺序与读入顺序相同 + */ + private void initMap() { + patternREMap = new LinkedHashMap<>(); + for (String line : content) { + String[] parts = line.split(" "); + patternREMap.put(parts[0], parts[1]); + } + } + + /** + * .l 文件内容对应的 DFA + */ + public List convert() { + // 处理正则定义 + List res = new LinkedList<>(patternREMap.values()); + List patternTypes = new LinkedList<>(patternREMap.keySet()); + + FA_Controller controller = new FA_Controller(); + return controller.lexicalAnalysis(res, patternTypes); + } + +} diff --git a/src/main/java/lex/generator/LexInputReader.java b/src/main/java/lex/generator/LexInputReader.java new file mode 100644 index 0000000..b4e3910 --- /dev/null +++ b/src/main/java/lex/generator/LexInputReader.java @@ -0,0 +1,36 @@ +package lex.generator; + +import java.io.InputStream; +import java.util.LinkedList; +import java.util.List; +import java.util.Scanner; + +/** + * Created by cuihua on 2017/10/24. + *

+ * 用于读取 Lex 的规格 .l 文件 + */ +public class LexInputReader { + + /** + * .l 文件的路径 + */ + private static final String path = "regular_expression.l"; + + public LexInputReader() { + } + + /** + * 从 .l 文件中读取数据 + */ + public List readREs() { + InputStream is = getClass().getClassLoader().getResourceAsStream(path); + Scanner sc = new Scanner(is); + + List reContent = new LinkedList<>(); + while (sc.hasNext()) { + reContent.add(sc.nextLine()); + } + return reContent; + } +} diff --git a/src/main/java/utilties/ClosureRecursionHandler.java b/src/main/java/utilties/ClosureRecursionHandler.java new file mode 100644 index 0000000..c959710 --- /dev/null +++ b/src/main/java/utilties/ClosureRecursionHandler.java @@ -0,0 +1,55 @@ +package utilties; + +import finiteAutomata.entity.FA_State; +import org.apache.log4j.Logger; + +import java.util.List; + +/** + * Created by cuihua on 2017/10/27. + *

+ * 解决递归 closure 时循环处理的问题 + */ +public class ClosureRecursionHandler { + + private static Logger logger = Logger.getLogger(ClosureRecursionHandler.class); + + private static List states = new FA_StatesList(); + private static FA_StateComparator comparator = new FA_StateComparator(); + + private ClosureRecursionHandler() { + } + + /** + * 清理当前处理的现场 + */ + public static void reset() { + states = new FA_StatesList(); + logger.debug("Already reset the ClosureRecursionHandler"); + } + + /** + * 增加一个 state,需保证整个 list 是排好序的,才能被复写的二分法找到 + */ + public static void addState(FA_State state) { + states.add(state); + states.sort(comparator); + } + + /** + * 增加一堆 state list,需保证整个 list 是排好序的,才能被复写的二分法找到 + */ + public static void addAllState(List newStates) { + states.addAll(newStates); + states.sort(comparator); + } + + /** + * 检测 states 中是否含有参数 state + */ + public static boolean contain(FA_State state) { + boolean result = states.contains(state); + logger.debug("State " + state.getStateID() + " is contained: " + result); + return result; + } +} diff --git a/src/main/java/utilties/DFA_StatePatternMappingController.java b/src/main/java/utilties/DFA_StatePatternMappingController.java new file mode 100644 index 0000000..28032cb --- /dev/null +++ b/src/main/java/utilties/DFA_StatePatternMappingController.java @@ -0,0 +1,31 @@ +package utilties; + +import finiteAutomata.entity.DFA; + +import java.util.HashMap; +import java.util.Map; + +/** + * Created by cuihua on 2017/11/2. + *

+ * 统一控制 DFA 与其对应模式的映射 + */ +public class DFA_StatePatternMappingController { + + private static Map map = new HashMap<>(); + + private DFA_StatePatternMappingController() { + } + + public static Map getMap() { + return map; + } + + /** + * 对终止态 state 添加对应的模式 pattern + */ + public static boolean add(DFA dfa, String pattern) { + map.put(dfa, pattern); + return true; + } +} diff --git a/src/main/java/utilties/ExtendedMark.java b/src/main/java/utilties/ExtendedMark.java new file mode 100644 index 0000000..673cd92 --- /dev/null +++ b/src/main/java/utilties/ExtendedMark.java @@ -0,0 +1,10 @@ +package utilties; + +/** + * Created by cuihua on 2017/10/25. + * + * 正则表达式中的扩展符号(+ / ? / {}) + */ +public enum ExtendedMark { + QUESTION_MARK, PLUS_MARK, BRACE_MARK +} diff --git a/src/main/java/utilties/FA_NodeComparator.java b/src/main/java/utilties/FA_NodeComparator.java new file mode 100644 index 0000000..51d3e3a --- /dev/null +++ b/src/main/java/utilties/FA_NodeComparator.java @@ -0,0 +1,42 @@ +package utilties; + +import finiteAutomata.entity.FA_Node; +import finiteAutomata.entity.FA_State; + +import java.util.Comparator; +import java.util.List; + +/** + * Created by cuihua on 2017/10/28. + *

+ * 对当前优化最小化 DFA 的结果集进行排序 + */ +public class FA_NodeComparator implements Comparator { + + + @Override + public int compare(FA_Node o1, FA_Node o2) { + List states1 = o1.getStates(); + List states2 = o2.getStates(); + + int size1 = states1.size(); + int size2 = states2.size(); + + if (size1 < size2) return -1; + else if (size1 > size2) return 1; + else { + // 状态数目相同,逐一比较 + for (int i = 0; i < size1; i++) { + int state1 = states1.get(i).getStateID(); + int state2 = states2.get(i).getStateID(); + + if (state1 < state2) return -1; + else if (state1 > state2) return 1; + } + + // 每个状态都相同 + return 0; + + } + } +} diff --git a/src/main/java/utilties/FA_NodesList.java b/src/main/java/utilties/FA_NodesList.java new file mode 100644 index 0000000..5f63442 --- /dev/null +++ b/src/main/java/utilties/FA_NodesList.java @@ -0,0 +1,33 @@ +package utilties; + +import finiteAutomata.entity.FA_Node; + +import java.util.LinkedList; + +/** + * Created by cuihua on 2017/10/28. + *

+ * 优化 DFA 时使用的数据结构 + * 复写二分法(根据叶节点中 FA_State 的状态数目而定) + */ +public class FA_NodesList extends LinkedList { + + @Override + public int indexOf(Object o) { + int i = ((FA_Node) o).getStates().size(); + + int start = 0; + int end = this.size() - 1; + while (start <= end) { + int middle = (start + end) / 2; + if (i < get(middle).getStates().size()) { + end = middle - 1; + } else if (i > get(middle).getStates().size()) { + start = middle + 1; + } else { + return middle; + } + } + return -1; + } +} diff --git a/src/main/java/utilties/FA_StateComparator.java b/src/main/java/utilties/FA_StateComparator.java new file mode 100644 index 0000000..28ac149 --- /dev/null +++ b/src/main/java/utilties/FA_StateComparator.java @@ -0,0 +1,20 @@ +package utilties; + +import finiteAutomata.entity.FA_State; + +import java.util.Comparator; + +/** + * Created by cuihua on 2017/10/24. + *

+ * 对 ε 闭包的集合进行排序 + */ +public class FA_StateComparator implements Comparator { + + + public int compare(FA_State o1, FA_State o2) { + if (o1.getStateID() < o2.getStateID()) return -1; + else if (o1.getStateID() == o2.getStateID()) return 0; + else return 1; + } +} diff --git a/src/main/java/utilties/FA_StateIDController.java b/src/main/java/utilties/FA_StateIDController.java new file mode 100644 index 0000000..05f9c7c --- /dev/null +++ b/src/main/java/utilties/FA_StateIDController.java @@ -0,0 +1,26 @@ +package utilties; + +/** + * Created by cuihua on 2017/10/26. + * + * 统一控制 FA_State 的序号 + */ +public class FA_StateIDController { + + /** + * 代表从此Controller中取到的当前可使用的ID + */ + private static int nowID; + + private FA_StateIDController() { + FA_StateIDController.nowID = 0; + } + + public static int getID() { + return FA_StateIDController.nowID; + } + + public static void setID(int nowID) { + FA_StateIDController.nowID = nowID; + } +} diff --git a/src/main/java/utilties/FA_StatesList.java b/src/main/java/utilties/FA_StatesList.java new file mode 100644 index 0000000..ff8ecb6 --- /dev/null +++ b/src/main/java/utilties/FA_StatesList.java @@ -0,0 +1,40 @@ +package utilties; + +import finiteAutomata.entity.FA_State; + +import java.util.Collection; +import java.util.LinkedList; + +/** + * Created by cuihua on 2017/10/24. + *

+ * 优化查找为二分法,速度更快 + */ +public class FA_StatesList extends LinkedList { + + public FA_StatesList() { + } + + public FA_StatesList(Collection c) { + super(c); + } + + @Override + public int indexOf(Object o) { + int i = ((FA_State) o).getStateID(); + + int start = 0; + int end = this.size() - 1; + while (start <= end) { + int middle = (start + end) / 2; + if (i < get(middle).getStateID()) { + end = middle - 1; + } else if (i > get(middle).getStateID()) { + start = middle + 1; + } else { + return middle; + } + } + return -1; + } +} diff --git a/src/main/java/utilties/NFA_StatePatternMappingController.java b/src/main/java/utilties/NFA_StatePatternMappingController.java new file mode 100644 index 0000000..e0e2eac --- /dev/null +++ b/src/main/java/utilties/NFA_StatePatternMappingController.java @@ -0,0 +1,31 @@ +package utilties; + +import finiteAutomata.entity.NFA; + +import java.util.HashMap; +import java.util.Map; + +/** + * Created by cuihua on 2017/11/2. + *

+ * 统一控制 NFA 与其对应模式的映射 + */ +public class NFA_StatePatternMappingController { + + private static Map map = new HashMap<>(); + + private NFA_StatePatternMappingController() { + } + + public static Map getMap() { + return map; + } + + /** + * 对终止态 state 添加对应的模式 pattern + */ + public static boolean add(NFA nfa, String pattern) { + map.put(nfa, pattern); + return true; + } +} diff --git a/src/main/java/utilties/SquareBracketMarkInnerType.java b/src/main/java/utilties/SquareBracketMarkInnerType.java new file mode 100644 index 0000000..550dcb8 --- /dev/null +++ b/src/main/java/utilties/SquareBracketMarkInnerType.java @@ -0,0 +1,12 @@ +package utilties; + +/** + * Created by cuihua on 2017/11/3. + * + * [m-n] m, n 的类型 + */ +public enum SquareBracketMarkInnerType { + + LOW_CHAR, UP_CHAR, INT + +} diff --git a/src/main/resources/log4j.properties b/src/main/resources/log4j.properties new file mode 100644 index 0000000..8c22b1d --- /dev/null +++ b/src/main/resources/log4j.properties @@ -0,0 +1,8 @@ +# 调整 rootLogger 的优先级以控制输入态输出的内容 +log4j.rootLogger=info, stdout + +### direct log messages to stdout ### +log4j.appender.stdout=org.apache.log4j.ConsoleAppender +log4j.appender.stdout.Target=System.out +log4j.appender.stdout.layout=org.apache.log4j.PatternLayout +log4j.appender.stdout.layout.ConversionPattern=%d{ABSOLUTE} %5p %c{1}:%L - %m%n \ No newline at end of file diff --git a/src/main/resources/re_backup b/src/main/resources/re_backup new file mode 100644 index 0000000..ac5836e --- /dev/null +++ b/src/main/resources/re_backup @@ -0,0 +1,13 @@ +separator \,|; +comparator <|>|(<=)|(>=)|(==)|(!=) +assignment = +digit 0|1|2|3|4|5|6|7|8|9 +number (1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)* +letter a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z +id letter(letter|number)* + + +ID aa+ +A (a|b)* +B (a|b)*abb(a|b)* +C aba?a+abb+cc diff --git a/src/main/resources/regular_expression.l b/src/main/resources/regular_expression.l new file mode 100644 index 0000000..751e05b --- /dev/null +++ b/src/main/resources/regular_expression.l @@ -0,0 +1,11 @@ +separator \,|; +comparator <|>|(<=)|(>=)|(==)|(!=) +assignment = +digit [0-9] +number [1-9][0-9]* +letter [a-zA-Z] +word [a-zA-Z]* +quotation "[a-zA-Z]*" +id [a-zA-Z][_a-zA-Z0-9]* +testTransferred1 \{a*b*c{1,3}\} +testTransferred2 \[0\-9\]a* \ No newline at end of file diff --git a/src/test/java/finiteAutomata/DFA_HandlerTest.java b/src/test/java/finiteAutomata/DFA_HandlerTest.java new file mode 100644 index 0000000..29e6cba --- /dev/null +++ b/src/test/java/finiteAutomata/DFA_HandlerTest.java @@ -0,0 +1,196 @@ +package finiteAutomata; + +import finiteAutomata.entity.DFA; +import finiteAutomata.entity.FA_Edge; +import finiteAutomata.entity.FA_State; +import finiteAutomata.entity.NFA; +import org.apache.log4j.Logger; +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +import java.lang.reflect.Method; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; + +/** + * DFA_Handler Tester. + * + * @author + * @version 1.0 + * @since

十月 24, 2017
+ */ +public class DFA_HandlerTest { + + private static final Logger logger = Logger.getLogger(DFA_HandlerTest.class); + + private NFA defaultNFA; + + @Before + public void before() throws Exception { + FA_State state1 = new FA_State(1); + FA_State state2 = new FA_State(2); + FA_State state3 = new FA_State(3); + FA_State state4 = new FA_State(4); + FA_State state5 = new FA_State(5); + FA_State state6 = new FA_State(6); + FA_State state7 = new FA_State(7); + FA_State state8 = new FA_State(8); + FA_State state9 = new FA_State(9); + FA_State state10 = new FA_State(10); + + FA_Edge edge1 = new FA_Edge('ε', state2); + FA_Edge edge2 = new FA_Edge('ε', state3); + FA_Edge edge3 = new FA_Edge('a', state4); + FA_Edge edge4 = new FA_Edge('ε', state6); + FA_Edge edge5 = new FA_Edge('ε', state7); + FA_Edge edge6 = new FA_Edge('ε', state8); + FA_Edge edge7 = new FA_Edge('ε', state5); + FA_Edge edge8 = new FA_Edge('ε', state9); + FA_Edge edge9 = new FA_Edge('a', state10); + + List follow1 = new LinkedList<>(); + follow1.add(edge1); + follow1.add(edge2); + state1.setFollows(follow1); + List follow2 = new LinkedList<>(); + follow2.add(edge3); + follow2.add(edge4); + state2.setFollows(follow2); + List follow3 = new LinkedList<>(); + follow3.add(edge5); + follow3.add(edge6); + state3.setFollows(follow3); + List follow4 = new LinkedList<>(); + follow4.add(edge7); + state4.setFollows(follow4); + List follow5 = new LinkedList<>(); + follow5.add(edge8); + state5.setFollows(follow5); + List follow7 = new LinkedList<>(); + follow7.add(edge9); + state7.setFollows(follow7); + + List alphabet = new LinkedList<>(); + alphabet.add('a'); + + List terminatedStates = new LinkedList<>(); + terminatedStates.add(state6); + terminatedStates.add(state8); + terminatedStates.add(state9); + terminatedStates.add(state10); + + List allStates = new LinkedList<>(); + allStates.add(state1); + allStates.add(state2); + allStates.add(state3); + allStates.add(state4); + allStates.add(state5); + allStates.add(state6); + allStates.add(state7); + allStates.add(state8); + allStates.add(state9); + allStates.add(state10); + + defaultNFA = new NFA(); + defaultNFA.setAlphabet(alphabet); + defaultNFA.setStart(state1); + defaultNFA.setTerminatedStates(terminatedStates); + defaultNFA.setStates(allStates); + } + + @After + public void after() throws Exception { + } + + /** + * Method: getFromNFA(NFA defaultNFA) + */ + @Test + public void testGetFromNFA1() throws Exception { + DFA_Handler dfaHandler = new DFA_Handler(); + dfaHandler.getFromNFA(defaultNFA); + + } + + /** + * Method: getFromNFA(NFA defaultNFA) + */ + @Test + public void testGetFromNFA2() throws Exception { + RegularExpressionHandler rgHandler = new RegularExpressionHandler(); + NFA_Handler nfaHandler = new NFA_Handler(); + DFA_Handler dfaHandler = new DFA_Handler(); + +// String re = "((ε|a)b*)*"; + String re = "(a|b)*abb(a|b)*"; + re = rgHandler.convertInfixToPostfix(rgHandler.standardizeRE(re)); + logger.debug("---------------- " + re + " ----------------"); + + NFA finalNFA = nfaHandler.getFromRE(re, null); + logger.debug("*************** finish convert " + re + " to NFA ***************"); + + // 转化为 DFA + DFA dfa = dfaHandler.getFromNFA(finalNFA); + logger.debug("all states size: " + dfa.getStates().size()); + } + + /** + * Method: optimize(DFA defaultNFA) + */ + @Test + public void testOptimize1() throws Exception { + DFA_Handler dfaHandler = new DFA_Handler(); + dfaHandler.optimize(dfaHandler.getFromNFA(defaultNFA)); + } + + /** + * Method: optimize(DFA defaultNFA) + */ + @Test + public void testOptimize2() throws Exception { + RegularExpressionHandler rgHandler = new RegularExpressionHandler(); + NFA_Handler nfaHandler = new NFA_Handler(); + DFA_Handler dfaHandler = new DFA_Handler(); + +// String re = "(a|b)*abb(a|b)*"; + String re = "\\{a*b*c{1,3}\\}"; + re = rgHandler.convertInfixToPostfix(rgHandler.standardizeRE(re)); + NFA finalNFA = nfaHandler.getFromRE(re, null); + DFA dfa = dfaHandler.optimize(dfaHandler.getFromNFA(finalNFA)); + + logger.info("优化后的 DFA 转换表"); + for (Map.Entry> entryState : dfa.getMove().entrySet()) { + FA_State start = entryState.getKey(); + for (Map.Entry entryEdge : entryState.getValue().entrySet()) { + logger.info(start.getStateID() + " through " + entryEdge.getKey() + " to " + entryEdge.getValue().getStateID()); + } + } + } + + + /** + * Method: closure(FA_State nowState) + */ + @Test + public void testClosure() throws Exception { + DFA_Handler dfaHandler = new DFA_Handler(); + + Method method = DFA_Handler.class.getDeclaredMethod("closure", FA_State.class); + method.setAccessible(true); + method.invoke(dfaHandler, defaultNFA.getStart()); + + // 需比对的答案 + // state: closure + // 1: 1, 2, 3, 6, 7, 8 + // 2: 2, 6 + // 3: 3, 7, 8 + // 4: 4, 5, 9 + // 7: 7 +// for (FA_State temp : result) { +// logger.debug(temp.getStateID()); +// } + } + +} diff --git a/src/test/java/finiteAutomata/FA_ControllerTest.java b/src/test/java/finiteAutomata/FA_ControllerTest.java new file mode 100644 index 0000000..a89b057 --- /dev/null +++ b/src/test/java/finiteAutomata/FA_ControllerTest.java @@ -0,0 +1,64 @@ +package finiteAutomata; + +import org.junit.After; +import org.junit.Before; +import org.junit.Test; +import utilties.DFA_StatePatternMappingController; +import utilties.NFA_StatePatternMappingController; + +import java.util.LinkedList; +import java.util.List; + +/** + * FA_Controller Tester. + * + * @author + * @version 1.0 + * @since
十月 27, 2017
+ */ +public class FA_ControllerTest { + + @Before + public void before() throws Exception { + } + + @After + public void after() throws Exception { + } + + /** + * Method: lexicalAnalysis(List res) + */ + @Test + public void testLexicalAnalysis() throws Exception { + FA_Controller controller = new FA_Controller(); + + List res = new LinkedList<>(); +// res.add("a+"); +// res.add("a*|b*"); +// res.add("(a|b)*"); +// res.add("(a*|b*)*"); +// res.add("((ε|a)b*)*"); +// res.add("(a|b)*abb(a|b)*"); +// res.add("aba?a+abb+cc"); +// res.add("\\{a*b*c{1,3}\\}"); + res.add("[a-z0-9]*"); + + + List patterns = new LinkedList<>(); + patterns.add("ID"); + patterns.add("A"); + patterns.add("B"); + patterns.add("C"); + patterns.add("D"); + patterns.add("E"); + patterns.add("F"); + + controller.lexicalAnalysis(res, patterns); +// NFA_StatePatternMappingController.getMap(); + DFA_StatePatternMappingController.getMap(); + + } + + +} diff --git a/src/test/java/finiteAutomata/NFA_HandlerTest.java b/src/test/java/finiteAutomata/NFA_HandlerTest.java new file mode 100644 index 0000000..2dcd278 --- /dev/null +++ b/src/test/java/finiteAutomata/NFA_HandlerTest.java @@ -0,0 +1,59 @@ +package finiteAutomata; + +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +/** + * NFA Tester. + * + * @author + * @version 1.0 + * @since
十月 26, 2017
+ */ +public class NFA_HandlerTest { + + @Before + public void before() throws Exception { + } + + @After + public void after() throws Exception { + } + + /** + * Method: getFromRE(String re) + */ + @Test + public void testGetFromRE1() throws Exception { + NFA_Handler handler = new NFA_Handler(); + handler.getFromRE("ab·a|*", "ID"); + } + + /** + * Method: getFromRE(String re) + */ + @Test + public void testGetFromRE2() throws Exception { + NFA_Handler handler = new NFA_Handler(); + handler.getFromRE("ab·c·\\{·ε\\{|·ε\\{|·", "ID"); + } + + /** + * Method: getFromRE(String re) + */ + @Test + public void testGetFromRE3() throws Exception { + NFA_Handler handler = new NFA_Handler(); + handler.getFromRE("\\{a*·b*·c·εc|·εc|·\\}·", "ID"); + } + + /** + * Method: combine(List nfaList) + */ + @Test + public void testCombine() throws Exception { + } + + +} diff --git a/src/test/java/finiteAutomata/RegularExpressionHandlerTest.java b/src/test/java/finiteAutomata/RegularExpressionHandlerTest.java new file mode 100644 index 0000000..0adb3f7 --- /dev/null +++ b/src/test/java/finiteAutomata/RegularExpressionHandlerTest.java @@ -0,0 +1,219 @@ +package finiteAutomata; + +import exceptions.UnexpectedRegularExprRuleException; +import org.apache.log4j.Logger; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import utilties.ExtendedMark; + +import java.lang.reflect.Method; + +/** + * RegularExpressionHandler Tester. + * + * @author + * @version 1.0 + * @since
十月 25, 2017
+ */ +public class RegularExpressionHandlerTest { + + private static final Logger logger = Logger.getLogger(RegularExpressionHandlerTest.class); + + @Before + public void before() throws Exception { + } + + @After + public void after() throws Exception { + } + + /** + * Method: standardizeRE(String re) + */ + @Test + public void testStandardizeRE() throws Exception { + RegularExpressionHandler re = new RegularExpressionHandler(); + Assert.assertEquals("a·a*", re.standardizeRE("a+")); + Assert.assertEquals("a·b·(ε|a)·a·a*·a·b·b·b*·c·c", re.standardizeRE("aba?a+abb+cc")); + Assert.assertEquals("(a|b)·(a|b)*", re.standardizeRE("(a|b)+")); + Assert.assertEquals("(a*|b*)*", re.standardizeRE("(a*|b*)*")); + Assert.assertEquals("((ε|a)·b*)*", re.standardizeRE("((ε|a)b*)*")); + Assert.assertEquals("(a|b)*·a·b·b·(a|b)*", re.standardizeRE("(a|b)*abb(a|b)*")); + + Assert.assertEquals("c·c·(a·b)·(a·b)·(ε|(a·b))·a·a·a", re.standardizeRE("cc(ab){2, 3}aaa")); + Assert.assertEquals("c·c·c·c*·a·a·a", re.standardizeRE("cc{2, }aaa")); + Assert.assertEquals("c·c·(ε|(a·b))·(ε|(a·b))·(ε|(a·b))·a·a·a", re.standardizeRE("cc(ab){, 3}aaa")); + Assert.assertEquals("c·c·c·a·a·a", re.standardizeRE("cc{2}aaa")); + Assert.assertEquals("c·(ε|c)·(ε|c)·(ε|c)·a·a·a", re.standardizeRE("cc{, 3}aaa")); + + Assert.assertEquals("c·(ε|c)·(ε|c)·(ε|(ε|c))·a·a·a", re.standardizeRE("c·(ε|c)·(ε|c){1, 2}aa·a")); + Assert.assertEquals("c·(ε|a)·(ε|a)·(ε|a)·(ε|c)·(ε|(ε|c))·a·a·a", re.standardizeRE("ca{0,3}·(ε|c){1, 2}aa·a")); + Assert.assertEquals("c·c·(a·b)·(ε|((0|1|2|3|4|5|6|7|8|9)|(a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z)))·a·a·a", + re.standardizeRE("cc(ab)[0-9a-z]?aaa")); + Assert.assertEquals("c·c·(a·b)·((0|1|2|3|4|5|6|7|8|9)|(a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z))·((0|1|2|3|4|5|6|7|8|9)|(a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z))*·a·a·a", + re.standardizeRE("cc(ab)[0-9a-z]+aaa")); + Assert.assertEquals("c·c·(a·b|(c·d)*)·(a·b|(c·d)*)·(ε|(a·b|(c·d)*))·a·a·a", re.standardizeRE("cc(ab|(cd)*){2,3}aaa")); + Assert.assertEquals("c·c·(a·b)·(a|b|c)·a·a·a", re.standardizeRE("cc(ab)[abc]aaa")); + Assert.assertEquals("c·c·(a·b)·(a|b|(c|d|e|f)|x|y)·a·a·a", re.standardizeRE("cc(ab)[abc-fxy]aaa")); + } + + /** + * Method: standardizeRE(String re) + */ + @Test(expected = UnexpectedRegularExprRuleException.class) + public void testStandardizeRE2() throws Exception { + RegularExpressionHandler re = new RegularExpressionHandler(); +// logger.debug(re.standardizeRE("aa(*)") + "\n"); + re.standardizeRE("aa["); + } + + /** + * Method: standardizeRE(String re) + */ + @Test + public void testStandardizeRE3() throws Exception { + RegularExpressionHandler re = new RegularExpressionHandler(); + + Assert.assertEquals("a·a·\\|", re.standardizeRE("aa\\|")); + Assert.assertEquals("a·a·\\{", re.standardizeRE("aa\\{")); + Assert.assertEquals("a·a·\\{·\\}", re.standardizeRE("aa\\{\\}")); + Assert.assertEquals("a·a·\\+", re.standardizeRE("aa\\+")); + Assert.assertEquals("a·a·\\?", re.standardizeRE("aa\\?")); + Assert.assertEquals("a·a·\\[", re.standardizeRE("aa\\[")); + Assert.assertEquals("a·a·\\[·\\]", re.standardizeRE("aa\\[\\]")); + Assert.assertEquals("a·a·\\-", re.standardizeRE("aa\\-")); + Assert.assertEquals("a·b·c·\\{·(ε|\\{)·(ε|\\{)", re.standardizeRE("abc\\{{1,3}")); + Assert.assertEquals("a·b·\\[·0·\\-·9·\\]", re.standardizeRE("ab\\[0\\-9\\]")); + Assert.assertEquals("\\{·a*·b*·c·(ε|c)·(ε|c)·\\}", re.standardizeRE("\\{a*b*c{1,3}\\}")); + } + + /** + * Method: standardizeRE(String re) + */ + @Test(expected = UnexpectedRegularExprRuleException.class) + public void testStandardizeRE4() throws Exception { + RegularExpressionHandler re = new RegularExpressionHandler(); +// re.standardizeRE("ab\\[0-9\\]"); + re.standardizeRE("ab\\[0\\-9\\]\\[0-\\9]"); + } + + /** + * Method: convertInfixToPostfix(String re) + */ + @Test + public void testConvertInfixToPostfix() throws Exception { + RegularExpressionHandler re = new RegularExpressionHandler(); + + Assert.assertEquals("ab|*", re.convertInfixToPostfix("(a|b)*")); + Assert.assertEquals("a*b*|*", re.convertInfixToPostfix("(a*|b*)*")); + Assert.assertEquals("εa|b*·*", re.convertInfixToPostfix("((ε|a)·b*)*")); + Assert.assertEquals("ab|*a·b·b·ab|*·", re.convertInfixToPostfix("(a|b)*·a·b·b·(a|b)*")); + Assert.assertEquals("ab·εa|·a·a*·a·b·b·b*·c·c·", re.convertInfixToPostfix("a·b·(ε|a)·a·a*·a·b·b·b*·c·c")); + + Assert.assertEquals("cc·ab··ab··εab·|·a·a·a·", re.convertInfixToPostfix("c·c·(a·b)·(a·b)·(ε|(a·b))·a·a·a")); + Assert.assertEquals("cc·c·c*·a·a·a·", re.convertInfixToPostfix("c·c·c·c*·a·a·a")); + Assert.assertEquals("cc·εab·|·εab·|·εab·|·a·a·a·", re.convertInfixToPostfix("c·c·(ε|(a·b))·(ε|(a·b))·(ε|(a·b))·a·a·a")); + Assert.assertEquals("cc·c·a·a·a·", re.convertInfixToPostfix("c·c·c·a·a·a")); + Assert.assertEquals("cεc|·εc|·εc|·a·a·a·", re.convertInfixToPostfix("c·(ε|c)·(ε|c)·(ε|c)·a·a·a")); + + } + + /** + * Method: convertInfixToPostfix(String re) + */ + @Test + public void testConvertInfixToPostfix2() throws Exception { + RegularExpressionHandler re = new RegularExpressionHandler(); + Assert.assertEquals("\\,;|", re.convertInfixToPostfix("\\,|;")); + Assert.assertEquals("ab·c·\\{·ε\\{|·ε\\{|·", re.convertInfixToPostfix("a·b·c·\\{·(ε|\\{)·(ε|\\{)")); + Assert.assertEquals("\\{a*·b*·c·εc|·εc|·\\}·", re.convertInfixToPostfix("\\{·a*·b*·c·(ε|c)·(ε|c)·\\}")); + } + + @Test + public void testStandardizeExtendedMark() throws Exception { + RegularExpressionHandler regularExpressionHandler = new RegularExpressionHandler(); + + Method method = RegularExpressionHandler.class.getDeclaredMethod("standardizeExtendedMark", StringBuffer.class, int.class, ExtendedMark.class); + method.setAccessible(true); + + StringBuffer stringBuffer = new StringBuffer(); + stringBuffer.append("cc(ab)+aaa"); + + method.invoke(regularExpressionHandler, stringBuffer, 6, ExtendedMark.PLUS_MARK); + + /* + 检测: + "a?", 1, ExtendedMark.QUESTION_MARK: (ε|a) + "aba?a", 3, ExtendedMark.QUESTION_MARK: ab(ε|a)a + "cc(ab)?a", 6, ExtendedMark.QUESTION_MARK: cc(ε|(ab))a + "aba+a", 3, ExtendedMark.PLUS_MARK: abaa*a + "cc(ab)+aaa", 6, ExtendedMark.PLUS_MARK: cc(ab)(ab)*aaa + */ + } + + @Test + public void testStandardizeExtendedMark2() throws Exception { + RegularExpressionHandler regularExpressionHandler = new RegularExpressionHandler(); + + Method method = RegularExpressionHandler.class.getDeclaredMethod("standardizeExtendedMark", StringBuffer.class, int.class, ExtendedMark.class); + method.setAccessible(true); + + StringBuffer stringBuffer = new StringBuffer(); + stringBuffer.append("cc(ab){,3}aaa"); + + method.invoke(regularExpressionHandler, stringBuffer, 6, ExtendedMark.BRACE_MARK); + + /* + 检测: + "cc(ab){2,3}aaa", 6, ExtendedMark.PLUS_MARK: cc(ab)(ab)(ε|(ab))aaa + "cc{2, 3}aaa", 2, ExtendedMark.PLUS_MARK: ccc(ε|c)aaa + + "cc(ab){2}aaa", 6, ExtendedMark.PLUS_MARK: cc(ab)(ab)aaa + "cc{2}aaa", 2, ExtendedMark.PLUS_MARK: cccaaa + + "cc(ab){2,}aaa", 6, ExtendedMark.PLUS_MARK: cc(ab)(ab)(ab)*aaa + "cc{2,}aaa", 2, ExtendedMark.PLUS_MARK: cccc*aaa + + "cc(ab){,3}aaa", 6, ExtendedMark.PLUS_MARK: cc(ε|(ab))(ε|(ab))(ε|(ab))aaa + "cc{,3}aaa", 2, ExtendedMark.PLUS_MARK: c(ε|c)(ε|c)(ε|c)aaa + + */ + } + + @Test + public void testStandardizeSquareBracketMark() throws Exception { + RegularExpressionHandler regularExpressionHandler = new RegularExpressionHandler(); + + Method method = RegularExpressionHandler.class.getDeclaredMethod("standardizeSquareBracketMark", StringBuffer.class, int.class); + method.setAccessible(true); + + StringBuffer stringBuffer = new StringBuffer(); + stringBuffer.append("cc(ab)[0-9a-z]aaa"); + + method.invoke(regularExpressionHandler, stringBuffer, 6); + } + + /** + * Method: comparePriority(char curChar, char top) + */ + @Test + public void testComparePriority() throws Exception { + + try { + Method method = RegularExpressionHandler.class.getDeclaredMethod("comparePriority", char.class, char.class); + method.setAccessible(true); + method.invoke(new RegularExpressionHandler(), '*', '*'); + + /* + 测试: + '|', '*':true + '·', '*':true + '·', '(':false + '*', '*':true + */ + } catch (NoSuchMethodException e) { + } + } +} \ No newline at end of file diff --git a/src/test/java/finiteAutomata/entity/DFATest.java b/src/test/java/finiteAutomata/entity/DFATest.java new file mode 100644 index 0000000..a84ba44 --- /dev/null +++ b/src/test/java/finiteAutomata/entity/DFATest.java @@ -0,0 +1,110 @@ +package finiteAutomata.entity; + +import finiteAutomata.FA_Controller; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import java.util.LinkedList; +import java.util.List; + +/** + * DFA Tester. + * + * @author + * @version 1.0 + * @since
十一月 2, 2017
+ */ +public class DFATest { + + @Before + public void before() throws Exception { + } + + @After + public void after() throws Exception { + } + + /** + * Method: isValid(String s) + */ + @Test + public void testIsValid1() throws Exception { + List res = new LinkedList<>(); + res.add("aa+"); + + List patterns = new LinkedList<>(); + patterns.add("ID"); + + FA_Controller controller = new FA_Controller(); + DFA dfa = controller.lexicalAnalysis(res, patterns).get(0); + + Assert.assertEquals(true, dfa.isValid("aa")); + Assert.assertEquals(true, dfa.isValid("aaa")); + } + + /** + * Method: isValid(String s) + */ + @Test + public void testIsValid2() throws Exception { + List res = new LinkedList<>(); + res.add("aa+"); + res.add("b*"); + + + List patterns = new LinkedList<>(); + patterns.add("ID"); + patterns.add("A"); + + FA_Controller controller = new FA_Controller(); + List dfas = controller.lexicalAnalysis(res, patterns); + DFA dfa1 = dfas.get(0); + DFA dfa2 = dfas.get(1); + + Assert.assertEquals(true, dfa1.isValid("aa")); + Assert.assertEquals(true, dfa1.isValid("aaa")); + Assert.assertEquals(false, dfa1.isValid("b")); + Assert.assertEquals(true, dfa2.isValid("b")); + Assert.assertEquals(true, dfa2.isValid("bb")); + Assert.assertEquals(true, dfa2.isValid("bbb")); + } + + /** + * Method: isValid(String s) + */ + @Test + public void testIsValid3() throws Exception { + List res = new LinkedList<>(); + res.add("aa+"); + + + List patterns = new LinkedList<>(); + patterns.add("ID"); + + FA_Controller controller = new FA_Controller(); + DFA dfa = controller.lexicalAnalysis(res, patterns).get(0); + + Assert.assertEquals(false, dfa.isValid("a")); + } + + /** + * Method: isValid(String s) + */ + @Test + public void testIsValid4() throws Exception { + List res = new LinkedList<>(); + res.add("aa+"); + + + List patterns = new LinkedList<>(); + patterns.add("ID"); + + FA_Controller controller = new FA_Controller(); + DFA dfa = controller.lexicalAnalysis(res, patterns).get(0); + + Assert.assertEquals(false, dfa.isValid("b")); + } + +}