上传文件至 ''

This commit is contained in:
sheet 2022-11-05 02:59:11 +00:00
parent ca95b2192f
commit cd34ff07d2
50 changed files with 3174 additions and 2 deletions

3
.gitignore vendored Normal file
View File

@ -0,0 +1,3 @@
target/*
.idea/*

6
.idea/vcs.xml Normal file
View File

@ -0,0 +1,6 @@
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="VcsDirectoryMappings">
<mapping directory="$PROJECT_DIR$" vcs="Git" />
</component>
</project>

20
Lex.iml Normal file
View File

@ -0,0 +1,20 @@
<?xml version="1.0" encoding="UTF-8"?>
<module org.jetbrains.idea.maven.project.MavenProjectsManager.isMavenModule="true" type="JAVA_MODULE" version="4">
<component name="NewModuleRootManager" LANGUAGE_LEVEL="JDK_1_8">
<output url="file://$MODULE_DIR$/target/classes" />
<output-test url="file://$MODULE_DIR$/target/test-classes" />
<content url="file://$MODULE_DIR$">
<sourceFolder url="file://$MODULE_DIR$/src/main/java" isTestSource="false" />
<sourceFolder url="file://$MODULE_DIR$/src/main/resources" type="java-resource" />
<sourceFolder url="file://$MODULE_DIR$/src/test/java" isTestSource="true" />
<excludeFolder url="file://$MODULE_DIR$/target" />
</content>
<orderEntry type="inheritedJdk" />
<orderEntry type="sourceFolder" forTests="false" />
<orderEntry type="library" name="Maven: org.apache.commons:commons-lang3:3.6" level="project" />
<orderEntry type="library" name="Maven: junit:junit:4.12" level="project" />
<orderEntry type="library" name="Maven: org.hamcrest:hamcrest-core:1.3" level="project" />
<orderEntry type="library" name="Maven: log4j:log4j:1.2.17" level="project" />
<orderEntry type="library" name="Maven: org.apache.commons:commons-lang3:3.6" level="project" />
</component>
</module>

BIN
README.assets/dfa.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB

BIN
README.assets/dtran.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 99 KiB

BIN
README.assets/fa-edge.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

BIN
README.assets/fa-node.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

BIN
README.assets/fa-state.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

BIN
README.assets/fa.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 97 KiB

BIN
README.assets/nfa.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

BIN
README.assets/token.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 56 KiB

111
README.md
View File

@ -1,3 +1,110 @@
# Rela
# Lex 词法分析器
## Motivation / Aim
深入理解编译原理中词法分析的过程,通过实现 RE => NFA => DFA => minimized DFA 的算法过程,生成 Lex 中由 .l 文件生成的 DFA 转换表,再对输入的源程序/文本进行词法分析,最终输出相应的词法单元序列。自动化构建 Lex深入理解各步转换的核心算法。
## Content description
在整个程序中,我是先从控制台获取用户输入,然后读取资源目录下定义正则定义的 .l 文件,依据 .l 文件中正则定义的顺序为优先级,生成对应的最小 DFA 序列。然后对输入的词素序列依次匹配。
## Ideas / Methods
通过定义自己的 .l lex 文件,生成一个可以用于判断词素是否合法的 DFA基于 Lex 编程。
## Assumptions
1. 输入的需要分析的源程序文本中__不含有空格__引号中也不能出现空格
2. 输入文本中支持的正则操作符有且只有 ___· | * + ? () {} [] , -___,如果需要匹配如上所述任一符号,需要在正则定义中对其进行转义。
基本符号 __·__ (连接符) 可以输入,也可以不输入+表示一次或多次__?__ 表示0次或1次。
__{}__ 支持{n}、{m, n}、{m,}、{,n}四种形式分别表示出现n次、出现m-n次、出现至少m次出现至多n次。
__[]__ 支持[abc]简写或运算,也支持[a-zA-Z0-9]逻辑顺序连字符形式。
转义使用 __\\__ 进行转义。比如需要匹配左大括号 { ,即需要在 RE 中写入 \\{
3. 输入结束后需要在空白符后输入 __###__ 结束输入,从而让程序运行。
4. 输出的词法单元序列即会出现在控制台,也会出现在代码项目父目录下(与项目 Lex 同级)。
## Related FA description
通过自定义资源目录下 regular_expression.l 中的正则定义,即可获取到对应的 DFA故没有一定的确切的 FA 描述。
## Description of important Data Structures
Java 定义文件位于 /finiteAutomata/entity 和 /lex/entity 下)
1. FA有限自动机。NFA 和 DFA 的抽象父类,包含字母表、开始状态、结束状态、所有状态,和一个判断输入词素是否合法的抽象方法。
![](README.assets/fa.jpg)
2. NFA不确定的有限自动机。
![](README.assets/nfa.jpg)
3. DFA确定的有限自动机。
![](README.assets/dfa.jpg)
4. FA_StateFA 中的一个节点。包含编号和对后续状态的链接集合。
![](README.assets/fa-state.jpg)
5. FA_EdgeFA 中的一条边/链接。包含边上的标记、这条链接的后继状态。
![](README.assets/fa-edge.jpg)
6. FA_Node最小化 DFA 的时候,用于划分等价的 DFA 等价集合。
![](README.assets/fa-node.jpg)
7. D_Tran子集构造法中包含对处理的 NFA 进行转换关系的映射。包含等价状态的开始态、等价状态的结束态、等价状态的标记。
![](README.assets/dtran.jpg)
8. Token词法分析后得到的的词法单元。包含此法单元的名称和属性值。
![](README.assets/token.jpg)
## Description of core Algorithms
Java 定义文件位于 /finiteAutomata 下)
### 1. RE => standardized RE postfix
首先将扩展符号 +、?、{}、[] 用基本符号 · | * 代替,如 a+ 可以替换为 aa*a? 可以替换为 (ε|a),并为正则定义补上缺省的连接符;然后将只含有并、或、闭包,括号的中缀表达式转换为后缀表达式。在这一步骤中,如果处理的字符是一个转义字符,则将其当作一个普通字符处理,具体实现中则是使用一个布尔变量 curCharIsTransferred 在遍历输入时标记前一字符是否为转移符号 __\\__ (即当前字符是否为需要被转移的操作符号)。
### 2. RE => NFA
输入标准化的正则定义后缀表达式,对其遍历,一个字符一个字符处理即可。若不为操作符,即操作数或转移的操作符,即新增这样一个只有两个状态,链接边为此字符的 NFA若为操作符号则使用栈的形式暂时保存构建过程中产生的各 NFA如遇见连接符·就取栈顶两个 NFA 连接,遇见或符(|)就取栈顶两个 NFA 做或操作,遇见闭包符(*)就取栈顶 NFA 闭包。
### 3. NFA => DFA
对处理过的正则定义后缀表达式通过 *子集构造法* 构建等价的 DFA。
### 4. DFA => minimized DFA
因为 DFA 中等价的状态节点意义相同,可以合并,所以可以通过算法构建状态数目最少的 DFA。具体思想是先将其划分为终止态集合和非终止态集合再分别计算每一个节点状态在字母表上每一个标记的后继状态是否等价。而在实现中用布尔变量 isWeakEqual 标记整个算法是否产生了新的等价状态,保证算法退出时有实现 look back 回头看,即后续的状态集合分化可能会引起之前已经分化的状态集合再次分化。
## Use cases on running
在满足 Assumptions 的前提下,可以任意修改资源文件目录下 regular_expression.l Lex文件并在控制台输入与之对应的源程序文本与 Lex 文件匹配。
比如对实践中现有的 regular_expression.l 可输入:
```
, ; >= != =
2 5 14 47
2345676543
a
t E surprise
"Wordmakesman" curTemperature10
{aaaaaaabbbbbbbbcc} [0-9]aaa
###
```
另外,在 /test 目录下,也存在很多测试用例可供测试。
## Problems occurred and related solutions
1.  最开始我是将所有正则定义依次构建为 NFA再将各 NFA 连接为一整个大的 NFA再将此 NFA 转换为具有最少状态的 DFA。这样的话虽然连接为一整个大的 NFA 时只合并了起始态,没有合并终止态,但是因为先合并再形成 DFA所以在 NFA => DFA 的子集构造法中取 ε 闭包时,也变相合并了终止态。所以最终形成的 DFA 中一个终止态会对应多个正则定义的模式,那至于他具体对应的是哪一个模式 pattern 就需要进一步处理。最开始采用将这多个可能的正则定义再生成一次最小 DFA即使使用了 map 映射尽可能减少生成的次数,但是至少也会多算一次。所以在想通出现一个终止态对应多个正则定义/模式后,决定不再将所有正则定义生成的所有 NFA 连接为一个 NFA而是分别生成 DFA再依据输入的正则定义为优先级串形匹配各 DFA如果合法即可以返回。
2.  DFA 的最小化优化算法虽然很好理解,但是在实践过程中却一直不能合适地写出来,最后通过拆分、减小复杂度的办法,逐步实现了优化算法。
## Your feelings and comments
1. 有时候虽然理解了算法能够手动计算,但是实际用程序语言去实现仍然是一件有难度的事情。
2. 灵活使用各数据结构,能够在一定程度上避免后续的编程错误。
## Highlights
1. 完整、合适的注释,详略得当,便于再次理解与修改,对算法中也别需要注意的点着重注释。
2. 对输入进行词法分析的源程序/文本进行异常处理。如输入 (*a) 形式时会抛出异常 UnexpectedRegularExprRuleException对输入的形式无法匹配 .l 文件中的正则定义时抛出异常 NotMatchingException。
3. 测试驱动编程。先根据情况写好测试用例,保证够狠地去进行测试。并通过测试各公开接口和使用反射测试私有方法,尽可能全面地覆盖各种情况。
4. 实现了正则定义中的转义符号,从而匹配实践中的操作符。
5. Maven 项目开发,使用 __log4j__ 进行日志记录输出,而不是简单的 System.out.print()。
深入理解编译原理中词法分析的过程,通过实现 RE => NFA => DFA => minimized DFA 的算法过程,生成 Lex 中由 .l 文件生成的 DFA 转换表,再对输入的源程序/文本进行词法分析,最终输出相应的词法单元序列。自动化构建 Lex深入理解各步转换的核心算法。

33
pom.xml Normal file
View File

@ -0,0 +1,33 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>cn.edu.nju.charlesfeng</groupId>
<artifactId>charlesfeng</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.commons/commons-lang3 -->
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.6</version>
</dependency>
</dependencies>
</project>

View File

@ -0,0 +1,23 @@
package exceptions;
/**
* Created by cuihua on 2017/11/2.
* <p>
* 用户要分析的字符串和 .l 正则定义不匹配
*/
public class NotMatchingException extends Exception {
/**
* 不合法的词素
*/
private String lexeme;
public NotMatchingException(String lexeme) {
this.lexeme = lexeme;
}
@Override
public String getMessage() {
return "转换生成的 DFA 中对词素 " + lexeme + " 无匹配状态";
}
}

View File

@ -0,0 +1,24 @@
package exceptions;
/**
* Created by cuihua on 2017/10/25.
* <p>
* 处理RE的时候不期望输入的格式
* (*, (|, |), |*, ||, ·), ·*, ·|, (·, |·, ··
*/
public class UnexpectedRegularExprRuleException extends Exception {
/**
* 不合理的正则定义
*/
private String re;
public UnexpectedRegularExprRuleException(String re) {
this.re = re;
}
@Override
public String getMessage() {
return "输入中 " + re + " 不符合规格";
}
}

View File

@ -0,0 +1,474 @@
package finiteAutomata;
import finiteAutomata.entity.*;
import org.apache.log4j.Logger;
import utilties.*;
import java.util.*;
/**
* Created by cuihua on 2017/10/27.
* <p>
* NFA 进行处理
* NFA => DFA
* optimize DFA
*/
public class DFA_Handler {
private static Logger logger = Logger.getLogger(DFA_Handler.class);
private static FA_StateComparator comparator = new FA_StateComparator();
public DFA_Handler() {
}
/**
* @param nfa 需要转变的NFA
* @return 与输入NFA一致的DFA
*/
public DFA getFromNFA(final NFA nfa) {
List<DTran> dTrans = new LinkedList<>();
// dStates为<闭包, 已标记>LinkedHashMap保证为顺序而不是 hash 过的
Map<List<FA_State>, Boolean> dStates = new LinkedHashMap<>();
dStates.put(closure(nfa.getStart()), false);
// 清理当前节点计算 closure 时的递归现场
ClosureRecursionHandler.reset();
while (true) {
// dStates中是否还有未标记的状态并对未标记的状态进行处理
boolean hasStopped = true;
List<FA_State> unhandled = null;
for (Map.Entry<List<FA_State>, Boolean> entry : dStates.entrySet()) {
if (!entry.getValue()) {
hasStopped = false;
entry.setValue(true);
unhandled = entry.getKey();
break;
}
}
// 循环的终止条件
if (hasStopped) break;
// 处理此时的标记
for (char c : nfa.getAlphabet()) {
List<FA_State> curFollowing = move(unhandled, c);
int curFollowingSize = curFollowing.size();
if (curFollowingSize != 0) {
// 否则此等价状态在此字符上无后继状态标记为空
// 保存当前要计算闭包的核
List<FA_State> curFollowingClosure = new FA_StatesList();
curFollowingClosure.addAll(curFollowing);
// 遍历后继的核得到核的闭包
for (FA_State tempState : curFollowing) {
List<FA_State> tempClosure = closure(tempState);
// 清理当前节点计算 closure 时的递归现场
ClosureRecursionHandler.reset();
// curFollowingClosure 中加入所有 tempClosure 没有的元素
curFollowingClosure.removeAll(tempClosure);
curFollowingClosure.addAll(tempClosure);
}
// 排序后对比判断此集合是都在dStates中
curFollowingClosure.sort(comparator);
if (!isInDSates(dStates, curFollowingClosure)) {
dStates.put(curFollowingClosure, false);
}
// 标记dTrans转换表
dTrans.add(new DTran(unhandled, curFollowingClosure, c));
}
}
}
// 打印 DFA 的状态对应表
logger.info("NFA => DFA 子集构造法结束");
for (DTran dTran : dTrans) {
dTran.show();
}
return getEquivalentDFA(nfa, dStates, dTrans);
}
/**
* 计算当前节点的ε闭包 ε-closure
*/
private List<FA_State> closure(FA_State nowState) {
List<FA_State> result = new FA_StatesList();
result.add(nowState);
ClosureRecursionHandler.addState(nowState);
// 遍历当前节点的每一个后续节点
for (FA_Edge tempEdge : nowState.getFollows()) {
if (tempEdge.getLabel() == 'ε') {
// 若递归 closure 结果集中不包含此节点则将此节点加入结果集
FA_State nextState = tempEdge.getPointTo();
if (!ClosureRecursionHandler.contain(nextState)) {
List<FA_State> temp = closure(nextState);
result.addAll(temp);
}
}
}
result.sort(comparator);
return result;
}
/**
* 将此状态以label后移
*/
private List<FA_State> move(List<FA_State> cur, char label) {
List<FA_State> result = new FA_StatesList();
for (FA_State tempState : cur) {
for (FA_Edge tempEdge : tempState.getFollows()) {
if (tempEdge.getLabel() == label) {
result.add(tempEdge.getPointTo());
}
}
}
result.sort(comparator);
return result;
}
/**
* 判断states是否已经在DSates中了
*/
private boolean isInDSates(Map<List<FA_State>, Boolean> DStates, List<FA_State> states) {
for (Map.Entry<List<FA_State>, Boolean> entry : DStates.entrySet()) {
List<FA_State> keyStates = entry.getKey();
if (keyStates.size() == states.size()) {
boolean allEqual = true;
for (int i = 0; i < states.size(); i++) {
if (states.get(i).getStateID() != keyStates.get(i).getStateID()) allEqual = false;
}
// 找到已经存在的状态
if (allEqual) return true;
}
}
return false;
}
/**
* 判断toTest是否与pre有交集
* 有交集现等价状态即为现DFA的终止态
*/
private boolean isTerminatedState(final List<FA_State> pre, final List<FA_State> toTest) {
// 取交集无并集
// 深度拷贝复制 toTest保证 retainAll 之后 toTest 不会被修改
List<FA_State> newList = new FA_StatesList();
newList.addAll(toTest);
newList.retainAll(pre);
return newList.size() != 0;
}
/**
* @param nfa NFA
* @return 通过子集构造法构建的等价的简单 DFA
*/
private DFA getEquivalentDFA(NFA nfa, Map<List<FA_State>, Boolean> dStates, List<DTran> dTrans) {
// 子集构造法结束根据dStatesdTrans构造相对应的DFAdStates从后往前即为现等价状态的产生顺序
// pre代表原NFAcur代表对应的DFA
List<FA_State> preTerminatedStates = nfa.getTerminatedStates();
List<FA_State> curStates = new FA_StatesList();
List<FA_State> curTerminatedStates = new FA_StatesList();
// 标记子集构造法中形成的等价节点和现在简化的节点之间的映射
Map<List<FA_State>, FA_State> faStatesConvertTable = new LinkedHashMap<>();
// dStates 顺序压入重新更换为简单 FA_State 也是顺序
int curIndex = 0;
for (List<FA_State> nowConvertedNFAStates : dStates.keySet()) {
FA_State equivalentState = new FA_State(curIndex);
curStates.add(equivalentState);
curIndex++;
faStatesConvertTable.put(nowConvertedNFAStates, equivalentState);
// 含有原 NFA 终止态的即为现终止态
if (isTerminatedState(preTerminatedStates, nowConvertedNFAStates)) {
curTerminatedStates.add(equivalentState);
}
}
// dTrans 上的连接加入现在 DFA并存入 DFA 成员变量 move
Map<FA_State, Map<Character, FA_State>> move = new LinkedHashMap<>();
for (DTran dTran : dTrans) {
FA_State curStart = faStatesConvertTable.get(dTran.getFrom());
FA_State curTo = faStatesConvertTable.get(dTran.getTo());
char label = dTran.getLabel();
FA_Edge curEdge = new FA_Edge(label, curTo);
curStart.getFollows().add(curEdge);
Map<Character, FA_State> curMove = move.get(curStart);
if (curMove != null) {
curMove.put(label, curTo);
} else {
curMove = new HashMap<>();
curMove.put(label, curTo);
move.put(curStart, curMove);
}
}
// 打印真正 DFA 的状态对应表
logger.info("NFA 经过子集构造法完成后真正的状态转换表");
showDFATrans(move);
curStates.sort(comparator);
curTerminatedStates.sort(comparator);
DFA dfa = new DFA();
dfa.setStart(curStates.get(0));
dfa.setAlphabet(nfa.getAlphabet());
dfa.setStates(curStates);
dfa.setTerminatedStates(curTerminatedStates);
dfa.setMove(move);
// 将原 NFA 对应的模式 pattern 加入现在的 DFA 映射
DFA_StatePatternMappingController.add(dfa, NFA_StatePatternMappingController.getMap().get(nfa));
return dfa;
}
/**
* @param dfa 需要被优化的DFA
* @return 具有最少状态的DFA
*/
public DFA optimize(DFA dfa) {
List<FA_State> nonTerminatedStates = new FA_StatesList();
List<FA_State> terminatedStates = dfa.getTerminatedStates();
List<Character> alphabet = dfa.getAlphabet();
// 构造初始两个集合 终结状态非终结状态
nonTerminatedStates.addAll(dfa.getStates());
nonTerminatedStates.removeAll(terminatedStates);
nonTerminatedStates.sort(comparator);
terminatedStates.sort(comparator);
// 第一次分的这两个集合手动排序让程序先处理非终结状态
List<FA_Node> nodes = new FA_NodesList();
if (nonTerminatedStates.size() > 0) {
// 只有终结态
FA_Node node1 = new FA_Node(nonTerminatedStates);
nodes.add(node1);
}
FA_Node node2 = new FA_Node(terminatedStates);
nodes.add(node2);
// while 循环保证算法的 traceBacking 回头看
while (true) {
// 所有叶节点内部的 FA_State 都是等价的
boolean isWeakEqual = true;
for (int i = 0; i < nodes.size(); i++) {
// 节点中只有一个状态已是最少无需再进行分化此节点
if (nodes.get(i).getStates().size() == 1) {
continue;
}
// 子集分化
for (int j = 0; j < alphabet.size(); ) {
char c = alphabet.get(j);
List<FA_Node> tempResult = optimizeOneNodeOneChar(dfa, nodes, nodes.get(i), c);
nodes.remove(i);
nodes.addAll(i, tempResult);
if (tempResult.size() > 1) {
// 发生了子集替换重新遍历每个 label
isWeakEqual = false;
j = 0;
} else {
j++;
}
}
}
// 全都弱等价结束算法
if (isWeakEqual) break;
}
// 重构 DFA
return reconstruction(dfa, nodes);
}
/**
* 在特定字母下子集分化一个 FA_Node 节点
*
* @param dfa 当前 DFA
* @param curDivision 目前的分化
* @param node 要优化的叶节点
* @param c 分化基于的条件
*/
private List<FA_Node> optimizeOneNodeOneChar(final DFA dfa, List<FA_Node> curDivision, FA_Node node, char c) {
List<FA_Node> result = new FA_NodesList();
if (node.getStates().size() == 1) {
// 节点中只有一个状态已是最少无需再进行分化此节点
result.add(node);
return result;
}
// 在此 label 下分别无后继分化有后继分化
List<FA_State> parentToNull = new FA_StatesList();
Map<FA_State, FA_State> parentToSon = new HashMap<>();
for (FA_State parentState : node.getStates()) {
// 该节点在该映射条件下的后继
Map<Character, FA_State> curEdges = dfa.getMove().get(parentState);
if (curEdges != null) {
FA_State sonState = curEdges.get(c);
if (sonState != null) {
// 有后继边且后继边中有 label c 的边
parentToSon.put(parentState, sonState);
} else {
parentToNull.add(parentState);
}
} else {
parentToNull.add(parentState);
}
}
if (parentToNull.size() != 0) {
parentToNull.sort(comparator);
result.add(new FA_Node(parentToNull));
}
if (parentToSon.size() != 0) {
// 判断 following 是不是在同一叶节点中FA_Node 为此次判断中原来的NodeList<FA_State> 为此 Node 下的父节点
Map<FA_Node, List<FA_State>> judge = new HashMap<>();
for (Map.Entry<FA_State, FA_State> entry : parentToSon.entrySet()) {
FA_State sonState = entry.getValue();
FA_Node belongingNode = getBelongingNode(curDivision, sonState);
if (judge.get(belongingNode) == null) {
List<FA_State> temp = new FA_StatesList();
temp.add(entry.getKey());
judge.put(belongingNode, temp);
} else {
judge.get(belongingNode).add(entry.getKey());
}
}
if (judge.size() > 1) {
// 形成了不同的分化
for (List<FA_State> states : judge.values()) {
states.sort(comparator);
result.add(new FA_Node(states));
}
} else {
// parentToSon 不形成新分化
List<FA_State> states = new FA_StatesList(parentToSon.keySet());
states.sort(comparator);
result.add(new FA_Node(states));
}
}
return result;
}
/**
* 找到当前状态所在的节点
*/
private FA_Node getBelongingNode(List<FA_Node> curDivision, FA_State state) {
for (FA_Node node : curDivision) {
if (node.getStates().contains(state)) return node;
}
return null;
}
/**
* 根据子集分化的算法结果 DFA 重的等价状态进行合并
*/
private DFA reconstruction(DFA dfa, List<FA_Node> nodes) {
// <被删除的状态节点, 用于替换的状态节点>
Map<FA_State, FA_State> deleteTran = new HashMap<>();
for (FA_Node node : nodes) {
// 只需要第一个状态作为代表从后面向前记录要删除的节点
List<FA_State> division = node.getStates();
for (int i = 1; i < division.size(); i++) {
deleteTran.put(division.get(i), division.get(0));
}
}
// 移除这些状态
List<FA_State> needDeleteStates = new FA_StatesList(deleteTran.keySet());
needDeleteStates.sort(comparator);
// 转移链接关系
// 需移除节点 指向 其他节点
for (FA_State state : needDeleteStates) {
dfa.getMove().remove(state);
}
// 其他节点 指向 需移除节点
Map<FA_State, Map<Character, FA_State>> newMove = new LinkedHashMap<>();
for (Map.Entry<FA_State, Map<Character, FA_State>> curMove : dfa.getMove().entrySet()) {
FA_State curStart = curMove.getKey();
if (newMove.get(curStart) == null) {
Map<Character, FA_State> edges = new LinkedHashMap<>();
newMove.put(curStart, edges);
}
// 转换表
for (Map.Entry<Character, FA_State> curEdge : curMove.getValue().entrySet()) {
char label = curEdge.getKey();
FA_State deleteState = curEdge.getValue();
if (needDeleteStates.contains(curEdge.getValue())) {
newMove.get(curStart).put(label, deleteTran.get(deleteState));
} else {
newMove.get(curStart).put(label, deleteState);
}
}
// 状态链接
for (FA_Edge curEdge : curStart.getFollows()) {
if (needDeleteStates.contains(curEdge.getPointTo())) {
curEdge.setPointTo(deleteTran.get(curEdge.getPointTo()));
}
}
}
dfa.getStates().removeAll(needDeleteStates);
dfa.getTerminatedStates().removeAll(needDeleteStates);
dfa.setMove(newMove);
// 如果删除了初始节点
if (needDeleteStates.contains(dfa.getStart())) {
dfa.setStart(deleteTran.get(dfa.getStart()));
}
logger.info("DFA 已经优化为最少数目");
showDFATrans(dfa.getMove());
return dfa;
}
/**
* 输出 NFA 的转换信息到控制台
*/
private void showDFATrans(Map<FA_State, Map<Character, FA_State>> move) {
for (Map.Entry<FA_State, Map<Character, FA_State>> entryState : move.entrySet()) {
FA_State start = entryState.getKey();
for (Map.Entry<Character, FA_State> entryEdge : entryState.getValue().entrySet()) {
logger.info(start.getStateID() + " through " + entryEdge.getKey() + " to " + entryEdge.getValue().getStateID());
}
}
}
}

View File

@ -0,0 +1,50 @@
package finiteAutomata;
import exceptions.UnexpectedRegularExprRuleException;
import finiteAutomata.entity.DFA;
import finiteAutomata.entity.NFA;
import org.apache.log4j.Logger;
import java.util.LinkedList;
import java.util.List;
/**
* Created by cuihua on 2017/10/27.
* <p>
* 控制将输入的所有 RE 转换为拥有最少数目状态的 DFA
*/
public class FA_Controller {
private static final Logger logger = Logger.getLogger(FA_Controller.class);
public List<DFA> lexicalAnalysis(List<String> res, List<String> patternType) {
RegularExpressionHandler rgHandler = new RegularExpressionHandler();
NFA_Handler nfaHandler = new NFA_Handler();
DFA_Handler dfaHandler = new DFA_Handler();
// 对每个正则定义依次生成最小 DFA
List<DFA> result = new LinkedList<>();
for (int i = 0; i < res.size(); i++) {
// 处理当前 RE
String re = res.get(i);
try {
re = rgHandler.convertInfixToPostfix(rgHandler.standardizeRE(re));
logger.debug("正在处理正则定义 " + re);
} catch (UnexpectedRegularExprRuleException e) {
e.printStackTrace();
}
// RE => NFA
NFA nfa = nfaHandler.getFromRE(re, patternType.get(i));
logger.debug("将正则定义 " + re + " 成功转化为 NFA");
// 转化为最小DFA
DFA dfa = dfaHandler.optimize(dfaHandler.getFromNFA(nfa));
logger.debug("正则定义 " + re + " 的状态数量: " + dfa.getStates().size());
result.add(dfa);
}
return result;
}
}

View File

@ -0,0 +1,305 @@
package finiteAutomata;
import finiteAutomata.entity.FA_Edge;
import finiteAutomata.entity.FA_State;
import finiteAutomata.entity.NFA;
import utilties.FA_StateComparator;
import utilties.FA_StateIDController;
import utilties.FA_StatesList;
import utilties.NFA_StatePatternMappingController;
import java.util.LinkedList;
import java.util.List;
import java.util.Stack;
/**
* Created by cuihua on 2017/10/27.
* <p>
* NFA 进行处理
* RE => NFA
* combine 多个 NFA
*/
public class NFA_Handler {
private static FA_StateComparator comparator = new FA_StateComparator();
/**
* @param re 标准化的正则定义后缀表达式
* @param patternType 此正则定义对应的模式 patternType
* @return 此正则定义对应的NFA
*/
public NFA getFromRE(String re, String patternType) {
// 栈中暂时保存处理过的NFA
Stack<NFA> handling = new Stack<>();
for (int i = 0; i < re.length(); i++) {
char c = re.charAt(i);
if (c == '\\') {
// 转义字符不作为连接符直接连接
handling = add(handling, re.charAt(i+1));
i++;
} else {
switch (c) {
case '·':
handling = join(handling);
break;
case '|':
handling = or(handling);
break;
case '*':
handling = closure(handling);
break;
default:
handling = add(handling, c);
break;
}
}
}
// 映射 NFA 与其对应的模式 pattern
NFA result = handling.get(0);
NFA_StatePatternMappingController.add(result, patternType);
// 最终栈中剩下的唯一NFA即为所求
return result;
}
/**
* 将字符c转换为一个NFA
*/
private Stack<NFA> add(Stack<NFA> handling, char c) {
int nowID = FA_StateIDController.getID();
FA_State start = new FA_State(nowID);
FA_State end = new FA_State(++nowID);
FA_StateIDController.setID(++nowID);
FA_Edge edge = new FA_Edge(c, end);
List<FA_Edge> follows = new LinkedList<>();
follows.add(edge);
start.setFollows(follows);
// 构造NFA边的标记为 ε 不计入字母表
List<Character> alphabet = new LinkedList<>();
if (c != 'ε') alphabet.add(c);
List<FA_State> terminatedStates = new FA_StatesList();
terminatedStates.add(end);
List<FA_State> states = new FA_StatesList();
states.add(start);
states.add(end);
NFA newNFA = new NFA();
newNFA.setStart(start);
newNFA.setAlphabet(alphabet);
newNFA.setTerminatedStates(terminatedStates);
newNFA.setStates(states);
// 得到结果压栈
handling.push(newNFA);
return handling;
}
/**
* 根据连接符取栈顶两个NFA连接
*/
private Stack<NFA> join(Stack<NFA> handling) {
NFA after = handling.pop();
NFA before = handling.pop();
// 将after加到before后面
FA_State joinStart = before.getTerminatedStates().get(0);
FA_State joinEnd = after.getStart();
FA_Edge joinEdge = new FA_Edge('ε', joinEnd);
List<FA_Edge> follows = new LinkedList<>();
follows.add(joinEdge);
joinStart.setFollows(follows);
// 连接之后字母表取无重复并集所有状态相加终止态变为after起始态不变
List<Character> beforeAlphabet = before.getAlphabet();
List<Character> afterAlphabet = after.getAlphabet();
beforeAlphabet.removeAll(afterAlphabet);
beforeAlphabet.addAll(afterAlphabet);
List<FA_State> beforeStates = before.getStates();
beforeStates.addAll(after.getStates());
before.setTerminatedStates(after.getTerminatedStates());
handling.push(before);
return handling;
}
/**
* 根据或符取栈顶两个NFA做或操作
*/
private Stack<NFA> or(Stack<NFA> handling) {
NFA nfa1 = handling.pop();
NFA nfa2 = handling.pop();
// 新增两个连接态
int nowID = FA_StateIDController.getID();
FA_State newStart = new FA_State(nowID);
FA_State newEnd = new FA_State(++nowID);
FA_StateIDController.setID(++nowID);
// nfa1 nfa2 并联
FA_State preStart1 = nfa1.getStart();
FA_State preStart2 = nfa2.getStart();
FA_State preEnd1 = nfa1.getTerminatedStates().get(0);
FA_State preEnd2 = nfa2.getTerminatedStates().get(0);
FA_Edge orEdge1 = new FA_Edge('ε', preStart1);
FA_Edge orEdge2 = new FA_Edge('ε', preStart2);
FA_Edge orEdge3 = new FA_Edge('ε', newEnd);
FA_Edge orEdge4 = new FA_Edge('ε', newEnd);
// 完善开始态
List<FA_Edge> startFollows = new LinkedList<>();
startFollows.add(orEdge1);
startFollows.add(orEdge2);
newStart.setFollows(startFollows);
// 修改原终止态
List<FA_Edge> preEndFollows1 = new LinkedList<>();
preEndFollows1.add(orEdge3);
preEnd1.setFollows(preEndFollows1);
List<FA_Edge> preEndFollows2 = new LinkedList<>();
preEndFollows2.add(orEdge4);
preEnd2.setFollows(preEndFollows2);
// 重新构造NFA字母集为无重复并集所有状态相加开始态和终止态为新态
List<Character> alphabet = new LinkedList<>();
alphabet.addAll(nfa1.getAlphabet());
alphabet.removeAll(nfa2.getAlphabet());
alphabet.addAll(nfa2.getAlphabet());
List<FA_State> terminatedStates = new FA_StatesList();
terminatedStates.add(newEnd);
List<FA_State> states = new FA_StatesList();
states.addAll(nfa1.getStates());
states.addAll(nfa2.getStates());
states.add(newStart);
states.add(newEnd);
NFA newNFA = new NFA();
newNFA.setStart(newStart);
newNFA.setAlphabet(alphabet);
newNFA.setTerminatedStates(terminatedStates);
newNFA.setStates(states);
handling.push(newNFA);
return handling;
}
/**
* 取栈顶NFA做闭包操作
*/
private Stack<NFA> closure(Stack<NFA> handling) {
NFA nfa = handling.pop();
// 新增两个连接态
int nowID = FA_StateIDController.getID();
FA_State newStart = new FA_State(nowID);
FA_State newEnd = new FA_State(++nowID);
FA_StateIDController.setID(++nowID);
// 新增连接边
FA_State preStart = nfa.getStart();
FA_State preEnd = nfa.getTerminatedStates().get(0);
FA_Edge newEdge1 = new FA_Edge('ε', preStart);
FA_Edge newEdge2 = new FA_Edge('ε', newEnd);
FA_Edge newEdge3 = new FA_Edge('ε', newEnd);
FA_Edge newEdge4 = new FA_Edge('ε', preStart);
List<FA_Edge> startFollows = new LinkedList<>();
startFollows.add(newEdge1);
startFollows.add(newEdge3);
newStart.setFollows(startFollows);
// 修改原终止态
List<FA_Edge> preEndFollows = new LinkedList<>();
preEndFollows.add(newEdge2);
preEndFollows.add(newEdge4);
preEnd.setFollows(preEndFollows);
// 对闭包NFA字母表不变修改开始态终止态所有状态
nfa.setStart(newStart);
List<FA_State> terminatedStates = new FA_StatesList();
terminatedStates.add(newEnd);
nfa.setTerminatedStates(terminatedStates);
List<FA_State> states = nfa.getStates();
states.add(newStart);
states.add(newEnd);
handling.push(nfa);
return handling;
}
/**
* @param nfaStack 需要被连接的所有NFA
* @return 连接为一个NFA
*/
public NFA combine(Stack<NFA> nfaStack) {
if (nfaStack.size() > 1) {
while (nfaStack.size() > 1) {
NFA nfa1 = nfaStack.pop();
NFA nfa2 = nfaStack.pop();
NFA newNFA = combineTwoNFA(nfa1, nfa2);
nfaStack.push(newNFA);
}
}
return nfaStack.pop();
}
private NFA combineTwoNFA(NFA nfa1, NFA nfa2) {
// 增加一个新的起始节点作为初始态
int nowID = FA_StateIDController.getID();
FA_State newStart = new FA_State(nowID);
FA_StateIDController.setID(++nowID);
// 连接原来的两个NFA
FA_Edge newEdge1 = new FA_Edge('ε', nfa1.getStart());
FA_Edge newEdge2 = new FA_Edge('ε', nfa2.getStart());
List<FA_Edge> startFollows = new LinkedList<>();
startFollows.add(newEdge1);
startFollows.add(newEdge2);
newStart.setFollows(startFollows);
// 重新构造NFA字母集为无重复并集所有状态和终止态相加开始态为新态
List<Character> alphabet = new LinkedList<>();
alphabet.addAll(nfa1.getAlphabet());
alphabet.removeAll(nfa2.getAlphabet());
alphabet.addAll(nfa2.getAlphabet());
List<FA_State> terminatedStates = new FA_StatesList();
terminatedStates.addAll(nfa1.getTerminatedStates());
terminatedStates.addAll(nfa2.getTerminatedStates());
terminatedStates.sort(comparator);
List<FA_State> states = new FA_StatesList();
states.addAll(nfa1.getStates());
states.addAll(nfa2.getStates());
states.add(newStart);
states.sort(comparator);
NFA newNFA = new NFA();
newNFA.setStart(newStart);
newNFA.setAlphabet(alphabet);
newNFA.setTerminatedStates(terminatedStates);
newNFA.setStates(states);
return newNFA;
}
}

View File

@ -0,0 +1,524 @@
package finiteAutomata;
import exceptions.UnexpectedRegularExprRuleException;
import org.apache.commons.lang3.ArrayUtils;
import org.apache.log4j.Logger;
import utilties.ExtendedMark;
import utilties.SquareBracketMarkInnerType;
import java.util.LinkedList;
import java.util.List;
import java.util.Stack;
/**
* Created by cuihua on 2017/10/25.
* <p>
* 输入的正则表达式
*/
public class RegularExpressionHandler {
private static final Logger logger = Logger.getLogger(RegularExpressionHandler.class);
/**
* 不可能存在的正则定义
*/
private static List<String> unexpectedRERules;
/**
* 标准化的正则表达式中优先级序列
* 优先级越高越靠后
*/
private static List<Character> priority;
/**
* 匹配如 [a-z]
*/
private static char[] lowCaseCharSequence = {
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'
};
/**
* 匹配如 [A-Z]
*/
private static char[] upCaseCharSequence = {
'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'
};
/**
* 匹配如 [0-9]
*/
private static char[] intSequence = {
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
};
public RegularExpressionHandler() {
unexpectedRERules = new LinkedList<>();
unexpectedRERules.add("(*");
unexpectedRERules.add("(|");
unexpectedRERules.add("|)");
unexpectedRERules.add("|*");
unexpectedRERules.add("||");
unexpectedRERules.add("·)");
unexpectedRERules.add("·*");
unexpectedRERules.add("·|");
unexpectedRERules.add("");
unexpectedRERules.add("");
unexpectedRERules.add("··");
priority = new LinkedList<>();
priority.add(0, '(');
priority.add(1, '·');
priority.add(2, '|');
priority.add(3, '*');
}
/**
* 默认re不含有连接符
* 将扩展符 +?{}[] 用基本符号代替
* 添加省略的连接符'·'对所有操作符画出所有的可能情况
*
* @param re 输入的正则表达式
* @return 标准的没有扩展语法如[], +, ?
*/
public String standardizeRE(String re) throws UnexpectedRegularExprRuleException {
// 替换所有空格便于控制
re = re.replace(" ", "");
// 替换扩展符号result存储替换后的字符串differ表示替换前后的对当前处理字符的Index差
StringBuffer result = new StringBuffer().append(re);
int differ = 0;
for (int i = 0; i < re.length(); i++) {
char c = re.charAt(i);
if (c == '\\') {
// 转义字符跳过处理
i++;
} else {
int preLength = result.length();
if (c == '?') {
result = standardizeExtendedMark(result, i + differ, ExtendedMark.QUESTION_MARK);
}
if (c == '+') {
result = standardizeExtendedMark(result, i + differ, ExtendedMark.PLUS_MARK);
}
if (c == '{') {
result = standardizeExtendedMark(result, i + differ, ExtendedMark.BRACE_MARK);
}
if (c == '[') {
result = standardizeSquareBracketMark(result, i + differ);
}
differ += result.length() - preLength;
}
}
String tempResult = result.toString();
checkREValidation(tempResult);
// 补充连接符joinCount表示连接前后的对当前处理字符的Index差curCharIsTransferred表示当前字符是否是转义字符
int joinCount = 0;
boolean curCharIsTransferred = false;
for (int i = 0; i < tempResult.length() - 1; i++) {
char before = tempResult.charAt(i);
char after = tempResult.charAt(i + 1);
if (before == '\\') {
// 转义字符之间不添加连接符跳过检查下一个操作符
curCharIsTransferred = true;
continue;
}
// 合法情况下含有连接符号的都不需要处理
if (before == '·' || after == '·') {
curCharIsTransferred = false;
continue;
}
if (after == '(' || isValidChar(false, after)) {
if (before == ')' || before == '*' || isValidChar(curCharIsTransferred, before)) {
result = standardizeJoinMark(result, i + joinCount);
joinCount++;
}
}
curCharIsTransferred = false;
}
return result.toString();
}
/**
* 处理扩展符号+ / ? / {}
*/
private StringBuffer standardizeExtendedMark(final StringBuffer re, int markIndex, ExtendedMark mark) throws UnexpectedRegularExprRuleException {
StringBuffer result = new StringBuffer();
// ? 前面是括号需要找到核
String content;
int contentStartIndex;
if (re.charAt(markIndex - 1) == ')') {
// 核为非单字符
contentStartIndex = getContentStartIndexOfExtendedMark(re, markIndex);
content = re.substring(contentStartIndex, markIndex);
} else {
// 核直接是前面的单个字符
if (markIndex >= 2 && re.charAt(markIndex - 2) == '\\') {
// 核为转义字符
contentStartIndex = markIndex - 2;
content = re.substring(contentStartIndex, markIndex);
} else {
// 核为普通单个字符
contentStartIndex = markIndex - 1;
content = String.valueOf(re.charAt(markIndex - 1));
}
}
result.append(re.substring(0, contentStartIndex));
if (mark == ExtendedMark.QUESTION_MARK) result.append("(ε|").append(content).append(')');
else if (mark == ExtendedMark.PLUS_MARK) result.append(content).append(content).append('*');
else if (mark == ExtendedMark.BRACE_MARK) {
// 大括号里面的内容
String sub = re.substring(markIndex + 1);
int braceEndIndex = sub.indexOf("}");
int commaIndex = sub.indexOf(",");
if (braceEndIndex == -1) throw new UnexpectedRegularExprRuleException(re.toString());
if (commaIndex == -1) {
// {n} 类型没有逗号只有数字重复数字遍即可
int times = Integer.parseInt(sub.substring(0, braceEndIndex));
for (int i = 0; i < times; i++) {
result.append(content);
}
} else {
if (commaIndex == 0) {
// {, n} 类型重复0-n遍
int times = Integer.parseInt(sub.substring(1, braceEndIndex));
for (int i = 0; i < times; i++) {
result.append("(ε|").append(content).append(')');
}
} else if (commaIndex == braceEndIndex - 1) {
// {n, } 类型重复最少n遍
int times = Integer.parseInt(sub.substring(0, commaIndex));
for (int i = 0; i < times; i++) {
result.append(content);
}
result.append(content).append("*");
} else {
// {m, n} 类型最少m遍最多n遍
int mTimes = Integer.parseInt(sub.substring(0, commaIndex));
int nTimes = Integer.parseInt(sub.substring(commaIndex + 1, braceEndIndex));
for (int i = 0; i < mTimes; i++) {
result.append(content);
}
for (int i = mTimes; i < nTimes; i++) {
result.append("(ε|").append(content).append(')');
}
}
}
// 如果 {} 不是最后一个字符加上后续字符
if (braceEndIndex != re.length() - 1) result.append(sub.substring(braceEndIndex + 1));
}
if (mark == ExtendedMark.QUESTION_MARK | mark == ExtendedMark.PLUS_MARK) {
// 如果 +/? 不是最后一个字符加上后续字符
if (markIndex != re.length() - 1) result.append(re.substring(markIndex + 1));
}
logger.debug(result);
return result;
}
/**
* 找到扩展符号+ / ? / {}作用的核的左括号
*/
private int getContentStartIndexOfExtendedMark(final StringBuffer re, int markIndex) {
int pairCount = 0;
int contentStartIndex;
for (contentStartIndex = markIndex - 1; contentStartIndex >= 0; contentStartIndex--) {
char c = re.charAt(contentStartIndex);
if (c == ')') pairCount++;
else if (c == '(') {
if (pairCount == 1) break;
else pairCount--;
}
}
return contentStartIndex;
}
/**
* 增加省略的连接符·
*
* @param joinIndex 需要在两个字符中间添加连接符号第一个字符的index
*/
private StringBuffer standardizeJoinMark(final StringBuffer re, int joinIndex) {
StringBuffer sb = new StringBuffer();
sb.append(re.substring(0, joinIndex + 1)).append('·').append(re.substring(joinIndex + 1));
return sb;
}
/**
* 将方括号里面的内容替换为普通的表达式
*/
private StringBuffer standardizeSquareBracketMark(final StringBuffer re, int markIndex)
throws UnexpectedRegularExprRuleException {
StringBuffer result = new StringBuffer();
result.append(re.substring(0, markIndex));
// 方括号里面的内容
String sub = re.substring(markIndex + 1);
int bracketEndIndex = sub.indexOf("]");
if (bracketEndIndex == -1) throw new UnexpectedRegularExprRuleException(re.toString());
String bracketContent = sub.substring(0, bracketEndIndex);
List<StringBuffer> bracketCompleted = new LinkedList<>();
for (int i = 0; i < bracketContent.length(); ) {
if (i < bracketContent.length() - 1 && bracketContent.charAt(i + 1) == '-') {
// 是连字符形式按范围或起来
char start = bracketContent.charAt(i);
char end = bracketContent.charAt(i + 2);
int startIndex, endIndex;
if (ArrayUtils.contains(lowCaseCharSequence, start)) {
// 小写字母
startIndex = ArrayUtils.indexOf(lowCaseCharSequence, start);
endIndex = ArrayUtils.indexOf(lowCaseCharSequence, end);
bracketCompleted.add(standardizeSquareBracketMarkSeparatorToCompleted(startIndex, endIndex,
SquareBracketMarkInnerType.LOW_CHAR));
} else if (ArrayUtils.contains(upCaseCharSequence, start)) {
// 大写字母
startIndex = ArrayUtils.indexOf(upCaseCharSequence, start);
endIndex = ArrayUtils.indexOf(upCaseCharSequence, end);
bracketCompleted.add(standardizeSquareBracketMarkSeparatorToCompleted(startIndex, endIndex,
SquareBracketMarkInnerType.UP_CHAR));
} else if (ArrayUtils.contains(intSequence, start)) {
// 数字
startIndex = ArrayUtils.indexOf(intSequence, start);
endIndex = ArrayUtils.indexOf(intSequence, end);
bracketCompleted.add(standardizeSquareBracketMarkSeparatorToCompleted(startIndex, endIndex,
SquareBracketMarkInnerType.INT));
}
i += 3;
} else {
// 没有被跳过最后一个字单个字符不是连字符形式单个字符或起来
StringBuffer sb = new StringBuffer();
sb.append(bracketContent.charAt(i));
bracketCompleted.add(sb);
i++;
}
}
// bracketCompleted 中的结果集或起来
if (bracketCompleted.size() > 1) {
result.append("(").append(bracketCompleted.get(0));
for (int i = 1; i < bracketCompleted.size(); i++) {
result.append("|").append(bracketCompleted.get(i));
}
result.append(")");
} else {
result.append(bracketCompleted.get(0));
}
if (bracketEndIndex != sub.length() - 1) result.append(sub.substring(bracketEndIndex + 1));
logger.debug(result);
return result;
}
/**
* [m-n] 类型的字符串进行补全
*
* @param startIndex 补全的第一个字母
* @param endIndex 补全的最后一个字母
*/
private StringBuffer standardizeSquareBracketMarkSeparatorToCompleted(int startIndex, int endIndex,
SquareBracketMarkInnerType innerType) {
StringBuffer sb = new StringBuffer();
sb.append("(");
switch (innerType) {
case LOW_CHAR:
for (int i = startIndex; i < endIndex; i++) {
sb.append(lowCaseCharSequence[i]).append("|");
}
sb.append(lowCaseCharSequence[endIndex]);
break;
case UP_CHAR:
for (int i = startIndex; i < endIndex; i++) {
sb.append(upCaseCharSequence[i]).append("|");
}
sb.append(upCaseCharSequence[endIndex]);
break;
case INT:
for (int i = startIndex; i < endIndex; i++) {
sb.append(intSequence[i]).append("|");
}
sb.append(intSequence[endIndex]);
break;
}
sb.append(")");
return sb;
}
/**
* 检查标准化正则定义的正确性
*/
private void checkREValidation(final String re) throws UnexpectedRegularExprRuleException {
for (int i = 0; i < re.length() - 1; i++) {
char before = re.charAt(i);
char after = re.charAt(i + 1);
// 输入RE不合法
String temp = before + "" + after;
if (unexpectedRERules.contains(temp)) {
throw new UnexpectedRegularExprRuleException(temp);
}
}
// 转义字符不合法
// {m, n} 形式已在标准化时处理
String toCheckComma = re;
int commaIndex;
while ((commaIndex = toCheckComma.indexOf(",")) != -1) {
if (commaIndex == 0) throw new UnexpectedRegularExprRuleException(re);
if (toCheckComma.charAt(commaIndex - 1) != '\\') throw new UnexpectedRegularExprRuleException(re);
else toCheckComma = toCheckComma.substring(commaIndex + 1);
}
// [m-n] 形式已在标准化时处理
String toCheckSeparator = re;
int separatorIndex;
while ((separatorIndex = toCheckSeparator.indexOf("-")) != -1) {
if (separatorIndex == 0) throw new UnexpectedRegularExprRuleException(re);
if (toCheckSeparator.charAt(separatorIndex - 1) != '\\') throw new UnexpectedRegularExprRuleException(re);
else toCheckSeparator = toCheckSeparator.substring(separatorIndex + 1);
}
}
/**
* 判断 re 中的输入字符 c 在条件 isTransferred 下是否合法
*/
private boolean isValidChar(boolean isTransferred, char toTest) {
if (isTransferred) {
// 转义字符
return isOperand(toTest);
} else {
// 普通字符
return !isOperand(toTest);
}
}
/**
* 判断字符 c 是不是操作符
*/
private boolean isOperand(char c) {
return (c == '·' || c == '|' || c == '*' || c == '(' || c == ')' || c == '+' || c == '?' || c == '{' || c == '}'
|| c == '[' || c == ']' || c == '-') || c == ',';
}
/**
* 将标准化后的正则定义的中缀表达式改为后缀表达式
* 只含有并闭包括号
*/
public String convertInfixToPostfix(String re) {
// 存储结果的后缀字符串
StringBuilder sb = new StringBuilder(re.length());
// 操作符的栈
Stack<Character> operandStack = new Stack<>();
// 判断当前字符是否是转义字符
boolean curCharIsTransferred = false;
for (int i = 0; i < re.length(); i++) {
char c = re.charAt(i);
// 转义的操作符
if (c == '\\') {
sb.append(c);
curCharIsTransferred = true;
continue;
}
// 非操作符
if (isValidChar(curCharIsTransferred, c)) {
sb.append(c);
curCharIsTransferred = false;
continue;
}
// 操作符
if (c == '(') operandStack.push('(');
else if (c == ')') {
// 退栈至匹配的'('
char top;
while ((top = operandStack.pop()) != '(') {
sb.append(top);
}
} else {
if (!operandStack.empty()) {
char top = operandStack.peek();
while (true) {
// 退栈高优先级的操作符最后再压栈当前操作符
// 没有优先级更高的操作符时跳出
if (comparePriority(c, top)) {
operandStack.pop();
sb.append(top);
} else break;
// 操作栈不为空时继续比较否则跳出
if (!operandStack.empty()) {
top = operandStack.peek();
} else break;
}
operandStack.push(c);
} else {
// 操作符栈中之前无堆栈将此操作符压栈
operandStack.push(c);
}
}
curCharIsTransferred = false;
}
// 栈中剩余操作符
while (!operandStack.empty()) {
char top = operandStack.pop();
sb.append(top);
}
return sb.toString();
}
/**
* @param curChar 当前读取的操作符
* @param top 当前符号栈的栈顶操作符
* @return true 如果 curChar 优先级小于等于 top 优先级top 需要被弹出false otherwise
*/
private boolean comparePriority(char curChar, char top) {
int curCharIndex = priority.indexOf(curChar);
int topCharIndex = priority.indexOf(top);
boolean result = (curCharIndex - topCharIndex) <= 0;
logger.debug("优先级:当前符号" + curChar + "小于等于栈顶符号" + top + ": " + result);
return result;
}
}

View File

@ -0,0 +1,45 @@
package finiteAutomata.entity;
import java.util.Map;
/**
* Created by cuihua on 2017/10/24.
* <p>
* Deterministic FA确定的有穷自动机
*/
public class DFA extends FA {
/**
* DFA 中各状态之间的转换关系
* 第一个 state(FA_State) 通过 label(Character) 到达第二个 state(FA_State)
*/
private Map<FA_State, Map<Character, FA_State>> move;
public Map<FA_State, Map<Character, FA_State>> getMove() {
return move;
}
public void setMove(Map<FA_State, Map<Character, FA_State>> move) {
this.move = move;
}
@Override
public boolean isValid(String lexeme) {
FA_State curState = getStart();
for (char c : lexeme.toCharArray()) {
boolean canFind = false;
for (FA_Edge curEdge : curState.getFollows()) {
if (curEdge.getLabel() == c) {
curState = curEdge.getPointTo();
canFind = true;
break;
}
}
if (!canFind) return false;
}
return getTerminatedStates().contains(curState);
}
}

View File

@ -0,0 +1,74 @@
package finiteAutomata.entity;
import org.apache.log4j.Logger;
import java.util.List;
/**
* Created by cuihua on 2017/10/24.
* <p>
* 标记子集构造法中产生的映射关系
*/
public class DTran {
private static Logger logger = Logger.getLogger(DTran.class);
/**
* 构造中产生的等价转换的出发状态
*/
private List<FA_State> from;
/**
* 构造中产生的等价转换的到达状态
*/
private List<FA_State> to;
/**
* 标记的转换条件
*/
private char label;
public DTran(List<FA_State> from, List<FA_State> to, char label) {
this.from = from;
this.to = to;
this.label = label;
}
public List<FA_State> getFrom() {
return from;
}
public void setFrom(List<FA_State> from) {
this.from = from;
}
public List<FA_State> getTo() {
return to;
}
public void setTo(List<FA_State> to) {
this.to = to;
}
public char getLabel() {
return label;
}
public void setLabel(char label) {
this.label = label;
}
// 控制台呈现该DTran
public void show() {
StringBuilder sb = new StringBuilder();
sb.append("\n");
for (FA_State state : from) {
sb.append(state.getStateID() + " ");
}
sb.append("\n" + label + "\n");
for (FA_State state : to) {
sb.append(state.getStateID() + " ");
}
logger.debug(sb.toString());
}
}

View File

@ -0,0 +1,70 @@
package finiteAutomata.entity;
import java.util.List;
/**
* Created by cuihua on 2017/10/24.
* <p>
* 表示有穷自动机
*/
public abstract class FA {
/**
* 开始状态
*/
private FA_State start;
/**
* 所有状态
*/
private List<FA_State> states;
/**
* 终止接受态
*/
private List<FA_State> terminatedStates;
/**
* 字母表
*/
private List<Character> alphabet;
public FA_State getStart() {
return start;
}
public void setStart(FA_State start) {
this.start = start;
}
public List<FA_State> getStates() {
return states;
}
public void setStates(List<FA_State> states) {
this.states = states;
}
public List<FA_State> getTerminatedStates() {
return terminatedStates;
}
public void setTerminatedStates(List<FA_State> terminatedStates) {
this.terminatedStates = terminatedStates;
}
public List<Character> getAlphabet() {
return alphabet;
}
public void setAlphabet(List<Character> alphabet) {
this.alphabet = alphabet;
}
/**
* @param lexeme 要检查的词素
* @return 词素是否合法
*/
public abstract boolean isValid(String lexeme);
}

View File

@ -0,0 +1,41 @@
package finiteAutomata.entity;
/**
* Created by cuihua on 2017/10/24.
* <p>
* 有穷自动机中的链接
*/
public class FA_Edge {
/**
* 这条边上的标记空用ε表示
*/
private char label;
/**
* 这条边指向的后记状态
*/
private FA_State pointTo;
public FA_Edge(char label, FA_State pointTo) {
this.label = label;
this.pointTo = pointTo;
}
public char getLabel() {
return label;
}
public void setLabel(char label) {
this.label = label;
}
public FA_State getPointTo() {
return pointTo;
}
public void setPointTo(FA_State pointTo) {
this.pointTo = pointTo;
}
}

View File

@ -0,0 +1,28 @@
package finiteAutomata.entity;
import java.util.List;
/**
* Created by cuihua on 2017/10/26.
* <p>
* 最小化 DFA 过程中形成的等价 FA_State 组成的集合
*/
public class FA_Node {
/**
* 等价 FA_State
*/
private List<FA_State> states;
public FA_Node(List<FA_State> states) {
this.states = states;
}
public List<FA_State> getStates() {
return states;
}
public void setStates(List<FA_State> states) {
this.states = states;
}
}

View File

@ -0,0 +1,44 @@
package finiteAutomata.entity;
import java.util.LinkedList;
import java.util.List;
/**
* Created by cuihua on 2017/10/24.
* <p>
* 有穷自动机中的节点
*/
public class FA_State {
/**
* 此节点对应的ID编号
*/
private int stateID;
/**
* 此节点对应的后续状态链接
*/
private List<FA_Edge> follows;
public FA_State(int stateID) {
this.stateID = stateID;
this.follows = new LinkedList<>();
}
public int getStateID() {
return stateID;
}
public void setStateID(int stateID) {
this.stateID = stateID;
}
public List<FA_Edge> getFollows() {
return follows;
}
public void setFollows(List<FA_Edge> follows) {
this.follows = follows;
}
}

View File

@ -0,0 +1,17 @@
package finiteAutomata.entity;
/**
* Created by cuihua on 2017/10/24.
* <p>
* Nondeterministic FA不确定的有穷自动机
*/
public class NFA extends FA {
/**
* TODO 因为没有使用 NFA 来校验词素所以等需要的时候再来实现
*/
@Override
public boolean isValid(String lexeme) {
return false;
}
}

View File

@ -0,0 +1,41 @@
package lex;
import exceptions.NotMatchingException;
import finiteAutomata.entity.DFA;
import lex.entity.Token;
import utilties.DFA_StatePatternMappingController;
import java.util.List;
/**
* Created by cuihua on 2017/11/2.
* <p>
* 词法分析器
*/
public class LexicalAnalyzer {
/**
* 由当前 .l 文件生成的最小 DFA
*/
private List<DFA> allDFAs;
public LexicalAnalyzer(List<DFA> allDFAs) {
this.allDFAs = allDFAs;
}
/**
* 对每一个词素都进行分析
*
* @param lexeme 要分析的词素
* @return 分析结束之后的的结果词法单元
*/
public Token analyze(String lexeme) throws NotMatchingException {
for (DFA curDFA : allDFAs) {
// 按优先级顺序依次对比满足了就返回
if (curDFA.isValid(lexeme))
return new Token(DFA_StatePatternMappingController.getMap().get(curDFA), lexeme);
}
throw new NotMatchingException(lexeme);
}
}

View File

@ -0,0 +1,43 @@
package lex;
import exceptions.NotMatchingException;
import finiteAutomata.entity.DFA;
import lex.entity.Token;
import lex.generator.LexInputHandler;
import lex.generator.LexInputReader;
import java.util.LinkedList;
import java.util.List;
/**
* Created by cuihua on 2017/11/1.
* <p>
* 主程序
* 输入用户输入程序
* 输出根据已有的 .l 文件输出 Token 序列
*/
public class Main {
public static void main(String[] args) throws NotMatchingException {
UserInteractionController userInteractionController = new UserInteractionController();
List<String> lexemes = userInteractionController.readUserContent();
// 解析 .l 文件代表的 DFA
LexInputReader lexInputReader = new LexInputReader();
List<String> lexContent = lexInputReader.readREs();
LexInputHandler lexInputHandler = new LexInputHandler(lexContent);
List<DFA> allDFAs = lexInputHandler.convert();
// 生成词法分析器
LexicalAnalyzer lexicalAnalyzer = new LexicalAnalyzer(allDFAs);
List<Token> resultTokens = new LinkedList<>();
for (String lexeme : lexemes) {
resultTokens.add(lexicalAnalyzer.analyze(lexeme));
}
userInteractionController.showAllTokens(resultTokens);
}
}

View File

@ -0,0 +1,90 @@
package lex;
import lex.entity.Token;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.time.LocalDateTime;
import java.util.LinkedList;
import java.util.List;
import java.util.Scanner;
/**
* Created by cuihua on 2017/11/1.
* <p>
* 与使用词法分析器的用户进行交互
*/
public class UserInteractionController {
/**
* 读取用户输入并进行简单处理返回所有的词素 lexemes
*/
public List<String> readUserContent() {
Scanner sc = new Scanner(System.in);
List<String> lexemes = new LinkedList<>();
String line;
while (!(line = sc.nextLine()).equals("###")) {
String[] parts = line.split(" ");
for (String lexeme : parts) {
if (!lexeme.equals(""))
lexemes.add(lexeme);
}
}
return lexemes;
}
/**
* 向用户展示所有的词法单元结果
*/
public void showAllTokens(List<Token> tokens) {
String s = getTokenOutput(tokens);
showInConsole(s);
try {
showInFile(s);
} catch (IOException e) {
System.out.println("Token 序列输出到文件:失败!");
}
}
/**
* token 序列中获取要输出的内容
*/
private String getTokenOutput(List<Token> tokens) {
StringBuilder sb = new StringBuilder();
sb.append("-------------------\n");
for (Token token : tokens) {
sb.append("< ").append(token.getPatternType());
if (token.getAttribute() != null) {
sb.append(", ");
sb.append(token.getAttribute());
}
sb.append(" >").append("\n");
}
sb.append("-------------------");
return sb.toString();
}
/**
* 控制台输出
*/
private void showInConsole(String s) {
System.out.println(s);
}
/**
* 文件输出
*/
private void showInFile(String s) throws IOException {
File file = new File(System.getProperty("user.dir") + " "+ LocalDateTime.now() + ".txt");
if (file.createNewFile()) {
FileWriter writer = new FileWriter(file);
writer.write(s);
writer.flush();
writer.close();
}
}
}

View File

@ -0,0 +1,40 @@
package lex.entity;
/**
* Created by cuihua on 2017/10/31.
*
* 词法单元
*/
public class Token {
/**
* 模式
*/
private String patternType;
/**
* 属性值
*/
private String attribute;
public Token(String patternType, String attribute) {
this.patternType = patternType;
this.attribute = attribute;
}
public String getPatternType() {
return patternType;
}
public void setPatternType(String patternType) {
this.patternType = patternType;
}
public String getAttribute() {
return attribute;
}
public void setAttribute(String attribute) {
this.attribute = attribute;
}
}

View File

@ -0,0 +1,54 @@
package lex.generator;
import finiteAutomata.FA_Controller;
import finiteAutomata.entity.DFA;
import java.util.*;
/**
* Created by cuihua on 2017/11/1.
* <p>
* 处理 Lex .l 文件中的数据只含有正则定义
*/
public class LexInputHandler {
/**
* .l 文件的内容模式 pattern + 正则定义 re
*/
private List<String> content;
/**
* 模式 正则定义 的一一映射
*/
private Map<String, String> patternREMap;
public LexInputHandler(List<String> content) {
this.content = content;
initMap();
}
/**
* 根据 .l 文件初始化映射表
* LinkedHashMao 保证顺序与读入顺序相同
*/
private void initMap() {
patternREMap = new LinkedHashMap<>();
for (String line : content) {
String[] parts = line.split(" ");
patternREMap.put(parts[0], parts[1]);
}
}
/**
* .l 文件内容对应的 DFA
*/
public List<DFA> convert() {
// 处理正则定义
List<String> res = new LinkedList<>(patternREMap.values());
List<String> patternTypes = new LinkedList<>(patternREMap.keySet());
FA_Controller controller = new FA_Controller();
return controller.lexicalAnalysis(res, patternTypes);
}
}

View File

@ -0,0 +1,36 @@
package lex.generator;
import java.io.InputStream;
import java.util.LinkedList;
import java.util.List;
import java.util.Scanner;
/**
* Created by cuihua on 2017/10/24.
* <p>
* 用于读取 Lex 的规格 .l 文件
*/
public class LexInputReader {
/**
* .l 文件的路径
*/
private static final String path = "regular_expression.l";
public LexInputReader() {
}
/**
* .l 文件中读取数据
*/
public List<String> readREs() {
InputStream is = getClass().getClassLoader().getResourceAsStream(path);
Scanner sc = new Scanner(is);
List<String> reContent = new LinkedList<>();
while (sc.hasNext()) {
reContent.add(sc.nextLine());
}
return reContent;
}
}

View File

@ -0,0 +1,55 @@
package utilties;
import finiteAutomata.entity.FA_State;
import org.apache.log4j.Logger;
import java.util.List;
/**
* Created by cuihua on 2017/10/27.
* <p>
* 解决递归 closure 时循环处理的问题
*/
public class ClosureRecursionHandler {
private static Logger logger = Logger.getLogger(ClosureRecursionHandler.class);
private static List<FA_State> states = new FA_StatesList();
private static FA_StateComparator comparator = new FA_StateComparator();
private ClosureRecursionHandler() {
}
/**
* 清理当前处理的现场
*/
public static void reset() {
states = new FA_StatesList();
logger.debug("Already reset the ClosureRecursionHandler");
}
/**
* 增加一个 state需保证整个 list 是排好序的才能被复写的二分法找到
*/
public static void addState(FA_State state) {
states.add(state);
states.sort(comparator);
}
/**
* 增加一堆 state list需保证整个 list 是排好序的才能被复写的二分法找到
*/
public static void addAllState(List<FA_State> newStates) {
states.addAll(newStates);
states.sort(comparator);
}
/**
* 检测 states 中是否含有参数 state
*/
public static boolean contain(FA_State state) {
boolean result = states.contains(state);
logger.debug("State " + state.getStateID() + " is contained: " + result);
return result;
}
}

View File

@ -0,0 +1,31 @@
package utilties;
import finiteAutomata.entity.DFA;
import java.util.HashMap;
import java.util.Map;
/**
* Created by cuihua on 2017/11/2.
* <p>
* 统一控制 DFA 与其对应模式的映射
*/
public class DFA_StatePatternMappingController {
private static Map<DFA, String> map = new HashMap<>();
private DFA_StatePatternMappingController() {
}
public static Map<DFA, String> getMap() {
return map;
}
/**
* 对终止态 state 添加对应的模式 pattern
*/
public static boolean add(DFA dfa, String pattern) {
map.put(dfa, pattern);
return true;
}
}

View File

@ -0,0 +1,10 @@
package utilties;
/**
* Created by cuihua on 2017/10/25.
*
* 正则表达式中的扩展符号+ / ? / {}
*/
public enum ExtendedMark {
QUESTION_MARK, PLUS_MARK, BRACE_MARK
}

View File

@ -0,0 +1,42 @@
package utilties;
import finiteAutomata.entity.FA_Node;
import finiteAutomata.entity.FA_State;
import java.util.Comparator;
import java.util.List;
/**
* Created by cuihua on 2017/10/28.
* <p>
* 对当前优化最小化 DFA 的结果集进行排序
*/
public class FA_NodeComparator implements Comparator<FA_Node> {
@Override
public int compare(FA_Node o1, FA_Node o2) {
List<FA_State> states1 = o1.getStates();
List<FA_State> states2 = o2.getStates();
int size1 = states1.size();
int size2 = states2.size();
if (size1 < size2) return -1;
else if (size1 > size2) return 1;
else {
// 状态数目相同逐一比较
for (int i = 0; i < size1; i++) {
int state1 = states1.get(i).getStateID();
int state2 = states2.get(i).getStateID();
if (state1 < state2) return -1;
else if (state1 > state2) return 1;
}
// 每个状态都相同
return 0;
}
}
}

View File

@ -0,0 +1,33 @@
package utilties;
import finiteAutomata.entity.FA_Node;
import java.util.LinkedList;
/**
* Created by cuihua on 2017/10/28.
* <p>
* 优化 DFA 时使用的数据结构
* 复写二分法根据叶节点中 FA_State 的状态数目而定
*/
public class FA_NodesList extends LinkedList<FA_Node> {
@Override
public int indexOf(Object o) {
int i = ((FA_Node) o).getStates().size();
int start = 0;
int end = this.size() - 1;
while (start <= end) {
int middle = (start + end) / 2;
if (i < get(middle).getStates().size()) {
end = middle - 1;
} else if (i > get(middle).getStates().size()) {
start = middle + 1;
} else {
return middle;
}
}
return -1;
}
}

View File

@ -0,0 +1,20 @@
package utilties;
import finiteAutomata.entity.FA_State;
import java.util.Comparator;
/**
* Created by cuihua on 2017/10/24.
* <p>
* ε 闭包的集合进行排序
*/
public class FA_StateComparator implements Comparator<FA_State> {
public int compare(FA_State o1, FA_State o2) {
if (o1.getStateID() < o2.getStateID()) return -1;
else if (o1.getStateID() == o2.getStateID()) return 0;
else return 1;
}
}

View File

@ -0,0 +1,26 @@
package utilties;
/**
* Created by cuihua on 2017/10/26.
*
* 统一控制 FA_State 的序号
*/
public class FA_StateIDController {
/**
* 代表从此Controller中取到的当前可使用的ID
*/
private static int nowID;
private FA_StateIDController() {
FA_StateIDController.nowID = 0;
}
public static int getID() {
return FA_StateIDController.nowID;
}
public static void setID(int nowID) {
FA_StateIDController.nowID = nowID;
}
}

View File

@ -0,0 +1,40 @@
package utilties;
import finiteAutomata.entity.FA_State;
import java.util.Collection;
import java.util.LinkedList;
/**
* Created by cuihua on 2017/10/24.
* <p>
* 优化查找为二分法速度更快
*/
public class FA_StatesList extends LinkedList<FA_State> {
public FA_StatesList() {
}
public FA_StatesList(Collection<? extends FA_State> c) {
super(c);
}
@Override
public int indexOf(Object o) {
int i = ((FA_State) o).getStateID();
int start = 0;
int end = this.size() - 1;
while (start <= end) {
int middle = (start + end) / 2;
if (i < get(middle).getStateID()) {
end = middle - 1;
} else if (i > get(middle).getStateID()) {
start = middle + 1;
} else {
return middle;
}
}
return -1;
}
}

View File

@ -0,0 +1,31 @@
package utilties;
import finiteAutomata.entity.NFA;
import java.util.HashMap;
import java.util.Map;
/**
* Created by cuihua on 2017/11/2.
* <p>
* 统一控制 NFA 与其对应模式的映射
*/
public class NFA_StatePatternMappingController {
private static Map<NFA, String> map = new HashMap<>();
private NFA_StatePatternMappingController() {
}
public static Map<NFA, String> getMap() {
return map;
}
/**
* 对终止态 state 添加对应的模式 pattern
*/
public static boolean add(NFA nfa, String pattern) {
map.put(nfa, pattern);
return true;
}
}

View File

@ -0,0 +1,12 @@
package utilties;
/**
* Created by cuihua on 2017/11/3.
*
* [m-n] m, n 的类型
*/
public enum SquareBracketMarkInnerType {
LOW_CHAR, UP_CHAR, INT
}

View File

@ -0,0 +1,8 @@
# 调整 rootLogger 的优先级以控制输入态输出的内容
log4j.rootLogger=info, stdout
### direct log messages to stdout ###
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{ABSOLUTE} %5p %c{1}:%L - %m%n

View File

@ -0,0 +1,13 @@
separator \,|;
comparator <|>|(<=)|(>=)|(==)|(!=)
assignment =
digit 0|1|2|3|4|5|6|7|8|9
number (1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)*
letter a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
id letter(letter|number)*
ID aa+
A (a|b)*
B (a|b)*abb(a|b)*
C aba?a+abb+cc

View File

@ -0,0 +1,11 @@
separator \,|;
comparator <|>|(<=)|(>=)|(==)|(!=)
assignment =
digit [0-9]
number [1-9][0-9]*
letter [a-zA-Z]
word [a-zA-Z]*
quotation "[a-zA-Z]*"
id [a-zA-Z][_a-zA-Z0-9]*
testTransferred1 \{a*b*c{1,3}\}
testTransferred2 \[0\-9\]a*

View File

@ -0,0 +1,196 @@
package finiteAutomata;
import finiteAutomata.entity.DFA;
import finiteAutomata.entity.FA_Edge;
import finiteAutomata.entity.FA_State;
import finiteAutomata.entity.NFA;
import org.apache.log4j.Logger;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import java.lang.reflect.Method;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
/**
* DFA_Handler Tester.
*
* @author <Authors name>
* @version 1.0
* @since <pre>十月 24, 2017</pre>
*/
public class DFA_HandlerTest {
private static final Logger logger = Logger.getLogger(DFA_HandlerTest.class);
private NFA defaultNFA;
@Before
public void before() throws Exception {
FA_State state1 = new FA_State(1);
FA_State state2 = new FA_State(2);
FA_State state3 = new FA_State(3);
FA_State state4 = new FA_State(4);
FA_State state5 = new FA_State(5);
FA_State state6 = new FA_State(6);
FA_State state7 = new FA_State(7);
FA_State state8 = new FA_State(8);
FA_State state9 = new FA_State(9);
FA_State state10 = new FA_State(10);
FA_Edge edge1 = new FA_Edge('ε', state2);
FA_Edge edge2 = new FA_Edge('ε', state3);
FA_Edge edge3 = new FA_Edge('a', state4);
FA_Edge edge4 = new FA_Edge('ε', state6);
FA_Edge edge5 = new FA_Edge('ε', state7);
FA_Edge edge6 = new FA_Edge('ε', state8);
FA_Edge edge7 = new FA_Edge('ε', state5);
FA_Edge edge8 = new FA_Edge('ε', state9);
FA_Edge edge9 = new FA_Edge('a', state10);
List<FA_Edge> follow1 = new LinkedList<>();
follow1.add(edge1);
follow1.add(edge2);
state1.setFollows(follow1);
List<FA_Edge> follow2 = new LinkedList<>();
follow2.add(edge3);
follow2.add(edge4);
state2.setFollows(follow2);
List<FA_Edge> follow3 = new LinkedList<>();
follow3.add(edge5);
follow3.add(edge6);
state3.setFollows(follow3);
List<FA_Edge> follow4 = new LinkedList<>();
follow4.add(edge7);
state4.setFollows(follow4);
List<FA_Edge> follow5 = new LinkedList<>();
follow5.add(edge8);
state5.setFollows(follow5);
List<FA_Edge> follow7 = new LinkedList<>();
follow7.add(edge9);
state7.setFollows(follow7);
List<Character> alphabet = new LinkedList<>();
alphabet.add('a');
List<FA_State> terminatedStates = new LinkedList<>();
terminatedStates.add(state6);
terminatedStates.add(state8);
terminatedStates.add(state9);
terminatedStates.add(state10);
List<FA_State> allStates = new LinkedList<>();
allStates.add(state1);
allStates.add(state2);
allStates.add(state3);
allStates.add(state4);
allStates.add(state5);
allStates.add(state6);
allStates.add(state7);
allStates.add(state8);
allStates.add(state9);
allStates.add(state10);
defaultNFA = new NFA();
defaultNFA.setAlphabet(alphabet);
defaultNFA.setStart(state1);
defaultNFA.setTerminatedStates(terminatedStates);
defaultNFA.setStates(allStates);
}
@After
public void after() throws Exception {
}
/**
* Method: getFromNFA(NFA defaultNFA)
*/
@Test
public void testGetFromNFA1() throws Exception {
DFA_Handler dfaHandler = new DFA_Handler();
dfaHandler.getFromNFA(defaultNFA);
}
/**
* Method: getFromNFA(NFA defaultNFA)
*/
@Test
public void testGetFromNFA2() throws Exception {
RegularExpressionHandler rgHandler = new RegularExpressionHandler();
NFA_Handler nfaHandler = new NFA_Handler();
DFA_Handler dfaHandler = new DFA_Handler();
// String re = "((ε|a)b*)*";
String re = "(a|b)*abb(a|b)*";
re = rgHandler.convertInfixToPostfix(rgHandler.standardizeRE(re));
logger.debug("---------------- " + re + " ----------------");
NFA finalNFA = nfaHandler.getFromRE(re, null);
logger.debug("*************** finish convert " + re + " to NFA ***************");
// 转化为 DFA
DFA dfa = dfaHandler.getFromNFA(finalNFA);
logger.debug("all states size: " + dfa.getStates().size());
}
/**
* Method: optimize(DFA defaultNFA)
*/
@Test
public void testOptimize1() throws Exception {
DFA_Handler dfaHandler = new DFA_Handler();
dfaHandler.optimize(dfaHandler.getFromNFA(defaultNFA));
}
/**
* Method: optimize(DFA defaultNFA)
*/
@Test
public void testOptimize2() throws Exception {
RegularExpressionHandler rgHandler = new RegularExpressionHandler();
NFA_Handler nfaHandler = new NFA_Handler();
DFA_Handler dfaHandler = new DFA_Handler();
// String re = "(a|b)*abb(a|b)*";
String re = "\\{a*b*c{1,3}\\}";
re = rgHandler.convertInfixToPostfix(rgHandler.standardizeRE(re));
NFA finalNFA = nfaHandler.getFromRE(re, null);
DFA dfa = dfaHandler.optimize(dfaHandler.getFromNFA(finalNFA));
logger.info("优化后的 DFA 转换表");
for (Map.Entry<FA_State, Map<Character, FA_State>> entryState : dfa.getMove().entrySet()) {
FA_State start = entryState.getKey();
for (Map.Entry<Character, FA_State> entryEdge : entryState.getValue().entrySet()) {
logger.info(start.getStateID() + " through " + entryEdge.getKey() + " to " + entryEdge.getValue().getStateID());
}
}
}
/**
* Method: closure(FA_State nowState)
*/
@Test
public void testClosure() throws Exception {
DFA_Handler dfaHandler = new DFA_Handler();
Method method = DFA_Handler.class.getDeclaredMethod("closure", FA_State.class);
method.setAccessible(true);
method.invoke(dfaHandler, defaultNFA.getStart());
// 需比对的答案
// state: closure
// 1: 1, 2, 3, 6, 7, 8
// 2: 2, 6
// 3: 3, 7, 8
// 4: 4, 5, 9
// 7: 7
// for (FA_State temp : result) {
// logger.debug(temp.getStateID());
// }
}
}

View File

@ -0,0 +1,64 @@
package finiteAutomata;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import utilties.DFA_StatePatternMappingController;
import utilties.NFA_StatePatternMappingController;
import java.util.LinkedList;
import java.util.List;
/**
* FA_Controller Tester.
*
* @author <Authors name>
* @version 1.0
* @since <pre>十月 27, 2017</pre>
*/
public class FA_ControllerTest {
@Before
public void before() throws Exception {
}
@After
public void after() throws Exception {
}
/**
* Method: lexicalAnalysis(List<String> res)
*/
@Test
public void testLexicalAnalysis() throws Exception {
FA_Controller controller = new FA_Controller();
List<String> res = new LinkedList<>();
// res.add("a+");
// res.add("a*|b*");
// res.add("(a|b)*");
// res.add("(a*|b*)*");
// res.add("((ε|a)b*)*");
// res.add("(a|b)*abb(a|b)*");
// res.add("aba?a+abb+cc");
// res.add("\\{a*b*c{1,3}\\}");
res.add("[a-z0-9]*");
List<String> patterns = new LinkedList<>();
patterns.add("ID");
patterns.add("A");
patterns.add("B");
patterns.add("C");
patterns.add("D");
patterns.add("E");
patterns.add("F");
controller.lexicalAnalysis(res, patterns);
// NFA_StatePatternMappingController.getMap();
DFA_StatePatternMappingController.getMap();
}
}

View File

@ -0,0 +1,59 @@
package finiteAutomata;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
/**
* NFA Tester.
*
* @author <Authors name>
* @version 1.0
* @since <pre>十月 26, 2017</pre>
*/
public class NFA_HandlerTest {
@Before
public void before() throws Exception {
}
@After
public void after() throws Exception {
}
/**
* Method: getFromRE(String re)
*/
@Test
public void testGetFromRE1() throws Exception {
NFA_Handler handler = new NFA_Handler();
handler.getFromRE("ab·a|*", "ID");
}
/**
* Method: getFromRE(String re)
*/
@Test
public void testGetFromRE2() throws Exception {
NFA_Handler handler = new NFA_Handler();
handler.getFromRE("ab·c·\\{·ε\\{|·ε\\{|·", "ID");
}
/**
* Method: getFromRE(String re)
*/
@Test
public void testGetFromRE3() throws Exception {
NFA_Handler handler = new NFA_Handler();
handler.getFromRE("\\{a*·b*·c·εc|·εc|·\\", "ID");
}
/**
* Method: combine(List<NFA> nfaList)
*/
@Test
public void testCombine() throws Exception {
}
}

View File

@ -0,0 +1,219 @@
package finiteAutomata;
import exceptions.UnexpectedRegularExprRuleException;
import org.apache.log4j.Logger;
import org.junit.After;
import org.junit.Assert;
import org.junit.Before;
import org.junit.Test;
import utilties.ExtendedMark;
import java.lang.reflect.Method;
/**
* RegularExpressionHandler Tester.
*
* @author <Authors name>
* @version 1.0
* @since <pre>十月 25, 2017</pre>
*/
public class RegularExpressionHandlerTest {
private static final Logger logger = Logger.getLogger(RegularExpressionHandlerTest.class);
@Before
public void before() throws Exception {
}
@After
public void after() throws Exception {
}
/**
* Method: standardizeRE(String re)
*/
@Test
public void testStandardizeRE() throws Exception {
RegularExpressionHandler re = new RegularExpressionHandler();
Assert.assertEquals("a·a*", re.standardizeRE("a+"));
Assert.assertEquals("a·b·(ε|a)·a·a*·a·b·b·b*·c·c", re.standardizeRE("aba?a+abb+cc"));
Assert.assertEquals("(a|b)·(a|b)*", re.standardizeRE("(a|b)+"));
Assert.assertEquals("(a*|b*)*", re.standardizeRE("(a*|b*)*"));
Assert.assertEquals("((ε|a)·b*)*", re.standardizeRE("((ε|a)b*)*"));
Assert.assertEquals("(a|b)*·a·b·b·(a|b)*", re.standardizeRE("(a|b)*abb(a|b)*"));
Assert.assertEquals("c·c·(a·b)·(a·b)·(ε|(a·b))·a·a·a", re.standardizeRE("cc(ab){2, 3}aaa"));
Assert.assertEquals("c·c·c·c*·a·a·a", re.standardizeRE("cc{2, }aaa"));
Assert.assertEquals("c·c·(ε|(a·b))·(ε|(a·b))·(ε|(a·b))·a·a·a", re.standardizeRE("cc(ab){, 3}aaa"));
Assert.assertEquals("c·c·c·a·a·a", re.standardizeRE("cc{2}aaa"));
Assert.assertEquals("c·(ε|c)·(ε|c)·(ε|c)·a·a·a", re.standardizeRE("cc{, 3}aaa"));
Assert.assertEquals("c·(ε|c)·(ε|c)·(ε|(ε|c))·a·a·a", re.standardizeRE("c·(ε|c)·(ε|c){1, 2}aa·a"));
Assert.assertEquals("c·(ε|a)·(ε|a)·(ε|a)·(ε|c)·(ε|(ε|c))·a·a·a", re.standardizeRE("ca{0,3}·(ε|c){1, 2}aa·a"));
Assert.assertEquals("c·c·(a·b)·(ε|((0|1|2|3|4|5|6|7|8|9)|(a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z)))·a·a·a",
re.standardizeRE("cc(ab)[0-9a-z]?aaa"));
Assert.assertEquals("c·c·(a·b)·((0|1|2|3|4|5|6|7|8|9)|(a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z))·((0|1|2|3|4|5|6|7|8|9)|(a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z))*·a·a·a",
re.standardizeRE("cc(ab)[0-9a-z]+aaa"));
Assert.assertEquals("c·c·(a·b|(c·d)*)·(a·b|(c·d)*)·(ε|(a·b|(c·d)*))·a·a·a", re.standardizeRE("cc(ab|(cd)*){2,3}aaa"));
Assert.assertEquals("c·c·(a·b)·(a|b|c)·a·a·a", re.standardizeRE("cc(ab)[abc]aaa"));
Assert.assertEquals("c·c·(a·b)·(a|b|(c|d|e|f)|x|y)·a·a·a", re.standardizeRE("cc(ab)[abc-fxy]aaa"));
}
/**
* Method: standardizeRE(String re)
*/
@Test(expected = UnexpectedRegularExprRuleException.class)
public void testStandardizeRE2() throws Exception {
RegularExpressionHandler re = new RegularExpressionHandler();
// logger.debug(re.standardizeRE("aa(*)") + "\n");
re.standardizeRE("aa[");
}
/**
* Method: standardizeRE(String re)
*/
@Test
public void testStandardizeRE3() throws Exception {
RegularExpressionHandler re = new RegularExpressionHandler();
Assert.assertEquals("a·a·\\|", re.standardizeRE("aa\\|"));
Assert.assertEquals("a·a·\\{", re.standardizeRE("aa\\{"));
Assert.assertEquals("a·a·\\\\}", re.standardizeRE("aa\\{\\}"));
Assert.assertEquals("a·a·\\+", re.standardizeRE("aa\\+"));
Assert.assertEquals("a·a·\\?", re.standardizeRE("aa\\?"));
Assert.assertEquals("a·a·\\[", re.standardizeRE("aa\\["));
Assert.assertEquals("a·a·\\\\]", re.standardizeRE("aa\\[\\]"));
Assert.assertEquals("a·a·\\-", re.standardizeRE("aa\\-"));
Assert.assertEquals("a·b·c·\\{·(ε|\\{)·(ε|\\{)", re.standardizeRE("abc\\{{1,3}"));
Assert.assertEquals("a·b·\\[·0·\\-·9·\\]", re.standardizeRE("ab\\[0\\-9\\]"));
Assert.assertEquals("\\{·a*·b*·c·(ε|c)·(ε|c)·\\}", re.standardizeRE("\\{a*b*c{1,3}\\}"));
}
/**
* Method: standardizeRE(String re)
*/
@Test(expected = UnexpectedRegularExprRuleException.class)
public void testStandardizeRE4() throws Exception {
RegularExpressionHandler re = new RegularExpressionHandler();
// re.standardizeRE("ab\\[0-9\\]");
re.standardizeRE("ab\\[0\\-9\\]\\[0-\\9]");
}
/**
* Method: convertInfixToPostfix(String re)
*/
@Test
public void testConvertInfixToPostfix() throws Exception {
RegularExpressionHandler re = new RegularExpressionHandler();
Assert.assertEquals("ab|*", re.convertInfixToPostfix("(a|b)*"));
Assert.assertEquals("a*b*|*", re.convertInfixToPostfix("(a*|b*)*"));
Assert.assertEquals("εa|b*·*", re.convertInfixToPostfix("((ε|a)·b*)*"));
Assert.assertEquals("ab|*a·b·b·ab|*·", re.convertInfixToPostfix("(a|b)*·a·b·b·(a|b)*"));
Assert.assertEquals("ab·εa|·a·a*·a·b·b·b*·c·c·", re.convertInfixToPostfix("a·b·(ε|a)·a·a*·a·b·b·b*·c·c"));
Assert.assertEquals("cc·ab··ab··εab·|·a·a·a·", re.convertInfixToPostfix("c·c·(a·b)·(a·b)·(ε|(a·b))·a·a·a"));
Assert.assertEquals("cc·c·c*·a·a·a·", re.convertInfixToPostfix("c·c·c·c*·a·a·a"));
Assert.assertEquals("cc·εab·|·εab·|·εab·|·a·a·a·", re.convertInfixToPostfix("c·c·(ε|(a·b))·(ε|(a·b))·(ε|(a·b))·a·a·a"));
Assert.assertEquals("cc·c·a·a·a·", re.convertInfixToPostfix("c·c·c·a·a·a"));
Assert.assertEquals("cεc|·εc|·εc|·a·a·a·", re.convertInfixToPostfix("c·(ε|c)·(ε|c)·(ε|c)·a·a·a"));
}
/**
* Method: convertInfixToPostfix(String re)
*/
@Test
public void testConvertInfixToPostfix2() throws Exception {
RegularExpressionHandler re = new RegularExpressionHandler();
Assert.assertEquals("\\,;|", re.convertInfixToPostfix("\\,|;"));
Assert.assertEquals("ab·c·\\{·ε\\{|·ε\\{|·", re.convertInfixToPostfix("a·b·c·\\{·(ε|\\{)·(ε|\\{)"));
Assert.assertEquals("\\{a*·b*·c·εc|·εc|·\\", re.convertInfixToPostfix("\\{·a*·b*·c·(ε|c)·(ε|c)·\\}"));
}
@Test
public void testStandardizeExtendedMark() throws Exception {
RegularExpressionHandler regularExpressionHandler = new RegularExpressionHandler();
Method method = RegularExpressionHandler.class.getDeclaredMethod("standardizeExtendedMark", StringBuffer.class, int.class, ExtendedMark.class);
method.setAccessible(true);
StringBuffer stringBuffer = new StringBuffer();
stringBuffer.append("cc(ab)+aaa");
method.invoke(regularExpressionHandler, stringBuffer, 6, ExtendedMark.PLUS_MARK);
/*
检测
"a?", 1, ExtendedMark.QUESTION_MARK: (ε|a)
"aba?a", 3, ExtendedMark.QUESTION_MARK: ab(ε|a)a
"cc(ab)?a", 6, ExtendedMark.QUESTION_MARK: cc(ε|(ab))a
"aba+a", 3, ExtendedMark.PLUS_MARK: abaa*a
"cc(ab)+aaa", 6, ExtendedMark.PLUS_MARK: cc(ab)(ab)*aaa
*/
}
@Test
public void testStandardizeExtendedMark2() throws Exception {
RegularExpressionHandler regularExpressionHandler = new RegularExpressionHandler();
Method method = RegularExpressionHandler.class.getDeclaredMethod("standardizeExtendedMark", StringBuffer.class, int.class, ExtendedMark.class);
method.setAccessible(true);
StringBuffer stringBuffer = new StringBuffer();
stringBuffer.append("cc(ab){,3}aaa");
method.invoke(regularExpressionHandler, stringBuffer, 6, ExtendedMark.BRACE_MARK);
/*
检测
"cc(ab){2,3}aaa", 6, ExtendedMark.PLUS_MARK: cc(ab)(ab)(ε|(ab))aaa
"cc{2, 3}aaa", 2, ExtendedMark.PLUS_MARK: ccc(ε|c)aaa
"cc(ab){2}aaa", 6, ExtendedMark.PLUS_MARK: cc(ab)(ab)aaa
"cc{2}aaa", 2, ExtendedMark.PLUS_MARK: cccaaa
"cc(ab){2,}aaa", 6, ExtendedMark.PLUS_MARK: cc(ab)(ab)(ab)*aaa
"cc{2,}aaa", 2, ExtendedMark.PLUS_MARK: cccc*aaa
"cc(ab){,3}aaa", 6, ExtendedMark.PLUS_MARK: cc(ε|(ab))(ε|(ab))(ε|(ab))aaa
"cc{,3}aaa", 2, ExtendedMark.PLUS_MARK: c(ε|c)(ε|c)(ε|c)aaa
*/
}
@Test
public void testStandardizeSquareBracketMark() throws Exception {
RegularExpressionHandler regularExpressionHandler = new RegularExpressionHandler();
Method method = RegularExpressionHandler.class.getDeclaredMethod("standardizeSquareBracketMark", StringBuffer.class, int.class);
method.setAccessible(true);
StringBuffer stringBuffer = new StringBuffer();
stringBuffer.append("cc(ab)[0-9a-z]aaa");
method.invoke(regularExpressionHandler, stringBuffer, 6);
}
/**
* Method: comparePriority(char curChar, char top)
*/
@Test
public void testComparePriority() throws Exception {
try {
Method method = RegularExpressionHandler.class.getDeclaredMethod("comparePriority", char.class, char.class);
method.setAccessible(true);
method.invoke(new RegularExpressionHandler(), '*', '*');
/*
测试
'|', '*':true
'·', '*':true
'·', '(':false
'*', '*':true
*/
} catch (NoSuchMethodException e) {
}
}
}

View File

@ -0,0 +1,110 @@
package finiteAutomata.entity;
import finiteAutomata.FA_Controller;
import org.junit.After;
import org.junit.Assert;
import org.junit.Before;
import org.junit.Test;
import java.util.LinkedList;
import java.util.List;
/**
* DFA Tester.
*
* @author <Authors name>
* @version 1.0
* @since <pre>十一月 2, 2017</pre>
*/
public class DFATest {
@Before
public void before() throws Exception {
}
@After
public void after() throws Exception {
}
/**
* Method: isValid(String s)
*/
@Test
public void testIsValid1() throws Exception {
List<String> res = new LinkedList<>();
res.add("aa+");
List<String> patterns = new LinkedList<>();
patterns.add("ID");
FA_Controller controller = new FA_Controller();
DFA dfa = controller.lexicalAnalysis(res, patterns).get(0);
Assert.assertEquals(true, dfa.isValid("aa"));
Assert.assertEquals(true, dfa.isValid("aaa"));
}
/**
* Method: isValid(String s)
*/
@Test
public void testIsValid2() throws Exception {
List<String> res = new LinkedList<>();
res.add("aa+");
res.add("b*");
List<String> patterns = new LinkedList<>();
patterns.add("ID");
patterns.add("A");
FA_Controller controller = new FA_Controller();
List<DFA> dfas = controller.lexicalAnalysis(res, patterns);
DFA dfa1 = dfas.get(0);
DFA dfa2 = dfas.get(1);
Assert.assertEquals(true, dfa1.isValid("aa"));
Assert.assertEquals(true, dfa1.isValid("aaa"));
Assert.assertEquals(false, dfa1.isValid("b"));
Assert.assertEquals(true, dfa2.isValid("b"));
Assert.assertEquals(true, dfa2.isValid("bb"));
Assert.assertEquals(true, dfa2.isValid("bbb"));
}
/**
* Method: isValid(String s)
*/
@Test
public void testIsValid3() throws Exception {
List<String> res = new LinkedList<>();
res.add("aa+");
List<String> patterns = new LinkedList<>();
patterns.add("ID");
FA_Controller controller = new FA_Controller();
DFA dfa = controller.lexicalAnalysis(res, patterns).get(0);
Assert.assertEquals(false, dfa.isValid("a"));
}
/**
* Method: isValid(String s)
*/
@Test
public void testIsValid4() throws Exception {
List<String> res = new LinkedList<>();
res.add("aa+");
List<String> patterns = new LinkedList<>();
patterns.add("ID");
FA_Controller controller = new FA_Controller();
DFA dfa = controller.lexicalAnalysis(res, patterns).get(0);
Assert.assertEquals(false, dfa.isValid("b"));
}
}