移进规约冲突

今天写了一个移进规约冲突的文法。文法规则如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
%{
#include <stdio.h>
#include <string.h>
#include "zend_compile.h"
#include "zend_opcode.h"
#include "zend_vm.h"

#define YYDEBUG 1

#define zendparse yyparse

extern int yylex(void);
extern int yyparse(void);
extern FILE *yyin;
extern int yylineno;

int yywrap()
{
return 1;
}

void yyerror(const char *s)
{
printf("[error] %s, in line %d\n", s, yylineno);
}

int main(int argc, char const *argv[])
{
return 0;
}
%}

%left '+' '-'
%left '*' '/'
%left '(' ')'

%token <ident> T_ECHO "'echo'"
%token <ast> T_LNUMBER "integer"
%token <ast> T_VARIABLE "variable"

%union {
zend_ast *ast;
}

%type <ast> top_statement statement
%type <ast> expr
%type <ast> echo_expr
%type <ast> scalar
%type <ast> top_statement_list
%type <ast> variable

%%

start:
top_statement_list { CG(ast) = $1; }
;

top_statement_list:
top_statement_list top_statement { $$ = zend_ast_list_add($1, $2); }
| %empty { $$ = zend_ast_create_list(0, ZEND_AST_STMT_LIST); }
;

top_statement:
statement { $$ = $1; }
;

statement:
T_ECHO echo_expr ';' { $$ = $2; }
| expr ';' { $$ = $1; }
;

echo_expr:
expr {
std::cout << "create echo zend_ast" << std::endl;
$$ = zend_ast_create_1(ZEND_AST_ECHO, 0, $1);
}
;

expr:
variable '=' expr {
std::cout << "create assign zend_ast" << std::endl;
$$ = zend_ast_create_2(ZEND_AST_ASSIGN, 0, $1, $3);
}
|
expr '+' expr {
std::cout << "create + zend_ast" << std::endl;
$$ = zend_ast_create_binary_op(ZEND_ADD, $1, $3);
}
| expr '-' expr {
std::cout << "create - zend_ast" << std::endl;
$$ = zend_ast_create_binary_op(ZEND_SUB, $1, $3);
}
| expr '*' expr {
std::cout << "create * zend_ast" << std::endl;
$$ = zend_ast_create_binary_op(ZEND_MUL, $1, $3);
}
| expr '/' expr {
std::cout << "create / zend_ast" << std::endl;
$$ = zend_ast_create_binary_op(ZEND_DIV, $1, $3);
}
| '(' expr ')' { $$ = $2; }
| scalar { $$ = $1; }
;

scalar:
T_LNUMBER { $$ = $1; }
;

variable:
T_VARIABLE { $$ = $1; }

%%

执行生成代码的命令:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
bison -d -Wcounterexamples zend_language_parser.y

zend_language_parser.y: warning: 4 shift/reduce conflicts [-Wconflicts-sr]
zend_language_parser.y: warning: shift/reduce conflict on token '+' [-Wcounterexamples]
Example: variable '=' expr • '+' expr
Shift derivation
expr
↳ variable '=' expr
↳ expr • '+' expr
Reduce derivation
expr
↳ expr '+' expr
↳ variable '=' expr •

# 省略其他的警告

可以看到,警告说是有4个地方有移进规约的冲突。

那么,什么是移进规约冲突呢?意思就是说,当我们预读了词素的时候,既可以对分析栈里面已有的词素进行规约也可以对预读的词素进行移进,这就是已经规约冲突。

OK,我们来看看上面的报错。可以看到,当有字符串:

1
$a = 1 + 1

输入时,会发生移进规约的冲突。

我们来看看如果是优先移进的话,状态是如何变化的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
+----------------------------------+   +-----+   +----------------------------------+
| | |init | | $a = 1 + 1 |
+----------------------------------+ +-----+ +----------------------------------+




+----------------------------------+ +-----+ +----------------------------------+
| $a | |shift| | = 1 + 1 |
+----------------------------------+ +-----+ +----------------------------------+



+----------------------------------+ +-----+ +----------------------------------+
| $a = | |shift| | 1 + 1 |
+----------------------------------+ +-----+ +----------------------------------+


+----------------------------------+ +-----+ +----------------------------------+
| $a = 1 | |shift| | + 1 |
+----------------------------------+ +-----+ +----------------------------------+



+----------------------------------+ +-----+ +----------------------------------+
| $a = 1 + | |shift| | 1 |
+----------------------------------+ +-----+ +----------------------------------+


+----------------------------------+ +-----+ +----------------------------------+
| $a = 1 + 1 | |shift| | |
+----------------------------------+ +-----+ +----------------------------------+


+----------------------------------+ +------+ +----------------------------------+
| $a = expr | |reduce| | |
+----------------------------------+ +------+ +----------------------------------+



+----------------------------------+ +------+ +----------------------------------+
| expr | |reduce| | |
+----------------------------------+ +------+ +----------------------------------+

对应的AST如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
               +------------+                             
|ZEND_ASSIGN |
| |
+------------+
|
|
+--------------+--------------+
| |
| |
v v
+------------+ +------------+
| $a | | ZEND_ADD |
| | | |
+------------+ +------------+
|
|
|
+---------------+-------------+
| |
| |
v v
+------------+ +------------+
| 1 | | 1 |
| | | |
+------------+ +------------+

计算这个AST,我们会得到$a的最终值为2。这是符合主流语言的预期的。

我们来看看如果是优先规约的话,状态是如何变化的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
+----------------------------------+   +-----+   +----------------------------------+
| | |init | | $a = 1 + 1 |
+----------------------------------+ +-----+ +----------------------------------+




+----------------------------------+ +-----+ +----------------------------------+
| $a | |shift| | = 1 + 1 |
+----------------------------------+ +-----+ +----------------------------------+



+----------------------------------+ +-----+ +----------------------------------+
| $a = | |shift| | 1 + 1 |
+----------------------------------+ +-----+ +----------------------------------+


+----------------------------------+ +-----+ +----------------------------------+
| $a = 1 | |shift| | + 1 |
+----------------------------------+ +-----+ +----------------------------------+



+----------------------------------+ +------+ +----------------------------------+
| expr | |reduce| | + 1 |
+----------------------------------+ +------+ +----------------------------------+


+----------------------------------+ +-----+ +----------------------------------+
| expr + | |shift| | 1 |
+----------------------------------+ +-----+ +----------------------------------+


+----------------------------------+ +------+ +----------------------------------+
| expr + 1 | |shift | | |
+----------------------------------+ +------+ +----------------------------------+



+----------------------------------+ +------+ +----------------------------------+
| expr | |reduce| | |
+----------------------------------+ +------+ +----------------------------------+

对应的AST如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
                                  +-----------------+                 
| |
| ZEND_ADD |
| |
+-----------------+
|
|
+---------------+----------------+
| |
| |
v v
+-----------------+ +-----------------+
| | | |
| ZEND_ASSIGN | | 1 |
| | | |
+-----------------+ +-----------------+
|
|
|
+-----------------+----------------+
| |
| |
v v
+-----------------+ +-----------------+
| | | |
| $a | | 1 |
| | | |
+-----------------+ +-----------------+

计算这个AST,我们会得到$a的最终值为1。这和我们想的不太一样。

所以,移进规约的冲突,会导致一些执行的顺序不一致。如果我们学习过bison官方文档经典的if ... else的移进规约冲突问题的话,我们知道,解决它的办法是修改文法,进而避免冲突(因为这个例子有一点绕,所以我没有用那个例子)。那我们这种情况呢,就可以通过设置词素的优先级来解决掉,我们设置=的优先级低于+即可:

1
2
3
4
%left '='
%left '+' '-'
%left '*' '/'
%left '(' ')'

这样的话,当我们的分析栈为如下情况的时候:

1
2
3
+----------------------------------+   +-----+   +----------------------------------+
| $a = 1 | |shift| | + 1 |
+----------------------------------+ +-----+ +----------------------------------+

我们预读一个+,因为+号的优先级更高一点,所以,此时不会选择规约,而是把+移进。这样,我们可以保证在后续规约的时候,先规约1 + 1,进而也保证了运算符的优先级。