-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathdesignP2.txt
More file actions
207 lines (175 loc) · 13.8 KB
/
designP2.txt
File metadata and controls
207 lines (175 loc) · 13.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
Interpreter design:
Language:
I chose to use C++11 for this interpreter. Unfortunately, c++11 isn't supported by stdlinux, so
c++0x it is.
Main function:
The main function reads the input from std::cin up to a $, and returns it as a string. the string is then
tokenized and returned as a vector of vectors of tokens, and then each individual
vector of tokens corresponding to a separate lisp expression is converted to internal representation
as a single S-expression, and printed. After printing, the SExpression is evaluated as a lisp expression
using the eval function.
Tokenization and conversion, and evaluation can generate errors, so the main loop has error handling logic
which allows it to print the error and continue onto the next s-expression.
Classes:
The interpreter has two structs, token and SExp.
Token:
Token is an aggregate struct that contains two strings, tokenType, and tokenText.
tokenType is the type of token, which can be one of the following:
"whitespace", "lParen", "rParen", "int", "symbolId", "dot"
Notice, that "$", and "$$", are valid symbols. These are used by the tokenizer to
separate s-expressions, and are not needed for evaluation so I got rid of them.
tokenText is the text that generated that token. for instance, "1" will have a token
type of "int", and a tokenText of "1". This is so the token can know and write it's
generating string. Only symbolid and int actually use this.
SExp:
This class is very similar to the version presented in class. It has an int type,
int val, string name, and SExp* left and right. Type {1,2,3} denotes the type of the
expression, 1 for int, 2 for symbolId, and 3 for SExp. Then the appropriate variables
are used to hold the contents of that s-expression, be it an integer, string, or
two SExp*'s representing the car and cdr of the s-expression.
SExp also has an unordered_map which holds SExp*'s uniquely identified by symbolId names.
This allows us to hold known name and their SExp*s for use in the conversion function.
Reading input:
The interpreter first waits for input using std::cin. It reads input line by line separated by
newline characters until it sees a "$", at which point, it stops reading input returns the input string.
Tokenizer:
Once input is read and returned as a string, Each expression is then searched for identifiers of
tokens. The tokenizer uses a vector to record the string of tokens. The function iterates over the
string lisp expression and finds tokens as below:
'(', ')', '.':
If any of these are found, the appropriate token is added to the back of the tokenizedStr
vector with generic tokenText.
ints, strings, whitespace:
If a character is found to be a digit, a token of type int is created. The tokenizer
continues to iterate along the string until it sees a non-digit. It sets the tokenText
to the value of the entire int, from the first char of the int to the iterator's end
position, then moves on to the next token.
SymbolIds work similarly. If a character is found to be a letter, a procedure much like int
initiates. The only difference is that (as mentioned in class) symbolIds can contain alpha-
numeric characters, but must start with a letter. The iterator iterates along the string
until it finds a character that's not alphanumeric, then sets the tokenText to the entire
token's value.
Whitespace uses the same procedure, and uses the cctype function isspace to capture tabs,
linefeeds, spaces, and more in one function call.
The tokenizer is run on each string from the read input routine's vector, and pushes each token
vector onto a vector of tokenVectors called tokenizedExprs, which is fed to the routine
which converts the token vectors to the internal representation
Convert to internal representation:
When the conversion function is called, a new SExp is created. The function checks to see
whether the token is an int, symbolId, or SExpression. For an int or symbolId, a simple
SExp of tha type is created and returned. for symbolIds, the symbolname is checked against
all existing symbolNames. If the symbol name exists, the extant version is returned instead of
creating a new one. If it doesn't exist, the function creates a new one and returns that, adding
it also to the internal hash table in SExp which holds the symbolId's corresponding SExps.
If the token vector is an s-expression, an s-expression of type 3 is created. the conversion function
sends the token vector to findCarAndCdr, which returns the car and cdr of the s-expression, and each
of those are added to the new SExp's corresponding variables. If the cdr of the s-expression starts
with a whitespace character (which would have been trimmed off of the front/back of the s-exp before
evaluation in other circumstances), then we know that the s-expression is a list. under normal
circumstances, the SExp*'s right and left are just set by a recursive call to the conversion function,
but if it's a list, then the car is set by a recursive call, and the cdr is set by a different
function specifically for lists.
convert to List:
Convert to list takes a vector of tokens and converts it to list notation. it first checks to see
if the vector of strings starts with "atom, whitespace" as its first two characters. If so, the atom
is added to the right side of the s-expression returned by this function, and the right side is
created by a recursive call with the recent atom removed from the vector.
If the vector starts with an l-paren, then it is an s-expression. The function pulls the entire
s-expression into it's own vector, and evaluates it using the normal conversion function. If it's
a list, the conversion function will send it back to the list conversion function lower down the call
stack. If not, it will be evaluated normally. This allows for a mix of list and normal notation to be
used throughout the lisp expression.
Errors:
If anything ourside of the ordiary occurs in any of these functions, including tokenization,
an error can be thrown. Errors are handled by the main function. If an error occurs durring
tokenization, the string being tokenized is ignored, and the next string is tokenized.
If an error occurs durring conversion, the vector of tokens is ignored, and the next vector
is converted. Either way, an error is printed.
Print:
The print function is run after an SExp is created and returned by the conversion function. An
SExp* is read in, and it's type is checked. If it is of type 1 or 2, it's name is printed. If it's
of type 3, an lParen is prined, then e->left is sent recursively to the print function. After that
a . is printed, then e->right is sent recursively to the print function, and finally an rParen is
printed.
Evaluation:
Eval takes two S expressions as arguments, e, and alist. e is the expression being evaluated, the
a-list is the list of arguments to function calls above the call to evaluate e. this list is used
throughout eval to associate function parameter names with their current arguments.
a-list: The a-list is basically a stack. When an s-expression is evaluated, the current a-list is
passed along with it. If that evaluation requires that arguments be put on the a-list, then they
are put at the top. When the evaluation ends, these arguments leave the a-list, by consequence of the
recursive nature of lisp evaluation. This means that an argument can be bound on the a-list multiple times.
When arg values are checked against the a-list, the first one is always used. This means that if you
recursively call a function, it's arguments always evaluate to the args passed in the most recent call.
Evaluation operations on e: If the s-expression is an integer atom, it is simply returned
by the evaluation function. If the s-expression is a symbolic identifier, it needs to be processed.
First, the interpreter checks to see if it is T or NIL. If so, it is returned. If not, the atom is
checked to see if it is bound to the a-list. If it's bound to something on the a-list, then the
thing it is bound to is returned. For instance, if the symbolId being evaluated is "A", and it's
bound to (3.4) on the a-list, (3.4) is returned on evaluation of A.
If e is an s-expression, it must be checked for evaluation. First, e is checked to see if it is a list.
If it is not a list, it can not be evaluated and is an error. If it is a list, then I first check it to
see if it is a call to a primitive function. If it is not a primitive function, then I check to see if it
is a special form. If it not a special form, then I check to see if it is a user defined function call.
If it is none of the above, then it is an error.
Primitive functions:
These are the primitive functions implemented:
CAR, CDR, CONS, ATOM, EQ, NULL, INT, PLUS, MINUS, TIMES, QUOTIENT, REMAINDER, LESS and GREATER
They are all called in evaluated the same way. First, the number of parameters to the function call
is checked. If the number is incorrect (which varies function by function) then an error is thrown.
If the call is correct, then the function params are bound using the "bind" utility function to the
params of that function on a-list, and the corresponding function call is initiated. The code for
these calls is in the lisp.h header file. The functions find their bound parameters on the a-list.
Once the parameters are found, they are evaluated, which may error. If they are evaluated correctly,
they are then either checked to assure they are the right form, and then used, or simply used in the
procedure specified. For instance, PLUS will find the values bound to it's arguments and evaluate them.
It will then make sure that they are integers, and if they are, it will add them together. The function
will then return an s-expression representing it's result.
Special forms:
These are the special forms implemented:
QUOTE, COND, DEFUN
Special forms function internally much like the primitive functions above, but with some differences.
QUOTE:
QUOTE simply returns the value of it's single parameter without evaluating it.
COND:
COND Takes any number of parameters, but each must be of a very specific form. COND iterates down
the list of arguments, adding each one to the a-list to a generic parameter "COND" + type + the param number.
This part is unique from the previous in that it checks that each of the arguments is of the correct
basic format, a two argument list. If all of the above is true, then the COND evaluation function is
called. This function takes more than just the a-list and e as params, it also takes the number of
conditions it was passed. It then systematically evaluates each condition starting at CONDB1, meaning the first
boolean expression. The first one to evaluate to true has it's expression (CONDE#) evaluated and returned.
If none of the boolean expressions evaluate to T, then an error is thrown.
DEFUN:
DEFUN places a function definition onto the D-list. DEFUN first checks to see if it was passed a 3 parameter
list. If not, an error is returned. If it was passed the correct format, it then creates an entry for a function
on the d-list. First, it creates the entry, which is of the form: (F.(PL.FB)) where F is the function name,
PL is the parameter list, and FB is the function body. The function is bound to the d-list in this fashion,
and then the name of the function is returned to be printed.
User-defined Function calls:
If none of the previous functions were applicable to the evaluating s-expression, the interpreter checks to see if
the s-expression is a call to a function on the d-list. If the s-expression's left side is a symbolId,
the interpreter checks to see if that symbolId is on the d-list. If it is not, an error occurs. If it is,
then that function's param list and body are returned from the d-list. Eval checks to make sure that the number
of params passed is the same as the number expected. If they're the same, then each argument is evaluated.
Then they are bound in order to their corresponding function parameters on the a-list using a bind function.
It should be noted that all parameters are evaluated before any are added to the a-list. This allows recursive
functions to have all of their parameters be evaluated with the same a-list, before those parameters values
themselves are added to the a-list. Once this is complete, the function body and new a-list are passed to eval,
and the result is returned.
Utility functions:
Trim:
A function that accepts a vector of tokens and removes leading and trailing whitespace
isAtom:
A function that takes a token and returns whether or not it is atomic
print, for tokens:
Prints a vector of tokens, useful for debugging.
bind functions:
A few functions used to add (symbolId . S-Expression) pairs to the a-list or d-list
Find:
Finds an S-expression bound to a symbolId on the a-list. takes the symbolId, returns the s-expression.
argCt:
counts the number of arguments a given function call has by iterating down the list. returns as a number.
isList:
Checks to see if an s-expression is in the list form or not.
There are also implementations of the primitive functions in the lisp.h header file.