CppLispInterpreter/designP2.txt at master · boubinjg/CppLispInterpreter · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
Interpreter design:
	Language:
		I chose to use C++11 for this interpreter. Unfortunately, c++11 isn't supported by stdlinux, so
		c++0x it is.

	Main function:
		The main function reads the input from std::cin up to a $, and returns it as a string. the string is then
		tokenized and returned as a vector of vectors of tokens, and then each individual
		vector of tokens corresponding to a separate lisp expression is converted to internal representation
		as a single S-expression, and printed. After printing, the SExpression is evaluated as a lisp expression
		using the eval function.

		Tokenization and conversion, and evaluation can generate errors, so the main loop has error handling logic
		which allows it to print the error and continue onto the next s-expression.

	Classes:
    The interpreter has two structs, token and SExp.
		Token:
			Token is an aggregate struct that contains two strings, tokenType, and tokenText.

			tokenType is the type of token, which can be one of the following:
				"whitespace", "lParen", "rParen", "int", "symbolId", "dot"
			Notice, that "$", and "$$", are valid symbols. These are used by the tokenizer to
			separate s-expressions, and are not needed for evaluation so I got rid of them.

			tokenText is the text that generated that token. for instance, "1" will have a token
			type of "int", and a tokenText of "1". This is so the token can know and write it's
			generating string. Only symbolid and int actually use this.

		SExp:
			This class is very similar to the version presented in class. It has an int type,
			int val, string name, and SExp* left and right. Type {1,2,3} denotes the type of the
			expression, 1 for int, 2 for symbolId, and 3 for SExp. Then the appropriate variables
			are used to hold the contents of that s-expression, be it an integer, string, or
			two SExp*'s representing the car and cdr of the s-expression.
			SExp also has an unordered_map which holds SExp*'s uniquely identified by symbolId names.
			This allows us to hold known name and their SExp*s for use in the conversion function.

	Reading input:
		The interpreter first waits for input using std::cin. It reads input line by line separated by
		newline characters until it sees a "$", at which point, it stops reading input returns the input string.

	Tokenizer:
		Once input is read and returned as a string, Each expression is then searched for identifiers of
		tokens. The tokenizer uses a vector to record the string of tokens. The function iterates over the
		string lisp expression and finds tokens as below:

		'(', ')', '.':
			If any of these are found, the appropriate token is added to the back of the tokenizedStr
			vector with generic tokenText.

		ints, strings, whitespace:
			If a character is found to be a digit, a token of type int is created. The tokenizer
			continues to iterate along the string until it sees a non-digit. It sets the tokenText
			to the value of the entire int, from the first char of the int to the iterator's end
			position, then moves on to the next token.
			SymbolIds work similarly. If a character is found to be a letter, a procedure much like int
			initiates. The only difference is that (as mentioned in class) symbolIds can contain alpha-
			numeric characters, but must start with a letter. The iterator iterates along the string
			until it finds a character that's not alphanumeric, then sets the tokenText to the entire
			token's value.
			Whitespace uses the same procedure, and uses the cctype function isspace to capture tabs,
			linefeeds, spaces, and more in one function call.

		The tokenizer is run on each string from the read input routine's vector, and pushes each token
		vector onto a vector of tokenVectors called tokenizedExprs, which is fed to the routine
		which converts the token vectors to the internal representation

	Convert to internal representation:
		When the conversion function is called, a new SExp is created. The function checks to see
		whether the token is an int, symbolId, or SExpression. For an int or symbolId, a simple
		SExp of tha type is created and returned. for symbolIds, the symbolname is checked against
		all existing symbolNames. If the symbol name exists, the extant version is returned instead of
		creating a new one. If it doesn't exist, the function creates a new one and returns that, adding
		it also to the internal hash table in SExp which holds the symbolId's corresponding SExps.

		If the token vector is an s-expression, an s-expression of type 3 is created. the conversion function
		sends the token vector to findCarAndCdr, which returns the car and cdr of the s-expression, and each
		of those are added to the new SExp's corresponding variables. If the cdr of the s-expression starts
		with a whitespace character (which would have been trimmed off of the front/back of the s-exp before
		evaluation in other circumstances), then we know that the s-expression is a list. under normal
		circumstances, the SExp*'s right and left are just set by a recursive call to the conversion function,
		but if it's a list, then the car is set by a recursive call, and the cdr is set by a different
		function specifically for lists.

	convert to List:
		Convert to list takes a vector of tokens and converts it to list notation. it first checks to see
		if the vector of strings starts with "atom, whitespace" as its first two characters. If so, the atom
		is added to the right side of the s-expression returned by this function, and the right side is
		created by a recursive call with the recent atom removed from the vector.
		If the vector starts with an l-paren, then it is an s-expression. The function pulls the entire
		s-expression into it's own vector, and evaluates it using the normal conversion function. If it's
		a list, the conversion function will send it back to the list conversion function lower down the call
		stack. If not, it will be evaluated normally. This allows for a mix of list and normal notation to be
		used throughout the lisp expression.

	Errors:
		If anything ourside of the ordiary occurs in any of these functions, including tokenization,
		an error can be thrown. Errors are handled by the main function. If an error occurs durring
		tokenization, the string being tokenized is ignored, and the next string is tokenized.
		If an error occurs durring conversion, the vector of tokens is ignored, and the next vector
		is converted. Either way, an error is printed.

	Print:
		The print function is run after an SExp is created and returned by the conversion function. An
		SExp* is read in, and it's type is checked. If it is of type 1 or 2, it's name is printed. If it's
		of type 3, an lParen is prined, then e->left is sent recursively to the print function. After that
		a . is printed, then e->right is sent recursively to the print function, and finally an rParen is
		printed.

    Evaluation:
        Eval takes two S expressions as arguments, e, and alist. e is the expression being evaluated, the
        a-list is the list of arguments to function calls above the call to evaluate e. this list is used
        throughout eval to associate function parameter names with their current arguments.

        a-list: The a-list is basically a stack. When an s-expression is evaluated, the current a-list is
        passed along with it. If that evaluation requires that arguments be put on the a-list, then they
        are put at the top. When the evaluation ends, these arguments leave the a-list, by consequence of the
        recursive nature of lisp evaluation. This means that an argument can be bound on the a-list multiple times.
        When arg values are checked against the a-list, the first one is always used. This means that if you
        recursively call a function, it's arguments always evaluate to the args passed in the most recent call.

        Evaluation operations on e: If the s-expression is an integer atom, it is simply returned
        by the evaluation function. If the s-expression is a symbolic identifier, it needs to be processed.
        First, the interpreter checks to see if it is T or NIL. If so, it is returned. If not, the atom is
        checked to see if it is bound to the a-list. If it's bound to something on the a-list, then the
        thing it is bound to is returned. For instance, if the symbolId being evaluated is "A", and it's
        bound to (3.4) on the a-list, (3.4) is returned on evaluation of A.
        If e is an s-expression, it must be checked for evaluation. First, e is checked to see if it is a list.
        If it is not a list, it can not be evaluated and is an error. If it is a list, then I first check it to
        see if it is a call to a primitive function. If it is not a primitive function, then I check to see if it
        is a special form. If it not a special form, then I check to see if it is a user defined function call.
        If it is none of the above, then it is an error.

        Primitive functions:
            These are the primitive functions implemented:
                CAR, CDR, CONS, ATOM, EQ, NULL, INT, PLUS, MINUS, TIMES, QUOTIENT, REMAINDER, LESS and GREATER
            They are all called in evaluated the same way. First, the number of parameters to the function call
            is checked. If the number is incorrect (which varies function by function) then an error is thrown.
            If the call is correct, then the function params are bound using the "bind" utility function to the
            params of that function on a-list, and the corresponding function call is initiated. The code for
            these calls is in the lisp.h header file. The functions find their bound parameters on the a-list.
            Once the parameters are found, they are evaluated, which may error. If they are evaluated correctly,
            they are then either checked to assure they are the right form, and then used, or simply used in the
            procedure specified. For instance, PLUS will find the values bound to it's arguments and evaluate them.
            It will then make sure that they are integers, and if they are, it will add them together. The function
            will then return an s-expression representing it's result.

        Special forms:
            These are the special forms implemented:
                QUOTE, COND, DEFUN
            Special forms function internally much like the primitive functions above, but with some differences.

            QUOTE:
                QUOTE simply returns the value of it's single parameter without evaluating it.

            COND:
                COND Takes any number of parameters, but each must be of a very specific form. COND iterates down
                the list of arguments, adding each one to the a-list to a generic parameter "COND" + type + the param number.
                This part is unique from the previous in that it checks that each of the arguments is of the correct
                basic format, a two argument list. If all of the above is true, then the COND evaluation function is
                called. This function takes more than just the a-list and e as params, it also takes the number of
                conditions it was passed. It then systematically evaluates each condition starting at CONDB1, meaning the first
                boolean expression. The first one to evaluate to true has it's expression (CONDE#) evaluated and returned.
                If none of the boolean expressions evaluate to T, then an error is thrown.
            DEFUN:
                DEFUN places a function definition onto the D-list. DEFUN first checks to see if it was passed a 3 parameter
                list. If not, an error is returned. If it was passed the correct format, it then creates an entry for a function
                on the d-list. First, it creates the entry, which is of the form: (F.(PL.FB)) where F is the function name,
                PL is the parameter list, and FB is the function body. The function is bound to the d-list in this fashion,
                and then the name of the function is returned to be printed.

        User-defined Function calls:
            If none of the previous functions were applicable to the evaluating s-expression, the interpreter checks to see if
            the s-expression is a call to a function on the d-list. If the s-expression's left side is a symbolId,
            the interpreter checks to see if that symbolId is on the d-list. If it is not, an error occurs. If it is,
            then that function's param list and body are returned from the d-list. Eval checks to make sure that the number
            of params passed is the same as the number expected. If they're the same, then each argument is evaluated.
            Then they are bound in order to their corresponding function parameters on the a-list using a bind function.
            It should be noted that all parameters are evaluated before any are added to the a-list. This allows recursive
            functions to have all of their parameters be evaluated with the same a-list, before those parameters values
            themselves are added to the a-list. Once this is complete, the function body and new a-list are passed to eval,
            and the result is returned.

	Utility functions:
		Trim:
			A function that accepts a vector of tokens and removes leading and trailing whitespace

		isAtom:
			A function that takes a token and returns whether or not it is atomic

		print, for tokens:
			Prints a vector of tokens, useful for debugging.

		bind functions:
		    A few functions used to add (symbolId . S-Expression) pairs to the a-list or d-list

		Find:
            Finds an S-expression bound to a symbolId on the a-list. takes the symbolId, returns the s-expression.

        argCt:
            counts the number of arguments a given function call has by iterating down the list. returns as a number.

        isList:
            Checks to see if an s-expression is in the list form or not.

        There are also implementations of the primitive functions in the lisp.h header file.