Course: COMS 4115 Programming Languages and Translators (Fall 2020)
Website: https://www.rayb.info/fall2020
University: Columbia University.
Instructor: Prof. Baishakhi Ray
- Announcement Date: Thursday, October 1, 2020
- Due Date: Wednesday, October 14, 2020 by 11:59 PM. No extensions!
- Total Points: 100
- Visualizing and Understanding ASTs: 10
- Task 1 (Recursive Functions): 40
- Task 2 (Code Reformatting Tool): 50
From this assignment:
- You will learn about the AST structure of the code.
- You will get familiar with different AST APIs provided by LLVM/Clang.
- You will get familiar with Clang's ASTVisitor.
- You will learn how to traverse an AST.
- You will learn how to write a simple code reformatter.
You have already implemented a (albeit very simple) lexical analyzer/tokenizer that converts character sequences from a code stream to token sequences. As you learned in lecture, this token sequence is processed by the parser, and an abstract syntax tree (AST) is generated. In this assignment, you will use the clang compiler to generate an AST. Further, you will investigate two different case studies to analyze an AST.
In Programming Assignment 2.1, you saw how to build LLVM. The Clang compiler front-end leverages the LLVM back-end infrastructure. For any code written in C, C++, or Objective-C, you can generate (and visualize) the AST using Clang.
First, let the shell/terminal know where you built LLVM. We recommend adding the following line to your $HOME/.bashrc file:
export LLVM_HOME=<base directory where you built your llvm>;Next, try the following command:
$LLVM_HOME/build/bin/clang -cc1 -ast-dump examples/gcd.c -I /usr/include/ \
-I $LLVM_HOME/build/lib/clang/12.0.0/include/;You should spend some time looking at the output. Please write a few sentences in the outputs/AST.txt file describing the following:
- What information is being presented in the output
- What you find interesting about the output
You can also generate a visual representation of the AST. Try running the following command:
export LLVM_HOME=<base directory where you built your llvm>;
$LLVM_HOME/build/bin/clang -cc1 -ast-view gcd.c -I /usr/include/ \
-I $LLVM_HOME/build/lib/clang/12.0.0/include/;This will generate a dot file in your /tmp/ directory. Locate the created dot file and identify the filename. Run the following:
dot -Tpdf /tmp/<dot file filename> -o output/AstView.pdf
Take a look at the PDF file that is generated, and try to match that with the output generated in the previous step.
The AST is not just for viewing purposes; there are many powerful things you can do with an AST. For example, you can warn developers when a variable name does not follow a convention (e.g., camel casing) by traversing an AST; if you find or "visit" a variable name, you can verify whether that name matches the intended convention. Otherwise, you can generate a warning.
In this particular assignment, you will investigate how to analyze an AST to extract important information for two different tasks.
Recursive functions are often compact, easy to read, and easy to understand. However, they often incur huge overhead due to call stack maintenance, which is a major concern if you are developing for a resource-constrained system. Your task here is to determine whether a given function is direct recursive or not. We will describe the tools that you will use for this task in a later section.
1. int recursive_factorial(int n1){
2. if (n1 <= 1){
3. return 1;
4. }
5. return n1 * recursive_factorial (n1 - 1);
6. }In the above C code snippet, the function recursive_factorial is a direct recursive function, since there is a call to itself inside the body (line 5). In contrast, the following function is not direct recursive; even though there is a function call at line 4, the callee is not the function itself:
1. int iterative_factorial(int n1){
2. int res = 1;
3. for(int i = 0; i <= n3; i++){
4. res = mult(res, i); // multiplication of res and i.
5. }
6. return res;
7. }We can perform a lot of interesting operations using an AST. In Task 1, you learned how to identify recursive functions. In this task, you will gain some hands-on experience on automatic code formatting by analyzing an AST.
Suppose you have a function call in your code such as the following:
foo( 1,
2 ,
3 , 5)Then, you will need to format it as:
foo (1, 2, 3, 5)More specifically, you will need to format the call expression as <callee><space>(<arg1>,<space><arg2>,<space>...,<space><argn>) for all n arguments. The <space> represents a single space character ' '. Note that the callee and/or arguments of a function may also be function calls themselves, and you will need to figure out how to reformat those nested function calls whenever they arise.
The conceptual AST structure of a CallExpr (Function Call Expression) node in an AST looks like this:
The Callee and any of the arguments (i.e., arg1, arg2, etc.) can also be CallExpr nodes. Take a look at the next examples, which illustrate (and explain) this construction.
Note that the following code is very difficult to read:
1. int bar(int k){
2. foo(
3. bar ( k
4. ),
5. 1 );
6. return 0;
7. }However, when you run your reformatter tool, the function call to foo at line 2 should be formatted as:
foo (bar (k), 1)Now, it is definitely much easier to understand the function call!
Explanation: in the function call at line 2, the callee is a function named foo, which takes in two arguments. One of the arguments (the first one) is also a function call that invokes bar with the argument k. The function call to bar is formatted as bar (k), which contributes to the original foo function call's formatting. Hence, the whole function call is reformatted as foo (bar (k), 1).
Here is another example:
1. typedef int (*FuncPtr)(int, int);
2.
3. int addNum(int a, int b) {
4. return a + b;
5. }
6.
7. int mulNum(int a, int b) {
8. return a * b;
9. }
10.
11. FuncPtr getFunc(int op) {
12. return op == 1 ? &addNum :
13. op == 2 ? &mulNum :
14. (FuncPtr)0;
15. }
16.
17. int main() {
18. int ret = getFunc(
19. 1+0 )( 5 , 6 );
20. return 0;
21. }The reformatted code that you should generate is:
getFunc (1+0) (5, 6)Explanation: in line 18 of the above code, there is a function call. It is slightly more complicated than the one in the previous example.
Here, the Callee is not a function name; rather, it is another function call (a CallExpr node) to getFunc, which takes in one argument. Thus, we reformat the Callee to getFunc (1+0), and we finally get the formatted output getFunc (1+0) (5, 6).
To implement the above two tasks, you will build a Clang tool that uses LLVM/Clang's RecursiveASTVisitor API. We have provided all the setup code to get started. However, we strongly recommend that you go over the API documentation of Clang tooling and AST visitors to understand the basic workflow.
- Create a folder named
clang-hw2under$LLVM_HOME/clang/tools. - Copy the ClangHw2.cpp, CMakeLists.txt, hw2_util.h, and hw2_util.cpp files into
$LLVM_HOME/clang/tools/clang-hw2. - Edit the
$LLVM_HOME/clang/tools/CmakeLists.txtfile, and add this line:add_clang_subdirectory(clang-hw2). - Now, go to
$LLVM_HOME/build, and runmake. When the build has successfully finished, it will generate a binary file namedclang-hw2in$LLVM_HOME/build/bin. - Finally, run the generated binary using the following command:
$LLVM_HOME/build/bin/clang-hw2 examples/gcd.c --
The FunctionVisitor class is a recursive AST visitor, which implements three visitors for two different types of AST nodes. The VisitForStmt is called when Clang's ASTVisitor encounters a ForStmt type of AST node. You DO NOT have to do anything with this function; we are providing it to give you a head start with ASTVisitor. The VisitFunctionDecl function is called when a FunctionDecl (function declaration) node is encountered.
Here are some other notes about the tasks:
We implemented VisitFunctionDecl, which calls the helper function isRecursiveFunction and decides whether that function is direct recursive or not. All you have to do is implement this isRecursiveFunction function.
You may consider the following constraint for Task 1:
- We will only test C code inputs. You DO NOT need to handle function calls in C++ or C++-specific functionality (including operator overloading or user-defined literals, etc.).
When you have fully implemented the first task and have run the tool with gcd.c, you will see the following output:
gcd_recursive - recursive
From the VisitFunctionDecl function, we call analyzeCallExpressionReformat to perform a depth-first search (DFS) on the AST. While performing DFS, if we encounter any CallExpr node, we call the formatFunctionCall function for formatting the code of that call expression. Note that, you DON'T have to identify call expressions in a given code snippet. We have already implemented that for you in this function. All you have to do is implement the formatFunctionCall function and return the formatted code string.
You may consider the following constraints for Task 2:
- You have to reformat only the
CallExprnode. If you encounter any other node (for instance,1+0in line 19 of the second example is aBinaryOperator node), you should copy the code as is from the input source. We have provided a helper functiongetSourceto copy the input code corresponding to a node. - The callee or arguments of a function call will be either a pure function call or a pure non-function call, i.e., there will not be a mixture of functions and non-functions involved in binary expressions, conditional expressions, etc. As an example, we will NOT test the following case:
foo(bar(3) + 1, 9 + bar(6))- We will only test C code inputs. You DO NOT need to handle function calls in C++ or C++-specific functionality (including operator overloading or user-defined literals, etc.).
- This problem may look like a simple character "parsing and formatting" problem, but you MUST use the template code we provided. Please DO NOT change function prototypes of any of the functions we have written.
We have also provided some other helper functions.
Please follow these steps for submission:
- Copy the completed
ClangHw2.cppfile from your$LLVM_HOME/clang/tools/clang-hw2/directory to the project'ssrcfolder. - Complete the write-up in outputs/AST.txt.
- Commit your code.
- Push the code to the master branch.
If you have any questions about this programming assignment, please post them in the Piazza forum for the course, and an instructor will reply to them as soon as possible. Any updates to the assignment itself will be available in Piazza.
This assignment belongs to Columbia University. It may be freely used for educational purposes.

