SCANL's srcml_identifier_getter_tool. reads a srcML archive and outputs all identifiers in that archive through standard out.
Please clone recursive since we are currently using submodules.
git clone --recursive [email protected]:SCANL/srcml_identifier_getter_tool.git
- To make this run, you need
libXML2-devandcmakeinstalled. Do so usingapt-get install libxml2-devandapt-get install cmake. mkdir buildin the root directorycd buildcmake ..make./bin/grabidentifiers- Will give you a list of arguments
You may need to git submodule init and git submodule update --remote --recursive in the srcSAXEventDispatcher, srcSAX, and popl folders.
These all assume that the incoming file (e.g., telegram.java.xml) is a srcML archive. Refer to the next section if you need srcML.
Get all identifiers from the a srcml archive:
./build/bin/grabidentifiers telegram.java.xml
Use a sample size of 5 and a random seed of 207085357:
./build/bin/grabidentifiers -s5 -r207085357 telegram.java.xml_position
Use a sample size of 5 and let the seed be generated randomly:
./build/bin/grabidentifiers -s5 telegram.java.xml_position
Use a sample size of 14, let the seed be generated randomly, and specify which contexts you want to draw a sample from.
This will draw a balanced sample from the contexts provided via -c. If we cannot evenly distribute between the provided
contexts, we will add (sample size % #contexts) to the sample size to make it even:
./build/bin/grabidentifiers -s14 -cPARAMETER,FUNCTION,DECLARATION telegram.java.xml_position
You can get srcML from here -- https://www.srcml.org/
Once you download it and install (or you can just use the executable), you do srcml --position <name_of_file_or_directory_containing_code> and it will create a srcML archive from that directory or code file. You can redirect its output into a file using >, or use srcml --position -o <FILE> to output to a specific file. If you need help, use srcml --help.
You don't have to use --position but the tool does collect line number-- line number will be 0 is position is not set.
Not tested on windows yet :c -- works on Ubuntu, probably most linux distros, and probably mac