You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+43-14Lines changed: 43 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,11 +1,19 @@
1
1
# SCALAR Part-of-speech tagger
2
2
This the official release of the SCALAR Part-of-speech tagger
3
3
4
-
**NOTE**
5
-
There is a fork of SCALAR which was designed to handle parallel http requests and cache SCALAR's output to increase its speed. You can find this version here: https://github.com/brandonscholten/scanl_tagger. These will be combined into a single application in the *very* near future.
4
+
## Getting Started with Docker
5
+
6
+
To run SCNL tagger in a Docker container you can clone the repository and pull the latest docker impage from `srcml/scanl_tagger:latest`
You will need `python3` installed. We will explicitly use the `python3` command below but, of course, if your environment is configured to use python3 by default, you do not need to. We have also only tested this on **Ubuntu 22** and **Ubuntu via WSL**. It most likely works in similar environments, but no guarantees.
16
+
You will need `python3.10` installed.
9
17
10
18
You'll need to install `pip` -- https://pip.pypa.io/en/stable/installation/
11
19
@@ -19,17 +27,29 @@ Finally, we require the `token` and `target` vectors from [code2vec](https://git
19
27
20
28
## Usage
21
29
22
-
```bash
23
-
python main.py -v # Display the application version.
24
-
python main.py -r # Start the server for tagging requests.
25
-
python main.py -t # Run the training set to retrain the model.
"cache selection" will save results to a separate cache if it is set to "student"
51
+
52
+
"code context" is one of:
33
53
- FUNCTION
34
54
- ATTRIBUTE
35
55
- CLASS
@@ -38,16 +58,19 @@ Where "code context" is one of:
38
58
39
59
For example:
40
60
41
-
Tag a declaration: ``http://127.0.0.1:5000/numberArray/DECLARATION``
61
+
Tag a declaration: ``http://127.0.0.1:5000/cache/numberArray/DECLARATION``
62
+
63
+
Tag a function: ``http://127.0.0.1:5000/cache/GetNumberArray/FUNCTION``
42
64
43
-
Tag a function: ``http://127.0.0.1:5000/GetNumberArray/FUNCTION``
65
+
Tag an class: ``http://127.0.0.1:5000/cache/PersonRecord/CLASS``
44
66
45
-
Tag an class: ``http://127.0.0.1:5000/PersonRecord/CLASS``
67
+
#### Note
68
+
Kebab case is not currently supported due to the limitations of Spiral. Attempting to send the tagger identifiers which are in kebab case will result in the entry of a single noun.
46
69
47
70
You will need to have a way to parse code and filter out identifier names if you want to do some on-the-fly analysis of source code. We recommend [srcML](https://www.srcml.org/). Since the actual tagger is a web server, you don't have to use srcML. You could always use other AST-based code representations, or any other method of obtaining identifier information.
48
71
49
72
## Training the tagger
50
-
You can train this tagger using the `-t` option (which will re-run the training routine). For the moment, most of this is hard-coded in, so if you want to use a different data set/different seeds, you'll need to modify the code. This is will potentially change in the future.
73
+
You can train this tagger using the `-t` option (which will re-run the training routine). For the moment, most of this is hard-coded in, so if you want to use a different data set/different seeds, you'll need to modify the code. This will potentially change in the future.
51
74
52
75
## Errors?
53
76
Please make an issue if you run into errors
@@ -63,3 +86,9 @@ The data used to train this tagger can be found in the most recent database upda
63
86
64
87
# Interested in our other work?
65
88
Find our other research [at our webpage](https://www.scanl.org/) and check out the [Identifier Name Structure Catalogue](https://github.com/SCANL/identifier_name_structure_catalogue)
89
+
90
+
# WordNet
91
+
This project uses WordNet to perform a dictionary lookup on the individual words in each identifier:
92
+
93
+
Princeton University "About WordNet." [WordNet](https://wordnet.princeton.edu/). Princeton University. 2010
0 commit comments