Research work completed during a Summer Undergraduate Research Fellowship (SURF) at Cornell University, under the supervision of Professor Owolabi Legunsen from the Software Engineering Lab.
- Background
- Repository Structure
- rv-warmup — Learning the RV Framework
- inline-coevolution — Main Research Project
- Key Achievements
- Collaborators
- References
Runtime Verification (RV) is a lightweight formal verification technique that monitors software systems during execution to check whether their behavior conforms to formally specified properties. Unlike static analysis, which reasons about code without executing it, or traditional testing, which checks specific inputs, runtime verification observes the actual execution trace and validates it against formal specifications in real time.
In the Java ecosystem, a prominent RV framework is JavaMOP (Monitoring-Oriented Programming), which allows developers to define formal specifications as regular expressions, context-free grammars, or finite-state machines. These specifications capture API usage contracts — for example, "an Iterator should always call hasNext() before next()" or "a Collection should be sorted before calling binarySearch()". When a running program violates one of these properties, JavaMOP raises a violation alert, enabling developers to detect bugs that might otherwise only manifest as subtle, hard-to-reproduce failures.
Professor Owolabi Legunsen's research group at Cornell has contributed significantly to advancing software testing methodologies, with a particular focus on inline tests (I-Tests). Inline tests are a novel testing paradigm in which test assertions are placed directly alongside production code — at the statement level — rather than in separate test files. This approach offers several advantages:
- Fine-grained coverage: Inline tests target individual statements, enabling precise validation of specific computations.
- Co-location with code: Because the tests live next to the code they verify, they are easier to understand, maintain, and keep synchronized with evolving source code.
- Bug detection during development: When the target statement changes, an inline test can immediately flag whether the change introduces a regression.
A key tool from this line of research is ExLi (Extracting Inline Tests), which automatically generates inline tests for Java programs by extracting assertions from existing unit tests. ExLi was presented at the ACM International Conference on the Foundations of Software Engineering (FSE 2024) by Yu Liu, Aditya Thimmaiah, Owolabi Legunsen, and Milos Gligoric.
Runtime-Verification/
├── README.md
├── rv-warmup/ # Introductory project to learn Runtime Verification
│ ├── toy-app/ # A toy Maven project used to practice RV with JavaMOP
│ ├── inspection.txt # Manual inspection of RV violations across open-source projects
│ ├── violation-counts-* # Violation count summaries for jsoup, JSqlParser, jetty-util, commons-compress
│ └── rv-warmup.pdf # Reference materials and tasks
├── inline-coevolution/ # Main research project on inline test co-evolution
│ ├── coEvoTask_Sebastian/ # Sebastian's pipeline: automated inline test co-evolution
│ ├── coEvoTask_Nan/ # Nan's pipeline: target statement change detection
│ ├── I-Test/ # I-Test framework integration files
│ ├── paper/ # LaTeX source for the research paper
│ ├── docs/ # Supplementary documentation (ExLi appendix)
│ ├── changed-result.csv # Dataset of target statements with version changes
│ ├── changed-result.json # JSON format of the dataset
│ └── ideas.txt / TODO.txt # Research notes and task tracking
└── papers/ # Related research papers
This sub-project served as an introduction to the research methodology and tooling. The goal was to become familiar with JavaMOP and the runtime verification workflow by applying it to real open-source Java projects.
A small Maven-based Java application (rv-warmup/toy-app/) was used to practice:
- Compiling the project with Maven.
- Running tests instrumented with JavaMOP monitoring agents.
- Analyzing violation reports generated by the RV framework.
Build logs and violation summaries are stored in the toy-compile.txt, toy-test.txt, toy-rv.txt, and violation-counts files.
After learning the tooling, violations were inspected across four real open-source Java projects:
| Project | Violation Counts File |
|---|---|
| jsoup | violation-counts-jsoup |
| JSqlParser | violation-counts-JSqlParser |
| Jetty Util | violation-counts-jetty-util |
| Commons Compress | violation-counts-commons-compress |
Each violation was manually classified into one of three categories:
- TrueBug — A genuine violation indicating a real defect or code smell.
- FalseAlarm — A violation that does not correspond to an actual bug (e.g., safe usage patterns the spec cannot capture).
- HardToInspect — A violation where the correctness is ambiguous without deeper analysis.
The detailed inspection results and reasoning are documented in rv-warmup/inspection.txt.
This is the primary research contribution, investigating how inline tests co-evolve with source code and evaluating the efficacy of inline tests in catching bugs during software development.
Traditional unit tests are maintained in separate test files, which can become desynchronized from the production code they validate. Inline tests, placed directly next to target statements in production code, offer a tighter coupling. But a key question remains:
Can inline tests effectively detect breakages introduced by code changes as a project evolves over time?
This project systematically answers this question by simulating the evolution of inline tests across the Git history of open-source Java projects.
The research followed a four-stage pipeline applied to 30+ open-source Java projects:
- Roll back in time: Navigate to an earlier commit in the project's Git history where a known target statement exists.
- Generate inline tests: At that historical commit, use the ExLi tool to automatically generate inline tests from existing unit tests, targeting specific statements.
- Roll forward through history: Advance through subsequent commits using Git patches and cherry-picking, resolving merge conflicts to keep the inline tests present in the codebase.
- Evaluate at each commit: At every commit along the forward path, run the inline tests and record whether they pass, fail, or encounter compilation/parsing errors.
This process simulates what would have happened if inline tests had been introduced at an earlier point in the project's development — effectively measuring their ability to catch regressions as the code evolved.
[Historical Commit] [Intermediate Commits] [Latest Commit]
│ │ │
┌────┴────┐ ┌─────┴─────┐ ┌─────┴─────┐
│ Generate│ Roll Forward │ Run Tests │ Roll Forward │ Run Tests │
│ I-Tests │ ──────────────► │ & Record │ ───────────────► │ & Record │
│ (ExLi) │ (patch/merge) │ Results │ (patch/merge) │ Results │
└─────────┘ └───────────┘ └───────────┘
Contains the main automation pipeline (run-itest.py) which:
- Clones target projects and checks out specific commits.
- Retrieves ExLi-generated serialization data (XML files).
- Iterates through the commit history, adding inline tests at the correct line numbers.
- Runs the tests using the I-Test framework (parsing, compiling, executing).
- Records results for each commit into structured CSV/JSON datasets.
- Classifies each checkpoint as:
pass,fail,compile_error,parse_error, ordeleted.
Results are stored in the results/ and results-20/ directories, with summary data in data.csv.
Contains scripts for analyzing which target statements change across the Git history:
- findVersion.sh — Shell script that clones repositories, walks through Git history, and identifies commits where target statements changed.
- ProcessDiff.java — Java utility for processing Git diffs to detect statement-level changes.
- find_change_in_all.py — Filters results to identify target statements that changed at least once.
- dump2csv.py / final_merge.py — Data processing utilities to merge and format results.
- process_1st_stage_data/ — Intermediate data processing scripts for combining and analyzing first-stage outputs.
- projects-20-logs/ — Execution logs for the first 20 projects analyzed.
LaTeX source files for the academic paper documenting the findings, including:
main.tex— Main document.abstract.tex/intro.tex— Paper sections.bib.bib— Bibliography.macros.tex/defs/— Custom macros for reporting dataset statistics.
Contains the I-Test Java framework files and screenshots of successful integration runs.
The curated dataset covers:
- 30+ open-source Java projects analyzed across their Git histories.
- 200+ statement-level breaking changes identified and classified.
- Results stored in inline-coevolution/changed-result.csv with the following schema:
| Column | Description |
|---|---|
id |
Unique identifier for the target statement |
project_name |
GitHub repository (e.g., Asana/java-asana) |
class_path |
Relative path to the Java source file |
given_sha |
The initial commit where the inline test was generated |
line_number_in_given_sha |
Line number of the target statement |
total_versions |
Total number of commits analyzed |
total_changed_versions |
Number of commits where the target statement changed |
changed_shas |
Commit hashes where changes occurred |
- Studied test co-evolution across 30+ open-source Java projects, systematically analyzing how inline tests behave as codebases evolve.
- Curated a dataset of 200+ statement-level breaking changes, classified by type and impact.
- Developed an automated pipeline that simulates project Git history, generates inline tests from unit tests using ExLi, and evaluates their results at each commit.
- Implemented shell scripts and Python automation for reproducible Maven builds, merge conflict handling, and structured results logging.
- Gained deep understanding of software testing at scale and empirical software engineering research methodology.
- Sebastian Urrea — Undergraduate Research Intern (SURF)
- Nan Huang — Research Collaborator
- Professor Owolabi Legunsen — Faculty Advisor, Software Engineering Lab, Cornell University
- Yu Liu (Yuki) — PhD Student at UT Austin, collaborator on ExLi and I-Test tooling
- Milos Gligoric — Associate Professor at UT Austin, collaborator on ExLi and I-Test research
-
Yu Liu, Aditya Thimmaiah, Owolabi Legunsen, and Milos Gligoric. "ExLi: An Inline-Test Generation Tool for Java." In Proceedings of the ACM International Conference on the Foundations of Software Engineering (FSE), Tool Demonstrations, 2024.
-
Owolabi Legunsen, Wajih Ul Hassan, Xinyue Xu, Grigore Roşu, and Darko Marinov. "How Good Are the Specs? A Study of the Bug-Finding Effectiveness of Existing Java API Specifications." In Proceedings of the International Conference on Automated Software Engineering (ASE), 2016.
-
Cornell Software Engineering Lab — https://www.cs.cornell.edu/~legunsen/