Runtime Verification — Summer Undergraduate Research at Cornell University

Research work completed during a Summer Undergraduate Research Fellowship (SURF) at Cornell University, under the supervision of Professor Owolabi Legunsen from the Software Engineering Lab.

Background

What is Runtime Verification?

Runtime Verification (RV) is a lightweight formal verification technique that monitors software systems during execution to check whether their behavior conforms to formally specified properties. Unlike static analysis, which reasons about code without executing it, or traditional testing, which checks specific inputs, runtime verification observes the actual execution trace and validates it against formal specifications in real time.

In the Java ecosystem, a prominent RV framework is JavaMOP (Monitoring-Oriented Programming), which allows developers to define formal specifications as regular expressions, context-free grammars, or finite-state machines. These specifications capture API usage contracts — for example, "an Iterator should always call hasNext() before next()" or "a Collection should be sorted before calling binarySearch()". When a running program violates one of these properties, JavaMOP raises a violation alert, enabling developers to detect bugs that might otherwise only manifest as subtle, hard-to-reproduce failures.

Inline Tests and Professor Legunsen's Research

Professor Owolabi Legunsen's research group at Cornell has contributed significantly to advancing software testing methodologies, with a particular focus on inline tests (I-Tests). Inline tests are a novel testing paradigm in which test assertions are placed directly alongside production code — at the statement level — rather than in separate test files. This approach offers several advantages:

Fine-grained coverage: Inline tests target individual statements, enabling precise validation of specific computations.
Co-location with code: Because the tests live next to the code they verify, they are easier to understand, maintain, and keep synchronized with evolving source code.
Bug detection during development: When the target statement changes, an inline test can immediately flag whether the change introduces a regression.

A key tool from this line of research is ExLi (Extracting Inline Tests), which automatically generates inline tests for Java programs by extracting assertions from existing unit tests. ExLi was presented at the ACM International Conference on the Foundations of Software Engineering (FSE 2024) by Yu Liu, Aditya Thimmaiah, Owolabi Legunsen, and Milos Gligoric.

Repository Structure

Runtime-Verification/
├── README.md
├── rv-warmup/                  # Introductory project to learn Runtime Verification
│   ├── toy-app/                # A toy Maven project used to practice RV with JavaMOP
│   ├── inspection.txt          # Manual inspection of RV violations across open-source projects
│   ├── violation-counts-*      # Violation count summaries for jsoup, JSqlParser, jetty-util, commons-compress
│   └── rv-warmup.pdf           # Reference materials and tasks
├── inline-coevolution/         # Main research project on inline test co-evolution
│   ├── coEvoTask_Sebastian/    # Sebastian's pipeline: automated inline test co-evolution
│   ├── coEvoTask_Nan/          # Nan's pipeline: target statement change detection
│   ├── I-Test/                 # I-Test framework integration files
│   ├── paper/                  # LaTeX source for the research paper
│   ├── docs/                   # Supplementary documentation (ExLi appendix)
│   ├── changed-result.csv      # Dataset of target statements with version changes
│   ├── changed-result.json     # JSON format of the dataset
│   └── ideas.txt / TODO.txt    # Research notes and task tracking
└── papers/                     # Related research papers

rv-warmup — Learning the RV Framework

This sub-project served as an introduction to the research methodology and tooling. The goal was to become familiar with JavaMOP and the runtime verification workflow by applying it to real open-source Java projects.

Toy Application

A small Maven-based Java application (rv-warmup/toy-app/) was used to practice:

Compiling the project with Maven.
Running tests instrumented with JavaMOP monitoring agents.
Analyzing violation reports generated by the RV framework.

Build logs and violation summaries are stored in the toy-compile.txt, toy-test.txt, toy-rv.txt, and violation-counts files.

Violation Inspection

After learning the tooling, violations were inspected across four real open-source Java projects:

Project	Violation Counts File
jsoup	`violation-counts-jsoup`
JSqlParser	`violation-counts-JSqlParser`
Jetty Util	`violation-counts-jetty-util`
Commons Compress	`violation-counts-commons-compress`

Each violation was manually classified into one of three categories:

TrueBug — A genuine violation indicating a real defect or code smell.
FalseAlarm — A violation that does not correspond to an actual bug (e.g., safe usage patterns the spec cannot capture).
HardToInspect — A violation where the correctness is ambiguous without deeper analysis.

The detailed inspection results and reasoning are documented in rv-warmup/inspection.txt.

inline-coevolution — Main Research Project

This is the primary research contribution, investigating how inline tests co-evolve with source code and evaluating the efficacy of inline tests in catching bugs during software development.

Research Motivation

Traditional unit tests are maintained in separate test files, which can become desynchronized from the production code they validate. Inline tests, placed directly next to target statements in production code, offer a tighter coupling. But a key question remains:

Can inline tests effectively detect breakages introduced by code changes as a project evolves over time?

This project systematically answers this question by simulating the evolution of inline tests across the Git history of open-source Java projects.

Methodology

The research followed a four-stage pipeline applied to 30+ open-source Java projects:

Roll back in time: Navigate to an earlier commit in the project's Git history where a known target statement exists.
Generate inline tests: At that historical commit, use the ExLi tool to automatically generate inline tests from existing unit tests, targeting specific statements.
Roll forward through history: Advance through subsequent commits using Git patches and cherry-picking, resolving merge conflicts to keep the inline tests present in the codebase.
Evaluate at each commit: At every commit along the forward path, run the inline tests and record whether they pass, fail, or encounter compilation/parsing errors.

This process simulates what would have happened if inline tests had been introduced at an earlier point in the project's development — effectively measuring their ability to catch regressions as the code evolved.

Pipeline Overview

  [Historical Commit]          [Intermediate Commits]          [Latest Commit]
         │                            │                              │
    ┌────┴────┐                 ┌─────┴─────┐                  ┌─────┴─────┐
    │ Generate│   Roll Forward  │ Run Tests │   Roll Forward   │ Run Tests │
    │ I-Tests │ ──────────────► │ & Record  │ ───────────────► │ & Record  │
    │ (ExLi)  │   (patch/merge) │  Results  │   (patch/merge)  │  Results  │
    └─────────┘                 └───────────┘                  └───────────┘

Project Components

`coEvoTask_Sebastian/` — Automated Co-Evolution Pipeline

Contains the main automation pipeline (run-itest.py) which:

Clones target projects and checks out specific commits.
Retrieves ExLi-generated serialization data (XML files).
Iterates through the commit history, adding inline tests at the correct line numbers.
Runs the tests using the I-Test framework (parsing, compiling, executing).
Records results for each commit into structured CSV/JSON datasets.
Classifies each checkpoint as: pass, fail, compile_error, parse_error, or deleted.

Results are stored in the results/ and results-20/ directories, with summary data in data.csv.

`coEvoTask_Nan/` — Target Statement Change Detection

Contains scripts for analyzing which target statements change across the Git history:

findVersion.sh — Shell script that clones repositories, walks through Git history, and identifies commits where target statements changed.
ProcessDiff.java — Java utility for processing Git diffs to detect statement-level changes.
find_change_in_all.py — Filters results to identify target statements that changed at least once.
dump2csv.py / final_merge.py — Data processing utilities to merge and format results.
process_1st_stage_data/ — Intermediate data processing scripts for combining and analyzing first-stage outputs.
projects-20-logs/ — Execution logs for the first 20 projects analyzed.

`paper/` — Research Paper (LaTeX)

LaTeX source files for the academic paper documenting the findings, including:

main.tex — Main document.
abstract.tex / intro.tex — Paper sections.
bib.bib — Bibliography.
macros.tex / defs/ — Custom macros for reporting dataset statistics.

`I-Test/` — Framework Integration

Contains the I-Test Java framework files and screenshots of successful integration runs.

Dataset

The curated dataset covers:

30+ open-source Java projects analyzed across their Git histories.
200+ statement-level breaking changes identified and classified.
Results stored in inline-coevolution/changed-result.csv with the following schema:

Column	Description
`id`	Unique identifier for the target statement
`project_name`	GitHub repository (e.g., `Asana/java-asana`)
`class_path`	Relative path to the Java source file
`given_sha`	The initial commit where the inline test was generated
`line_number_in_given_sha`	Line number of the target statement
`total_versions`	Total number of commits analyzed
`total_changed_versions`	Number of commits where the target statement changed
`changed_shas`	Commit hashes where changes occurred

Key Achievements

Studied test co-evolution across 30+ open-source Java projects, systematically analyzing how inline tests behave as codebases evolve.
Curated a dataset of 200+ statement-level breaking changes, classified by type and impact.
Developed an automated pipeline that simulates project Git history, generates inline tests from unit tests using ExLi, and evaluates their results at each commit.
Implemented shell scripts and Python automation for reproducible Maven builds, merge conflict handling, and structured results logging.
Gained deep understanding of software testing at scale and empirical software engineering research methodology.

Collaborators

Sebastian Urrea — Undergraduate Research Intern (SURF)
Nan Huang — Research Collaborator
Professor Owolabi Legunsen — Faculty Advisor, Software Engineering Lab, Cornell University
Yu Liu (Yuki) — PhD Student at UT Austin, collaborator on ExLi and I-Test tooling
Milos Gligoric — Associate Professor at UT Austin, collaborator on ExLi and I-Test research

References

Yu Liu, Aditya Thimmaiah, Owolabi Legunsen, and Milos Gligoric. "ExLi: An Inline-Test Generation Tool for Java." In Proceedings of the ACM International Conference on the Foundations of Software Engineering (FSE), Tool Demonstrations, 2024.
Owolabi Legunsen, Wajih Ul Hassan, Xinyue Xu, Grigore Roşu, and Darko Marinov. "How Good Are the Specs? A Study of the Bug-Finding Effectiveness of Existing Java API Specifications." In Proceedings of the International Conference on Automated Software Engineering (ASE), 2016.
Cornell Software Engineering Lab — https://www.cs.cornell.edu/~legunsen/

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
inline-coevolution		inline-coevolution
papers		papers
rv-warmup		rv-warmup
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Runtime Verification — Summer Undergraduate Research at Cornell University

Table of Contents

Background

What is Runtime Verification?

Inline Tests and Professor Legunsen's Research

Repository Structure

rv-warmup — Learning the RV Framework

Toy Application

Violation Inspection

inline-coevolution — Main Research Project

Research Motivation

Methodology

Pipeline Overview

Project Components

`coEvoTask_Sebastian/` — Automated Co-Evolution Pipeline

`coEvoTask_Nan/` — Target Statement Change Detection

`paper/` — Research Paper (LaTeX)

`I-Test/` — Framework Integration

Dataset

Key Achievements

Collaborators

References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Runtime Verification — Summer Undergraduate Research at Cornell University

Table of Contents

Background

What is Runtime Verification?

Inline Tests and Professor Legunsen's Research

Repository Structure

rv-warmup — Learning the RV Framework

Toy Application

Violation Inspection

inline-coevolution — Main Research Project

Research Motivation

Methodology

Pipeline Overview

Project Components

coEvoTask_Sebastian/ — Automated Co-Evolution Pipeline

coEvoTask_Nan/ — Target Statement Change Detection

paper/ — Research Paper (LaTeX)

I-Test/ — Framework Integration

Dataset

Key Achievements

Collaborators

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`coEvoTask_Sebastian/` — Automated Co-Evolution Pipeline

`coEvoTask_Nan/` — Target Statement Change Detection

`paper/` — Research Paper (LaTeX)

`I-Test/` — Framework Integration

Packages