Origin/60 airflow benchmarks on gravitas by Jan-Willem · Pull Request #61 · casangi/RADPS

Jan-Willem · 2025-07-09T17:56:49Z

The airflow benchmark has been updated to work on the k3s cluster, but is still limited to a 1000 tasks. The benchmarking plots have also been updated.

Steps followed to deploy airflow on k3s:

kubectl create namespace **jsteeb-airflow**

helm upgrade --install airflow apache-airflow/airflow --namespace **jsteeb-airflow** \
    --set dags.persistence.enabled=false  \
    --set dags.gitSync.enabled=true \
    --set dags.gitSync.repo=https://github.com/casangi/RADPS.git \
    --set dags.gitSync.branch=60-airflow-benchmarks-on-gravitas \
    --set dags.gitSync.subPath=benchmarking

kubectl expose deployment airflow-api-server --port 8289 --target-port 8080 --name=airflow-load-balancer --namespace=jsteeb-airflow --type=LoadBalancer

Use a different namespace if you want to try it out. Also the github branch can be set using set dags.gitSync.branch.

Copilot

Pull Request Overview

This PR extends the benchmarking suite to include Airflow runs on a k3s cluster and updates plotting logic.

Adds an Airflow benchmark entry to the results CSV.
Introduces plot_overhead_as_function_of_n_tasks_v2 for improved task-count plots.
Adjusts login handling and cleans up Prefect/Airflow benchmark scripts.

Reviewed Changes

Copilot reviewed 5 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
benchmarking/results/benchmark_return_results.csv	Added a new Airflow benchmark row
benchmarking/plot_results_benchmark.py	Added `plot_overhead_as_function_of_n_tasks_v2` and commented out old calls
benchmarking/benchmark_utils.py	Hardcoded developer name instead of using `os.getlogin()`
benchmarking/benchmark_prefect_return.py	Left commented-out ThreadPoolTaskRunner code
benchmarking/benchmark_airflow_return.py	Fixed XCom limits comment, changed DB type, run ID logic, and output path

Comments suppressed due to low confidence (3)

benchmarking/benchmark_prefect_return.py:120

[nitpick] The large commented-out block for ThreadPoolTaskRunner should be removed or moved to version control history to clean up the function.

benchmarking/plot_results_benchmark.py:127

[nitpick] The x-axis label mentions seconds (s), but this plot shows the number of tasks. Consider removing (s) to avoid confusion.

        ax.set_xlabel("Number of Tasks (s)")

benchmarking/benchmark_airflow_return.py:114

[nitpick] The prefix 'bairflow_benchmark' seems to include an extra 'b'. Rename to 'airflow_benchmark' for clarity and consistency.

        runid = "bairflow_benchmark" + '_' + modtime #os.getlogin() fails on k3s

Copilot · 2025-07-09T17:58:34Z

+    if skiprows:
+        exclude_headerrow = ''
+        if 0 in skiprows:
+            skiprows = skiprows.remove(0)


.remove() mutates the list and returns None, so skiprows becomes None. Instead, call skiprows.remove(0) on its own or use a list comprehension to filter out 0.

Suggested change

skiprows = skiprows.remove(0)

skiprows = [row for row in skiprows if row != 0]

Copilot · 2025-07-09T17:58:34Z

+        # if workflow_orchestration_framework == "prefect":
+        #     df = df[df['return_size_mb'] == 10.0]
+        if workflow_orchestration_framework == "prefect":


np.unique returns an array; comparing it directly to a string will not work as intended. Use framework = np.unique(...)[0] or check membership ("prefect" in ...).

Suggested change

# if workflow_orchestration_framework == "prefect":

# df = df[df['return_size_mb'] == 10.0]

if workflow_orchestration_framework == "prefect":

# if "prefect" in workflow_orchestration_framework:

# df = df[df['return_size_mb'] == 10.0]

if "prefect" in workflow_orchestration_framework:

Copilot · 2025-07-09T17:58:34Z

            "workflow_orchestration_framework": workflow_orchestration_framework,
            "date_and_time": datetime.now().isoformat(),
-            "developer": os.getlogin(),
+            "developer": "jsteeb", #os.getlogin(),


[nitpick] Hardcoding the developer name reduces portability. Consider using getpass.getuser() or falling back to os.getlogin() for different environments.

Suggested change

"developer": "jsteeb", #os.getlogin(),

"developer": getpass.getuser(),

Copilot · 2025-07-09T17:58:35Z

    #db_url = settings.SQL_ALCHEMY_CONN
    #db_type = make_url(db_url).drivername
-    db_type = 'sqlite'  
+    db_type = 'postgress'  


Typo in database type: 'postgress' should be 'postgres'. This typo may prevent correct backend detection.

Suggested change

db_type = 'postgress'

db_type = 'postgres'

Copilot · 2025-07-09T17:58:35Z

        )

-        save_timing_results(result_dict, filename = 'airflow_benchmark_results.csv' )   
+        save_timing_results(result_dict, filename = '~/airflow_benchmark_results.csv' )   


Using ~ in filename won't be expanded by to_csv. Call os.path.expanduser('~/airflow_benchmark_results.csv') to ensure the file is written in the user's home directory.

Suggested change

save_timing_results(result_dict, filename = '~/airflow_benchmark_results.csv' )

save_timing_results(result_dict, filename=os.path.expanduser('~/airflow_benchmark_results.csv'))

Jan-Willem and others added 11 commits July 8, 2025 13:24

Test workflow for k3s.

c8a1943

Add log

77dd577

1

ed38b4d

remove get run id

872e51e

Add run_id back

f93add1

Remove os.getlogin

8b36851

Try saving results to home dir.

b81db9b

Increase tasks to 1000.

fe4f3d4

Increase tasks to 2000.

7ea2fc2

Update plost with latest benchmark runs.

2350040

Remove v2 benchmark_airflow_return and update benchmark_airflow_return.

96c35ad

Jan-Willem requested a review from Copilot July 9, 2025 17:56

Copilot AI reviewed Jul 9, 2025

View reviewed changes

Jan-Willem requested review from amcnicho and taktsutsumi July 9, 2025 19:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Origin/60 airflow benchmarks on gravitas#61

Origin/60 airflow benchmarks on gravitas#61
Jan-Willem wants to merge 11 commits into
mainfrom
origin/60-airflow-benchmarks-on-gravitas

Jan-Willem commented Jul 9, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 9, 2025

Uh oh!

Copilot AI Jul 9, 2025

Uh oh!

Copilot AI Jul 9, 2025

Uh oh!

Copilot AI Jul 9, 2025

Uh oh!

Copilot AI Jul 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	skiprows = skiprows.remove(0)
	skiprows = [row for row in skiprows if row != 0]

	"developer": "jsteeb", #os.getlogin(),
	"developer": getpass.getuser(),

	save_timing_results(result_dict, filename = '~/airflow_benchmark_results.csv' )
	save_timing_results(result_dict, filename=os.path.expanduser('~/airflow_benchmark_results.csv'))

Conversation

Jan-Willem commented Jul 9, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants