Conversation
There was a problem hiding this comment.
Pull request overview
Adds an Argo-based mechanism to periodically refresh PostgreSQL materialized views in the Kubernetes “welearn-datastack” deployment, backed by a reusable refresh script.
Changes:
- Added a Bash script to refresh a specified PostgreSQL materialized view.
- Added an Argo
WorkflowTemplateto refresh three specific materialized views. - Added an Argo
CronWorkflowto run the refresh workflow every 3 hours.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
script/update-materialized-view |
New Bash helper to run REFRESH MATERIALIZED VIEW CONCURRENTLY using env-provided DB credentials. |
k8s/welearn-datastack/templates/updateMaterializedViews/workflow-template.yaml |
New Argo WorkflowTemplate + CronWorkflow to execute the refresh script for multiple views on a schedule. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| export DB_PORT="$PG_PORT" | ||
| export DB_PASSWORD="$PG_PASSWORD" | ||
| chmod +x /script/update-materialized-view | ||
| /script/update-materialized-view "$view_name" |
There was a problem hiding this comment.
The script passes "$view_name", but view_name is not defined as a shell variable in this source: block. Argo input parameters need to be referenced via templating (e.g. {{inputs.parameters.view_name}}) or explicitly assigned to a shell variable before use; otherwise this will call the script with an empty argument and fail.
| /script/update-materialized-view "$view_name" | |
| /script/update-materialized-view "{{inputs.parameters.view_name}}" |
| metadata: | ||
| name: update-materialized-views-cron | ||
| spec: | ||
| schedule: "0 */3 * * *" # Every 3 hours |
There was a problem hiding this comment.
In this chart, other CronWorkflows use spec.schedules: [...] (array) rather than spec.schedule:. Using schedule here is inconsistent and may not be supported by the Argo version/configuration assumed elsewhere in the chart; align this field with the existing CronWorkflow templates.
| schedule: "0 */3 * * *" # Every 3 hours | |
| schedules: | |
| - "0 */3 * * *" # Every 3 hours |
| # Function to refresh a materialized view | ||
| refresh_view() { | ||
| local view_name=$1 | ||
| echo "Refreshing materialized view: $view_name" | ||
| PGPASSWORD="$DB_PASSWORD" psql -U "$DB_USER" -h "$DB_HOST" -p "$DB_PORT" -d "$DB_NAME" -c "REFRESH MATERIALIZED VIEW CONCURRENTLY $view_name;" |
There was a problem hiding this comment.
The view name is interpolated directly into SQL, which allows SQL injection if the argument is ever influenced by user input (and also breaks on identifiers needing quoting). Validate view_name against a strict identifier/schema regex and/or quote identifiers safely before building the REFRESH MATERIALIZED VIEW statement.
| # Function to refresh a materialized view | |
| refresh_view() { | |
| local view_name=$1 | |
| echo "Refreshing materialized view: $view_name" | |
| PGPASSWORD="$DB_PASSWORD" psql -U "$DB_USER" -h "$DB_HOST" -p "$DB_PORT" -d "$DB_NAME" -c "REFRESH MATERIALIZED VIEW CONCURRENTLY $view_name;" | |
| # Validate the materialized view name (allow optional schema prefix). | |
| if [[ ! "$VIEW_NAME" =~ ^[A-Za-z_][A-Za-z0-9_]*(\.[A-Za-z_][A-Za-z0-9_]*)?$ ]]; then | |
| echo "Invalid materialized view name: $VIEW_NAME" >&2 | |
| exit 1 | |
| fi | |
| # Function to refresh a materialized view | |
| refresh_view() { | |
| local view_name=$1 | |
| # Construct a safely quoted identifier for use in SQL. | |
| local view_name_sql | |
| if [[ "$view_name" == *.* ]]; then | |
| local schema=${view_name%%.*} | |
| local name=${view_name#*.} | |
| view_name_sql="\"${schema}\".\"${name}\"" | |
| else | |
| view_name_sql="\"${view_name}\"" | |
| fi | |
| echo "Refreshing materialized view: $view_name" | |
| PGPASSWORD="$DB_PASSWORD" psql -U "$DB_USER" -h "$DB_HOST" -p "$DB_PORT" -d "$DB_NAME" -c "REFRESH MATERIALIZED VIEW CONCURRENTLY $view_name_sql;" |
| DB_HOST=${DB_HOST:-localhost} # Default to localhost if not set | ||
| DB_PORT=${DB_PORT:-5432} # Default to 5432 if not set | ||
| DB_PASSWORD=${DB_PASSWORD} # PostgreSQL password | ||
|
|
There was a problem hiding this comment.
This script does not verify required connection env vars (DB_NAME/DB_USER/DB_PASSWORD) are set and non-empty. If any are missing, it will fail later with less actionable psql errors; add explicit checks with clear error messages before running the refresh.
| # Validate required database connection environment variables | |
| if [ -z "$DB_NAME" ]; then | |
| echo "Error: DB_NAME environment variable is not set or empty." >&2 | |
| exit 1 | |
| fi | |
| if [ -z "$DB_USER" ]; then | |
| echo "Error: DB_USER environment variable is not set or empty." >&2 | |
| exit 1 | |
| fi | |
| if [ -z "$DB_PASSWORD" ]; then | |
| echo "Error: DB_PASSWORD environment variable is not set or empty." >&2 | |
| exit 1 | |
| fi |
| - name: secrets | ||
| secret: | ||
| secretName: {{ .name }} |
There was a problem hiding this comment.
volumes: is followed by a list item that is not indented under it, which makes this manifest invalid YAML. Indent the - name: secrets block so it is a child of volumes: (see other workflow templates in this chart for the expected indentation).
| - name: secrets | |
| secret: | |
| secretName: {{ .name }} | |
| - name: secrets | |
| secret: | |
| secretName: {{ .name }} |
| envFrom: | ||
| - configMapRef: | ||
| name: {{ .name }} | ||
| volumeMounts: |
There was a problem hiding this comment.
This template references {{ .name }} for the ConfigMap/Secret name, but this file is not wrapped in a with .Values.<component> block like other templates, so .name will render as <no value> and break envFrom / secretName. Use a value that exists in the current scope (e.g. add a .Values.updateMaterializedViews.name and wrap the file in {{- with ... }}) and ensure the corresponding ConfigMap/Secret resources are created.
k8s/welearn-datastack/templates/updateMaterializedViews/workflow-template.yaml
Outdated
Show resolved
Hide resolved
…ow-template.yaml Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This pull request introduces a new workflow for updating PostgreSQL materialized views in a Kubernetes environment using Argo Workflows. It adds a reusable workflow template and a supporting shell script to automate the refresh process for multiple materialized views, and schedules this operation to run every three hours.
Argo Workflow and CronWorkflow for Materialized Views:
updateMaterializedViews/workflow-template.yamldefining an ArgoWorkflowTemplateand aCronWorkflowto automate the refresh of three materialized views (qty_document_in_qdrant,qty_document_in_qdrant_per_corpus, andqty_document_per_corpus) every three hours. The workflow uses a parameterized template to call a script for each view and includes security context and secret handling.Script for Refreshing Materialized Views:
script/update-materialized-view, a Bash script that receives a materialized view name as an argument and refreshes it in PostgreSQL using credentials provided via environment variables. The script includes error handling and outputs status messages for observability.