Skip to content

[DCP - Terraform] Initializes the APIs, Service Account, Cloud Run Service and Spanner Instance and DB.#15

Open
gmechali wants to merge 3 commits intodatacommonsorg:mainfrom
gmechali:terraform
Open

[DCP - Terraform] Initializes the APIs, Service Account, Cloud Run Service and Spanner Instance and DB.#15
gmechali wants to merge 3 commits intodatacommonsorg:mainfrom
gmechali:terraform

Conversation

@gmechali
Copy link

@gmechali gmechali commented Feb 25, 2026

This group of terraform script controls the setup of a new Data Commons Platform deployment within a GCP Project that has nothing setup.

  • Enables the Spanner, Cloud Run and IAM APIs
  • Creates a Service Account, with databaseUser permission
  • Creates a Cloud Run Service, deploying the :latest datacommons-platform image and grants it AllUsers Invoker permissions
  • Creates a Spanner Instance and DB

Note that after running this in datcom-website-dev, we succeeded with each of the following resources:

  • DCP Service Account - 🔗
  • DCP Cloud Run Service - 🔗
  • Spanner Instance 🔗 and DB 🔗

And after running the commands in the DCP setup: https://github.com/datacommonsorg/datacommons?tab=readme-ov-file#2-define-your-schema but changing the URL to the new Cloud Run Service, you can inspect the logs to find the requests, and inspect the DB to find the schema and nodes successfully saved!

Note - I have added more optional variables to control the Spanner DB + cloud run service setup but have not yet tested them. Thoughts on including those?

Lastly - I believe the container failed to start due to the Dockerfile missing the "start" command. Not sure how it worked locally, but included the change and that fixed the cloud run server.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @gmechali, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive set of Terraform scripts designed to automate the initial setup and deployment of the Data Commons Platform (DCP) within a new Google Cloud Project. The changes streamline the provisioning of core GCP services, including API enablement, service account creation with appropriate permissions, Cloud Run service deployment, and Spanner instance and database setup, ensuring a consistent and repeatable infrastructure foundation for DCP.

Highlights

  • Terraform Infrastructure Setup: Introduced a new set of Terraform scripts to provision and deploy the Data Commons Platform (DCP) on Google Cloud, enabling a streamlined and repeatable infrastructure setup.
  • GCP API Enablement: Automated the enablement of essential Google Cloud APIs, specifically Spanner, Cloud Run, and IAM, as part of the initial deployment process.
  • Service Account and IAM Configuration: Created a dedicated Service Account for the DCP Cloud Run service and granted it the necessary databaseUser permissions for Spanner, along with AllUsers Invoker permissions for the Cloud Run service itself.
  • Cloud Run Service Deployment: Configured and deployed the datacommons-platform Docker image to a new Cloud Run service, including settings for scaling, resources, and environment variables.
  • Spanner Instance and Database Provisioning: Added Terraform resources to conditionally create a Google Cloud Spanner instance and database, with configurable options for processing units and instance/database IDs.
  • Documentation and Configuration: Provided a detailed README.md for the new Terraform setup, outlining prerequisites, authentication, environment configuration, deployment steps, and architecture overview. Also added comprehensive input variables for customization.
Changelog
  • .gitignore
    • Added entries to ignore Terraform state files, .env configuration, and Python virtual environment artifacts.
  • build/Dockerfile
    • Modified the CMD instruction to explicitly include the start command for the datacommons api entrypoint.
  • infra/dcp/README.md
    • Added a new README providing detailed instructions for setting up, deploying, and tearing down the Data Commons Platform infrastructure using Terraform.
  • infra/dcp/cloudrun.tf
    • Added Terraform resource definition for the google_cloud_run_v2_service, configuring the DCP Cloud Run service with image, scaling, resources, and environment variables.
  • infra/dcp/iam.tf
    • Added Terraform resources for creating a google_service_account for the Cloud Run service and granting it roles/spanner.databaseUser and roles/run.invoker permissions.
  • infra/dcp/main.tf
    • Added core Terraform configuration, including required Google provider version and resources to enable run.googleapis.com, spanner.googleapis.com, and iam.googleapis.com APIs.
  • infra/dcp/outputs.tf
    • Added Terraform output variables to expose the Cloud Run service URL, service account email, and Spanner instance/database IDs.
  • infra/dcp/spanner.tf
    • Added Terraform resources for conditionally creating a google_spanner_instance and google_spanner_database based on input variables.
  • infra/dcp/variables.tf
    • Added comprehensive Terraform variable definitions for project ID, region, image URL, service names, Spanner configuration, and Cloud Run settings (CPU, memory, scaling, concurrency, timeout).
Activity
  • No human activity (comments, reviews, etc.) has been recorded yet for this pull request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Terraform scripts for setting up the Data Commons Platform on GCP, including enabling APIs, creating a service account, and provisioning Cloud Run and Spanner resources. The primary security concern identified is the explicit granting of public access to the Cloud Run service via the allUsers IAM binding, which bypasses standard IAM-based access controls and requires careful evaluation. Other issues include the need to scope down IAM permissions, make public access and database deletion protection configurable (especially for production environments), correct the .gitignore for reproducible builds, and avoid the use of the :latest Docker image tag.


*.pyc
.venv/
uv.lock No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The uv.lock file is being ignored. However, this file is essential for ensuring reproducible builds, as it's used in the Dockerfile with uv sync --frozen. Without this file in version control, builds may fail or use different package versions, leading to unpredictable behavior. Please remove uv.lock from the .gitignore file and commit it to the repository.

name = var.spanner_database_id

# Prevent deletion of data
deletion_protection = false # Set to true for production
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Spanner database has deletion_protection hardcoded to false. While the comment indicates it should be true for production, it's easy to forget to change this manually, creating a significant risk of accidental data loss in production environments. This setting should be controlled by a variable (e.g., spanner_deletion_protection) to be set appropriately for each environment.

You will also need to add a new variable spanner_deletion_protection in variables.tf, likely with a default of false.

  deletion_protection = var.spanner_deletion_protection

Comment on lines +15 to +21
resource "google_cloud_run_service_iam_binding" "public_invoker" {
location = google_cloud_run_v2_service.dcp_service.location
service = google_cloud_run_v2_service.dcp_service.name
role = "roles/run.invoker"
members = [
"allUsers"
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The Terraform configuration grants the roles/run.invoker role to allUsers for the Cloud Run service, making it publicly accessible without GCP IAM authentication. This significantly increases the attack surface and could lead to unauthorized access if the application lacks robust internal authentication. It is recommended to control this public access with a variable, such as allow_unauthenticated_invocations, defaulting to false for safety. This makes the decision to expose the service explicit and requires adding this variable to variables.tf.

resource "google_cloud_run_service_iam_binding" "public_invoker" {
  count    = var.allow_unauthenticated_invocations ? 1 : 0
  location = google_cloud_run_v2_service.dcp_service.location
  service  = google_cloud_run_v2_service.dcp_service.name
  role     = "roles/run.invoker"
  members = [
    "allUsers"
  ]
}

Comment on lines +8 to +12
resource "google_project_iam_member" "spanner_user" {
project = var.project_id
role = "roles/spanner.databaseUser"
member = "serviceAccount:${google_service_account.dcp_runner.email}"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The service account is granted roles/spanner.databaseUser at the project level, which is overly permissive. It's a security best practice to scope permissions to the specific resource they are needed for (principle of least privilege). Please grant the role at the Spanner instance level instead using google_spanner_instance_iam_member.

# Grant Spanner Database User role to the Service Account
resource "google_spanner_instance_iam_member" "spanner_user" {
  instance = var.spanner_instance_id
  role     = "roles/spanner.databaseUser"
  member   = "serviceAccount:${google_service_account.dcp_runner.email}"

  depends_on = [
    google_spanner_instance.main
  ]
}

variable "image_url" {
description = "Docker image URL to deploy"
type = string
default = "gcr.io/datcom-ci/datacommons-platform:latest"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The default image_url uses the :latest tag. Using mutable tags like :latest is not recommended for deployments as it can lead to unexpected code being deployed if the image is updated. This makes deployments less predictable and rollbacks harder. It is a best practice to use immutable image tags, such as a git commit SHA or a semantic version number (e.g., gcr.io/datcom-ci/datacommons-platform:v1.2.3 or gcr.io/datcom-ci/datacommons-platform:sha-a1b2c3d).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant