Skip to content

🐛 [firestore-bigquery-export] Backfill script is unusable with a non-default Firestore database #2522

@akaasula21

Description

@akaasula21

Description of configuration

  • Extension name: firestore-bigquery-export
  • Extension version: 0.2.5

Configuration values (redact info where appropriate):

  • Cloud Functions location: us-east4
  • Firestore Instance ID: qarik-spearinai-demo-project-chat (Note: This is a non-default database)
  • Collection path: qarik-spearinai-demo-project-chat

Description of problem

The official backfill script (fs-bq-import-collection) is unusable for projects that have a non-default Firestore database. The version of the script installed in a standard Google Cloud Shell environment is too old and lacks the feature to specify a target database ID. This makes it impossible to backfill historical data in this common architecture.

Steps to reproduce:

  • Create a Firebase project with a named, non-default Firestore database (e.g., my-project-db). Do not use the (default) database.

  • Successfully install the firestore-bigquery-export extension, configuring it to sync a collection from the non-default database.

  • Open Google Cloud Shell and attempt to run the backfill script for the collection.

  • Observe the failures described below.

Expected result

The backfill script should provide a way to specify the non-default Firestore database ID, either through an interactive prompt or a command-line flag (e.g., --firestoreInstanceId), and successfully import the data.

Actual result

The script fails in two different ways depending on the method used:

  1. Interactive Mode (npx @firebaseextensions/fs-bq-import-collection):
    The script does not ask for a database ID. It presumably searches the (default) database, fails to find the collection, and exits with a 5 NOT_FOUND error.

Error importing Collection to BigQuery: Error: Failed to access collection: 5 NOT_FOUND:

  1. Non-Interactive Mode (with modern flags):Attempts to use modern flags like --firestoreInstanceId fail with an unknown option error. This is because npx and npm install -g in the Cloud Shell environment consistently install a very old version of the script (0.1.26), even when @latest is requested.

The --help output from the installed version proves that the flag to specify a database does not exist:

`akaasula@cloudshell:~ (spearinai)$ npx @firebaseextensions/fs-bq-import-collection --help
Usage: fs-bq-import-collection [options]

Import a Firestore Collection into a BigQuery Changelog Table

Options:
-V, --version output the version number
--non-interactive Parse all input from command line flags instead of prompting the caller. (default: false)
-P, --project Firebase Project ID for project containing the Cloud Firestore database.
-B, --big-query-project Google Cloud Project ID for BigQuery.
-q, --query-collection-group [true|false] Use 'true' for a collection group query, otherwise a collection query is performed.
-s, --source-collection-path The path of the the Cloud Firestore Collection to import from. (This may or may not be the same Collection for which you plan to mirror changes.)
-d, --dataset The ID of the BigQuery dataset to import to. (A dataset will be created if it doesn't already exist.)
-t, --table-name-prefix The identifying prefix of the BigQuery table to import to. (A table will be created if one doesn't already exist.)
-b, --batch-size [batch-size] Number of documents to stream into BigQuery at once. (default: 300)
-l, --dataset-location Location of the BigQuery dataset.
-m, --multi-threaded [true|false] Whether to run standard or multi-thread import version
-u, --use-new-snapshot-query-syntax [true|false] Whether to use updated latest snapshot query
-f, --transform-function-url URL of function to transform data before export (e.g., https://us-west1-project.cloudfunctions.net/transform)
-e, --use-emulator [true|false] Whether to use the firestore emulator`

-f, --failed-batch-output Path to the JSON file where failed batches will be recorded.
-h, --help display help for command

This tooling issue makes it impossible to backfill historical data, which is a critical blocker.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions