Skip to content

Conversation

@shunping
Copy link
Collaborator

@shunping shunping commented Nov 22, 2025

This PR lays the groundwork for the GCS client library migration in the Java SDK. It refactors the existing GCS client implementation to prepare for a new implementation while maintaining backward compatibility.

Key changes include:

  • The current GCS client implementation has been moved to a new legacy module.
  • The existing module's public APIs have been mostly preserved and now delegate calls to the new legacy module.
  • Code and tests have been cleaned up:
    • References to the gcsio module, which will be deprecated, have been removed from GcsUtil.java.
    • Some references in GcsUtilLegacy.java and GcsOptions.java will be removed in a future PR.
  • The open API in the legacy module has been rerouted to include previously implemented but unused IO request count metrics, as discussed in [BEAM-11980] Java GCS - Implement IO Request Count metrics #15394 (comment).

import com.google.api.services.storage.model.StorageObject;
import com.google.auth.Credentials;
import com.google.auto.value.AutoValue;
import com.google.cloud.hadoop.gcsio.CreateObjectOptions;
Copy link
Collaborator Author

@shunping shunping Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

References to com.google.cloud.hadoop.gcsio.* and com.google.cloud.hadoop.util.* have been removed from GcsUtil.java. They are kept in GcsUtilLegacy.java though during the migration process.

}

@VisibleForTesting
void setCloudStorageImpl(GoogleCloudStorage g) {
Copy link
Collaborator Author

@shunping shunping Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the two setCloudStorageImpl functions are no longer surfaced for testing in GcsUtil.java, we still keep them in GcsUtilLegacy.java.

}
}

GoogleCloudStorage createGoogleCloudStorage(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is also removed from GcsUtil.java, but will remain in GcsUtilLegacy.java.

* @return a SeekableByteChannel that can read the object data
*/
public SeekableByteChannel open(GcsPath path) throws IOException {
return open(path, this.googleCloudStorageOptions.getReadChannelOptions());
Copy link
Collaborator Author

@shunping shunping Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing with the previous implementation

public SeekableByteChannel open(GcsPath path) throws IOException {
String bucket = path.getBucket();
SeekableByteChannel channel =
googleCloudStorage.open(
new StorageResourceId(path.getBucket(), path.getObject()),
this.googleCloudStorageOptions.getReadChannelOptions());
return wrapInCounting(channel, bucket);
}
,

we will invoke the open() function below to include the IO request counters similar to the create() functions below.

@shunping
Copy link
Collaborator Author

shunping commented Nov 22, 2025

cc'ed @clairemcginty

Context: gcsio is sunsetting (https://s.apache.org/beam-gcsutil-modernization), and we are on the way to migrate to the GCS client library for Beam Java SDK.

import org.apache.beam.runners.core.metrics.MonitoringInfoMetricName;
import org.apache.beam.sdk.extensions.gcp.auth.TestCredential;
import org.apache.beam.sdk.extensions.gcp.options.GcsOptions;
import org.apache.beam.sdk.extensions.gcp.util.GcsUtil.BatchInterface;
Copy link
Collaborator Author

@shunping shunping Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BatchInterface and RewriteOp are only used in Tests, so it is fine to get rid of them in GcsUtil and keep them in GcsUtilLegacy.

@shunping shunping marked this pull request as ready for review November 23, 2025 00:14
@github-actions
Copy link
Contributor

Assigning reviewers:

R: @Abacn for label java.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant