Skip to content

Features/hpcdatamgm 2088 - DON'T MERGE#142

Open
eranrosenberg wants to merge 80 commits into
releases/3.24.0from
features/HPCDATAMGM-2088
Open

Features/hpcdatamgm 2088 - DON'T MERGE#142
eranrosenberg wants to merge 80 commits into
releases/3.24.0from
features/HPCDATAMGM-2088

Conversation

@eranrosenberg
Copy link
Copy Markdown
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings February 20, 2026 14:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces initial support for “auto-tiering” migrations, adding new API/scheduler entry points, schema + DB changes, and a DAO intended to query external archives for inactive files.

Changes:

  • Add auto-tiering REST endpoint and scheduled tasks to create/process bulk auto-tiering migration tasks.
  • Extend domain schemas/DB migrations and DAO logic for new auto-tiering-related fields/status/type.
  • Update S3 path-style access provider list to include VAST_NCIVS.

Reviewed changes

Copilot reviewed 23 out of 28 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
utils/hpc-client/lib/hpc-common-3.23.0.jar Adds a compiled client JAR artifact.
src/hpc-server/hpc-ws-rs-impl/src/main/resources/WEB-INF/spring/hpc-server.properties Adds VAST_NCIVS to path-style enabled S3 providers.
src/hpc-server/hpc-ws-rs-impl/src/main/java/gov/nih/nci/hpc/ws/rs/impl/HpcDataMigrationRestServiceImpl.java Implements the new processAutoTiering REST method.
src/hpc-server/hpc-ws-rs-api/src/main/java/gov/nih/nci/hpc/ws/rs/HpcDataMigrationRestService.java Adds /autoTiering/{configurationId} endpoint definition.
src/hpc-server/hpc-scheduler/src/main/resources/logback.xml Changes scheduler log path property from catalina.base to catalina.home.
src/hpc-server/hpc-scheduler/src/main/resources/WEB-INF/spring/hpc-scheduler.properties Adds VAST_NCIVS to path-style enabled S3 providers.
src/hpc-server/hpc-scheduler-migration/src/main/resources/logback.xml Changes migration-scheduler log path property from catalina.base to catalina.home.
src/hpc-server/hpc-scheduler-migration/src/main/resources/WEB-INF/spring/hpc-scheduler-migration.properties Sets migration serverIds/serverId, adds cron entries, updates S3 providers list.
src/hpc-server/hpc-scheduler-migration/src/main/java/gov/nih/nci/hpc/scheduler/migration/impl/HpcScheduledTasksImpl.java Adds scheduled executions for auto-tiering and bulk auto-tiering processing.
src/hpc-server/hpc-domain-types/src/main/resources/schema/HpcDataMigrationTypes.xsd Adds new migration status/type enum values.
src/hpc-server/hpc-domain-model/src/main/resources/schema/HpcDataMigration.xsd Removes stray whitespace.
src/hpc-server/hpc-domain-model/src/main/resources/schema/HpcDataManagement.xsd Adds new auto-tiering-related configuration fields to the schema.
src/hpc-server/hpc-dao-impl/src/main/scripts/migration/release-3.23.0/hpc_auto_tiering.sql Adds DB columns for auto-tiering configuration + migration task source-location fields.
src/hpc-server/hpc-dao-impl/src/main/resources/META-INF/spring/hpc-dao-impl-beans-configuration.xml Registers hpcExternalArchiveDAO bean.
src/hpc-server/hpc-dao-impl/src/main/java/gov/nih/nci/hpc/dao/trino/impl/HpcExternalArchiveDAOImpl.java Introduces a Trino-backed DAO stub for external archive queries (currently dummy).
src/hpc-server/hpc-dao-impl/src/main/java/gov/nih/nci/hpc/dao/oracle/impl/HpcDataMigrationDAOImpl.java Extends migration task/task-result SQL and task mapping for new columns/statuses.
src/hpc-server/hpc-dao-impl/src/main/java/gov/nih/nci/hpc/dao/oracle/impl/HpcDataManagementConfigurationDAOImpl.java Maps new S3 auto-tiering DB columns onto transfer configuration objects.
src/hpc-server/hpc-dao-api/src/main/java/gov/nih/nci/hpc/dao/HpcExternalArchiveDAO.java Adds external archive DAO interface.
src/hpc-server/hpc-bus-service-impl/src/main/java/gov/nih/nci/hpc/bus/impl/HpcDataMigrationBusServiceImpl.java Adds bus-service logic to create and process bulk auto-tiering tasks.
src/hpc-server/hpc-bus-service-api/src/main/java/gov/nih/nci/hpc/bus/HpcDataMigrationBusService.java Adds new bus-service API methods for auto-tiering.
src/hpc-server/hpc-app-service-impl/src/main/java/gov/nih/nci/hpc/service/impl/HpcDataMigrationServiceImpl.java Adds auto-tiering task creation + “files not accessed” lookup via external archive DAO.
src/hpc-server/hpc-app-service-impl/src/main/java/gov/nih/nci/hpc/service/impl/HpcDataManagementServiceImpl.java Adds helper to list auto-tiering-enabled configurations.
src/hpc-server/hpc-app-service-api/src/main/java/gov/nih/nci/hpc/service/HpcDataMigrationService.java Extends service API for auto-tiering and new task-creation signature.
src/hpc-server/hpc-app-service-api/src/main/java/gov/nih/nci/hpc/service/HpcDataManagementService.java Adds API to list auto-tiering-enabled data management configurations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +59 to +65
// TODO: Implement actual query logic using Trino to scan external archive
// For now, return dummy paths for testing
List<String> filePaths = new ArrayList<>();
filePaths.add("/Eran/eran-file-1.txt");
filePaths.add("/Eran/eran-file-2.txt");
filePaths.add("/Eran/eran-file-3.txt");

Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This DAO currently returns hard-coded dummy file paths and contains a TODO for the real Trino query. Shipping placeholder data here will trigger auto-tiering migrations for non-existent paths and makes the feature behavior incorrect in production. Please implement the real query logic (or gate this behind a feature flag) and remove the hard-coded test paths before merging.

Suggested change
// TODO: Implement actual query logic using Trino to scan external archive
// For now, return dummy paths for testing
List<String> filePaths = new ArrayList<>();
filePaths.add("/Eran/eran-file-1.txt");
filePaths.add("/Eran/eran-file-2.txt");
filePaths.add("/Eran/eran-file-3.txt");
// TODO: Implement actual query logic using Trino to scan the external archive and
// populate filePaths with files not accessed within the given time period.
List<String> filePaths = new ArrayList<>();
// Optional: allow dummy data for testing when explicitly enabled via a feature flag.
// This flag must NOT be enabled in production.
boolean useDummyData = Boolean.getBoolean("hpc.externalArchive.dummyData.enabled");
if (useDummyData) {
logger.warn("Using dummy external archive file paths; feature flag 'hpc.externalArchive.dummyData.enabled' is enabled. "
+ "Ensure this is disabled in production.");
filePaths.add("/Eran/eran-file-1.txt");
filePaths.add("/Eran/eran-file-2.txt");
filePaths.add("/Eran/eran-file-3.txt");
} else {
logger.warn("Trino-backed external archive query is not yet implemented; returning empty file list.");
}

Copilot uses AI. Check for mistakes.
Comment on lines 451 to +455
dataMigrationTask.getSize(), dataMigrationTask.getRetryTaskId(), dataMigrationTask.getRetryUserId(),
dataMigrationTask.getMetadataFromArchiveFileContainerId(),
dataMigrationTask.getMetadataToArchiveFileContainerId(),
dataMigrationTask.getMetadataArchiveFileIdPattern(), dataMigrationTask.getId(),
dataMigrationTask.getMetadataArchiveFileIdPattern(),
dataMigrationTask.getId(),
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPSERT_DATA_MIGRATION_TASK_RESULT_SQL now includes AUTO_TIERING_* placeholders, but this jdbcTemplate.update(...) call does not supply corresponding parameters for those new columns (in either the UPDATE or INSERT parameter sets). This will cause a runtime SQL parameter count mismatch; add the missing AUTO_TIERING_* arguments (or remove the placeholders from the SQL if not needed).

Copilot uses AI. Check for mistakes.
Comment on lines +173 to +174
migrationTask.setStatus(autoTieringRequest ? HpcDataMigrationStatus.AUTO_TIERING_RECEIVED :
HpcDataMigrationStatus.RECEIVED);
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-tiering tasks are being created with status AUTO_TIERING_RECEIVED, but the migration processing flows shown in this PR only poll/process status RECEIVED (no code paths query for AUTO_TIERING_RECEIVED). As a result, these tasks can be assigned to a server but never picked up for processing. Either keep the status as RECEIVED for data-object migration tasks, or add corresponding processing/queries for AUTO_TIERING_RECEIVED.

Suggested change
migrationTask.setStatus(autoTieringRequest ? HpcDataMigrationStatus.AUTO_TIERING_RECEIVED :
HpcDataMigrationStatus.RECEIVED);
migrationTask.setStatus(HpcDataMigrationStatus.RECEIVED);

Copilot uses AI. Check for mistakes.
type="hpc-domain-datatransfer:HpcDataTransferType" />
<xsd:element name="s3UploadConfigurationId" type="xsd:string" />
<xsd:element name="s3DefaultDownloadConfigurationId" type="xsd:string" />
<xsd:element name="s3AutoTieringConfigurationId" type="xsd:string" />
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s3AutoTieringConfigurationId is introduced as a required element (no minOccurs=0), but the code enables auto-tiering only when this ID is present/non-empty. Making it required breaks existing configurations that don't use auto-tiering. Mark it optional (minOccurs=0) or version the schema/DTOs to preserve backward compatibility.

Suggested change
<xsd:element name="s3AutoTieringConfigurationId" type="xsd:string" />
<xsd:element name="s3AutoTieringConfigurationId" type="xsd:string" minOccurs="0" />

Copilot uses AI. Check for mistakes.
<configuration>

<property name="SCHEDULER_SERVER_LOG" value="${catalina.base}/logs/hpc-server/hpc-scheduler.log"/>
<property name="SCHEDULER_SERVER_LOG" value="${catalina.home}/logs/hpc-server/hpc-scheduler.log"/>
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The active log file path uses ${catalina.home} while the rolling policy still uses ${catalina.base} in fileNamePattern, which will split current vs rolled logs across different directories. Use the same base variable for both (typically ${catalina.base}) to keep logs/rotation consistent per Tomcat instance.

Suggested change
<property name="SCHEDULER_SERVER_LOG" value="${catalina.home}/logs/hpc-server/hpc-scheduler.log"/>
<property name="SCHEDULER_SERVER_LOG" value="${catalina.base}/logs/hpc-server/hpc-scheduler.log"/>

Copilot uses AI. Check for mistakes.
<configuration>

<property name="SCHEDULER_MIGRATION_SERVER_LOG" value="${catalina.base}/logs/hpc-server/hpc-scheduler-migration.log"/>
<property name="SCHEDULER_MIGRATION_SERVER_LOG" value="${catalina.home}/logs/hpc-server/hpc-scheduler-migration.log"/>
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The active log file path uses ${catalina.home} while the rolling policy still uses ${catalina.base} in fileNamePattern, which will split current vs rolled logs across different directories. Use the same base variable for both (typically ${catalina.base}) to keep logs/rotation consistent per Tomcat instance.

Suggested change
<property name="SCHEDULER_MIGRATION_SERVER_LOG" value="${catalina.home}/logs/hpc-server/hpc-scheduler-migration.log"/>
<property name="SCHEDULER_MIGRATION_SERVER_LOG" value="${catalina.base}/logs/hpc-server/hpc-scheduler-migration.log"/>

Copilot uses AI. Check for mistakes.
Comment on lines 69 to 72
+ "TO_S3_ARCHIVE_LOCATION_FILE_CONTAINER_ID = ?, TO_S3_ARCHIVE_LOCATION_FILE_ID = ?, SERVER_ID = ?, ALIGN_ARCHIVE_PATH = ?, DATA_SIZE = ?, "
+ "RETRY_TASK_ID = ?, RETRY_USER_ID = ?, METADATA_FROM_ARCHIVE_FILE_CONTAINER_ID = ?, "
+ "METADATA_TO_ARCHIVE_FILE_CONTAINER_ID = ?, METADATA_ARCHIVE_FILE_ID_PATTERN = ? "
+ "METADATA_TO_ARCHIVE_FILE_CONTAINER_ID = ?, METADATA_ARCHIVE_FILE_ID_PATTERN = ?, AUTO_TIERING_FILE_CONTAINER_ID = ?, AUTO_TIERING_FILE_ID = ? "
+ "when not matched then insert (ID, PARENT_ID, USER_ID, PATH, CONFIGURATION_ID, FROM_S3_ARCHIVE_CONFIGURATION_ID, "
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPSERT_DATA_MIGRATION_TASK_RESULT_SQL references AUTO_TIERING_FILE_CONTAINER_ID and AUTO_TIERING_FILE_ID columns. No migration in this PR adds these columns to HPC_DATA_MIGRATION_TASK_RESULT (and release-3.23.0/hpc_auto_tiering.sql only alters HPC_DATA_MIGRATION_TASK). Align the SQL with actual schema changes, or add the missing columns/migration.

Copilot uses AI. Check for mistakes.
s3Configuration.setExternalStorage(rs.getBoolean("EXTERNAL_STORAGE"));
s3Configuration.setPosixPath(rs.getString("POSIX_PATH"));
s3Configuration.setAutoTieringSearchPath(rs.getString("AUTO_TIERING_SEARCH_PATH"));
s3Configuration.setAutoTieringInactivityMonths(rs.getInt("AUTO_TIERING_INACTIVITY_MONTHS"));
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AUTO_TIERING_INACTIVITY_MONTHS is nullable (added without NOT NULL), but this mapper uses rs.getInt(...), which returns 0 when the DB value is NULL. That makes it impossible to distinguish “unset” from an actual 0-month value and can cause auto-tiering to proceed with an unintended default. Use rs.getObject(..., Integer.class) or rs.getInt + rs.wasNull() to preserve nulls.

Suggested change
s3Configuration.setAutoTieringInactivityMonths(rs.getInt("AUTO_TIERING_INACTIVITY_MONTHS"));
s3Configuration.setAutoTieringInactivityMonths(
rs.getObject("AUTO_TIERING_INACTIVITY_MONTHS", Integer.class));

Copilot uses AI. Check for mistakes.
<xsd:element name="externalStorage" type="xsd:boolean" />
<xsd:element name="posixPath" type="xsd:string" minOccurs="0"/>
<xsd:element name="posixPath" type="xsd:string" />
<xsd:element name="autoTieringSearchPath" type="xsd:string" />
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

autoTieringSearchPath is added as a required element (no minOccurs=0), but the DB migration adds AUTO_TIERING_SEARCH_PATH as nullable and the service treats auto-tiering as an optional feature. Keeping this required is a breaking schema change for existing configurations/clients; make it optional (minOccurs=0) or version the schema.

Suggested change
<xsd:element name="autoTieringSearchPath" type="xsd:string" />
<xsd:element name="autoTieringSearchPath" type="xsd:string" minOccurs="0" />

Copilot uses AI. Check for mistakes.
Comment on lines +189 to 190
hpc.service.dataMigration.serverIds=server-id
hpc.service.serverId=server-id
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This properties file now hard-codes hpc.service.dataMigration.serverIds=server-id, while the other shipped properties (e.g., hpc-server.properties / hpc-scheduler.properties) leave serverIds empty for environment-specific configuration. Hard-coding a placeholder here can cause migrations to be assigned incorrectly in non-dev deployments; consider keeping it empty and documenting/overriding via ${catalina.base}/conf instead.

Suggested change
hpc.service.dataMigration.serverIds=server-id
hpc.service.serverId=server-id
# Server IDs are intentionally left empty for environment-specific configuration.
# Override these values via ${catalina.base}/conf in each deployment.
hpc.service.dataMigration.serverIds=
hpc.service.serverId=

Copilot uses AI. Check for mistakes.
Eran Rosenberg added 5 commits March 8, 2026 09:48
# Conflicts:
#	utils/hpc-client/hpc-cli-3.24.0.jar
#	utils/hpc-client/lib/hpc-common-3.24.0.jar
#	utils/hpc-client/lib/hpc-domain-types-3.24.0.jar
#	utils/hpc-client/lib/hpc-dto-3.24.0.jar
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 69 out of 74 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

<xsd:restriction base="xsd:string">
<xsd:enumeration value="RECEIVED" />
<xsd:enumeration value="IN_PROGRESS" />
<xsd:enumeration value="AUTO_TIERING_RECEIVED" />
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AUTO_TIERING_RECEIVED status is added as a new migration status in HpcDataMigrationTypes.xsd and is used to set the status for auto-tiering data object tasks in createDataObjectMigrationTask. However, the getDataMigrationTasks method in HpcDataMigrationServiceImpl uses a SQL query that fetches tasks by STATUS = ? with a single status value. The processBulkAutoTieringMigrationReceived scheduler method queries with HpcDataMigrationStatus.RECEIVED and type BULK_AUTO_TIERING, which would not include AUTO_TIERING_RECEIVED tasks. It is unclear whether AUTO_TIERING_RECEIVED individual data-object tasks are ever picked up and processed — there is no handler for AUTO_TIERING_RECEIVED data-object tasks in any of the processXxx methods.

Copilot uses AI. Check for mistakes.
Comment on lines +101 to +109
private void dbConnect() throws HpcException {
try {
jdbcTemplate.getDataSource().getConnection();

} catch (Exception e) {
throw new HpcException("Failed to connect to Trino DB. Check connection & credentials config",
HpcErrorType.DATABASE_ERROR, HpcIntegratedSystem.VAST, e);
}
}
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dbConnect() method calls jdbcTemplate.getDataSource().getConnection() without closing the returned connection. This causes a connection leak every time the bean is initialized. The connection should be closed in a finally block or use a try-with-resources statement. Note that the existing Oracle DAO HpcDataManagementConfigurationDAOImpl.dbConnect() has the same pattern, but since this is new code it should be written correctly.

Copilot uses AI. Check for mistakes.
Comment on lines +281 to +284
hpc.dao.trino.host=trino.fairway.nci.nih.gov
hpc.dao.trino.port=443
hpc.dao.trino.username=ncif-hpcdm-svc
hpc.dao.trino.password=< Configure Me >
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The property file contains a literal placeholder password < Configure Me > committed to the repository. Credentials should never be stored in property files in version control, not even as placeholders. This file should reference an environment variable, a vault secret, or an external configuration store. Additionally, this is a template-like properties file checked in for a specific environment, which is a broader concern.

Copilot uses AI. Check for mistakes.
Comment on lines +57 to +59
<xsd:element name="posixPath" type="xsd:string" />
<xsd:element name="autoTieringSearchPath" type="xsd:string" />
<xsd:element name="autoTieringInactivityMonths" type="xsd:int" minOccurs="0" />
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The posixPath field in the HpcS3ArchiveConfiguration complex type was changed from minOccurs="0" (optional) to required (no minOccurs attribute, which defaults to 1). This is a breaking schema change: any existing XML documents that omit posixPath will now fail validation, and any existing configurations in the database that do not have a posixPath set will cause deserialization failures at runtime. If this field is truly now required for all S3 archive configurations, existing database records and any client-side DTOs that build these objects must be updated accordingly.

Copilot uses AI. Check for mistakes.
Comment on lines +564 to +579
public void processAutoTiering() throws HpcException {
// obtain a list of HPC data management configurations which are to be included in auto-tiering scan and
// create a task for each.
dataManagementService.getAutoTieringDataManagementConfigurations().forEach(dataManagementConfiguration -> {
try {
logger.info("Create bulk auto-tiering task for data management configuration: {}",
dataManagementConfiguration.getId());
dataMigrationService.createBulkAutoTieringTask(dataManagementConfiguration.getId(),
dataManagementConfiguration.getS3UploadConfigurationId(),
dataManagementConfiguration.getS3AutoTieringConfigurationId(), null);

} catch (HpcException e) {
logger.error("Failed to process auto-tiering for data management configuration: {}",
dataManagementConfiguration.getId(), e);
}
});
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The processAutoTiering() (no-arg, called by the scheduler) uses dataManagementConfiguration.getS3UploadConfigurationId() as the fromS3ArchiveConfigurationId when calling createBulkAutoTieringTask. However, for auto-tiering the source is the external VAST archive, not the S3 upload configuration. The upload configuration ID represents where files are uploaded to, not the external source being scanned. The fromS3ArchiveConfigurationId argument in createBulkAutoTieringTask should probably be the same configuration that has autoTieringSearchPath and autoTieringInactivityMonths defined, which is the auto-tiering source configuration, not the upload configuration.

Copilot uses AI. Check for mistakes.
// Get data objects eligible for auto-tiering and create individual migration tasks
processBulkAutoTieringMigration(bulkAutoTieringTask);

// Mark the bulk metadata update task - in-progress
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 635 says "Mark the bulk metadata update task - in-progress" but the code is actually processing a bulk auto-tiering task, not a bulk metadata update task. This is a copy-paste error from the processBulkMetadataUpdateMigrationReceived method.

Suggested change
// Mark the bulk metadata update task - in-progress
// Mark the bulk auto-tiering task - in-progress

Copilot uses AI. Check for mistakes.
bulkAutoTieringTask.getFromS3ArchiveConfigurationId(),
bulkAutoTieringTask.getToS3ArchiveConfigurationId(),
bulkAutoTieringTask.getId(), bulkAutoTieringTask.getAlignArchivePath(),
null, null, bulkAutoTieringTask.getRetryUserId(),
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In processBulkAutoTieringMigration, bulkAutoTieringTask.getRetryUserId() is passed as the retryUserId parameter to createDataObjectMigrationTask, but for a new auto-tiering task (not a retry), this value would typically be null. However, retryTaskId is passed as null while retryUserId is taken from the bulk task's retryUserId. This is inconsistent — if retryTaskId is null, retryUserId should also be null. This looks like a copy-paste mistake.

Suggested change
null, null, bulkAutoTieringTask.getRetryUserId(),
null, null, null,

Copilot uses AI. Check for mistakes.
Comment on lines +79 to +81
return jdbcTemplate.queryForList(
GET_FILES_NOT_ACCESSED_SQL.replace("{months}", monthsNotAccessed.toString()),
String.class, searchPath, searchPath);
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SQL query substitutes {months} using plain string replacement (GET_FILES_NOT_ACCESSED_SQL.replace("{months}", monthsNotAccessed.toString())). Even though monthsNotAccessed is an Integer (so there is no direct SQL injection risk from that field), the substitution bypasses JDBC parameterized query handling. Trino's JDBC driver may support parameterized INTERVAL expressions; if so, this should use a proper bind parameter instead of string concatenation. At minimum, a comment should explain why string substitution is used here rather than a bind parameter.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +42
--
-- hpc_auto_tiering.sql
--
-- Copyright SVG, Inc.
-- Copyright Leidos Biomedical Research, Inc
--
-- Distributed under the OSI-approved BSD 3-Clause License.
-- See http://ncip.github.com/HPC/LICENSE.txt for details.
--
--
-- @author <a href="mailto:rosenbergea@nih.gov">Eran Rosenberg</a>
--

-- HPC_S3_ARCHIVE_CONFIGURATION
ALTER TABLE HPC_S3_ARCHIVE_CONFIGURATION add (
AUTO_TIERING_SEARCH_PATH VARCHAR2(2700)
);
COMMENT ON COLUMN HPC_S3_ARCHIVE_CONFIGURATION.AUTO_TIERING_SEARCH_PATH IS 'The search path for auto-tiering external archive files';

ALTER TABLE HPC_S3_ARCHIVE_CONFIGURATION add (
AUTO_TIERING_INACTIVITY_MONTHS NUMBER(10)
);
COMMENT ON COLUMN HPC_S3_ARCHIVE_CONFIGURATION.AUTO_TIERING_INACTIVITY_MONTHS IS 'The inactivity period in months before files are auto-tiered to Glacier Deep Archive';

-- HPC_DATA_MANAGEMENT_CONFIGURATION
ALTER TABLE HPC_DATA_MANAGEMENT_CONFIGURATION add (
S3_AUTO_TIERING_ARCHIVE_CONFIGURATION_ID VARCHAR2(50)
);
COMMENT ON COLUMN HPC_DATA_MANAGEMENT_CONFIGURATION.S3_AUTO_TIERING_ARCHIVE_CONFIGURATION_ID IS 'The S3 archive configuration ID for auto-tiering external archive into';

-- HPC_DATA_MIGRATION_TASK
ALTER TABLE HPC_DATA_MIGRATION_TASK add (
FROM_S3_ARCHIVE_LOCATION_FILE_CONTAINER_ID VARCHAR2(50)
);
COMMENT ON COLUMN HPC_DATA_MIGRATION_TASK.FROM_S3_ARCHIVE_LOCATION_FILE_CONTAINER_ID IS 'The archive file container ID (bucket) migrating from';

ALTER TABLE HPC_DATA_MIGRATION_TASK add (
FROM_S3_ARCHIVE_LOCATION_FILE_ID VARCHAR2(2700)
);
COMMENT ON COLUMN HPC_DATA_MIGRATION_TASK.FROM_S3_ARCHIVE_LOCATION_FILE_ID IS 'The archive file ID (object ID) migrating from';


Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The identical SQL migration script hpc_auto_tiering.sql has been added to both release-3.23.0/ and release-3.24.0/. Only release-3.24.0/ should contain this script, since the feature is part of version 3.24.0. Including it under release-3.23.0/ is likely a mistake and would cause the DDL to be applied twice if both release directories are executed during an upgrade from 3.22.x to 3.24.0, leading to "column already exists" errors.

Copilot uses AI. Check for mistakes.
@eranrosenberg eranrosenberg changed the base branch from releases/3.23.0 to releases/3.24.0 March 28, 2026 17:23
@eranrosenberg eranrosenberg requested a review from Copilot March 28, 2026 17:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 47 out of 48 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +43 to +50
<!-- HPC Trino DB data source -->
<bean id="hpcTrinoDataSource" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close">
<property name="driverClassName" value="io.trino.jdbc.TrinoDriver" />
<property name="url" value="${hpc.dao.trino.url}" />
<property name="username" value="${hpc.dao.trino.username}" />
<property name="password" value="${hpc.dao.trino.password}" />
<property name="maxActive" value="${hpc.dao.trino.maxActive}" />
</bean>
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This config adds a Trino DataSource/JdbcTemplate that references hpc.dao.trino.* placeholders. Since this beans file is imported broadly, any runtime/tests that don't define these properties will fail at context startup. Consider defining safe defaults in the module properties files, or conditionally creating these beans only when Trino is enabled/needed (e.g., profile/conditional property).

Copilot uses AI. Check for mistakes.
Comment on lines +101 to +104
private void dbConnect() throws HpcException {
try {
jdbcTemplate.getDataSource().getConnection();

Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dbConnect() opens a JDBC connection but never closes it, which can leak connections on startup/reload. Use try-with-resources (or explicitly close) around getConnection() to ensure it is returned to the pool.

Copilot uses AI. Check for mistakes.
Comment on lines 55 to 60
<xsd:element name="dataManagementConfigurationId" type="xsd:string" />
<xsd:element name="externalStorage" type="xsd:boolean" />
<xsd:element name="posixPath" type="xsd:string" minOccurs="0"/>
<xsd:element name="posixPath" type="xsd:string" />
<xsd:element name="autoTieringSearchPath" type="xsd:string" />
<xsd:element name="autoTieringInactivityMonths" type="xsd:int" minOccurs="0" />
</xsd:sequence>
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These schema changes make previously-optional fields required (posixPath, autoTieringSearchPath, and s3AutoTieringConfigurationId). This is a backward-incompatible XSD change and conflicts with the code which treats auto-tiering fields as optional (only validated when auto-tiering is enabled). Consider restoring minOccurs="0" for existing optional fields and marking the new auto-tiering fields optional as well.

Copilot uses AI. Check for mistakes.
Comment on lines 261 to 264
hpc.integration.sps.domain=NIH

hpc.integration.irods.basePath=/tempZone/home

Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file now defines hpc.integration.irods.basePath twice (earlier in the file and again here). Duplicate keys in .properties are order-dependent and make the effective configuration unclear; please keep a single definition.

Copilot uses AI. Check for mistakes.
Comment on lines +121 to +128
conn = connect(
host=args.host,
port=args.port,
user=args.user,
auth=BasicAuthentication(args.user, password),
http_scheme="https",
verify=False,
catalog=CATALOG,
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Trino client disables TLS certificate verification (verify=False). That’s unsafe by default and can allow MITM credential capture. Prefer verify=True by default and add an explicit flag (e.g., --insecure/--no-verify) for environments that need it.

Copilot uses AI. Check for mistakes.
Comment on lines +49 to +66
import org.apache.commons.lang3.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.dao.DataAccessException;
import org.springframework.dao.IncorrectResultSizeDataAccessException;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.jdbc.core.RowMapper;
import org.springframework.jdbc.core.support.SqlLobValue;
import org.springframework.jdbc.support.lob.DefaultLobHandler;
import org.springframework.jdbc.support.lob.LobHandler;

import java.sql.Types;
import java.util.*;

import static gov.nih.nci.hpc.util.HpcUtil.fromPathsString;
import static gov.nih.nci.hpc.util.HpcUtil.toPathsString;
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import section contains many duplicate imports/static imports (e.g., StringUtils/Logger/JdbcTemplate and the static HpcUtil imports appear twice). This will trigger compilation errors and also makes the file hard to maintain. Please remove the duplicates and re-run import optimization.

Copilot uses AI. Check for mistakes.
s3Configuration.setExternalStorage(rs.getBoolean("EXTERNAL_STORAGE"));
s3Configuration.setPosixPath(rs.getString("POSIX_PATH"));
s3Configuration.setAutoTieringSearchPath(rs.getString("AUTO_TIERING_SEARCH_PATH"));
s3Configuration.setAutoTieringInactivityMonths(rs.getInt("AUTO_TIERING_INACTIVITY_MONTHS"));
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AUTO_TIERING_INACTIVITY_MONTHS is nullable, but rs.getInt(...) returns 0 when the column is NULL. That means configurations without a value will be treated as 0 months (and later validation == null will never trip), potentially auto-tiering everything. Use rs.getObject(..., Integer.class) or rs.getInt(...) + rs.wasNull() to preserve nulls.

Suggested change
s3Configuration.setAutoTieringInactivityMonths(rs.getInt("AUTO_TIERING_INACTIVITY_MONTHS"));
Integer autoTieringInactivityMonths = (Integer) rs.getObject("AUTO_TIERING_INACTIVITY_MONTHS");
s3Configuration.setAutoTieringInactivityMonths(autoTieringInactivityMonths);

Copilot uses AI. Check for mistakes.
Comment on lines +625 to +630
logger.info("Create bulk auto-tiering task for data management configuration: {}",
dataManagementConfiguration.getId());
dataMigrationService.createBulkAutoTieringTask(dataManagementConfiguration.getId(),
dataManagementConfiguration.getS3UploadConfigurationId(),
dataManagementConfiguration.getS3AutoTieringConfigurationId(), null);

Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

processAutoTiering() creates bulk auto-tiering tasks with a null userId. Those tasks later spawn data-object tasks that call registerDataObject(..., userId, ...), so this will likely fail at runtime (or create tasks with missing audit/ownership data). Use the system account/invoker userId here (since the method is @HpcExecuteAsSystemAccount) or otherwise ensure a non-null userId is persisted on the tasks.

Copilot uses AI. Check for mistakes.
dependabot Bot and others added 30 commits April 16, 2026 21:47
Bumps [org.bouncycastle:bcpkix-jdk18on](https://github.com/bcgit/bc-java) from 1.79 to 1.84.
- [Changelog](https://github.com/bcgit/bc-java/blob/main/docs/releasenotes.html)
- [Commits](https://github.com/bcgit/bc-java/commits)

---
updated-dependencies:
- dependency-name: org.bouncycastle:bcpkix-jdk18on
  dependency-version: '1.84'
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Bumps [org.bouncycastle:bcprov-jdk18on](https://github.com/bcgit/bc-java) from 1.79 to 1.84.
- [Changelog](https://github.com/bcgit/bc-java/blob/main/docs/releasenotes.html)
- [Commits](https://github.com/bcgit/bc-java/commits)

---
updated-dependencies:
- dependency-name: org.bouncycastle:bcprov-jdk18on
  dependency-version: '1.84'
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [org.bouncycastle:bcprov-jdk18on](https://github.com/bcgit/bc-java) from 1.79 to 1.84.
- [Changelog](https://github.com/bcgit/bc-java/blob/main/docs/releasenotes.html)
- [Commits](https://github.com/bcgit/bc-java/commits)

---
updated-dependencies:
- dependency-name: org.bouncycastle:bcprov-jdk18on
  dependency-version: '1.84'
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Features/hpcdatamgm 2178 merged to releases/3.25.0
…m.sun.mail-jakarta.mail-2.0.2

Bump com.sun.mail:jakarta.mail from 2.0.1 to 2.0.2 in /src/hpc-server
…g.bouncycastle-bcpkix-jdk18on-1.84

Bump org.bouncycastle:bcpkix-jdk18on from 1.79 to 1.84 in /src/hpc-server
…ouncycastle-bcpkix-jdk18on-1.84

Bump org.bouncycastle:bcpkix-jdk18on from 1.79 to 1.84 in /src/hpc-cli
…ouncycastle-bcpkix-jdk18on-1.84

Bump org.bouncycastle:bcpkix-jdk18on from 1.79 to 1.84 in /src/hpc-web
…ouncycastle-bcprov-jdk18on-1.84

Bump org.bouncycastle:bcprov-jdk18on from 1.79 to 1.84 in /src/hpc-web
…g.bouncycastle-bcprov-jdk18on-1.84

Bump org.bouncycastle:bcprov-jdk18on from 1.79 to 1.84 in /src/hpc-server
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants