Features/hpcdatamgm 2088 - DON'T MERGE by eranrosenberg · Pull Request #142 · CBIIT/HPC_DME_APIs

eranrosenberg · 2026-02-20T14:40:32Z

No description provided.

# Conflicts: # src/hpc-server/hpc-scheduler-migration/src/main/resources/WEB-INF/spring/hpc-scheduler-migration.properties

… for a given config id

Copilot

Pull request overview

This PR introduces initial support for “auto-tiering” migrations, adding new API/scheduler entry points, schema + DB changes, and a DAO intended to query external archives for inactive files.

Changes:

Add auto-tiering REST endpoint and scheduled tasks to create/process bulk auto-tiering migration tasks.
Extend domain schemas/DB migrations and DAO logic for new auto-tiering-related fields/status/type.
Update S3 path-style access provider list to include VAST_NCIVS.

Reviewed changes

Copilot reviewed 23 out of 28 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
utils/hpc-client/lib/hpc-common-3.23.0.jar	Adds a compiled client JAR artifact.
src/hpc-server/hpc-ws-rs-impl/src/main/resources/WEB-INF/spring/hpc-server.properties	Adds `VAST_NCIVS` to path-style enabled S3 providers.
src/hpc-server/hpc-ws-rs-impl/src/main/java/gov/nih/nci/hpc/ws/rs/impl/HpcDataMigrationRestServiceImpl.java	Implements the new `processAutoTiering` REST method.
src/hpc-server/hpc-ws-rs-api/src/main/java/gov/nih/nci/hpc/ws/rs/HpcDataMigrationRestService.java	Adds `/autoTiering/{configurationId}` endpoint definition.
src/hpc-server/hpc-scheduler/src/main/resources/logback.xml	Changes scheduler log path property from `catalina.base` to `catalina.home`.
src/hpc-server/hpc-scheduler/src/main/resources/WEB-INF/spring/hpc-scheduler.properties	Adds `VAST_NCIVS` to path-style enabled S3 providers.
src/hpc-server/hpc-scheduler-migration/src/main/resources/logback.xml	Changes migration-scheduler log path property from `catalina.base` to `catalina.home`.
src/hpc-server/hpc-scheduler-migration/src/main/resources/WEB-INF/spring/hpc-scheduler-migration.properties	Sets migration serverIds/serverId, adds cron entries, updates S3 providers list.
src/hpc-server/hpc-scheduler-migration/src/main/java/gov/nih/nci/hpc/scheduler/migration/impl/HpcScheduledTasksImpl.java	Adds scheduled executions for auto-tiering and bulk auto-tiering processing.
src/hpc-server/hpc-domain-types/src/main/resources/schema/HpcDataMigrationTypes.xsd	Adds new migration status/type enum values.
src/hpc-server/hpc-domain-model/src/main/resources/schema/HpcDataMigration.xsd	Removes stray whitespace.
src/hpc-server/hpc-domain-model/src/main/resources/schema/HpcDataManagement.xsd	Adds new auto-tiering-related configuration fields to the schema.
src/hpc-server/hpc-dao-impl/src/main/scripts/migration/release-3.23.0/hpc_auto_tiering.sql	Adds DB columns for auto-tiering configuration + migration task source-location fields.
src/hpc-server/hpc-dao-impl/src/main/resources/META-INF/spring/hpc-dao-impl-beans-configuration.xml	Registers `hpcExternalArchiveDAO` bean.
src/hpc-server/hpc-dao-impl/src/main/java/gov/nih/nci/hpc/dao/trino/impl/HpcExternalArchiveDAOImpl.java	Introduces a Trino-backed DAO stub for external archive queries (currently dummy).
src/hpc-server/hpc-dao-impl/src/main/java/gov/nih/nci/hpc/dao/oracle/impl/HpcDataMigrationDAOImpl.java	Extends migration task/task-result SQL and task mapping for new columns/statuses.
src/hpc-server/hpc-dao-impl/src/main/java/gov/nih/nci/hpc/dao/oracle/impl/HpcDataManagementConfigurationDAOImpl.java	Maps new S3 auto-tiering DB columns onto transfer configuration objects.
src/hpc-server/hpc-dao-api/src/main/java/gov/nih/nci/hpc/dao/HpcExternalArchiveDAO.java	Adds external archive DAO interface.
src/hpc-server/hpc-bus-service-impl/src/main/java/gov/nih/nci/hpc/bus/impl/HpcDataMigrationBusServiceImpl.java	Adds bus-service logic to create and process bulk auto-tiering tasks.
src/hpc-server/hpc-bus-service-api/src/main/java/gov/nih/nci/hpc/bus/HpcDataMigrationBusService.java	Adds new bus-service API methods for auto-tiering.
src/hpc-server/hpc-app-service-impl/src/main/java/gov/nih/nci/hpc/service/impl/HpcDataMigrationServiceImpl.java	Adds auto-tiering task creation + “files not accessed” lookup via external archive DAO.
src/hpc-server/hpc-app-service-impl/src/main/java/gov/nih/nci/hpc/service/impl/HpcDataManagementServiceImpl.java	Adds helper to list auto-tiering-enabled configurations.
src/hpc-server/hpc-app-service-api/src/main/java/gov/nih/nci/hpc/service/HpcDataMigrationService.java	Extends service API for auto-tiering and new task-creation signature.
src/hpc-server/hpc-app-service-api/src/main/java/gov/nih/nci/hpc/service/HpcDataManagementService.java	Adds API to list auto-tiering-enabled data management configurations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-20T14:49:22Z

+		// TODO: Implement actual query logic using Trino to scan external archive
+		// For now, return dummy paths for testing
+		List<String> filePaths = new ArrayList<>();
+		filePaths.add("/Eran/eran-file-1.txt");
+		filePaths.add("/Eran/eran-file-2.txt");
+		filePaths.add("/Eran/eran-file-3.txt");
+


This DAO currently returns hard-coded dummy file paths and contains a TODO for the real Trino query. Shipping placeholder data here will trigger auto-tiering migrations for non-existent paths and makes the feature behavior incorrect in production. Please implement the real query logic (or gate this behind a feature flag) and remove the hard-coded test paths before merging.

Suggested change

// TODO: Implement actual query logic using Trino to scan external archive

// For now, return dummy paths for testing

List<String> filePaths = new ArrayList<>();

filePaths.add("/Eran/eran-file-1.txt");

filePaths.add("/Eran/eran-file-2.txt");

filePaths.add("/Eran/eran-file-3.txt");

// TODO: Implement actual query logic using Trino to scan the external archive and

// populate filePaths with files not accessed within the given time period.

List<String> filePaths = new ArrayList<>();

// Optional: allow dummy data for testing when explicitly enabled via a feature flag.

// This flag must NOT be enabled in production.

boolean useDummyData = Boolean.getBoolean("hpc.externalArchive.dummyData.enabled");

if (useDummyData) {

logger.warn("Using dummy external archive file paths; feature flag 'hpc.externalArchive.dummyData.enabled' is enabled. "

+ "Ensure this is disabled in production.");

filePaths.add("/Eran/eran-file-1.txt");

filePaths.add("/Eran/eran-file-2.txt");

filePaths.add("/Eran/eran-file-3.txt");

} else {

logger.warn("Trino-backed external archive query is not yet implemented; returning empty file list.");

}

Copilot · 2026-02-20T14:49:22Z

 					dataMigrationTask.getSize(), dataMigrationTask.getRetryTaskId(), dataMigrationTask.getRetryUserId(),
 					dataMigrationTask.getMetadataFromArchiveFileContainerId(),
 					dataMigrationTask.getMetadataToArchiveFileContainerId(),
-					dataMigrationTask.getMetadataArchiveFileIdPattern(), dataMigrationTask.getId(),
+					dataMigrationTask.getMetadataArchiveFileIdPattern(),
+					dataMigrationTask.getId(),


UPSERT_DATA_MIGRATION_TASK_RESULT_SQL now includes AUTO_TIERING_* placeholders, but this jdbcTemplate.update(...) call does not supply corresponding parameters for those new columns (in either the UPDATE or INSERT parameter sets). This will cause a runtime SQL parameter count mismatch; add the missing AUTO_TIERING_* arguments (or remove the placeholders from the SQL if not needed).

Copilot · 2026-02-20T14:49:23Z

+		migrationTask.setStatus(autoTieringRequest ? HpcDataMigrationStatus.AUTO_TIERING_RECEIVED :
+		                                             HpcDataMigrationStatus.RECEIVED);


Auto-tiering tasks are being created with status AUTO_TIERING_RECEIVED, but the migration processing flows shown in this PR only poll/process status RECEIVED (no code paths query for AUTO_TIERING_RECEIVED). As a result, these tasks can be assigned to a server but never picked up for processing. Either keep the status as RECEIVED for data-object migration tasks, or add corresponding processing/queries for AUTO_TIERING_RECEIVED.

Suggested change

migrationTask.setStatus(autoTieringRequest ? HpcDataMigrationStatus.AUTO_TIERING_RECEIVED :

HpcDataMigrationStatus.RECEIVED);

migrationTask.setStatus(HpcDataMigrationStatus.RECEIVED);

Copilot · 2026-02-20T14:49:23Z

 					type="hpc-domain-datatransfer:HpcDataTransferType" />
 				<xsd:element name="s3UploadConfigurationId" type="xsd:string" />
 				<xsd:element name="s3DefaultDownloadConfigurationId" type="xsd:string" />
+				<xsd:element name="s3AutoTieringConfigurationId" type="xsd:string" />


s3AutoTieringConfigurationId is introduced as a required element (no minOccurs=0), but the code enables auto-tiering only when this ID is present/non-empty. Making it required breaks existing configurations that don't use auto-tiering. Mark it optional (minOccurs=0) or version the schema/DTOs to preserve backward compatibility.

Suggested change

<xsd:element name="s3AutoTieringConfigurationId" type="xsd:string" />

<xsd:element name="s3AutoTieringConfigurationId" type="xsd:string" minOccurs="0" />

Copilot · 2026-02-20T14:49:23Z

 <configuration>

-    <property name="SCHEDULER_SERVER_LOG" value="${catalina.base}/logs/hpc-server/hpc-scheduler.log"/>
+    <property name="SCHEDULER_SERVER_LOG" value="${catalina.home}/logs/hpc-server/hpc-scheduler.log"/>


The active log file path uses ${catalina.home} while the rolling policy still uses ${catalina.base} in fileNamePattern, which will split current vs rolled logs across different directories. Use the same base variable for both (typically ${catalina.base}) to keep logs/rotation consistent per Tomcat instance.

Suggested change

<property name="SCHEDULER_SERVER_LOG" value="${catalina.home}/logs/hpc-server/hpc-scheduler.log"/>

<property name="SCHEDULER_SERVER_LOG" value="${catalina.base}/logs/hpc-server/hpc-scheduler.log"/>

Copilot · 2026-02-20T14:49:24Z

 <configuration>

-    <property name="SCHEDULER_MIGRATION_SERVER_LOG" value="${catalina.base}/logs/hpc-server/hpc-scheduler-migration.log"/>
+    <property name="SCHEDULER_MIGRATION_SERVER_LOG" value="${catalina.home}/logs/hpc-server/hpc-scheduler-migration.log"/>


The active log file path uses ${catalina.home} while the rolling policy still uses ${catalina.base} in fileNamePattern, which will split current vs rolled logs across different directories. Use the same base variable for both (typically ${catalina.base}) to keep logs/rotation consistent per Tomcat instance.

Suggested change

<property name="SCHEDULER_MIGRATION_SERVER_LOG" value="${catalina.home}/logs/hpc-server/hpc-scheduler-migration.log"/>

<property name="SCHEDULER_MIGRATION_SERVER_LOG" value="${catalina.base}/logs/hpc-server/hpc-scheduler-migration.log"/>

Copilot · 2026-02-20T14:49:24Z

 			+ "TO_S3_ARCHIVE_LOCATION_FILE_CONTAINER_ID = ?, TO_S3_ARCHIVE_LOCATION_FILE_ID = ?, SERVER_ID = ?, ALIGN_ARCHIVE_PATH = ?, DATA_SIZE = ?, "
 			+ "RETRY_TASK_ID = ?, RETRY_USER_ID = ?, METADATA_FROM_ARCHIVE_FILE_CONTAINER_ID = ?, "
-			+ "METADATA_TO_ARCHIVE_FILE_CONTAINER_ID = ?, METADATA_ARCHIVE_FILE_ID_PATTERN = ? "
+			+ "METADATA_TO_ARCHIVE_FILE_CONTAINER_ID = ?, METADATA_ARCHIVE_FILE_ID_PATTERN = ?, AUTO_TIERING_FILE_CONTAINER_ID = ?, AUTO_TIERING_FILE_ID = ? "
 			+ "when not matched then insert (ID, PARENT_ID, USER_ID, PATH, CONFIGURATION_ID, FROM_S3_ARCHIVE_CONFIGURATION_ID, "


UPSERT_DATA_MIGRATION_TASK_RESULT_SQL references AUTO_TIERING_FILE_CONTAINER_ID and AUTO_TIERING_FILE_ID columns. No migration in this PR adds these columns to HPC_DATA_MIGRATION_TASK_RESULT (and release-3.23.0/hpc_auto_tiering.sql only alters HPC_DATA_MIGRATION_TASK). Align the SQL with actual schema changes, or add the missing columns/migration.

Copilot · 2026-02-20T14:49:24Z

 		s3Configuration.setExternalStorage(rs.getBoolean("EXTERNAL_STORAGE"));
 		s3Configuration.setPosixPath(rs.getString("POSIX_PATH"));
+		s3Configuration.setAutoTieringSearchPath(rs.getString("AUTO_TIERING_SEARCH_PATH"));
+		s3Configuration.setAutoTieringInactivityMonths(rs.getInt("AUTO_TIERING_INACTIVITY_MONTHS"));


AUTO_TIERING_INACTIVITY_MONTHS is nullable (added without NOT NULL), but this mapper uses rs.getInt(...), which returns 0 when the DB value is NULL. That makes it impossible to distinguish “unset” from an actual 0-month value and can cause auto-tiering to proceed with an unintended default. Use rs.getObject(..., Integer.class) or rs.getInt + rs.wasNull() to preserve nulls.

Suggested change

s3Configuration.setAutoTieringInactivityMonths(rs.getInt("AUTO_TIERING_INACTIVITY_MONTHS"));

s3Configuration.setAutoTieringInactivityMonths(

rs.getObject("AUTO_TIERING_INACTIVITY_MONTHS", Integer.class));

Copilot · 2026-02-20T14:49:25Z

 			<xsd:element name="externalStorage" type="xsd:boolean" />
-			<xsd:element name="posixPath" type="xsd:string" minOccurs="0"/>
+			<xsd:element name="posixPath" type="xsd:string" />
+			<xsd:element name="autoTieringSearchPath" type="xsd:string" />


autoTieringSearchPath is added as a required element (no minOccurs=0), but the DB migration adds AUTO_TIERING_SEARCH_PATH as nullable and the service treats auto-tiering as an optional feature. Keeping this required is a breaking schema change for existing configurations/clients; make it optional (minOccurs=0) or version the schema.

Suggested change

<xsd:element name="autoTieringSearchPath" type="xsd:string" />

<xsd:element name="autoTieringSearchPath" type="xsd:string" minOccurs="0" />

Copilot · 2026-02-20T14:49:25Z

+hpc.service.dataMigration.serverIds=server-id
 hpc.service.serverId=server-id


This properties file now hard-codes hpc.service.dataMigration.serverIds=server-id, while the other shipped properties (e.g., hpc-server.properties / hpc-scheduler.properties) leave serverIds empty for environment-specific configuration. Hard-coding a placeholder here can cause migrations to be assigned incorrectly in non-dev deployments; consider keeping it empty and documenting/overriding via ${catalina.base}/conf instead.

Suggested change

hpc.service.dataMigration.serverIds=server-id

hpc.service.serverId=server-id

# Server IDs are intentionally left empty for environment-specific configuration.

# Override these values via ${catalina.base}/conf in each deployment.

hpc.service.dataMigration.serverIds=

hpc.service.serverId=

# Conflicts: # utils/hpc-client/hpc-cli-3.24.0.jar # utils/hpc-client/lib/hpc-common-3.24.0.jar # utils/hpc-client/lib/hpc-domain-types-3.24.0.jar # utils/hpc-client/lib/hpc-dto-3.24.0.jar

Copilot

Pull request overview

Copilot reviewed 69 out of 74 changed files in this pull request and generated 9 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-08T17:26:41Z

 		<xsd:restriction base="xsd:string">
 			<xsd:enumeration value="RECEIVED" />
 			<xsd:enumeration value="IN_PROGRESS" />
+			<xsd:enumeration value="AUTO_TIERING_RECEIVED" />


The AUTO_TIERING_RECEIVED status is added as a new migration status in HpcDataMigrationTypes.xsd and is used to set the status for auto-tiering data object tasks in createDataObjectMigrationTask. However, the getDataMigrationTasks method in HpcDataMigrationServiceImpl uses a SQL query that fetches tasks by STATUS = ? with a single status value. The processBulkAutoTieringMigrationReceived scheduler method queries with HpcDataMigrationStatus.RECEIVED and type BULK_AUTO_TIERING, which would not include AUTO_TIERING_RECEIVED tasks. It is unclear whether AUTO_TIERING_RECEIVED individual data-object tasks are ever picked up and processed — there is no handler for AUTO_TIERING_RECEIVED data-object tasks in any of the processXxx methods.

Copilot · 2026-03-08T17:26:42Z

+	private void dbConnect() throws HpcException {
+		try {
+			jdbcTemplate.getDataSource().getConnection();
+
+		} catch (Exception e) {
+			throw new HpcException("Failed to connect to Trino DB. Check connection & credentials config",
+					HpcErrorType.DATABASE_ERROR, HpcIntegratedSystem.VAST, e);
+		}
+	}


The dbConnect() method calls jdbcTemplate.getDataSource().getConnection() without closing the returned connection. This causes a connection leak every time the bean is initialized. The connection should be closed in a finally block or use a try-with-resources statement. Note that the existing Oracle DAO HpcDataManagementConfigurationDAOImpl.dbConnect() has the same pattern, but since this is new code it should be written correctly.

Copilot · 2026-03-08T17:26:42Z

+hpc.dao.trino.host=trino.fairway.nci.nih.gov
+hpc.dao.trino.port=443
+hpc.dao.trino.username=ncif-hpcdm-svc
+hpc.dao.trino.password=< Configure Me >


The property file contains a literal placeholder password < Configure Me > committed to the repository. Credentials should never be stored in property files in version control, not even as placeholders. This file should reference an environment variable, a vault secret, or an external configuration store. Additionally, this is a template-like properties file checked in for a specific environment, which is a broader concern.

Copilot · 2026-03-08T17:26:42Z

+			<xsd:element name="posixPath" type="xsd:string" />
+			<xsd:element name="autoTieringSearchPath" type="xsd:string" />
+			<xsd:element name="autoTieringInactivityMonths" type="xsd:int" minOccurs="0" />


The posixPath field in the HpcS3ArchiveConfiguration complex type was changed from minOccurs="0" (optional) to required (no minOccurs attribute, which defaults to 1). This is a breaking schema change: any existing XML documents that omit posixPath will now fail validation, and any existing configurations in the database that do not have a posixPath set will cause deserialization failures at runtime. If this field is truly now required for all S3 archive configurations, existing database records and any client-side DTOs that build these objects must be updated accordingly.

Copilot · 2026-03-08T17:26:43Z

+    public void processAutoTiering() throws HpcException {
+        // obtain a list of HPC data management configurations which are to be included in auto-tiering scan and
+        // create a task for each.
+        dataManagementService.getAutoTieringDataManagementConfigurations().forEach(dataManagementConfiguration -> {
+            try {
+                logger.info("Create bulk auto-tiering task for data management configuration: {}",
+                        dataManagementConfiguration.getId());
+                dataMigrationService.createBulkAutoTieringTask(dataManagementConfiguration.getId(),
+                        dataManagementConfiguration.getS3UploadConfigurationId(),
+                        dataManagementConfiguration.getS3AutoTieringConfigurationId(), null);
+
+            } catch (HpcException e) {
+                logger.error("Failed to process auto-tiering for data management configuration: {}",
+                        dataManagementConfiguration.getId(), e);
+            }
+        });


The processAutoTiering() (no-arg, called by the scheduler) uses dataManagementConfiguration.getS3UploadConfigurationId() as the fromS3ArchiveConfigurationId when calling createBulkAutoTieringTask. However, for auto-tiering the source is the external VAST archive, not the S3 upload configuration. The upload configuration ID represents where files are uploaded to, not the external source being scanned. The fromS3ArchiveConfigurationId argument in createBulkAutoTieringTask should probably be the same configuration that has autoTieringSearchPath and autoTieringInactivityMonths defined, which is the auto-tiering source configuration, not the upload configuration.

Copilot · 2026-03-08T17:26:43Z

+                            // Get data objects eligible for auto-tiering and create individual migration tasks
+                            processBulkAutoTieringMigration(bulkAutoTieringTask);
+
+                            // Mark the bulk metadata update task - in-progress


The comment on line 635 says "Mark the bulk metadata update task - in-progress" but the code is actually processing a bulk auto-tiering task, not a bulk metadata update task. This is a copy-paste error from the processBulkMetadataUpdateMigrationReceived method.

Suggested change

// Mark the bulk metadata update task - in-progress

// Mark the bulk auto-tiering task - in-progress

Copilot · 2026-03-08T17:26:43Z

+                    bulkAutoTieringTask.getFromS3ArchiveConfigurationId(),
+                    bulkAutoTieringTask.getToS3ArchiveConfigurationId(),
+                    bulkAutoTieringTask.getId(), bulkAutoTieringTask.getAlignArchivePath(),
+                    null, null, bulkAutoTieringTask.getRetryUserId(),


In processBulkAutoTieringMigration, bulkAutoTieringTask.getRetryUserId() is passed as the retryUserId parameter to createDataObjectMigrationTask, but for a new auto-tiering task (not a retry), this value would typically be null. However, retryTaskId is passed as null while retryUserId is taken from the bulk task's retryUserId. This is inconsistent — if retryTaskId is null, retryUserId should also be null. This looks like a copy-paste mistake.

Suggested change

null, null, bulkAutoTieringTask.getRetryUserId(),

null, null, null,

Copilot · 2026-03-08T17:26:43Z

+			return jdbcTemplate.queryForList(
+					GET_FILES_NOT_ACCESSED_SQL.replace("{months}", monthsNotAccessed.toString()),
+					String.class, searchPath, searchPath);


The SQL query substitutes {months} using plain string replacement (GET_FILES_NOT_ACCESSED_SQL.replace("{months}", monthsNotAccessed.toString())). Even though monthsNotAccessed is an Integer (so there is no direct SQL injection risk from that field), the substitution bypasses JDBC parameterized query handling. Trino's JDBC driver may support parameterized INTERVAL expressions; if so, this should use a proper bind parameter instead of string concatenation. At minimum, a comment should explain why string substitution is used here rather than a bind parameter.

Copilot · 2026-03-08T17:26:44Z

+--
+-- hpc_auto_tiering.sql
+--
+-- Copyright SVG, Inc.
+-- Copyright Leidos Biomedical Research, Inc
+--
+-- Distributed under the OSI-approved BSD 3-Clause License.
+-- See http://ncip.github.com/HPC/LICENSE.txt for details.
+--
+--
+-- @author <a href="mailto:rosenbergea@nih.gov">Eran Rosenberg</a>
+--
+
+-- HPC_S3_ARCHIVE_CONFIGURATION
+ALTER TABLE HPC_S3_ARCHIVE_CONFIGURATION add (
+	AUTO_TIERING_SEARCH_PATH VARCHAR2(2700)
+	);
+COMMENT ON COLUMN HPC_S3_ARCHIVE_CONFIGURATION.AUTO_TIERING_SEARCH_PATH IS 'The search path for auto-tiering external archive files';
+
+ALTER TABLE HPC_S3_ARCHIVE_CONFIGURATION add (
+	AUTO_TIERING_INACTIVITY_MONTHS NUMBER(10)
+	);
+COMMENT ON COLUMN HPC_S3_ARCHIVE_CONFIGURATION.AUTO_TIERING_INACTIVITY_MONTHS IS 'The inactivity period in months before files are auto-tiered to Glacier Deep Archive';
+
+-- HPC_DATA_MANAGEMENT_CONFIGURATION
+ALTER TABLE HPC_DATA_MANAGEMENT_CONFIGURATION add (
+	S3_AUTO_TIERING_ARCHIVE_CONFIGURATION_ID VARCHAR2(50)
+	);
+COMMENT ON COLUMN HPC_DATA_MANAGEMENT_CONFIGURATION.S3_AUTO_TIERING_ARCHIVE_CONFIGURATION_ID IS 'The S3 archive configuration ID for auto-tiering external archive into';
+
+-- HPC_DATA_MIGRATION_TASK
+ALTER TABLE HPC_DATA_MIGRATION_TASK add (
+	FROM_S3_ARCHIVE_LOCATION_FILE_CONTAINER_ID VARCHAR2(50)
+	);
+COMMENT ON COLUMN HPC_DATA_MIGRATION_TASK.FROM_S3_ARCHIVE_LOCATION_FILE_CONTAINER_ID IS 'The archive file container ID (bucket) migrating from';
+
+ALTER TABLE HPC_DATA_MIGRATION_TASK add (
+	FROM_S3_ARCHIVE_LOCATION_FILE_ID VARCHAR2(2700)
+	);
+COMMENT ON COLUMN HPC_DATA_MIGRATION_TASK.FROM_S3_ARCHIVE_LOCATION_FILE_ID IS 'The archive file ID (object ID) migrating from';
+
+


The identical SQL migration script hpc_auto_tiering.sql has been added to both release-3.23.0/ and release-3.24.0/. Only release-3.24.0/ should contain this script, since the feature is part of version 3.24.0. Including it under release-3.23.0/ is likely a mistake and would cause the DDL to be applied twice if both release directories are executed during an upgrade from 3.22.x to 3.24.0, leading to "column already exists" errors.

Copilot

Pull request overview

Copilot reviewed 47 out of 48 changed files in this pull request and generated 8 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-28T17:32:43Z

+	<!-- HPC Trino DB data source -->
+	<bean id="hpcTrinoDataSource" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close">
+		<property name="driverClassName" value="io.trino.jdbc.TrinoDriver" />
+		<property name="url" value="${hpc.dao.trino.url}" />
+		<property name="username" value="${hpc.dao.trino.username}" />
+		<property name="password" value="${hpc.dao.trino.password}" />
+		<property name="maxActive" value="${hpc.dao.trino.maxActive}" />
+	</bean>


This config adds a Trino DataSource/JdbcTemplate that references hpc.dao.trino.* placeholders. Since this beans file is imported broadly, any runtime/tests that don't define these properties will fail at context startup. Consider defining safe defaults in the module properties files, or conditionally creating these beans only when Trino is enabled/needed (e.g., profile/conditional property).

Copilot · 2026-03-28T17:32:43Z

+	private void dbConnect() throws HpcException {
+		try {
+			jdbcTemplate.getDataSource().getConnection();
+


dbConnect() opens a JDBC connection but never closes it, which can leak connections on startup/reload. Use try-with-resources (or explicitly close) around getConnection() to ensure it is returned to the pool.

Copilot · 2026-03-28T17:32:43Z

 			<xsd:element name="dataManagementConfigurationId" type="xsd:string" />
 			<xsd:element name="externalStorage" type="xsd:boolean" />
-			<xsd:element name="posixPath" type="xsd:string" minOccurs="0"/>
+			<xsd:element name="posixPath" type="xsd:string" />
+			<xsd:element name="autoTieringSearchPath" type="xsd:string" />
+			<xsd:element name="autoTieringInactivityMonths" type="xsd:int" minOccurs="0" />
 		</xsd:sequence>


These schema changes make previously-optional fields required (posixPath, autoTieringSearchPath, and s3AutoTieringConfigurationId). This is a backward-incompatible XSD change and conflicts with the code which treats auto-tiering fields as optional (only validated when auto-tiering is enabled). Consider restoring minOccurs="0" for existing optional fields and marking the new auto-tiering fields optional as well.

Copilot · 2026-03-28T17:32:43Z

 hpc.integration.sps.domain=NIH

-   
+hpc.integration.irods.basePath=/tempZone/home



This file now defines hpc.integration.irods.basePath twice (earlier in the file and again here). Duplicate keys in .properties are order-dependent and make the effective configuration unclear; please keep a single definition.

Copilot · 2026-03-28T17:32:44Z

+    conn = connect(
+        host=args.host,
+        port=args.port,
+        user=args.user,
+        auth=BasicAuthentication(args.user, password),
+        http_scheme="https",
+        verify=False,
+        catalog=CATALOG,


The Trino client disables TLS certificate verification (verify=False). That’s unsafe by default and can allow MITM credential capture. Prefer verify=True by default and add an explicit flag (e.g., --insecure/--no-verify) for environments that need it.

Copilot · 2026-03-28T17:32:44Z

+import org.apache.commons.lang3.StringUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.springframework.beans.factory.annotation.Autowired;
+import org.springframework.beans.factory.annotation.Qualifier;
+import org.springframework.dao.DataAccessException;
+import org.springframework.dao.IncorrectResultSizeDataAccessException;
+import org.springframework.jdbc.core.JdbcTemplate;
+import org.springframework.jdbc.core.RowMapper;
+import org.springframework.jdbc.core.support.SqlLobValue;
+import org.springframework.jdbc.support.lob.DefaultLobHandler;
+import org.springframework.jdbc.support.lob.LobHandler;
+
+import java.sql.Types;
+import java.util.*;
+
+import static gov.nih.nci.hpc.util.HpcUtil.fromPathsString;
+import static gov.nih.nci.hpc.util.HpcUtil.toPathsString;


The import section contains many duplicate imports/static imports (e.g., StringUtils/Logger/JdbcTemplate and the static HpcUtil imports appear twice). This will trigger compilation errors and also makes the file hard to maintain. Please remove the duplicates and re-run import optimization.

Copilot · 2026-03-28T17:32:44Z

 		s3Configuration.setExternalStorage(rs.getBoolean("EXTERNAL_STORAGE"));
 		s3Configuration.setPosixPath(rs.getString("POSIX_PATH"));
+		s3Configuration.setAutoTieringSearchPath(rs.getString("AUTO_TIERING_SEARCH_PATH"));
+		s3Configuration.setAutoTieringInactivityMonths(rs.getInt("AUTO_TIERING_INACTIVITY_MONTHS"));


AUTO_TIERING_INACTIVITY_MONTHS is nullable, but rs.getInt(...) returns 0 when the column is NULL. That means configurations without a value will be treated as 0 months (and later validation == null will never trip), potentially auto-tiering everything. Use rs.getObject(..., Integer.class) or rs.getInt(...) + rs.wasNull() to preserve nulls.

Suggested change

s3Configuration.setAutoTieringInactivityMonths(rs.getInt("AUTO_TIERING_INACTIVITY_MONTHS"));

Integer autoTieringInactivityMonths = (Integer) rs.getObject("AUTO_TIERING_INACTIVITY_MONTHS");

s3Configuration.setAutoTieringInactivityMonths(autoTieringInactivityMonths);

Copilot · 2026-03-28T17:32:44Z

+                logger.info("Create bulk auto-tiering task for data management configuration: {}",
+                        dataManagementConfiguration.getId());
+                dataMigrationService.createBulkAutoTieringTask(dataManagementConfiguration.getId(),
+                        dataManagementConfiguration.getS3UploadConfigurationId(),
+                        dataManagementConfiguration.getS3AutoTieringConfigurationId(), null);
+


processAutoTiering() creates bulk auto-tiering tasks with a null userId. Those tasks later spawn data-object tasks that call registerDataObject(..., userId, ...), so this will likely fail at runtime (or create tasks with missing audit/ownership data). Use the system account/invoker userId here (since the method is @HpcExecuteAsSystemAccount) or otherwise ensure a non-null userId is persisted on the tasks.

Bumps [org.bouncycastle:bcpkix-jdk18on](https://github.com/bcgit/bc-java) from 1.79 to 1.84. - [Changelog](https://github.com/bcgit/bc-java/blob/main/docs/releasenotes.html) - [Commits](https://github.com/bcgit/bc-java/commits) --- updated-dependencies: - dependency-name: org.bouncycastle:bcpkix-jdk18on dependency-version: '1.84' dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>

…ownload Task dashboard

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Bumps [org.bouncycastle:bcprov-jdk18on](https://github.com/bcgit/bc-java) from 1.79 to 1.84. - [Changelog](https://github.com/bcgit/bc-java/blob/main/docs/releasenotes.html) - [Commits](https://github.com/bcgit/bc-java/commits) --- updated-dependencies: - dependency-name: org.bouncycastle:bcprov-jdk18on dependency-version: '1.84' dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>

…trieving size

Features/hpcdatamgm 2178 merged to releases/3.25.0

… into releases/3.25.0

…d potential issue with size calculation

…eleases/3.25.0

…, hpc-cli and hpc-web

…m.sun.mail-jakarta.mail-2.0.2 Bump com.sun.mail:jakarta.mail from 2.0.1 to 2.0.2 in /src/hpc-server

…g.bouncycastle-bcpkix-jdk18on-1.84 Bump org.bouncycastle:bcpkix-jdk18on from 1.79 to 1.84 in /src/hpc-server

…ouncycastle-bcpkix-jdk18on-1.84 Bump org.bouncycastle:bcpkix-jdk18on from 1.79 to 1.84 in /src/hpc-cli

…ouncycastle-bcpkix-jdk18on-1.84 Bump org.bouncycastle:bcpkix-jdk18on from 1.79 to 1.84 in /src/hpc-web

…ouncycastle-bcprov-jdk18on-1.84 Bump org.bouncycastle:bcprov-jdk18on from 1.79 to 1.84 in /src/hpc-web

…g.bouncycastle-bcprov-jdk18on-1.84 Bump org.bouncycastle:bcprov-jdk18on from 1.79 to 1.84 in /src/hpc-server

…torUserId.

Eran Rosenberg added 8 commits February 2, 2026 13:01

Initial code w/ DB script

b5ac515

Merge branch 'releases/3.23.0' into features/HPCDATAMGM-2088

308b18a

# Conflicts: # src/hpc-server/hpc-scheduler-migration/src/main/resources/WEB-INF/spring/hpc-scheduler-migration.properties

Merge branch 'releases/3.23.0' into features/HPCDATAMGM-2088

1b788c6

Done w/ auto-tiering bulk task and a new API to kick off auto-tiering…

cea5f4f

… for a given config id

Merge branch 'releases/3.23.0' into features/HPCDATAMGM-2088

bb489d8

Progress so far - bus and app services

a369e12

Merge branch 'releases/3.23.0' into features/HPCDATAMGM-2088

7a6d8ee

Changes so far to include 2 scheduled tasks

8d3a05a

Copilot AI review requested due to automatic review settings February 20, 2026 14:40

Copilot started reviewing on behalf of eranrosenberg February 20, 2026 14:41 View session

Copilot AI reviewed Feb 20, 2026

View reviewed changes

Eran Rosenberg added 5 commits March 8, 2026 09:48

Design doc update

08009d2

Merge branch 'releases/3.23.0' into features/HPCDATAMGM-2088

227af59

Merge branch 'releases/3.24.0' into features/HPCDATAMGM-2088

3112712

# Conflicts: # utils/hpc-client/hpc-cli-3.24.0.jar # utils/hpc-client/lib/hpc-common-3.24.0.jar # utils/hpc-client/lib/hpc-domain-types-3.24.0.jar # utils/hpc-client/lib/hpc-dto-3.24.0.jar

Move scripts

043a9bf

Completed DAO w/ Trino

1cc8173

eranrosenberg requested a review from Copilot March 8, 2026 17:22

Copilot started reviewing on behalf of eranrosenberg March 8, 2026 17:23 View session

Copilot AI reviewed Mar 8, 2026

View reviewed changes

Eran Rosenberg added 5 commits March 25, 2026 07:07

HPCDATAMGM-2088 - completed registration w/ archive linking

e266644

Merged 3.24

f3f2ff1

Merge branch 'releases/3.24.0' into features/HPCDATAMGM-2088

c9ca7ac

spacing

2f55047

complete bulk task

fff26c0

eranrosenberg changed the base branch from releases/3.23.0 to releases/3.24.0 March 28, 2026 17:23

eranrosenberg requested a review from Copilot March 28, 2026 17:24

Copilot started reviewing on behalf of eranrosenberg March 28, 2026 17:24 View session

Copilot AI reviewed Mar 28, 2026

View reviewed changes

dinhys added 2 commits March 29, 2026 20:45

Updating to new cert for hpc-server and vast.

5035d7b

Merge remote-tracking branch 'remotes/origin/releases/3.24.0'

7e9d407

dependabot Bot and others added 30 commits April 16, 2026 21:47

HPCDATAMGM-2178: Additional changes to display download size in the D…

094f0cc

…ownload Task dashboard

Apply suggestion from @Copilot

ce7e18a

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

HPCDATAMGM-2178: Updated with Copilot recommended changes

1172e9b

HPCDATAMGM-2178: Additional changes to improve performance of APIs re…

f5a4c64

…trieving size

Added release notes for 3.25.0

3a72235

HPCDATAMGM-2178: Removed unused code referencing HpcReportService

bb3884a

Merge pull request #167 from CBIIT/features/HPCDATAMGM-2178

bb14b7f

Features/hpcdatamgm 2178 merged to releases/3.25.0

Additional changes for Release Notes for 3.25.0

2e69d3a

Merge branch 'releases/3.25.0' of https://github.com/CBIIT/HPC_DME_APIs…

af28a66

… into releases/3.25.0

Additional changes for Release Notes for 3.25.0

e26eea9

Merge branch 'releases/3.25.0' into features/HPCDATAMGM-2088

9297d63

HPCDATAMGM-2178: Display download size for retried transactions, fixe…

c728193

…d potential issue with size calculation

Merge remote-tracking branch 'origin/features/HPCDATAMGM-2178' into r…

501aed5

…eleases/3.25.0

HPCDATAMGM-2185: Upgrade jakarta.mail from 2.0.1 to 2.0.2 in hpc-server

7f7f839

HPCDATAMGM-2186: Upgrade bouncycastle from 1.79 to 1.84 in hpc-server…

6f6dfa1

…, hpc-cli and hpc-web

HPCDATAMGM-2186: Upgrade bouncycastle from 1.79 to 1.84 in hpc-server…

59472a5

…, hpc-cli and hpc-web

Merge pull request #163 from CBIIT/dependabot/maven/src/hpc-server/co…

9c488d6

…m.sun.mail-jakarta.mail-2.0.2 Bump com.sun.mail:jakarta.mail from 2.0.1 to 2.0.2 in /src/hpc-server

Merge pull request #164 from CBIIT/dependabot/maven/src/hpc-server/or…

9af70ec

…g.bouncycastle-bcpkix-jdk18on-1.84 Bump org.bouncycastle:bcpkix-jdk18on from 1.79 to 1.84 in /src/hpc-server

Merge pull request #165 from CBIIT/dependabot/maven/src/hpc-cli/org.b…

7b81f5e

…ouncycastle-bcpkix-jdk18on-1.84 Bump org.bouncycastle:bcpkix-jdk18on from 1.79 to 1.84 in /src/hpc-cli

Merge pull request #166 from CBIIT/dependabot/maven/src/hpc-web/org.b…

811df92

…ouncycastle-bcpkix-jdk18on-1.84 Bump org.bouncycastle:bcpkix-jdk18on from 1.79 to 1.84 in /src/hpc-web

Merge pull request #168 from CBIIT/dependabot/maven/src/hpc-web/org.b…

df601df

…ouncycastle-bcprov-jdk18on-1.84 Bump org.bouncycastle:bcprov-jdk18on from 1.79 to 1.84 in /src/hpc-web

Merge pull request #169 from CBIIT/dependabot/maven/src/hpc-server/or…

74e6c9d

…g.bouncycastle-bcprov-jdk18on-1.84 Bump org.bouncycastle:bcprov-jdk18on from 1.79 to 1.84 in /src/hpc-server

Updated to current property hpc.service.notification.systemAdministra…

5c306da

…torUserId.

Added HPCDATAMGM-2185, HPCDATAMGM-2186.

d22f618

Resolving performance issue with collection download retry.

e29d46c

Created branch releases/3.26.0.

3c113bb

Merge branch 'releases/3.26.0' into features/HPCDATAMGM-2088

453c70c

		migrationTask.setStatus(autoTieringRequest ? HpcDataMigrationStatus.AUTO_TIERING_RECEIVED :
		HpcDataMigrationStatus.RECEIVED);

	migrationTask.setStatus(autoTieringRequest ? HpcDataMigrationStatus.AUTO_TIERING_RECEIVED :
	HpcDataMigrationStatus.RECEIVED);
	migrationTask.setStatus(HpcDataMigrationStatus.RECEIVED);

	<xsd:element name="s3AutoTieringConfigurationId" type="xsd:string" />
	<xsd:element name="s3AutoTieringConfigurationId" type="xsd:string" minOccurs="0" />

	<property name="SCHEDULER_SERVER_LOG" value="${catalina.home}/logs/hpc-server/hpc-scheduler.log"/>
	<property name="SCHEDULER_SERVER_LOG" value="${catalina.base}/logs/hpc-server/hpc-scheduler.log"/>

	<property name="SCHEDULER_MIGRATION_SERVER_LOG" value="${catalina.home}/logs/hpc-server/hpc-scheduler-migration.log"/>
	<property name="SCHEDULER_MIGRATION_SERVER_LOG" value="${catalina.base}/logs/hpc-server/hpc-scheduler-migration.log"/>

	s3Configuration.setAutoTieringInactivityMonths(rs.getInt("AUTO_TIERING_INACTIVITY_MONTHS"));
	s3Configuration.setAutoTieringInactivityMonths(
	rs.getObject("AUTO_TIERING_INACTIVITY_MONTHS", Integer.class));

	<xsd:element name="autoTieringSearchPath" type="xsd:string" />
	<xsd:element name="autoTieringSearchPath" type="xsd:string" minOccurs="0" />

		hpc.service.dataMigration.serverIds=server-id
		hpc.service.serverId=server-id

	// Mark the bulk metadata update task - in-progress
	// Mark the bulk auto-tiering task - in-progress

	null, null, bulkAutoTieringTask.getRetryUserId(),
	null, null, null,

		hpc.integration.sps.domain=NIH


		hpc.integration.irods.basePath=/tempZone/home

	s3Configuration.setAutoTieringInactivityMonths(rs.getInt("AUTO_TIERING_INACTIVITY_MONTHS"));
	Integer autoTieringInactivityMonths = (Integer) rs.getObject("AUTO_TIERING_INACTIVITY_MONTHS");
	s3Configuration.setAutoTieringInactivityMonths(autoTieringInactivityMonths);

Conversation

eranrosenberg commented Feb 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026