Skip to content

Conversation

@ldaugusto
Copy link
Contributor

Details

This PR adds the complete backend infrastructure for the Optimization Studio feature, enabling creation, retrieval, and management of Studio optimization runs through REST API endpoints.

Backend API Changes

  • Added OptimizationStudioConfig model with comprehensive nested validation for:
    • Prompt configuration (system/user messages)
    • LLM model configuration (provider, name, parameters)
    • Evaluation metrics (equals, g_eval, contains, levenshtein_ratio)
    • Optimizer configuration (type and parameters)
  • Added OptimizationStudioLog model for presigned S3 log URL responses
  • Extended /v1/private/optimizations endpoints with:
    • GET /studio/{id}/logs - Retrieve presigned S3 URLs for optimization logs
    • include_studio_config query parameter on GET /{id} - Optionally include full studio config
    • studio_only query parameter on GET / - Filter for Studio optimizations only

Service Layer Implementation

  • Implemented Redis RQ job enqueueing to OPTIMIZER_CLOUD queue
  • Added OptimizationStudioJobMessage for Python backend communication
  • Added workspaceName validation with early failure for Studio optimizations
  • Implemented S3 presigned URL generation for log file downloads
  • Added automatic optimization cancellation on job enqueueing failures
  • Updated job message to use workspaceName instead of workspaceId for Python SDK compatibility

Data Layer Updates

  • Added studio_config JSON column via Liquibase migration
  • Implemented JSON serialization/deserialization for studio configurations
  • Extended search criteria with studio_only filter
  • Updated SQL queries to support studio config filtering

Validation & Error Handling

  • Added @Valid annotation on studioConfig field for nested validation
  • Implemented workspaceName presence check before optimization creation
  • Added null checks with automatic cancellation in job enqueueing

Change checklist

  • User facing
  • Documentation update

Issues

  • OPIK-2768

Testing

  • Added comprehensive integration tests for Studio optimization creation and retrieval
  • Added test for Redis job enqueueing verification with actual RQ inspection
  • Added test for include_studio_config query parameter behavior
  • Added test for studio_only filtering functionality
  • Fixed test isolation issue in findOptimizations__withStudioOnlyFlag by using unique workspace
  • All tests verify proper JSON serialization/deserialization of studio configurations
  • Validation tests ensure proper error responses for missing required fields

Documentation

N/A - Backend API implementation, frontend documentation will follow

- Add studio_config column to optimizations table for Studio configuration
- Integrate Studio endpoints into existing OptimizationsResource (GET with flags, POST, logs endpoint)
- Add OptimizationStudioConfig with nested records for type-safe configuration
- Implement Redis RQ job enqueueing for Studio optimizations
- Add error handling: cancel optimization if Redis enqueue fails
- Add presigned S3 URL support for optimization logs
- Add comprehensive integration tests for Studio optimization flow
- Add OptimizationStudioJobMessage for type-safe Redis job payloads
@ldaugusto ldaugusto requested a review from a team as a code owner November 12, 2025 14:35
Copilot AI review requested due to automatic review settings November 12, 2025 14:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements the complete backend infrastructure for the Optimization Studio feature, enabling creation, retrieval, and management of Studio optimization runs through REST API endpoints. The implementation includes comprehensive model validation, Redis RQ job enqueueing for Python backend communication, S3 presigned URL generation for log downloads, and automatic optimization cancellation on job enqueueing failures.

Key changes:

  • Added OptimizationStudioConfig model with nested validation for prompt configuration, LLM models, evaluation metrics, and optimizer settings
  • Extended /v1/private/optimizations endpoints with Studio-specific functionality including logs retrieval and optional config inclusion
  • Implemented Redis RQ job enqueueing to OPTIMIZER_CLOUD queue with automatic cancellation on failure

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
OptimizationsResourceTest.java Added comprehensive integration tests for Studio optimization creation, Redis job enqueueing verification, and filtering
OptimizationResourceClient.java Extended test client with methods for Studio config inclusion and logs retrieval endpoints
000046_add_studio_config_to_optimizations.sql Added database migration for studio_config JSON column in optimizations table
PreSignerService.java Added getter method for presigned URL expiration timeout configuration
OptimizationStudioJobMessage.java Created message model for Redis RQ job communication with Python backend
OptimizationService.java Implemented Studio optimization job enqueueing, config scrubbing logic, and logs URL generation
OptimizationSearchCriteria.java Added studioOnly filter parameter for optimization searches
OptimizationDAO.java Implemented JSON serialization/deserialization for studio configurations with database operations
OptimizationsResource.java Added REST endpoints for Studio logs and optional config inclusion parameter
OptimizationStudioLog.java Created response model for presigned S3 log URLs
OptimizationStudioConfig.java Defined comprehensive validation model for Studio optimization configurations
Optimization.java Added studioConfig field with @Valid annotation to optimization model


// Should only return Studio optimizations
assertThat(page.content()).isNotEmpty();
assertThat(page.content()).allMatch(opt -> opt.studioConfig() != null);
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is incorrect. The test verifies studioOnly=true filtering, but earlier in the code (line 748) we see that studioConfig is scrubbed by default. According to line 116-120 in OptimizationService.java, studioConfig is only preserved when studioOnly=true. However, the find operation returns a page without explicitly requesting to include studio configs, so this assertion will fail because studioConfig will be null even for Studio optimizations unless explicitly requested. The test should either verify a different property or request inclusion of studio configs.

Copilot uses AI. Check for mistakes.
Comment on lines +239 to +245
if (workspaceName == null) {
log.error(
"Cannot enqueue Studio optimization job for id: '{}' - workspaceName is null, marking as CANCELLED",
optimization.id());
cancelOptimization(optimization.id(), workspaceId);
return;
}
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The null check for workspaceName is checking the wrong thing. Looking at line 147, workspaceName is extracted with ctx.get(RequestContext.WORKSPACE_NAME) which will throw a NoSuchElementException if the key doesn't exist, rather than returning null. The check should use ctx.getOrDefault(RequestContext.WORKSPACE_NAME, null) at line 147 to make this null check meaningful.

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please double check this, it might be right, you this code will never handle the situation unless you address the comment.

Feel free to also use StringUtils.isBlank on ws name, for a more strict check. That would you could return the empty string if absent, so you avoid NPE on potential operations on this ws name variable.

Comment on lines +570 to +577
String studioConfigJson = "";
if (optimization.studioConfig() != null) {
try {
studioConfigJson = JsonUtils.writeValueAsString(optimization.studioConfig());
} catch (Exception e) {
log.error("Failed to serialize studio_config for optimization: '{}'", optimization.id(), e);
}
}
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The serialization error is logged but the empty string is still used, which could cause data integrity issues. If serialization fails, the method should either throw an exception to prevent creating an invalid optimization record, or handle the failure more gracefully. Silently continuing with an empty string may lead to inconsistent state where an optimization is created but its studio config is lost.

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I agree with this comment.

As you're filtering non-null, that should be enough here. If converting the studioConfig to JSON fails, knowing the request object has all validations, this should simply trigger a 500 exception.

If I was going to catch, I'd go with UncheckedIOException only, but better I'd remove the try-catch all together here.

Comment on lines +612 to +621
OptimizationStudioConfig studioConfig = null;
String studioConfigJson = row.get("studio_config", String.class);
if (studioConfigJson != null && !studioConfigJson.isBlank()) {
try {
studioConfig = JsonUtils.readValue(studioConfigJson, OptimizationStudioConfig.class);
} catch (Exception e) {
log.error("Failed to deserialize studio_config for optimization: '{}'",
row.get("id", UUID.class), e);
}
}
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the serialization issue, deserialization errors are logged but the null value is silently used. This could mask database corruption or schema mismatches. Consider whether this should throw an exception or at minimum, be more explicit about returning a partial/invalid optimization object.

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Create an aux method such as getJsonNodeOrDefault, but with the appropriate JsonUtils method. You can perform all null and blank changes as needed, but no need to catch. It should be a 500.

@ldaugusto ldaugusto changed the title [OPIK-2768] [BE] Add Optimization Studio API endpoints and validation [OPIK-2768] [WIP] [BE] Add Optimization Studio API endpoints and validation Nov 12, 2025
@github-actions
Copy link
Contributor

Backend Tests Results

5 443 tests   5 436 ✅  50m 19s ⏱️
  287 suites      7 💤
  287 files        0 ❌

Results for commit 6d1c039.

@andrescrz andrescrz marked this pull request as draft November 13, 2025 09:32
@andrescrz andrescrz marked this pull request as ready for review November 17, 2025 09:53
Copy link
Member

@andrescrz andrescrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I left many comments, this implementation looks very clear and well done!

Let's focus on the important stuff. The comments that need to be addressed are the ones related to the new migration in ClickHouse, as they can't be changed after pushing:

  1. The Nullable string field.
  2. Adding the rollback for this migration.
  3. Updating the migration creator name.

Other important comments, that should probably be addressed in a follow up PR:

  1. Handling unknown fields in the request objects.
  2. Limit size of lists in the request objects.
  3. The path for the logs endpoint.
  4. Allowing 500 to bubble up.
  5. Not scrubbing fields.
  6. Potential missing workspace bug.
  7. Adding bounded elastic for the API call (pre-sign).

The rest if minor and optional.

* This represents the full payload sent from the frontend to create a Studio optimization.
*/
@Builder(toBuilder = true)
@JsonNaming(PropertyNamingStrategies.SnakeCaseStrategy.class)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add @JsonIgnoreProperties(ignoreUnknown = true) to all these request objects.

It will protect the endpoint from callers sending unknown data and also make it more endurable to future changes.

@Builder(toBuilder = true)
@JsonNaming(PropertyNamingStrategies.SnakeCaseStrategy.class)
public record StudioPrompt(
@NotEmpty @Valid List<StudioMessage> messages) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to limit the amount of messages to ingest here. I don't have all the context here, but typically we go up to a max of 1K.

For this particular case, a smaller limit might be more than enough.

@Builder(toBuilder = true)
@JsonNaming(PropertyNamingStrategies.SnakeCaseStrategy.class)
public record StudioEvaluation(
@NotEmpty @Valid List<StudioMetric> metrics) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same about limiting the list.

In addition, I recommend adding inner validation to the objects List<@NotNull @Valid StudioMetric> metrics.

Same for the other.


@Builder(toBuilder = true)
@JsonNaming(PropertyNamingStrategies.SnakeCaseStrategy.class)
@JsonInclude(JsonInclude.Include.NON_NULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we configured our JSON loggers to add this setting globally, so no need to have it here again. It's redundant, you can clean it up.

@JsonInclude(JsonInclude.Include.NON_NULL)
public record OptimizationStudioLog(
String url,
Instant lastModified,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: our general convention for this type of fields is lastUpdatedAt.

Comment on lines +22 to +24
@NonNull UUID optimizationId,
@NonNull String workspaceName,
@NonNull OptimizationStudioConfig config,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: for JSON objects, better use NotNull annotation.

Comment on lines +86 to +89
@Override
public long getPresignedUrlExpirationSeconds() {
return s3Config.getPreSignUrlTimeoutSec();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: this method just returns a field from an injected configuration service. It's an indirect path of returning it. There are better options:

  1. Just inject S3Config into OptimizationService. It's just a config object, not leaking anything from a different scope.
  2. Move the majority of the logic of generateStudioLogsResponse to this service and just delegate the call from OptimizationService.

@@ -0,0 +1,7 @@
--liquibase formatted sql
--changeset admin:000046_add_studio_config_to_optimizations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: go with your name instead of admin for migrations, as the general convention.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add rollback to drop the column if exists.

--comment: Add studio_config column to optimizations table for Optimization Studio feature

ALTER TABLE ${ANALYTICS_DB_DATABASE_NAME}.optimizations ON CLUSTER '{cluster}'
ADD COLUMN IF NOT EXISTS studio_config Nullable(String);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important: ClickHouse performs worse with nullable fields. Better define it with String and explicitly default to the empty string ''. (it would implicitly default anyway). That's our general convention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants