-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[OPIK-2768] [WIP] [BE] Add Optimization Studio API endpoints and validation #4050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add studio_config column to optimizations table for Studio configuration - Integrate Studio endpoints into existing OptimizationsResource (GET with flags, POST, logs endpoint) - Add OptimizationStudioConfig with nested records for type-safe configuration - Implement Redis RQ job enqueueing for Studio optimizations - Add error handling: cancel optimization if Redis enqueue fails - Add presigned S3 URL support for optimization logs - Add comprehensive integration tests for Studio optimization flow - Add OptimizationStudioJobMessage for type-safe Redis job payloads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements the complete backend infrastructure for the Optimization Studio feature, enabling creation, retrieval, and management of Studio optimization runs through REST API endpoints. The implementation includes comprehensive model validation, Redis RQ job enqueueing for Python backend communication, S3 presigned URL generation for log downloads, and automatic optimization cancellation on job enqueueing failures.
Key changes:
- Added
OptimizationStudioConfigmodel with nested validation for prompt configuration, LLM models, evaluation metrics, and optimizer settings - Extended
/v1/private/optimizationsendpoints with Studio-specific functionality including logs retrieval and optional config inclusion - Implemented Redis RQ job enqueueing to
OPTIMIZER_CLOUDqueue with automatic cancellation on failure
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| OptimizationsResourceTest.java | Added comprehensive integration tests for Studio optimization creation, Redis job enqueueing verification, and filtering |
| OptimizationResourceClient.java | Extended test client with methods for Studio config inclusion and logs retrieval endpoints |
| 000046_add_studio_config_to_optimizations.sql | Added database migration for studio_config JSON column in optimizations table |
| PreSignerService.java | Added getter method for presigned URL expiration timeout configuration |
| OptimizationStudioJobMessage.java | Created message model for Redis RQ job communication with Python backend |
| OptimizationService.java | Implemented Studio optimization job enqueueing, config scrubbing logic, and logs URL generation |
| OptimizationSearchCriteria.java | Added studioOnly filter parameter for optimization searches |
| OptimizationDAO.java | Implemented JSON serialization/deserialization for studio configurations with database operations |
| OptimizationsResource.java | Added REST endpoints for Studio logs and optional config inclusion parameter |
| OptimizationStudioLog.java | Created response model for presigned S3 log URLs |
| OptimizationStudioConfig.java | Defined comprehensive validation model for Studio optimization configurations |
| Optimization.java | Added studioConfig field with @Valid annotation to optimization model |
|
|
||
| // Should only return Studio optimizations | ||
| assertThat(page.content()).isNotEmpty(); | ||
| assertThat(page.content()).allMatch(opt -> opt.studioConfig() != null); |
Copilot
AI
Nov 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion is incorrect. The test verifies studioOnly=true filtering, but earlier in the code (line 748) we see that studioConfig is scrubbed by default. According to line 116-120 in OptimizationService.java, studioConfig is only preserved when studioOnly=true. However, the find operation returns a page without explicitly requesting to include studio configs, so this assertion will fail because studioConfig will be null even for Studio optimizations unless explicitly requested. The test should either verify a different property or request inclusion of studio configs.
| if (workspaceName == null) { | ||
| log.error( | ||
| "Cannot enqueue Studio optimization job for id: '{}' - workspaceName is null, marking as CANCELLED", | ||
| optimization.id()); | ||
| cancelOptimization(optimization.id(), workspaceId); | ||
| return; | ||
| } |
Copilot
AI
Nov 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The null check for workspaceName is checking the wrong thing. Looking at line 147, workspaceName is extracted with ctx.get(RequestContext.WORKSPACE_NAME) which will throw a NoSuchElementException if the key doesn't exist, rather than returning null. The check should use ctx.getOrDefault(RequestContext.WORKSPACE_NAME, null) at line 147 to make this null check meaningful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please double check this, it might be right, you this code will never handle the situation unless you address the comment.
Feel free to also use StringUtils.isBlank on ws name, for a more strict check. That would you could return the empty string if absent, so you avoid NPE on potential operations on this ws name variable.
| String studioConfigJson = ""; | ||
| if (optimization.studioConfig() != null) { | ||
| try { | ||
| studioConfigJson = JsonUtils.writeValueAsString(optimization.studioConfig()); | ||
| } catch (Exception e) { | ||
| log.error("Failed to serialize studio_config for optimization: '{}'", optimization.id(), e); | ||
| } | ||
| } |
Copilot
AI
Nov 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The serialization error is logged but the empty string is still used, which could cause data integrity issues. If serialization fails, the method should either throw an exception to prevent creating an invalid optimization record, or handle the failure more gracefully. Silently continuing with an empty string may lead to inconsistent state where an optimization is created but its studio config is lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I agree with this comment.
As you're filtering non-null, that should be enough here. If converting the studioConfig to JSON fails, knowing the request object has all validations, this should simply trigger a 500 exception.
If I was going to catch, I'd go with UncheckedIOException only, but better I'd remove the try-catch all together here.
| OptimizationStudioConfig studioConfig = null; | ||
| String studioConfigJson = row.get("studio_config", String.class); | ||
| if (studioConfigJson != null && !studioConfigJson.isBlank()) { | ||
| try { | ||
| studioConfig = JsonUtils.readValue(studioConfigJson, OptimizationStudioConfig.class); | ||
| } catch (Exception e) { | ||
| log.error("Failed to deserialize studio_config for optimization: '{}'", | ||
| row.get("id", UUID.class), e); | ||
| } | ||
| } |
Copilot
AI
Nov 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the serialization issue, deserialization errors are logged but the null value is silently used. This could mask database corruption or schema mismatches. Consider whether this should throw an exception or at minimum, be more explicit about returning a partial/invalid optimization object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Create an aux method such as getJsonNodeOrDefault, but with the appropriate JsonUtils method. You can perform all null and blank changes as needed, but no need to catch. It should be a 500.
Backend Tests Results5 443 tests 5 436 ✅ 50m 19s ⏱️ Results for commit 6d1c039. |
andrescrz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I left many comments, this implementation looks very clear and well done!
Let's focus on the important stuff. The comments that need to be addressed are the ones related to the new migration in ClickHouse, as they can't be changed after pushing:
- The Nullable string field.
- Adding the rollback for this migration.
- Updating the migration creator name.
Other important comments, that should probably be addressed in a follow up PR:
- Handling unknown fields in the request objects.
- Limit size of lists in the request objects.
- The path for the logs endpoint.
- Allowing 500 to bubble up.
- Not scrubbing fields.
- Potential missing workspace bug.
- Adding bounded elastic for the API call (pre-sign).
The rest if minor and optional.
| * This represents the full payload sent from the frontend to create a Studio optimization. | ||
| */ | ||
| @Builder(toBuilder = true) | ||
| @JsonNaming(PropertyNamingStrategies.SnakeCaseStrategy.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add @JsonIgnoreProperties(ignoreUnknown = true) to all these request objects.
It will protect the endpoint from callers sending unknown data and also make it more endurable to future changes.
| @Builder(toBuilder = true) | ||
| @JsonNaming(PropertyNamingStrategies.SnakeCaseStrategy.class) | ||
| public record StudioPrompt( | ||
| @NotEmpty @Valid List<StudioMessage> messages) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want to limit the amount of messages to ingest here. I don't have all the context here, but typically we go up to a max of 1K.
For this particular case, a smaller limit might be more than enough.
| @Builder(toBuilder = true) | ||
| @JsonNaming(PropertyNamingStrategies.SnakeCaseStrategy.class) | ||
| public record StudioEvaluation( | ||
| @NotEmpty @Valid List<StudioMetric> metrics) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same about limiting the list.
In addition, I recommend adding inner validation to the objects List<@NotNull @Valid StudioMetric> metrics.
Same for the other.
|
|
||
| @Builder(toBuilder = true) | ||
| @JsonNaming(PropertyNamingStrategies.SnakeCaseStrategy.class) | ||
| @JsonInclude(JsonInclude.Include.NON_NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we configured our JSON loggers to add this setting globally, so no need to have it here again. It's redundant, you can clean it up.
| @JsonInclude(JsonInclude.Include.NON_NULL) | ||
| public record OptimizationStudioLog( | ||
| String url, | ||
| Instant lastModified, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: our general convention for this type of fields is lastUpdatedAt.
| @NonNull UUID optimizationId, | ||
| @NonNull String workspaceName, | ||
| @NonNull OptimizationStudioConfig config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: for JSON objects, better use NotNull annotation.
| @Override | ||
| public long getPresignedUrlExpirationSeconds() { | ||
| return s3Config.getPreSignUrlTimeoutSec(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: this method just returns a field from an injected configuration service. It's an indirect path of returning it. There are better options:
- Just inject
S3ConfigintoOptimizationService. It's just a config object, not leaking anything from a different scope. - Move the majority of the logic of
generateStudioLogsResponseto this service and just delegate the call fromOptimizationService.
| @@ -0,0 +1,7 @@ | |||
| --liquibase formatted sql | |||
| --changeset admin:000046_add_studio_config_to_optimizations | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: go with your name instead of admin for migrations, as the general convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add rollback to drop the column if exists.
| --comment: Add studio_config column to optimizations table for Optimization Studio feature | ||
|
|
||
| ALTER TABLE ${ANALYTICS_DB_DATABASE_NAME}.optimizations ON CLUSTER '{cluster}' | ||
| ADD COLUMN IF NOT EXISTS studio_config Nullable(String); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is important: ClickHouse performs worse with nullable fields. Better define it with String and explicitly default to the empty string ''. (it would implicitly default anyway). That's our general convention.
Details
This PR adds the complete backend infrastructure for the Optimization Studio feature, enabling creation, retrieval, and management of Studio optimization runs through REST API endpoints.
Backend API Changes
OptimizationStudioConfigmodel with comprehensive nested validation for:OptimizationStudioLogmodel for presigned S3 log URL responses/v1/private/optimizationsendpoints with:GET /studio/{id}/logs- Retrieve presigned S3 URLs for optimization logsinclude_studio_configquery parameter on GET/{id}- Optionally include full studio configstudio_onlyquery parameter on GET/- Filter for Studio optimizations onlyService Layer Implementation
OPTIMIZER_CLOUDqueueOptimizationStudioJobMessagefor Python backend communicationworkspaceNameinstead ofworkspaceIdfor Python SDK compatibilityData Layer Updates
studio_configJSON column via Liquibase migrationstudio_onlyfilterValidation & Error Handling
@Validannotation onstudioConfigfield for nested validationChange checklist
Issues
Testing
include_studio_configquery parameter behaviorstudio_onlyfiltering functionalityfindOptimizations__withStudioOnlyFlagby using unique workspaceDocumentation
N/A - Backend API implementation, frontend documentation will follow