Skip to content

Support External Postgres DB #276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 82 commits into from
Jul 23, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
d9498ef
ai generated tests for the evaluators functions
jkwatson Jul 15, 2025
9677012
don't try to look up node ids in empty vector stores
jkwatson Jul 16, 2025
49bbdb1
move suggested questions under the sessions route
jkwatson Jul 16, 2025
19f0f43
fix things up for postgres db access
jkwatson Jul 16, 2025
b0c6f3c
formatting
jkwatson Jul 16, 2025
50c1f3f
fixes for not being able to create new dbs
jkwatson Jul 16, 2025
b343124
only set the DB_URL if it isn't already set
jkwatson Jul 17, 2025
77e14e1
fix the install directory
jkwatson Jul 17, 2025
0d83634
change location of .nvm and source bash from install_node
ewilliams-cloudera Jul 17, 2025
cd202de
Update release version to dev-testing
actions-user Jul 17, 2025
3b91002
removed unused import
ewilliams-cloudera Jul 17, 2025
159b37d
add logging for initializing the JDBI instance
jkwatson Jul 17, 2025
1150fbc
Update release version to dev-testing
actions-user Jul 17, 2025
51b6337
wip on ui for metadata
ewilliams-cloudera Jul 17, 2025
af8d53f
wip
jkwatson Jul 17, 2025
3ebf3c5
update FE types to match python land
jkwatson Jul 17, 2025
d10843b
fix margin bottom consistency
ewilliams-cloudera Jul 17, 2025
c3f2e2e
Update release version to dev-testing
actions-user Jul 17, 2025
497c80b
set the username/password for the database if set from env
jkwatson Jul 17, 2025
0c9b0a2
drop databases
mliu-cloudera Jul 17, 2025
c3de72a
Update release version to dev-testing
actions-user Jul 17, 2025
3f7ac6e
limit number of retries
ewilliams-cloudera Jul 17, 2025
7d4ebfd
Update release version to dev-testing
actions-user Jul 17, 2025
168e79d
bumped bedrock converse and fixed a bug in tool calling check
baasitsharief Jul 17, 2025
b7f61af
remove unused
ewilliams-cloudera Jul 17, 2025
1239b59
Update release version to dev-testing
actions-user Jul 17, 2025
40b80c7
minor error handling improvement
ewilliams-cloudera Jul 17, 2025
0420092
fixed bug with Empty Response with no documents in data source and to…
baasitsharief Jul 17, 2025
4b27d11
fix mypy issues
baasitsharief Jul 17, 2025
9fe4cd2
add a main method to test if a db connection string is valid
jkwatson Jul 18, 2025
b1a5b48
Update release version to dev-testing
actions-user Jul 18, 2025
80c55b6
add python endpoint to test a jdbc connection string
jkwatson Jul 18, 2025
29f635a
export the install dir so it can be used by the fastapi process
jkwatson Jul 18, 2025
7f96771
make sure to use the right java
jkwatson Jul 18, 2025
336fc40
pass in the db type so we can do a bare server connection
jkwatson Jul 18, 2025
eb50cea
Update release version to dev-testing
actions-user Jul 18, 2025
41da0cf
better error handling for api proxy
ewilliams-cloudera Jul 18, 2025
019e04f
pass through error on non-502s, use 502 error instaed of 503, dont re…
ewilliams-cloudera Jul 18, 2025
809020b
Update release version to dev-testing
actions-user Jul 18, 2025
4d0b691
more config details
ewilliams-cloudera Jul 18, 2025
411e398
wip settings page for external metadata db
baasitsharief Jul 18, 2025
2069e16
fix
baasitsharief Jul 18, 2025
8b7bf34
drop databases
mliu-cloudera Jul 18, 2025
ffc56c7
wip on formatting warnings
jkwatson Jul 18, 2025
3edbb91
wip
ewilliams-cloudera Jul 18, 2025
fc809f7
wip test connection
baasitsharief Jul 18, 2025
4aefce8
update form items
ewilliams-cloudera Jul 18, 2025
a3e832c
fix connection test
ewilliams-cloudera Jul 20, 2025
2d2b39b
use formValues
ewilliams-cloudera Jul 20, 2025
c0c1ab4
refactor: make username and password required for JDBC connection
baasitsharief Jul 21, 2025
56199c7
improve handling for testing connection
ewilliams-cloudera Jul 22, 2025
8875d10
conditionally render test button
ewilliams-cloudera Jul 22, 2025
d8ec481
fix mypy issues
ewilliams-cloudera Jul 22, 2025
8487d7b
disable test button if no password or username
ewilliams-cloudera Jul 22, 2025
d0ea0cf
Update ui/src/pages/Settings/MetadataDBFields.tsx
ewilliams-cloudera Jul 22, 2025
5be6d10
Update ui/src/pages/Settings/MetadataDBFields.tsx
ewilliams-cloudera Jul 22, 2025
24880bd
Update release version to dev-testing
actions-user Jul 22, 2025
b95de5d
handle clearing values for external db when switching to h2
ewilliams-cloudera Jul 22, 2025
fdc3df1
clear field values in ui when using h2
ewilliams-cloudera Jul 22, 2025
d718e97
Update release version to dev-testing
actions-user Jul 22, 2025
43c4b8d
refactor environment variable handling for H2 database configuration
ewilliams-cloudera Jul 22, 2025
39ad030
Update release version to dev-testing
actions-user Jul 22, 2025
5953591
refactor: update H2 database URL to use absolute path
baasitsharief Jul 22, 2025
5dc9c57
refactor: change metadata_db_provider comparison to string literal fo…
baasitsharief Jul 22, 2025
9e008a0
refactor: fix comparison operator for metadata_db_provider in H2 check
baasitsharief Jul 22, 2025
7bb9391
Update release version to dev-testing
actions-user Jul 22, 2025
e91cef1
refactor: remove DB_URL, DB_USERNAME, and DB_PASSWORD from environmen…
baasitsharief Jul 22, 2025
016f5df
Update release version to dev-testing
actions-user Jul 22, 2025
2e71996
refactor: update config_to_env to use Optional for environment variab…
baasitsharief Jul 22, 2025
cb8c84d
refactor: change config_to_env to return non-optional environment var…
baasitsharief Jul 22, 2025
b7277e4
Update release version to dev-testing
actions-user Jul 22, 2025
c9088a1
refactor: update DB_URL retrieval to use a fallback value for H2 conf…
baasitsharief Jul 22, 2025
e422d60
refactor: streamline JDBC configuration for H2 by using a default DB_…
baasitsharief Jul 22, 2025
6fc709b
refactor: improve validation message handling and remove messageQueue…
ewilliams-cloudera Jul 22, 2025
a65a696
Update release version to dev-testing
actions-user Jul 22, 2025
5c1a97b
refactor: enhance input validation for JDBC URL, username, and passwo…
baasitsharief Jul 22, 2025
3d32576
Vite dev changes to maybe address import error in dev, switch to usin…
ewilliams-cloudera Jul 23, 2025
6f52f0c
Update release version to dev-testing
actions-user Jul 23, 2025
4057375
test config change
ewilliams-cloudera Jul 23, 2025
cb83f97
Update release version to dev-testing
actions-user Jul 23, 2025
8877267
title change
ewilliams-cloudera Jul 23, 2025
0ccfb93
remove restriction on username
ewilliams-cloudera Jul 23, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ private static Jdbi createJdbi() {
if (jdbi == null) {
synchronized (LOCK) {
if (jdbi == null) {
log.info("Initializing new Jdbi instance");
jdbi = Jdbi.create(createDataSource());
}
}
Expand All @@ -92,10 +93,19 @@ private static Migrator migrator(DataSource dataSource, RdbConfig dbConfig) {
private static DatabaseConfig createDatabaseConfig() {
String dbUrl = System.getenv().getOrDefault("DB_URL", "jdbc:h2:mem:rag");
String rdbType = System.getenv().getOrDefault("DB_TYPE", RdbConfig.H2_DB_TYPE);
String password = System.getenv().get("DB_PASSWORD");
String username = System.getenv().get("DB_USERNAME");
RdbConfig rdbConfiguration =
RdbConfig.builder().rdbUrl(dbUrl).rdbType(rdbType).rdbDatabaseName("rag").build();
RdbConfig.builder()
.rdbUrl(dbUrl)
.rdbType(rdbType)
.rdbDatabaseName("rag")
.rdbUsername(username)
.rdbPassword(password)
.build();
if (rdbConfiguration.isPostgres()) {
rdbConfiguration = rdbConfiguration.toBuilder().rdbUsername("postgres").build();
rdbConfiguration =
rdbConfiguration.toBuilder().rdbUsername("postgres").rdbDatabaseName(null).build();
}
return DatabaseConfig.builder().RdbConfiguration(rdbConfiguration).build();
}
Expand Down
44 changes: 43 additions & 1 deletion backend/src/main/java/com/cloudera/cai/util/db/JdbiUtils.java
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
/*******************************************************************************
/*
* CLOUDERA APPLIED MACHINE LEARNING PROTOTYPE (AMP)
* (C) Cloudera, Inc. 2024
* All rights reserved.
Expand Down Expand Up @@ -97,4 +97,46 @@ public static void createDBIfNotExists(RdbConfig rdbConfig) throws SQLException
}
}
}

/**
* A utility class to test database connectivity using JDBC.
*
* <p>Run this with: java -cp prebuilt_artifacts/rag-api.jar
* -Dloader.main=com.cloudera.cai.util.db.JdbiUtils
* org.springframework.boot.loader.launch.PropertiesLauncher <jdbc_url> <username> <password>
*
* <p>An exit code of 0 indicates success, 1 indicates failure, and 2 indicates incorrect usage.
*/
public static void main(String[] args) {
if (args.length != 4) {
System.err.println("Usage: JdbiUtils <db_url> <username> <password> <db_type>");
System.exit(2); // Incorrect usage
}
String dbUrl = args[0];
String username = args[1];
String password = args[2];
String dbType = args[3];
RdbConfig rdbConfiguration =
RdbConfig.builder()
.rdbUrl(dbUrl)
.rdbType(dbType)
.rdbDatabaseName("rag")
.rdbUsername(username)
.rdbPassword(password)
.build();
var connectionString = RdbConfig.buildDatabaseServerConnectionString(rdbConfiguration);
try (Connection connection =
DriverManager.getConnection(connectionString, username, password)) {
if (connection != null && !connection.isClosed()) {
System.out.println("Connection successful.");
System.exit(0); // Success
} else {
System.err.println("Connection failed: Connection is null or closed.");
System.exit(1); // Failure
}
} catch (Exception e) {
System.err.println("Connection failed: " + e.getMessage());
System.exit(1); // Failure
}
}
}
35 changes: 24 additions & 11 deletions backend/src/main/java/com/cloudera/cai/util/db/RdbConfig.java
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ public static String buildDatabaseConnectionString(RdbConfig rdb) {
return adjustMsSqlRdbUrl(rdb.rdbUrl) + ";databaseName=" + rdb.getRdbDatabaseName();
}
if (rdb.isPostgres()) {
return rdb.rdbUrl + "/" + rdb.getRdbDatabaseName();
return rdb.rdbUrl;
}

final var url =
Expand Down Expand Up @@ -153,7 +153,15 @@ public static String buildDatabaseServerConnectionString(RdbConfig rdb) {
}

if (rdb.isPostgres()) {
Copy link
Preview

Copilot AI Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern for PostgreSQL URL parsing is complex and could benefit from a comment explaining its structure and what each group captures for future maintainability.

Suggested change
if (rdb.isPostgres()) {
if (rdb.isPostgres()) {
// Regex pattern to parse PostgreSQL URLs:
// ^jdbc:postgresql:(//[^/]+/)?(\\w+)(.*)
// - (//[^/]+/)?: Matches the optional host and port part of the URL (e.g., //localhost:5432/).
// - (\\w+): Matches the database name (e.g., mydatabase).
// - (.*): Matches any additional parameters or options in the URL (e.g., ?ssl=true).

Copilot uses AI. Check for mistakes.

return rdb.rdbUrl + "/" + rdb.getRdbDatabaseName();
var pattern =
Pattern.compile("^jdbc:postgresql:(//[^/]+/)?(\\w+)(.*)", Pattern.CASE_INSENSITIVE);
var matcher = pattern.matcher(rdb.rdbUrl);
if (!matcher.matches()) {
throw new IllegalStateException("URL doesn't match the expected regex");
}
var firstPart = matcher.group(1);
var lastPart = matcher.group(3);
return "jdbc:postgresql:" + firstPart + rdb.getRdbDatabaseName() + lastPart;
}
final var url =
rdb.rdbUrl
Expand Down Expand Up @@ -181,7 +189,11 @@ private static String adjustMsSqlRdbUrl(String rdbUrl) {
public static String buildDatabaseName(RdbConfig rdb) {
String dbName = rdb.getRdbDatabaseName();
if (rdb.getDbConnectionUrl() != null) {
dbName = getDBNameFromDBConnectionURL(rdb);
dbName = getDBNameFromDBConnectionURL(rdb, rdb.dbConnectionUrl);
}

if (dbName == null) {
dbName = getDBNameFromDBConnectionURL(rdb, rdb.rdbUrl);
}

if (dbName.contains("-")) {
Expand All @@ -195,25 +207,26 @@ public static String buildDatabaseName(RdbConfig rdb) {
return dbName;
}

private static String getDBNameFromDBConnectionURL(RdbConfig rdb) {
private static String getDBNameFromDBConnectionURL(RdbConfig rdb, String url) {
String regex;
if (rdb.isMssql()) {
// Regex reference: https://regex101.com/r/yaU0DY/1
regex = ";databaseName=([^;]*)";
} else {
} else if (rdb.isMysql()) {
regex = "^jdbc:mysql:(?://[^/]+/)?(\\w+)";
} else if (rdb.isPostgres()) {
regex = "^jdbc:postgresql:(?://[^/]+/)?(\\w+)";
} else {
throw new IllegalStateException(
"database url parsing not supported for db type: " + rdb.rdbType);
}
Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
var dbName =
pattern
.matcher(rdb.dbConnectionUrl)
.results()
.map(mr -> mr.group(1))
.collect(Collectors.joining());
pattern.matcher(url).results().map(mr -> mr.group(1)).collect(Collectors.joining());

if (dbName.isEmpty()) {
throw new InvalidDbConfigException(
rdb.dbConnectionUrl, "Database name not found in the database connection URL");
url, "Database name not found in the database connection URL");
}
return dbName;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,6 @@

BEGIN;

Copy link
Preview

Copilot AI Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SQL syntax change from INTEGER DEFAULT 10 to SET DEFAULT 10 is correct for PostgreSQL ALTER COLUMN operations, but the migration should include proper validation to ensure the column type is already INTEGER before setting the default.

Suggested change
-- Validate that the column type is INTEGER
DO $$
BEGIN
IF NOT EXISTS (
SELECT 1
FROM information_schema.columns
WHERE table_name = 'rag_data_source'
AND column_name = 'chunk_overlap_percent'
AND data_type = 'integer'
) THEN
RAISE EXCEPTION 'Column "chunk_overlap_percent" must be of type INTEGER to set a default value.';
END IF;
END $$;
-- Set the default value for the column

Copilot uses AI. Check for mistakes.

ALTER TABLE rag_data_source ALTER COLUMN chunk_overlap_percent INTEGER DEFAULT 10;
ALTER TABLE rag_data_source ALTER COLUMN chunk_overlap_percent SET DEFAULT 10;

COMMIT;
60 changes: 60 additions & 0 deletions backend/src/test/java/com/cloudera/cai/util/db/RdbConfigTest.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
/*
* CLOUDERA APPLIED MACHINE LEARNING PROTOTYPE (AMP)
* (C) Cloudera, Inc. 2025
* All rights reserved.
*
* Applicable Open Source License: Apache 2.0
*
* NOTE: Cloudera open source products are modular software products
* made up of hundreds of individual components, each of which was
* individually copyrighted. Each Cloudera open source product is a
* collective work under U.S. Copyright Law. Your license to use the
* collective work is as provided in your written agreement with
* Cloudera. Used apart from the collective work, this file is
* licensed for your use pursuant to the open source license
* identified above.
*
* This code is provided to you pursuant a written agreement with
* (i) Cloudera, Inc. or (ii) a third-party authorized to distribute
* this code. If you do not have a written agreement with Cloudera nor
* with an authorized and properly licensed third party, you do not
* have any rights to access nor to use this code.
*
* Absent a written agreement with Cloudera, Inc. ("Cloudera") to the
* contrary, A) CLOUDERA PROVIDES THIS CODE TO YOU WITHOUT WARRANTIES OF ANY
* KIND; (B) CLOUDERA DISCLAIMS ANY AND ALL EXPRESS AND IMPLIED
* WARRANTIES WITH RESPECT TO THIS CODE, INCLUDING BUT NOT LIMITED TO
* IMPLIED WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY AND
* FITNESS FOR A PARTICULAR PURPOSE; (C) CLOUDERA IS NOT LIABLE TO YOU,
* AND WILL NOT DEFEND, INDEMNIFY, NOR HOLD YOU HARMLESS FOR ANY CLAIMS
* ARISING FROM OR RELATED TO THE CODE; AND (D)WITH RESPECT TO YOUR EXERCISE
* OF ANY RIGHTS GRANTED TO YOU FOR THE CODE, CLOUDERA IS NOT LIABLE FOR ANY
* DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, PUNITIVE OR
* CONSEQUENTIAL DAMAGES INCLUDING, BUT NOT LIMITED TO, DAMAGES
* RELATED TO LOST REVENUE, LOST PROFITS, LOSS OF INCOME, LOSS OF
* BUSINESS ADVANTAGE OR UNAVAILABILITY, OR LOSS OR CORRUPTION OF
* DATA.
*/

package com.cloudera.cai.util.db;

import static org.assertj.core.api.Assertions.assertThat;
import static org.junit.jupiter.api.Assertions.*;

import org.junit.jupiter.api.Test;
class RdbConfigTest {

@Test
void buildDatabaseConnectionString() {

var url =
"jdbc:postgresql://rag-dev-testing.cluster.us-west-2.rds.amazonaws.com:5432/rag?username=foo&password=bar";

var rdb =
RdbConfig.builder().rdbUrl(url).rdbDatabaseName("postgres").rdbType("PostgreSQL").build();
var result = RdbConfig.buildDatabaseServerConnectionString(rdb);
assertThat(result)
.isEqualTo(
"jdbc:postgresql://rag-dev-testing.cluster.us-west-2.rds.amazonaws.com:5432/postgres?username=foo&password=bar");
}
}
2 changes: 1 addition & 1 deletion docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ services:
- "9464:9464"
environment:
- API_HOST=0.0.0.0
- DB_URL=jdbc:postgresql://db:5432
- DB_URL=jdbc:postgresql://db:5432/rag
- DB_TYPE=PostgreSQL
- OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4318
- OTEL_METRICS_EXPORTER=none # we configure this by hand
Expand Down
1 change: 1 addition & 0 deletions llm-service/app/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@
SummaryStorageProviderType = Literal["Local", "S3"]
ChatStoreProviderType = Literal["Local", "S3"]
VectorDbProviderType = Literal["QDRANT", "OPENSEARCH"]
MetadataDbProviderType = Literal["H2", "PostgreSQL"]


class _Settings:
Expand Down
3 changes: 1 addition & 2 deletions llm-service/app/routers/index/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,16 +46,15 @@
from . import amp_metadata
from . import models
from . import metrics
from . import chat

logger = logging.getLogger(__name__)


router = APIRouter()
router.include_router(chat.router)
router.include_router(summaries.router)
router.include_router(data_source.router)
router.include_router(sessions.router)
router.include_router(sessions.no_id_router)
router.include_router(amp_metadata.router)
# include this for legacy UI calls
router.include_router(amp_metadata.router, prefix="/index", deprecated=True)
Expand Down
21 changes: 21 additions & 0 deletions llm-service/app/routers/index/amp_metadata/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@
from fastapi.params import Header

from .... import exceptions
from ....config import MetadataDbProviderType
from ....services.amp_metadata import (
ProjectConfig,
ProjectConfigPlus,
Expand All @@ -54,6 +55,8 @@
update_project_environment,
get_project_environment,
get_application_config,
validate_jdbc,
ValidationResult,
)
from ....services.amp_update import does_amp_need_updating
from ....services.models.providers import CAIIModelProvider
Expand Down Expand Up @@ -198,6 +201,24 @@ def save_auth_token(auth_token: Annotated[str, Body(embed=True)]) -> str:
return "Auth token saved successfully"


@router.post(
"/validate-jdbc-connection",
summary="Validates a JDBC connection string, username, and password.",
)
@exceptions.propagates
def validate_jdbc_connection(
db_url: Annotated[str, Body(embed=True)],
username: Annotated[str, Body(embed=True)],
password: Annotated[str, Body(embed=True)],
db_type: Annotated[MetadataDbProviderType, Body(embed=True)],
) -> ValidationResult:
"""
Calls the JdbiUtils main method to validate JDBC connection parameters.
Returns a dict with 'valid': True/False and 'message'.
"""
return validate_jdbc(db_type, db_url, password, username)


def save_cdp_token(auth_token: str) -> None:
token_data = {"access_token": auth_token}
with open("cdp_token", "w") as file:
Expand Down
69 changes: 0 additions & 69 deletions llm-service/app/routers/index/chat/__init__.py

This file was deleted.

4 changes: 3 additions & 1 deletion llm-service/app/routers/index/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,9 @@ def get_reranking_models() -> List[ModelResponse]:
return models.Reranking.list_available()


@router.get("/model_source", summary="Model source enabled - Bedrock, CAII, or Azure")
@router.get(
"/model_source", summary="Model source enabled - Bedrock, CAII, OpenAI or Azure"
)
@exceptions.propagates
def get_model() -> models.ModelSource:
return models.get_model_source()
Expand Down
Loading
Loading