Clickhouse only db #11704

alisman · 2025-09-11T16:05:30Z

No description provided.

…1673) Replace foreach loops generating multiple prepared statement parameters with single array parameter using ArrayTypeHandler. This significantly improves performance with ClickHouse JDBC connections by reducing parameter overhead. - Use CONCAT(study_id, ':', patient_id) with ArrayTypeHandler - Add SqlUtils.combineStudyAndPatientIds() utility method - Apply optimization to both patient and sample lookup queries - Maintain security through proper parameter binding 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <[email protected]>

Co-authored-by: Bryan Lai <[email protected]>

…C performance (#11703) Apply ArrayTypeHandler optimization strategy to whereSample include, following the same approach used in PatientMapper (commit 2e2ec22). This significantly improves performance with ClickHouse JDBC connections by reducing parameter overhead. Changes: - Replace foreach loops in whereSample with ArrayTypeHandler for both single-study and multi-study queries - Use SqlUtils.combineStudyAndPatientIds() for multi-study scenarios with CONCAT-based unique key matching - Optimize getClinicalAttributeCountsBySampleIds query performance through updated whereSample include This reduces prepared statement parameters from potentially thousands to single array parameters, maintaining security through proper parameter binding. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <[email protected]>

…rmance Apply ArrayTypeHandler optimization strategy across all high-priority MyBatis mappers to dramatically improve ClickHouse JDBC performance by reducing prepared statement parameter overhead. SqlUtils Enhancements: - Add listToArray() utility method to convert List<String> to String[] for ArrayTypeHandler - Extend combineStudyAndPatientIds() usage for multi-study query optimization Optimized MyBatis Mappers (11 files): - ClinicalAttributeMapper.xml - clinical attribute count queries with sample IDs - ClinicalDataMapper.xml - sample and patient clinical data queries - ClinicalEventMapper.xml - clinical events by sample and patient IDs - CopyNumberSegmentMapper.xml - copy number segment queries - DiscreteCopyNumberMapper.xml - discrete copy number queries - MutationMapper.xml - mutation queries with sample/profile pairs - NamespaceMapper.xml - sample ID namespace queries - SampleMapper.xml - sample queries with study/sample and study/patient pairs - StructuralVariantMapper.xml - structural variant queries - TreatmentMapper.xml - treatment sample ID queries Optimization Strategy Applied: - Single-study queries: Use <bind> + ArrayTypeHandler for direct List→Array conversion - Multi-study queries: Use SqlUtils.combineStudyAndPatientIds() with CONCAT matching - Replace foreach loops generating multiple prepared statement parameters - Use proper <bind> elements to avoid MyBatis parameter binding errors Performance Impact: - Reduces prepared statement parameters from potentially thousands to single arrays - Follows same proven optimization pattern from PatientMapper (commit 2e2ec22) - Maintains security through proper parameter binding - Significant improvement for ClickHouse JDBC connections 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

…11720)

Replace individual fetchGenePanelDataByMolecularProfileId calls with a single fetchGenePanelDataByMolecularProfileIds batch call to reduce database round trips and improve performance. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

alisman · 2025-10-29T21:19:58Z

src/main/java/org/cbioportal/legacy/persistence/mybatis/StructuralVariantMyBatisRepository.java

-                        structuralVariantQueries)
-                    .stream())
-        .collect(Collectors.toList());
+    return structuralVariantMapper.fetchStructuralVariants(


this change was made because fetching all profiles at once is much more performant. it is always true that sampleId collection and molecular profile collection will be of equal length

Simplified the method to call mutationMapper.getMutationsInMultipleMolecularProfiles directly instead of iterating through grouped cases and making multiple calls. This reduces database round trips and improves performance. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Simplified the method to call structuralVariantMapper.fetchStructuralVariants directly with all molecular profile IDs instead of iterating through grouped cases and making multiple calls. This reduces database round trips and improves performance. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Co-authored-by: Bryan Lai <[email protected]>

…of profiling status for genes which are covered by the same panels

This commit optimizes the discrete copy number queries in DiscreteCopyNumberMapper by reordering the FROM clause to start with sample_cna_event instead of cna_event, and using subqueries to filter by genetic_profile_id instead of joining the full genetic_profile table. Changes: - Reordered FROM clause to start with sample_cna_event table - Replaced genetic_profile.stable_id joins with subquery lookups - Filters on genetic_profile_id directly in sample_cna_event table - Improves query performance by reducing join complexity This optimization should improve query performance for discrete copy number alteration lookups, especially for large datasets. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

This commit introduces a more efficient studyExists() method for validating study existence without fetching full study objects, and fixes a ClickHouse SQL compatibility issue in the StudyMapper. Changes: 1. Added StudyService.studyExists() method - Fetches only study IDs instead of full objects - Throws StudyNotFoundException if study doesn't exist - More efficient than getStudy() when only validation is needed 2. Replaced getStudy() with studyExists() across codebase - Updated 29 call sites where return value wasn't used - Affected services: Clinical, Sample, Patient, MolecularProfile, etc. 3. Fixed ClickHouse SQL error in StudyMapper.xml - Added cancer_study_identifier to GROUP BY clause - Resolves: "Column is not under aggregate function and not in GROUP BY keys" - Required for ORDER BY cancer_study_identifier to work with ClickHouse 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

* Fix zero value handling in molecular data truncation Add special case handling for zero values in the molecular data value truncation logic to prevent them from being converted incorrectly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Update src/main/resources/org/cbioportal/legacy/persistence/mybatis/MolecularDataMapper.xml Co-authored-by: Onur Sumer <[email protected]> --------- Co-authored-by: Claude <[email protected]> Co-authored-by: Onur Sumer <[email protected]>

* ♻️ Refactor Cancer Study PermissionEvaluator * 🔒 Add Support for CancerStudyMetadata obj * Update field typeOfCancer to cancerType

* Fix sample list query for clickhouse * Change cancer study details query in sample list query * Clean up SampleListMapper.xml by removing unused prefix properties

…nly (#11784)

Fix SQL issues in Mutation Mapper migrated to mutation_derived table. Fix issues with clickhouse.sql so it runs in test context Fix CH problem with getMutationCountByPosition query update pom.xml with new derived table version

…nly-db # Conflicts: # pom.xml # src/main/resources/org/cbioportal/legacy/persistence/mybatis/CosmicCountMapper.xml # src/test/java/org/cbioportal/legacy/persistence/mybatis/CosmicCountMyBatisRepositoryTest.java

…pository test (#11791)

onursumer · 2025-11-03T16:43:26Z

src/main/java/org/cbioportal/domain/mutation/AlleleSpecificCopyNumber.java

Do we need this empty file?

onursumer · 2025-11-03T16:52:44Z

src/main/java/org/cbioportal/legacy/web/ReferenceGenomeGeneController.java

-    List<ReferenceGenomeGene> genes = geneMemoizerService.fetchGenes(genomeName);
-    if (genes == null) {
-      genes = referenceGenomeGeneService.fetchAllReferenceGenomeGenes(genomeName);
-      geneMemoizerService.cacheGenes(genes, genomeName);


This is the only place where we use the gene memoizer service. We should probably remove the interface and the implementation class as well.

onursumer · 2025-11-03T16:58:35Z

src/main/resources/org/cbioportal/legacy/persistence/mybatis/AlterationCountsMapper.xml

            COUNT(*) AS totalCount,
-            COUNT(DISTINCT(CASE_ID)) AS numberOfAlteredCases
+            COUNT(DISTINCT(case_id)) AS numberOfAlteredCases,
+            2 AS QueryNumber


Does this serve any purpose?

onursumer · 2025-11-03T17:13:08Z

src/main/resources/org/cbioportal/legacy/persistence/mybatis/AlterationCountsMapper.xml

+            <!-- TODO: check this-->
+            ANY_VALUE(gene1.hugo_gene_symbol) AS gene1HugoGeneSymbol,
+            gene2.entrez_gene_id AS gene2EntrezGeneId,
+            ANY_VALUE(gene2.hugo_gene_symbol) AS gene2HugoGeneSymbol,


we don't need to fix the ANY_VALUE issue here anymore, right? I guess we can remove the TODO comment

onursumer · 2025-11-03T17:13:52Z

src/main/resources/org/cbioportal/legacy/persistence/mybatis/AlterationCountsMapper.xml

+            <!-- TODO: check this-->
+            ANY_VALUE(gene1.hugo_gene_symbol) AS gene1HugoGeneSymbol,
+            gene2.entrez_gene_id AS gene2EntrezGeneId,
+            ANY_VALUE(gene2.hugo_gene_symbol) AS gene2HugoGeneSymbol,


same as above

onursumer · 2025-11-03T17:28:33Z

src/main/resources/org/cbioportal/legacy/persistence/mybatis/ClinicalEventMapper.xml

+            <if test="!clinicalEvents.isEmpty()">
+                AND
+                <foreach item="element" collection="clinicalEvents" open="(" separator="OR " close=")">
+                    <if test="element.attributes == null || element.attributes.isEmpty()">
+                        clinical_event.event_type = #{element.eventType}
+                    </if>
+                    <if test="element.attributes != null and !element.attributes.isEmpty()">
+                        (CONCAT(clinical_event.event_type, '_', clinical_event_data.key)) IN
+                        <foreach item="attribute" collection="element.attributes" open="(" separator="," close=")">
+                            CONCAT(#{element.eventType}, '_', #{attribute.key})
+                        </foreach>
+                    </if>
+                </foreach>
+            </if>


This seems like a new addition. Probably addressing a legacy bug. This broke one of the legacy tests because of a missing clinicalEvents param.

onursumer · 2025-11-03T18:45:37Z

src/main/resources/org/cbioportal/legacy/persistence/mybatis/DiscreteCopyNumberMapper.xml

+            <!-- TODO: check this-->
+            ANY_VALUE(gene.hugo_gene_symbol) AS "hugoGeneSymbol",


remove this TODO as well?

onursumer · 2025-11-03T18:49:43Z

src/main/resources/org/cbioportal/legacy/persistence/mybatis/MutationMapper.xml

+        <!-- TODO: check this-->
+        ANY_VALUE(GENE.hugoGeneSymbol) AS "hugoGeneSymbol",


remove this TODO?

onursumer · 2025-11-03T18:55:23Z

src/main/resources/org/cbioportal/legacy/persistence/mybatis/StudyMapper.xml

+            <!--TODO : FIX-->
+            <!--COUNT(CASE WHEN sample_list.stable_id = CONCAT(cancer_study.cancer_study_identifier,'_all') THEN 1 ELSE NULL END) AS allSampleCount-->
+            1 AS allSampleCount


I think this is fixed in the new clickhouse implementation, so we don't use this getStudies mapping anymore, right?

onursumer · 2025-11-03T18:55:58Z

src/main/resources/org/cbioportal/legacy/persistence/mybatis/StudyMapper.xml

+            <!--TODO: FIX-->
+            <!--COUNT(CASE WHEN sample_list.stable_id = CONCAT(cancer_study.cancer_study_identifier,'_all') THEN 1 ELSE NULL END) AS allSampleCount-->
+            1 AS allSampleCount


same as above

onursumer · 2025-11-03T19:35:54Z

#11790 addresses some of my review comments above

The method name combineStudyAndPatientIds was confusing because it was often used with sampleIds and molecularProfileIds, not just patientIds. Renamed to combineStudyAndEntityIds to better reflect its generic purpose of combining study IDs with any type of entity IDs. Updated all 15 usages across MyBatis mapper XML files: - TreatmentMapper.xml - CopyNumberSegmentMapper.xml - DiscreteCopyNumberMapper.xml - MutationMapper.xml - ClinicalEventMapper.xml (2 usages) - NamespaceMapper.xml - StructuralVariantMapper.xml - ClinicalDataMapper.xml (2 usages) - SampleMapper.xml (2 usages) - PatientMapper.xml (2 usages) - ClinicalAttributeMapper.xml 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

haynescd and others added 9 commits September 11, 2025 16:21

Update initial Mappers to fix Patient View Page

85b47c2

Fix queries to support cbioportal with clickhouse only

fd83576

Fix mybatis xml

32f527c

Remove GeneMemoizer call

611aa91

improve molecular data query (#11669)

000498c

add back clinical data mapper queries for clinical table (#11700)

c284181

Co-authored-by: Bryan Lai <[email protected]>

alisman force-pushed the demo-clickhouse-only-db branch from 2e9c7ae to 688a552 Compare September 11, 2025 20:21

alisman and others added 5 commits September 17, 2025 17:14

Fix capitalization of field names in sql

94aaee9

make sure treatment mapper string comparisons are case-insensitive (#…

6189f4f

…11720)

fix broken resource data endpoints (#11719)

e5fdccf

improve generic_assay_data sql performance (#11706)

c90509b

alisman force-pushed the demo-clickhouse-only-db branch from 6f65b0a to ba502c6 Compare October 28, 2025 18:42

alisman commented Oct 29, 2025

View reviewed changes

alisman and others added 13 commits October 31, 2025 15:08

Fix spotless issues

2503494

fix broken getPatientClinicalDataFromStudyViewFilter mapping (#11710)

aa133a8

fix high security vulnerabilities (#11717)

4fb4d04

Co-authored-by: Bryan Lai <[email protected]>

Refactor Clickhouse enrichments endpoint to avoid redundant counting …

2e5d568

…of profiling status for genes which are covered by the same panels

truncate genetic alteration values for molecular data queries (#11734)

d89e0f0

Fix column store study endpoint (#11747)

c2e4117

* ♻️ Refactor Cancer Study PermissionEvaluator * 🔒 Add Support for CancerStudyMetadata obj * Update field typeOfCancer to cancerType

Fix sample list error (clickhouse-only) (#11753)

9387415

* Fix sample list query for clickhouse * Change cancer study details query in sample list query * Clean up SampleListMapper.xml by removing unused prefix properties

Fix legacy core tests after changes to legacy mapper for clickhouse-o…

2ba9202

…nly (#11784)

alisman added 3 commits October 31, 2025 15:10

Merge remote-tracking branch 'upstream/master' into demo-clickhouse-o…

84a22ab

…nly-db # Conflicts: # pom.xml # src/main/resources/org/cbioportal/legacy/persistence/mybatis/CosmicCountMapper.xml # src/test/java/org/cbioportal/legacy/persistence/mybatis/CosmicCountMyBatisRepositoryTest.java

Fix upper case column issues in merged code

2164b45

alisman force-pushed the demo-clickhouse-only-db branch from 7ff1ddf to 2164b45 Compare October 31, 2025 19:56

fix a potential null pointer exception with the VariantCountMyBatisRe…

d2e6aa4

…pository test (#11791)

onursumer self-assigned this Nov 3, 2025

onursumer reviewed Nov 3, 2025

View reviewed changes

		<!-- TODO: check this-->
		ANY_VALUE(gene.hugo_gene_symbol) AS "hugoGeneSymbol",

		<!-- TODO: check this-->
		ANY_VALUE(GENE.hugoGeneSymbol) AS "hugoGeneSymbol",

Uh oh!

Clickhouse only db #11704

Are you sure you want to change the base?

Clickhouse only db #11704

Uh oh!

Conversation

alisman commented Sep 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

onursumer commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants