From 443c8c167cb11e94b0d613b098cab265cb3050ba Mon Sep 17 00:00:00 2001 From: tgambin Date: Tue, 13 Apr 2021 21:29:17 +0200 Subject: [PATCH 1/3] docs/pluggable_intervaljoin init --- .../interval_set_representation.rst | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 docs/source/development/interval_set_representation/interval_set_representation.rst diff --git a/docs/source/development/interval_set_representation/interval_set_representation.rst b/docs/source/development/interval_set_representation/interval_set_representation.rst new file mode 100644 index 00000000..8da0827f --- /dev/null +++ b/docs/source/development/interval_set_representation/interval_set_representation.rst @@ -0,0 +1,9 @@ +Interval Set representation +======== + +.. contents:: + + +Replacing Interval Trees with other data structeres: +############################ + From ae8aae300e7f4ad13487dc2d57ee41c7c55fe9cc Mon Sep 17 00:00:00 2001 From: tgambin Date: Fri, 16 Apr 2021 18:47:14 +0200 Subject: [PATCH 2/3] Update interval_set_representation.rst --- .../interval_set_representation.rst | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/docs/source/development/interval_set_representation/interval_set_representation.rst b/docs/source/development/interval_set_representation/interval_set_representation.rst index 8da0827f..1460a0c2 100644 --- a/docs/source/development/interval_set_representation/interval_set_representation.rst +++ b/docs/source/development/interval_set_representation/interval_set_representation.rst @@ -4,6 +4,14 @@ Interval Set representation .. contents:: -Replacing Interval Trees with other data structeres: +Replacing Interval Trees with other data structures: ############################ +In the first version of SeQuiLa we used Interval red black tree data structure implemented in Java. +On the other, we are aware that state-of-the-art non-distributed tools such as (bed-tk, cgranges, genomicRanges, etc) utilizes data structures that are more efficient in finding overlaps between sets of intervals, i.e. Augmented Interval Lists, Nested Contained Lists, Implicit Interval Trees, etc. + +In the current version of SeQuiLa we provide the interface that allows to test new data structures and find overlaps algorithms. To replace the orginal implementation of RedBlack interval tree, the following steps need to be done: +* Add a new class of IntervalTreeHolder that implements interface defined in BaseIntervalHolder.scala. +* Add a new class of Node that implements interface defined in BaseNode.scala +* Set InternalParams.intervalHolderClass parameter in spark.sqlContext configuration. +* Sample implementation of DummyIntervalHolder and its use can be found in CustomIntervalHolderTestSuits. From 33bd4ed661d3aa7d29b0076b11634b8e8ed2ff22 Mon Sep 17 00:00:00 2001 From: tgambin Date: Fri, 16 Apr 2021 18:50:41 +0200 Subject: [PATCH 3/3] Update interval_set_representation.rst --- .../interval_set_representation.rst | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/docs/source/development/interval_set_representation/interval_set_representation.rst b/docs/source/development/interval_set_representation/interval_set_representation.rst index 1460a0c2..5a529f41 100644 --- a/docs/source/development/interval_set_representation/interval_set_representation.rst +++ b/docs/source/development/interval_set_representation/interval_set_representation.rst @@ -1,17 +1,16 @@ Interval Set representation ======== -.. contents:: - - -Replacing Interval Trees with other data structures: -############################ In the first version of SeQuiLa we used Interval red black tree data structure implemented in Java. On the other, we are aware that state-of-the-art non-distributed tools such as (bed-tk, cgranges, genomicRanges, etc) utilizes data structures that are more efficient in finding overlaps between sets of intervals, i.e. Augmented Interval Lists, Nested Contained Lists, Implicit Interval Trees, etc. In the current version of SeQuiLa we provide the interface that allows to test new data structures and find overlaps algorithms. To replace the orginal implementation of RedBlack interval tree, the following steps need to be done: -* Add a new class of IntervalTreeHolder that implements interface defined in BaseIntervalHolder.scala. -* Add a new class of Node that implements interface defined in BaseNode.scala -* Set InternalParams.intervalHolderClass parameter in spark.sqlContext configuration. -* Sample implementation of DummyIntervalHolder and its use can be found in CustomIntervalHolderTestSuits. + +- Add a new class of IntervalTreeHolder that implements interface defined in BaseIntervalHolder.scala. + +- Add a new class of Node that implements interface defined in BaseNode.scala + +- Set InternalParams.intervalHolderClass parameter in spark.sqlContext configuration. + +- Sample implementation of DummyIntervalHolder and its use can be found in CustomIntervalHolderTestSuits.