Skip to content

Commit 8bb97fa

Browse files
authored
DAT-331: Add option to automatically preserve TTL and timestamp (#384)
1 parent b65a9e6 commit 8bb97fa

File tree

20 files changed

+2610
-1886
lines changed

20 files changed

+2610
-1886
lines changed

changelog/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
## Changelog
22

3-
## 1.7.1 (in progress)
3+
## 1.8.0 (in progress)
44

55
- [improvement] Upgrade driver to 4.10.0.
66
- [bug] Fix incorrect error message when read concurrency is < 1.
@@ -9,6 +9,7 @@
99
- [improvement] Accept Well-known Binary (WKB) input formats for Geometry types.
1010
- [improvement] Make Json connector sensitive to the configured binary format.
1111
- [improvement] Make Geometry formats configurable.
12+
- [new feature] DAT-331: Add option to automatically preserve TTL and timestamp.
1213

1314
## 1.7.0
1415

manual/application.template.conf

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -715,6 +715,52 @@ dsbulk {
715715
# Default value: true
716716
#schema.nullToUnset = true
717717

718+
# Whether to preserve cell timestamps when loading and unloading. Ignored when `schema.query` is
719+
# provided, or when the target table is a counter table. If true, the following rules will be
720+
# applied to generated queries:
721+
#
722+
# - When loading, instead of a single INSERT statement, the generated query will be a BATCH
723+
# query; this is required in order to preserve individual column timestamps for each row.
724+
# - When unloading, the generated SELECT statement will export each column along with its
725+
# individual timestamp.
726+
#
727+
# For both loading and unlaoding, DSBulk will import and export timestamps using field names
728+
# such as `"writetime(<column>)"`, where `<column>` is the column's internal CQL name; for
729+
# example, if the table has a column named `"MyCol"`, its corresponding timestamp would be
730+
# exported as `"writetime(MyCol)"` in the generated query and in the resulting connector record.
731+
# If you intend to use this feature to export and import tables letting DSBulk generate the
732+
# appropriate queries, these names are fine and need not be changed. If, however, you would like
733+
# to export or import data to or from external sources that use different field names, you could
734+
# do so by using the function `writetime` in a schema.mapping entry; for example, the following
735+
# mapping would map `col1` along with its timestamp to two distinct fields, `field1` and
736+
# `field1_writetime`: `field1 = col1, field1_writetime = writetime(col1)`.
737+
# Type: boolean
738+
# Default value: false
739+
#schema.preserveTimestamp = false
740+
741+
# Whether to preserve cell TTLs when loading and unloading. Ignored when `schema.query` is
742+
# provided, or when the target table is a counter table. If true, the following rules will be
743+
# applied to generated queries:
744+
#
745+
# - When loading, instead of a single INSERT statement, the generated query will be a BATCH
746+
# query; this is required in order to preserve individual column TTLs for each row.
747+
# - When unloading, the generated SELECT statement will export each column along with its
748+
# individual TTL.
749+
#
750+
# For both loading and unlaoding, DSBulk will import and export TTLs using field names such as
751+
# `"ttl(<column>)"`, where `<column>` is the column's internal CQL name; for example, if the
752+
# table has a column named `"MyCol"`, its corresponding TTL would be exported as `"ttl(MyCol)"`
753+
# in the generated query and in the resulting connector record. If you intend to use this
754+
# feature to export and import tables letting DSBulk generate the appropriate queries, these
755+
# names are fine and need not be changed. If, however, you would like to export or import data
756+
# to or from external sources that use different field names, you could do so by using the
757+
# function `ttl` in a schema.mapping entry; for example, the following mapping would map `col1`
758+
# along with its TTL to two distinct fields, `field1` and `field1_ttl`: `field1 = col1,
759+
# field1_ttl = ttl(col1)`.
760+
# Type: boolean
761+
# Default value: false
762+
#schema.preserveTtl = false
763+
718764
# The query to use. If not specified, then *schema.keyspace* and *schema.table* must be
719765
# specified, and dsbulk will infer the appropriate statement based on the table's metadata,
720766
# using all available columns. If `schema.keyspace` is provided, the query need not include the

manual/settings.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -790,6 +790,28 @@ This setting is ignored when counting. When set to true but the protocol version
790790

791791
Default: **true**.
792792

793+
#### -timestamp,<br />--schema.preserveTimestamp<br />--dsbulk.schema.preserveTimestamp _&lt;boolean&gt;_
794+
795+
Whether to preserve cell timestamps when loading and unloading. Ignored when `schema.query` is provided, or when the target table is a counter table. If true, the following rules will be applied to generated queries:
796+
797+
- When loading, instead of a single INSERT statement, the generated query will be a BATCH query; this is required in order to preserve individual column timestamps for each row.
798+
- When unloading, the generated SELECT statement will export each column along with its individual timestamp.
799+
800+
For both loading and unlaoding, DSBulk will import and export timestamps using field names such as `"writetime(<column>)"`, where `<column>` is the column's internal CQL name; for example, if the table has a column named `"MyCol"`, its corresponding timestamp would be exported as `"writetime(MyCol)"` in the generated query and in the resulting connector record. If you intend to use this feature to export and import tables letting DSBulk generate the appropriate queries, these names are fine and need not be changed. If, however, you would like to export or import data to or from external sources that use different field names, you could do so by using the function `writetime` in a schema.mapping entry; for example, the following mapping would map `col1` along with its timestamp to two distinct fields, `field1` and `field1_writetime`: `field1 = col1, field1_writetime = writetime(col1)`.
801+
802+
Default: **false**.
803+
804+
#### -ttl,<br />--schema.preserveTtl<br />--dsbulk.schema.preserveTtl _&lt;boolean&gt;_
805+
806+
Whether to preserve cell TTLs when loading and unloading. Ignored when `schema.query` is provided, or when the target table is a counter table. If true, the following rules will be applied to generated queries:
807+
808+
- When loading, instead of a single INSERT statement, the generated query will be a BATCH query; this is required in order to preserve individual column TTLs for each row.
809+
- When unloading, the generated SELECT statement will export each column along with its individual TTL.
810+
811+
For both loading and unlaoding, DSBulk will import and export TTLs using field names such as `"ttl(<column>)"`, where `<column>` is the column's internal CQL name; for example, if the table has a column named `"MyCol"`, its corresponding TTL would be exported as `"ttl(MyCol)"` in the generated query and in the resulting connector record. If you intend to use this feature to export and import tables letting DSBulk generate the appropriate queries, these names are fine and need not be changed. If, however, you would like to export or import data to or from external sources that use different field names, you could do so by using the function `ttl` in a schema.mapping entry; for example, the following mapping would map `col1` along with its TTL to two distinct fields, `field1` and `field1_ttl`: `field1 = col1, field1_ttl = ttl(col1)`.
812+
813+
Default: **false**.
814+
793815
#### -query,<br />--schema.query<br />--dsbulk.schema.query _&lt;string&gt;_
794816

795817
The query to use. If not specified, then *schema.keyspace* and *schema.table* must be specified, and dsbulk will infer the appropriate statement based on the table's metadata, using all available columns. If `schema.keyspace` is provided, the query need not include the keyspace to qualify the table reference.

mapping/src/main/antlr4/com/datastax/oss/dsbulk/generated/mapping/Mapping.g4

Lines changed: 50 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -40,9 +40,9 @@ regularMappedEntry
4040
;
4141

4242
inferredMappedEntry
43-
: '*' ( ':' | '=' ) '*'
44-
| '*' ( ':' | '=' ) '-' variable
45-
| '*' ( ':' | '=' ) '[' '-' variable ( ',' '-' variable )* ']'
43+
: STAR ( ':' | '=' ) STAR
44+
| STAR ( ':' | '=' ) '-' variable
45+
| STAR ( ':' | '=' ) '[' '-' variable ( ',' '-' variable )* ']'
4646
;
4747

4848
indexedEntry
@@ -63,46 +63,69 @@ fieldOrFunction
6363
| function
6464
;
6565

66-
field
67-
: UNQUOTED_IDENTIFIER
68-
| QUOTED_IDENTIFIER
69-
;
70-
7166
variableOrFunction
7267
: variable
7368
| function
7469
;
7570

71+
field
72+
: identifier
73+
;
74+
7675
variable
76+
: identifier
77+
;
78+
79+
keyspaceName
80+
: identifier
81+
;
82+
83+
functionName
84+
: identifier
85+
;
86+
87+
columnName
88+
: identifier
89+
;
90+
91+
identifier
7792
: UNQUOTED_IDENTIFIER
7893
| QUOTED_IDENTIFIER
94+
// also valid as identifiers:
95+
| WRITETIME
96+
| TTL
7997
;
8098

8199
function
82-
: WRITETIME '(' functionArg ')'
83-
| qualifiedFunctionName '(' ')'
84-
| qualifiedFunctionName '(' functionArgs ')'
100+
: writetime
101+
| ttl
102+
| qualifiedFunctionName '(' functionArgs? ')'
85103
;
86104

87-
qualifiedFunctionName
88-
: ( keyspaceName '.' )? functionName
105+
writetime
106+
: WRITETIME '(' STAR ')'
107+
| WRITETIME '(' columnName ( ',' columnName )* ')'
89108
;
90109

91-
keyspaceName
92-
: UNQUOTED_IDENTIFIER
93-
| QUOTED_IDENTIFIER
110+
ttl
111+
: TTL '(' STAR ')'
112+
| TTL '(' columnName ( ',' columnName )* ')'
94113
;
95114

96-
functionName
97-
: UNQUOTED_IDENTIFIER
98-
| QUOTED_IDENTIFIER
115+
qualifiedFunctionName
116+
: ( keyspaceName '.' )? functionName
99117
;
100118

101119
functionArgs
102120
: functionArg ( ',' functionArg )*
103121
;
104122

105123
functionArg
124+
: columnName
125+
| literal
126+
;
127+
128+
literal
106129
: INTEGER
107130
| FLOAT
108131
| BOOLEAN
@@ -111,8 +134,6 @@ functionArg
111134
| HEXNUMBER
112135
| STRING_LITERAL
113136
| ( '-' )? ( K_NAN | K_INFINITY )
114-
| QUOTED_IDENTIFIER
115-
| UNQUOTED_IDENTIFIER
116137
;
117138

118139
// Case-insensitive alpha characters
@@ -181,6 +202,14 @@ WRITETIME
181202
: W R I T E T I M E
182203
;
183204

205+
TTL
206+
: T T L
207+
;
208+
209+
STAR
210+
: '*'
211+
;
212+
184213
BOOLEAN
185214
: T R U E | F A L S E
186215
;

0 commit comments

Comments
 (0)