-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
I have a strange issue with titan-lucene. Under a certain condition titan-lucene returns edges that do not meet the predicate requirement LESS_THAN.
The purpose of my little application is to sync an external source (LDAP) with a Titan graph. A full synchronization
- sends all entries in the LDAP
- I "touch" all vertices and edges that come past me during the synchronization process
- delete all graph elements that have an update timestamp from before when the synchronization process started (were not touched)
The following is reproducible with titan-lucene 0.5.0 and 0.5.1.
During development I noticed the following behavior. No changes are made to the LDAP contents during this procedure.
- delete the Lucene index folder
- start full synchronization; at the end of the process no vertices or edges are deleted, as expected
- kill java process with SIGKILL (Eclipse standard behavior)
- start full synchronization; at the end of the process all edges are deleted, not expected
- kill java process with SIGKILL
- start full synchronization; at the end of the process no vertices or edges are deleted, as expected
Repeating steps 5 and 6 will always yield the expected result. Only the second run after the index has been created from scratch exhibits the faulty behavior.
The index is created as follows:
PropertyKey propUpdatedAt = tm.makePropertyKey(PropertyKeys.UPDATED_AT.name()).cardinality(Cardinality.SINGLE).dataType(Long.class).make();
tm.buildIndex(PropertyKeys.UPDATEDATIDXMIXEDEDGE.name(), Edge.class).addKey(propUpdatedAt).buildMixedIndex("locallucene");
The index is queried as follows:
// cutoffTimestamp is a long
query.has(PropertyKeys.UPDATED_AT.name(), Compare.LESS_THAN, cutoffTimestamp).edges();
In step 4 the above line returns an iterator that will deliver all edges in the graph.
Inverting the predicate to GREATER_THAN does not invert the behavior. It always deletes all edges (that have been touched during synchronization).
I have debugged the code and in step 4 the returned edges definitely have a modification timestamp > cutt-off timestamp.
My unit tests that insert vertices and edges with predefined modification timestamps and then test the deletion of outdated elements are all green. At the end of the tests the index and DB directory are wiped out. So the tests are equivalent to step 2.
The DB configuration is
storage.backend=berkeleyje
storage.directory=/tmp/berkeleydb
storage.transactions=true
schema.default=none
index.locallucene.backend=lucene
index.locallucene.directory=/tmp/searchindex