Skip to content

Clear the functests search index when clearing the DB #9638

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 25, 2025

Conversation

seanh
Copy link
Member

@seanh seanh commented Jun 19, 2025

For speed h does not normally clear the DB between functests unless the
test has @pytest.mark.parametrize("with_clean_db"), see:
#6845

At the time when this change was made the search index was still being
cleared before each functest which makes no sense: the search API will
not return an annotation if it's not in the search index, even if the
annotation is in the DB. If you aren't going to clear the DB before each
functest, you shouldn't clear the search index either.

A later PR accidentally broke the
always_delete_all_elasticsearch_documents fixture so that the search
index wasn't actually being cleared between functests anyway, see:
#9636

This commit:

  1. Removes the broken always_delete_all_elasticsearch_documents
    fixture: it doesn't do anything, and since we don't clear the DB
    between tests it doesn't make sense to clear the search index either.
  2. Replaces the with_clean_db fixture with
    with_clean_db_and_search_index: it doesn't make sense to clear the
    DB without also clearing the search index.

Longer-term I think we probably want to find a fast way to clear both
the DB and search index before each functional test.

@seanh seanh requested a review from robertknight June 19, 2025 10:39
@seanh seanh force-pushed the clean-search-index-when-cleaning-db branch 2 times, most recently from 558895f to 52667ae Compare June 19, 2025 15:21
@seanh seanh changed the base branch from main to remove-common-fixtures-dir June 19, 2025 15:23
Comment on lines -61 to -73
# Push all changes to segments to make sure all annotations that were added get removed.
elasticsearch_dsl.Index(client.index, using=client.conn).refresh()

client.conn.delete_by_query(
index=client.index,
body={"query": {"match_all": {}}},
# This query occasionally fails with a version conflict.
# Forcing the deletion resolves the issue, but the exact
# cause of the version conflict has not been found yet.
conflicts="proceed",
# Add refresh to make deletion changes show up in search results.
refresh=True,
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The global es_client fixture (common to both unittests and functests) no longer clears the Elasticsearch index after each test. This has been moved into a unittests-only fixture.

@@ -51,7 +52,18 @@ def reset_app(app):


@pytest.fixture
def with_clean_db(db_engine):
def with_clean_db_and_search_index(db_engine, clear_search_index):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The with_clean_db() functests fixture has been replaced with a with_clean_db_and_search_index() fixture: it doesn't make sense for a test to request a clean DB but not a clean search index. You can only have neither or both, not one without the other.

Comment on lines -89 to -93
# Always unconditionally wipe the Elasticsearch index after every functional
# test.
@pytest.fixture(autouse=True)
def always_delete_all_elasticsearch_documents():
pass
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in another PR (#9636, which I'm not planning to merge) this fixture is broken and doesn't do anything. The functionality that this fixture was supposed to provide has been moved into the with_clean_db_and_search_index fixture and is executed only for tests that use that fixure, not for every test.

Comment on lines +70 to +73
@pytest.fixture
def es_client(es_client, clear_search_index):
yield es_client
clear_search_index()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This preserves the previous behaviour of the unit tests: they always clear the search index after every test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clears the search index after the test runs. My understanding of yield fixtures is that the teardown code is still run if the test fails, but there is potential that it doesn't get run if the test process were aborted for some reason. I suspect this is rare enough not to be a problem, but I'll note it as a minor hazard.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. This is preserving existing behaviour, but I agree it seems more robust to clear the search index before a test rather than after. For some reason cleaning up after a test seems to be the common and recommended approach. This seems to be a harmless change to make so I've sent a follow-up PR for it: #9653

@seanh seanh force-pushed the clean-search-index-when-cleaning-db branch from 52667ae to 886a2bc Compare June 19, 2025 15:34
@seanh seanh marked this pull request as ready for review June 19, 2025 15:40
@robertknight
Copy link
Member

At the time when this change was made the search index was still being
cleared before each functest which makes no sense: the search API will
not return an annotation if it's not in the search index, even if the
annotation is in the DB.

If we're merely not cleaning the DB as an optimization, then some tests might be OK with having "orphaned" pre-existing entries which will never be found by search. Still, this also creates potentially surprising results. I agree it would be best to find a fast way of clearing both.

Comment on lines +70 to +73
@pytest.fixture
def es_client(es_client, clear_search_index):
yield es_client
clear_search_index()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clears the search index after the test runs. My understanding of yield fixtures is that the teardown code is still run if the test fails, but there is potential that it doesn't get run if the test process were aborted for some reason. I suspect this is rare enough not to be a problem, but I'll note it as a minor hazard.

seanh added a commit that referenced this pull request Jun 24, 2025
Clear the search index *before* each unittest rather than after, as this
seems more robust. See:

#9638 (comment)
seanh added a commit that referenced this pull request Jun 24, 2025
Clear the search index *before* each unittest rather than after, as this
seems more robust. See:

#9638 (comment)
@seanh
Copy link
Member Author

seanh commented Jun 24, 2025

I agree it would be best to find a fast way of clearing both.

IIRC the problem is that the unittests use a DB session that's specially wrapped in a transaction that's rolled back after each test:

h/tests/conftest.py

Lines 18 to 43 in 1615569

@pytest.fixture
def db_session(db_engine, db_sessionfactory):
"""
Return the SQLAlchemy database session.
This returns a session that is wrapped in an external transaction that is
rolled back after each test, so tests can't make database changes that
affect later tests. Even if the test (or the code under test) calls
session.commit() this won't touch the external transaction.
Recipe adapted from:
https://docs.sqlalchemy.org/en/20/orm/session_transaction.html#joining-a-session-into-an-external-transaction-such-as-for-test-suites
"""
connection = db_engine.connect()
transaction = connection.begin()
session = db_sessionfactory(
bind=connection, join_transaction_mode="create_savepoint"
)
try:
yield session
finally:
session.close()
transaction.rollback()
connection.close()
But in the functests there are two separate DB sessions: the one used by the tests themselves (to create test data, or do DB queries for test assertions) and then the one that the app creates for itself. This means that the tests need to actually commit any test data to the DB and this needs to be actually committed (not held in some transaction) because the data needs to be seen by the app which is using an entirely separate session. Similarly the app needs to actually commit any data changes it makes so that test assertions will see it. And wiping actually committed stuff from the DB after each test is much slower than rolling back a transaction that was never really committed. Since functests never call the app directly (only via WebTest) there's no way for functests to pass a shared and specially constructed DB session into the app like the unittests do. The test code and application code are actually running in a single shared process. But WebTest didn't appear to provide any way for functests to pass an arbitrary Python object like a DB session through to the app.

@seanh seanh force-pushed the remove-common-fixtures-dir branch from b05436b to 3c534f7 Compare June 24, 2025 17:03
@seanh seanh force-pushed the clean-search-index-when-cleaning-db branch from 886a2bc to 3a3bd05 Compare June 24, 2025 17:07
@seanh
Copy link
Member Author

seanh commented Jun 24, 2025

Rebased

Base automatically changed from remove-common-fixtures-dir to main June 25, 2025 11:03
For speed h does not normally clear the DB between functests unless the
test has `@pytest.mark.parametrize("with_clean_db")`, see:
#6845

At the time when this change was made the _search index_ was still being
cleared before each functest which makes no sense: the search API will
not return an annotation if it's not in the search index, even if the
annotation is in the DB. If you aren't going to clear the DB before each
functest, you shouldn't clear the search index either.

A later PR accidentally broke the
`always_delete_all_elasticsearch_documents` fixture so that the search
index wasn't actually being cleared between functests anyway, see:
#9636

This commit:

1. Removes the broken `always_delete_all_elasticsearch_documents`
   fixture: it doesn't do anything, and since we don't clear the DB
   between tests it doesn't make sense to clear the search index either.
2. Replaces the `with_clean_db` fixture with
   `with_clean_db_and_search_index`: it doesn't make sense to clear the
   DB without also clearing the search index.

Longer-term I think we probably want to find a fast way to clear both
the DB and search index before each functional test.
@seanh seanh force-pushed the clean-search-index-when-cleaning-db branch from 3a3bd05 to bc4bc83 Compare June 25, 2025 11:05
@seanh seanh merged commit d10c60d into main Jun 25, 2025
11 checks passed
@seanh seanh deleted the clean-search-index-when-cleaning-db branch June 25, 2025 11:09
seanh added a commit that referenced this pull request Jun 25, 2025
Clear the search index *before* each unittest rather than after, as this
seems more robust. See:

#9638 (comment)
seanh added a commit that referenced this pull request Jun 25, 2025
Clear the search index *before* each unittest rather than after, as this
seems more robust. See:

#9638 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants