Clear the search index between functional tests #9636

seanh · 2025-06-18T16:48:16Z

This always_delete_all_elasticsearch_documents fixture that's executed
before every functional test doesn't do anything:

h/tests/functional/conftest.py

Lines 95 to 99 in 2aab326

    
           # Always unconditionally wipe the Elasticsearch index after every functional 
        
           # test. 
        
           @pytest.fixture(autouse=True) 
        
           def always_delete_all_elasticsearch_documents(): 
        
               pass

When this fixture was first added (in
#5002, back in 2018) the fixture
depended on another fixture named delete_all_elasticsearch_documents.
By having both autouse=True and depending on the
delete_all_elasticsearch_documents fixture, the
always_delete_all_elasticsearch_documents fixture ensured that the
delete_all_elasticsearch_documents fixture was run before every
functest.

At some point the delete_all_elasticsearch_documents fixture
dependency got removed, meaning the
always_delete_all_elasticsearch_documents fixture no longer does
anything, and the Elasticsearch index is no longer cleared between
functional tests.

What seems to have happened is:

A few months later (still in 2018) the
delete_all_elasticsearch_documents fixture was removed.
always_delete_all_elasticsearch_documents was changed to depend on
es6_client instead: e4c1460

The es6_client fixture had a builtin behaviour that it tears down
the search index after each test, so
delete_all_elasticsearch_documents was no longer needed. This meant
that it was now unclear why the
always_delete_all_elasticsearch_documents fixture depended on
es6_client. There was no comment explaining the apparently unused
dependency. The previous dependency name
delete_all_elasticsearch_documents had made the reason for the
dependency clear.
The es6_client fixture was later renamed to es_client.
A few years later, in 2021, a large PR fixing lots of pylint
unused-argument warnings removed the es_client fixture dependency:
Fix pylint unused-argument on the test code #6953

So AFAICT the Elasticsearch index has not been cleared between functests
since August 2021.

As long as the database is being cleared, if an annotation from a
previous test still remains in Elasticsearch the search API won't return
it if it's not in the DB, so I think mosts tests should be unaffected.

This is causing a new functest that I have locally to intermittently
fail if I run the entire test class, but reliably succeed if I run just
the one test. I'm not sure exactly why, the test in question is calling
the search API and the annotation being missing from the DB should mean
that the test is unaffected by stray annotations in the search index.
Could be to do with predictable randomness producing annotations with
the same IDs in subsequent test runs, not sure. Adding back the
es_client fixture dependency seems to fix it.

It's often said that you should be careful adding code comments and
instead let the code speak for itself (use good names and clear code) as
the code can change but the comment doesn't get updated and becomes
misleading. In this case the naming of the original test fixtures made
the intent clear and no comment was included. The test fixture name was
then changed in a way that obscured the intent of the code. And then
years later a mistake was made. A code comment might have prevented the
mistake (though possibly not, with the giant "fix all unused arguments"
PR).

This `always_delete_all_elasticsearch_documents` fixture that's executed before every functional test doesn't do anything: https://github.com/hypothesis/h/blob/2aab32695aaba9069dea487a0728491b061e854c/tests/functional/conftest.py#L95-L99 When this fixture was first added (in #5002, back in 2018) the fixture depended on another fixture named `delete_all_elasticsearch_documents`. By having both `autouse=True` and depending on the `delete_all_elasticsearch_documents` fixture, the `always_delete_all_elasticsearch_documents` fixture ensured that the `delete_all_elasticsearch_documents` fixture was run before every functest. At some point the `delete_all_elasticsearch_documents` fixture dependency got removed, meaning the `always_delete_all_elasticsearch_documents` fixture no longer does anything, and the Elasticsearch index is no longer cleared between functional tests. What seems to have happened is: 1. A few months later (still in 2018) the `delete_all_elasticsearch_documents` fixture was removed. `always_delete_all_elasticsearch_documents` was changed to depend on `es6_client` instead: e4c1460 The `es6_client` fixture had a builtin behaviour that it teardowns the search index after each test, so `delete_all_elasticsearch_documents` was no longer needed. This meant that it was not unclear why the `always_delete_all_elasticsearch_documents` fixture depended on `es6_client`. There was no comment explaining the apparently unused dependency. The previous dependency name `delete_all_elasticsearch_documents` had made the reason for the dependency clear. 2. The `es6_client` fixture was later renamed to `es_client`. 3. A few years later, in 2021, a large PR fixing lots of pylint unused-argument warnings removed the `es_client` fixture dependency: #6953 So AFAICT the Elasticsearch index has not been cleared between functests since August 2021. As long as the database is being cleared, if an annotation from a previous test still remains in Elasticsearch the search API won't return it if it's not in the DB, so I think mosts tests should be unaffected. This is causing a new functest that I have locally to intermittently fail if I run the entire test class, but reliably succeed if I run just the one test. I'm not sure exactly why, the test in question is calling the search API and the annotation being missing from the DB should mean that the test is unaffected by stray annotations in the search index. Could be to do with predictable randomness producing annotations with the same IDs in subsequent test runs, not sure. Adding back the `es_client` fixture dependency seems to fix it. It's often said that you should be careful adding code comments and instead let the code speak for itself (use good names and clear code) as the code can change but the comment doesn't get updated and becomes misleading. In this case the naming of the original test fixtures made the intent clear and no comment was included. The test fixture name was then changed in a way that obscured the intent of the code. And then years later a mistake was made. A code comment might have prevented the mistake (though possibly not, with the giant "fix all unused arguments" PR).

seanh · 2025-06-18T16:51:42Z

This is where the es_client fixture clears the search index after each test:

h/tests/common/fixtures/elasticsearch.py

Lines 23 to 32 in 2aab326

    
           client.conn.delete_by_query( 
        
               index=client.index, 
        
               body={"query": {"match_all": {}}}, 
        
               # This query occasionally fails with a version conflict. 
        
               # Forcing the deletion resolves the issue, but the exact 
        
               # cause of the version conflict has not been found yet. 
        
               conflicts="proceed", 
        
               # Add refresh to make deletion changes show up in search results. 
        
               refresh=True, 
        
           )

seanh · 2025-06-18T16:59:15Z

Part of what seems to have been going on in the discussion between Jon and Marcos in #6953 is the idea that if a fixture can be removed from a test and the test still passes then the fixture must have been unnecessary. But as can be seen from this example, that's not true: you'd have to also verify that the test still fails if the code is wrong. A fixture dependency could be what makes a test fail when it should fail, and that's not a rare case, it's common.

robertknight · 2025-06-19T08:38:12Z

But as can be seen from this example, that's not true: you'd have to also verify that the test still fails if the code is wrong. A fixture dependency could be what makes a test fail when it should fail, and that's not a rare case, it's common.

Right, but this does mean that it is difficult to distinguish necessary fixtures from fixtures that were accidentally included or needed in the past but no longer. Mechanisms to make it easier to reliably identify unused fixtures would be a plus.

For speed h does not normally clear the DB between functests unless the test has `@pytest.mark.parametrize("with_clean_db")`, see: #6845 At the time when this change was made the _search index_ was still being cleared before each functest which makes no sense: the search API will not return an annotation if it's not in the search index, even if the annotation is in the DB. If you aren't going to clear the DB before each functest, you shouldn't clear the search index either. A later PR accidentally broke the `always_delete_all_elasticsearch_documents` fixture so that the search index wasn't actually being cleared between functests anyway, see: #9636 This commit: 1. Removes the broken `always_delete_all_elasticsearch_documents` fixture: it doesn't do anything, and since we don't clear the DB between tests it doesn't make sense to clear the search index either. 2. Replaces the `with_clean_db` fixture with `with_clean_db_and_search_index`: it doesn't make sense to clear the DB without also clearing the search index. Longer-term I think we probably want to find a fast way to clear both the DB and search index before each functional test.

seanh · 2025-06-19T15:41:14Z

Closing in favour of #9638

For speed h does not normally clear the DB between functests unless the test has `@pytest.mark.parametrize("with_clean_db")`, see: #6845 At the time when this change was made the _search index_ was still being cleared before each functest which makes no sense: the search API will not return an annotation if it's not in the search index, even if the annotation is in the DB. If you aren't going to clear the DB before each functest, you shouldn't clear the search index either. A later PR accidentally broke the `always_delete_all_elasticsearch_documents` fixture so that the search index wasn't actually being cleared between functests anyway, see: #9636 This commit: 1. Removes the broken `always_delete_all_elasticsearch_documents` fixture: it doesn't do anything, and since we don't clear the DB between tests it doesn't make sense to clear the search index either. 2. Replaces the `with_clean_db` fixture with `with_clean_db_and_search_index`: it doesn't make sense to clear the DB without also clearing the search index. Longer-term I think we probably want to find a fast way to clear both the DB and search index before each functional test.

seanh requested a review from robertknight June 18, 2025 16:48

robertknight approved these changes Jun 19, 2025

View reviewed changes

seanh mentioned this pull request Jun 19, 2025

Clear the functests search index when clearing the DB #9638

Merged

seanh closed this Jun 19, 2025

seanh deleted the clear-search-index-between-tests branch June 19, 2025 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clear the search index between functional tests #9636

Clear the search index between functional tests #9636

Uh oh!

seanh commented Jun 18, 2025 •

edited

Loading

Uh oh!

seanh commented Jun 18, 2025

Uh oh!

seanh commented Jun 18, 2025 •

edited

Loading

Uh oh!

robertknight commented Jun 19, 2025

Uh oh!

seanh commented Jun 19, 2025

Uh oh!

Uh oh!

	# Always unconditionally wipe the Elasticsearch index after every functional
	# test.
	@pytest.fixture(autouse=True)
	def always_delete_all_elasticsearch_documents():
	pass

Clear the search index between functional tests #9636

Clear the search index between functional tests #9636

Uh oh!

Conversation

seanh commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seanh commented Jun 18, 2025

Uh oh!

seanh commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

robertknight commented Jun 19, 2025

Uh oh!

seanh commented Jun 19, 2025

Uh oh!

Uh oh!

seanh commented Jun 18, 2025 •

edited

Loading

seanh commented Jun 18, 2025 •

edited

Loading