Skip to content

Clear the search index between functional tests #9636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

seanh
Copy link
Member

@seanh seanh commented Jun 18, 2025

This always_delete_all_elasticsearch_documents fixture that's executed
before every functional test doesn't do anything:

# Always unconditionally wipe the Elasticsearch index after every functional
# test.
@pytest.fixture(autouse=True)
def always_delete_all_elasticsearch_documents():
pass

When this fixture was first added (in
#5002, back in 2018) the fixture
depended on another fixture named delete_all_elasticsearch_documents.
By having both autouse=True and depending on the
delete_all_elasticsearch_documents fixture, the
always_delete_all_elasticsearch_documents fixture ensured that the
delete_all_elasticsearch_documents fixture was run before every
functest.

At some point the delete_all_elasticsearch_documents fixture
dependency got removed, meaning the
always_delete_all_elasticsearch_documents fixture no longer does
anything, and the Elasticsearch index is no longer cleared between
functional tests.

What seems to have happened is:

  1. A few months later (still in 2018) the
    delete_all_elasticsearch_documents fixture was removed.
    always_delete_all_elasticsearch_documents was changed to depend on
    es6_client instead: e4c1460

    The es6_client fixture had a builtin behaviour that it tears down
    the search index after each test, so
    delete_all_elasticsearch_documents was no longer needed. This meant
    that it was now unclear why the
    always_delete_all_elasticsearch_documents fixture depended on
    es6_client. There was no comment explaining the apparently unused
    dependency. The previous dependency name
    delete_all_elasticsearch_documents had made the reason for the
    dependency clear.

  2. The es6_client fixture was later renamed to es_client.

  3. A few years later, in 2021, a large PR fixing lots of pylint
    unused-argument warnings removed the es_client fixture dependency:
    Fix pylint unused-argument on the test code #6953

So AFAICT the Elasticsearch index has not been cleared between functests
since August 2021.

As long as the database is being cleared, if an annotation from a
previous test still remains in Elasticsearch the search API won't return
it if it's not in the DB, so I think mosts tests should be unaffected.

This is causing a new functest that I have locally to intermittently
fail if I run the entire test class, but reliably succeed if I run just
the one test. I'm not sure exactly why, the test in question is calling
the search API and the annotation being missing from the DB should mean
that the test is unaffected by stray annotations in the search index.
Could be to do with predictable randomness producing annotations with
the same IDs in subsequent test runs, not sure. Adding back the
es_client fixture dependency seems to fix it.

It's often said that you should be careful adding code comments and
instead let the code speak for itself (use good names and clear code) as
the code can change but the comment doesn't get updated and becomes
misleading. In this case the naming of the original test fixtures made
the intent clear and no comment was included. The test fixture name was
then changed in a way that obscured the intent of the code. And then
years later a mistake was made. A code comment might have prevented the
mistake (though possibly not, with the giant "fix all unused arguments"
PR).

This `always_delete_all_elasticsearch_documents` fixture that's executed
before every functional test doesn't do anything:

https://github.com/hypothesis/h/blob/2aab32695aaba9069dea487a0728491b061e854c/tests/functional/conftest.py#L95-L99

When this fixture was first added (in
#5002, back in 2018) the fixture
depended on another fixture named `delete_all_elasticsearch_documents`.
By having both `autouse=True` and depending on the
`delete_all_elasticsearch_documents` fixture, the
`always_delete_all_elasticsearch_documents` fixture ensured that the
`delete_all_elasticsearch_documents` fixture was run before every
functest.

At some point the `delete_all_elasticsearch_documents` fixture
dependency got removed, meaning the
`always_delete_all_elasticsearch_documents` fixture no longer does
anything, and the Elasticsearch index is no longer cleared between
functional tests.

What seems to have happened is:

1. A few months later (still in 2018) the
   `delete_all_elasticsearch_documents` fixture was removed.
   `always_delete_all_elasticsearch_documents` was changed to depend on
   `es6_client` instead: e4c1460

   The `es6_client` fixture had a builtin behaviour that it teardowns
   the search index after each test, so
   `delete_all_elasticsearch_documents` was no longer needed. This meant
   that it was not unclear why the
   `always_delete_all_elasticsearch_documents` fixture depended on
   `es6_client`. There was no comment explaining the apparently unused
   dependency. The previous dependency name
   `delete_all_elasticsearch_documents` had made the reason for the
   dependency clear.

2. The `es6_client` fixture was later renamed to `es_client`.

3. A few years later, in 2021, a large PR fixing lots of pylint
   unused-argument warnings removed the `es_client` fixture dependency:
   #6953

So AFAICT the Elasticsearch index has not been cleared between functests
since August 2021.

As long as the database is being cleared, if an annotation from a
previous test still remains in Elasticsearch the search API won't return
it if it's not in the DB, so I think mosts tests should be unaffected.

This is causing a new functest that I have locally to intermittently
fail if I run the entire test class, but reliably succeed if I run just
the one test. I'm not sure exactly why, the test in question is calling
the search API and the annotation being missing from the DB should mean
that the test is unaffected by stray annotations in the search index.
Could be to do with predictable randomness producing annotations with
the same IDs in subsequent test runs, not sure. Adding back the
`es_client` fixture dependency seems to fix it.

It's often said that you should be careful adding code comments and
instead let the code speak for itself (use good names and clear code) as
the code can change but the comment doesn't get updated and becomes
misleading. In this case the naming of the original test fixtures made
the intent clear and no comment was included. The test fixture name was
then changed in a way that obscured the intent of the code. And then
years later a mistake was made. A code comment might have prevented the
mistake (though possibly not, with the giant "fix all unused arguments"
PR).
@seanh seanh requested a review from robertknight June 18, 2025 16:48
@seanh
Copy link
Member Author

seanh commented Jun 18, 2025

This is where the es_client fixture clears the search index after each test:

client.conn.delete_by_query(
index=client.index,
body={"query": {"match_all": {}}},
# This query occasionally fails with a version conflict.
# Forcing the deletion resolves the issue, but the exact
# cause of the version conflict has not been found yet.
conflicts="proceed",
# Add refresh to make deletion changes show up in search results.
refresh=True,
)

@seanh
Copy link
Member Author

seanh commented Jun 18, 2025

Part of what seems to have been going on in the discussion between Jon and Marcos in #6953 is the idea that if a fixture can be removed from a test and the test still passes then the fixture must have been unnecessary. But as can be seen from this example, that's not true: you'd have to also verify that the test still fails if the code is wrong. A fixture dependency could be what makes a test fail when it should fail, and that's not a rare case, it's common.

@robertknight
Copy link
Member

But as can be seen from this example, that's not true: you'd have to also verify that the test still fails if the code is wrong. A fixture dependency could be what makes a test fail when it should fail, and that's not a rare case, it's common.

Right, but this does mean that it is difficult to distinguish necessary fixtures from fixtures that were accidentally included or needed in the past but no longer. Mechanisms to make it easier to reliably identify unused fixtures would be a plus.

seanh added a commit that referenced this pull request Jun 19, 2025
For speed h does not normally clear the DB between functests unless the
test has `@pytest.mark.parametrize("with_clean_db")`, see:
#6845

At the time when this change was made the _search index_ was still being
cleared before each functest which makes no sense: the search API will
not return an annotation if it's not in the search index, even if the
annotation is in the DB. If you aren't going to clear the DB before each
functest, you shouldn't clear the search index either.

A later PR accidentally broke the
`always_delete_all_elasticsearch_documents` fixture so that the search
index wasn't actually being cleared between functests anyway, see:
#9636

This commit:

1. Removes the broken `always_delete_all_elasticsearch_documents`
   fixture: it doesn't do anything, and since we don't clear the DB
   between tests it doesn't make sense to clear the search index either.
2. Replaces the `with_clean_db` fixture with
   `with_clean_db_and_search_index`: it doesn't make sense to clear the
   DB without also clearing the search index.

Longer-term I think we probably want to find a fast way to clear both
the DB and search index before each functional test.
seanh added a commit that referenced this pull request Jun 19, 2025
For speed h does not normally clear the DB between functests unless the
test has `@pytest.mark.parametrize("with_clean_db")`, see:
#6845

At the time when this change was made the _search index_ was still being
cleared before each functest which makes no sense: the search API will
not return an annotation if it's not in the search index, even if the
annotation is in the DB. If you aren't going to clear the DB before each
functest, you shouldn't clear the search index either.

A later PR accidentally broke the
`always_delete_all_elasticsearch_documents` fixture so that the search
index wasn't actually being cleared between functests anyway, see:
#9636

This commit:

1. Removes the broken `always_delete_all_elasticsearch_documents`
   fixture: it doesn't do anything, and since we don't clear the DB
   between tests it doesn't make sense to clear the search index either.
2. Replaces the `with_clean_db` fixture with
   `with_clean_db_and_search_index`: it doesn't make sense to clear the
   DB without also clearing the search index.

Longer-term I think we probably want to find a fast way to clear both
the DB and search index before each functional test.
seanh added a commit that referenced this pull request Jun 19, 2025
For speed h does not normally clear the DB between functests unless the
test has `@pytest.mark.parametrize("with_clean_db")`, see:
#6845

At the time when this change was made the _search index_ was still being
cleared before each functest which makes no sense: the search API will
not return an annotation if it's not in the search index, even if the
annotation is in the DB. If you aren't going to clear the DB before each
functest, you shouldn't clear the search index either.

A later PR accidentally broke the
`always_delete_all_elasticsearch_documents` fixture so that the search
index wasn't actually being cleared between functests anyway, see:
#9636

This commit:

1. Removes the broken `always_delete_all_elasticsearch_documents`
   fixture: it doesn't do anything, and since we don't clear the DB
   between tests it doesn't make sense to clear the search index either.
2. Replaces the `with_clean_db` fixture with
   `with_clean_db_and_search_index`: it doesn't make sense to clear the
   DB without also clearing the search index.

Longer-term I think we probably want to find a fast way to clear both
the DB and search index before each functional test.
seanh added a commit that referenced this pull request Jun 19, 2025
For speed h does not normally clear the DB between functests unless the
test has `@pytest.mark.parametrize("with_clean_db")`, see:
#6845

At the time when this change was made the _search index_ was still being
cleared before each functest which makes no sense: the search API will
not return an annotation if it's not in the search index, even if the
annotation is in the DB. If you aren't going to clear the DB before each
functest, you shouldn't clear the search index either.

A later PR accidentally broke the
`always_delete_all_elasticsearch_documents` fixture so that the search
index wasn't actually being cleared between functests anyway, see:
#9636

This commit:

1. Removes the broken `always_delete_all_elasticsearch_documents`
   fixture: it doesn't do anything, and since we don't clear the DB
   between tests it doesn't make sense to clear the search index either.
2. Replaces the `with_clean_db` fixture with
   `with_clean_db_and_search_index`: it doesn't make sense to clear the
   DB without also clearing the search index.

Longer-term I think we probably want to find a fast way to clear both
the DB and search index before each functional test.
@seanh
Copy link
Member Author

seanh commented Jun 19, 2025

Closing in favour of #9638

@seanh seanh closed this Jun 19, 2025
@seanh seanh deleted the clear-search-index-between-tests branch June 19, 2025 15:41
seanh added a commit that referenced this pull request Jun 19, 2025
For speed h does not normally clear the DB between functests unless the
test has `@pytest.mark.parametrize("with_clean_db")`, see:
#6845

At the time when this change was made the _search index_ was still being
cleared before each functest which makes no sense: the search API will
not return an annotation if it's not in the search index, even if the
annotation is in the DB. If you aren't going to clear the DB before each
functest, you shouldn't clear the search index either.

A later PR accidentally broke the
`always_delete_all_elasticsearch_documents` fixture so that the search
index wasn't actually being cleared between functests anyway, see:
#9636

This commit:

1. Removes the broken `always_delete_all_elasticsearch_documents`
   fixture: it doesn't do anything, and since we don't clear the DB
   between tests it doesn't make sense to clear the search index either.
2. Replaces the `with_clean_db` fixture with
   `with_clean_db_and_search_index`: it doesn't make sense to clear the
   DB without also clearing the search index.

Longer-term I think we probably want to find a fast way to clear both
the DB and search index before each functional test.
seanh added a commit that referenced this pull request Jun 24, 2025
For speed h does not normally clear the DB between functests unless the
test has `@pytest.mark.parametrize("with_clean_db")`, see:
#6845

At the time when this change was made the _search index_ was still being
cleared before each functest which makes no sense: the search API will
not return an annotation if it's not in the search index, even if the
annotation is in the DB. If you aren't going to clear the DB before each
functest, you shouldn't clear the search index either.

A later PR accidentally broke the
`always_delete_all_elasticsearch_documents` fixture so that the search
index wasn't actually being cleared between functests anyway, see:
#9636

This commit:

1. Removes the broken `always_delete_all_elasticsearch_documents`
   fixture: it doesn't do anything, and since we don't clear the DB
   between tests it doesn't make sense to clear the search index either.
2. Replaces the `with_clean_db` fixture with
   `with_clean_db_and_search_index`: it doesn't make sense to clear the
   DB without also clearing the search index.

Longer-term I think we probably want to find a fast way to clear both
the DB and search index before each functional test.
seanh added a commit that referenced this pull request Jun 24, 2025
For speed h does not normally clear the DB between functests unless the
test has `@pytest.mark.parametrize("with_clean_db")`, see:
#6845

At the time when this change was made the _search index_ was still being
cleared before each functest which makes no sense: the search API will
not return an annotation if it's not in the search index, even if the
annotation is in the DB. If you aren't going to clear the DB before each
functest, you shouldn't clear the search index either.

A later PR accidentally broke the
`always_delete_all_elasticsearch_documents` fixture so that the search
index wasn't actually being cleared between functests anyway, see:
#9636

This commit:

1. Removes the broken `always_delete_all_elasticsearch_documents`
   fixture: it doesn't do anything, and since we don't clear the DB
   between tests it doesn't make sense to clear the search index either.
2. Replaces the `with_clean_db` fixture with
   `with_clean_db_and_search_index`: it doesn't make sense to clear the
   DB without also clearing the search index.

Longer-term I think we probably want to find a fast way to clear both
the DB and search index before each functional test.
seanh added a commit that referenced this pull request Jun 25, 2025
For speed h does not normally clear the DB between functests unless the
test has `@pytest.mark.parametrize("with_clean_db")`, see:
#6845

At the time when this change was made the _search index_ was still being
cleared before each functest which makes no sense: the search API will
not return an annotation if it's not in the search index, even if the
annotation is in the DB. If you aren't going to clear the DB before each
functest, you shouldn't clear the search index either.

A later PR accidentally broke the
`always_delete_all_elasticsearch_documents` fixture so that the search
index wasn't actually being cleared between functests anyway, see:
#9636

This commit:

1. Removes the broken `always_delete_all_elasticsearch_documents`
   fixture: it doesn't do anything, and since we don't clear the DB
   between tests it doesn't make sense to clear the search index either.
2. Replaces the `with_clean_db` fixture with
   `with_clean_db_and_search_index`: it doesn't make sense to clear the
   DB without also clearing the search index.

Longer-term I think we probably want to find a fast way to clear both
the DB and search index before each functional test.
seanh added a commit that referenced this pull request Jun 25, 2025
For speed h does not normally clear the DB between functests unless the
test has `@pytest.mark.parametrize("with_clean_db")`, see:
#6845

At the time when this change was made the _search index_ was still being
cleared before each functest which makes no sense: the search API will
not return an annotation if it's not in the search index, even if the
annotation is in the DB. If you aren't going to clear the DB before each
functest, you shouldn't clear the search index either.

A later PR accidentally broke the
`always_delete_all_elasticsearch_documents` fixture so that the search
index wasn't actually being cleared between functests anyway, see:
#9636

This commit:

1. Removes the broken `always_delete_all_elasticsearch_documents`
   fixture: it doesn't do anything, and since we don't clear the DB
   between tests it doesn't make sense to clear the search index either.
2. Replaces the `with_clean_db` fixture with
   `with_clean_db_and_search_index`: it doesn't make sense to clear the
   DB without also clearing the search index.

Longer-term I think we probably want to find a fast way to clear both
the DB and search index before each functional test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants