Skip to content

Possible exit-time cache/config teardown bug reported by automated tooling #682

@Carmel0

Description

@Carmel0

Summary

An automated testing workflow reported a possible bug in Aspell related to
exit-time cache/config teardown.

This is not yet a confirmed root-cause analysis. The current status is:

  • automated tooling first reported suspicious behavior
  • the issue can also be reproduced on the plain aspell program under Helgrind
  • the exact root cause is still a hypothesis and should be confirmed by maintainers

At the moment, this should be treated as an unconfirmed suspected bug report
with reproducible symptoms.

How it was first detected

The issue was first noticed by automated tooling during a fuzzing / sanitizer-based
workflow.

Observed automated signals included:

  • a DMSAN-based run reporting teardown-time failure
  • Helgrind later reporting invalid mutex usage
  • the suspicious behavior appearing during program exit / teardown rather than in
    the main parsing path

Because sanitizer-assisted workflows can sometimes change manifestation, the report
was then checked again on a plain non-sanitized, non-AFL-instrumented program.

Plain-program reproduction

The plain Aspell binary itself can reproduce suspicious teardown behavior under
Helgrind.

A simple path such as this did not reproduce the issue:

printf "" | valgrind --tool=helgrind --error-exitcode=99 ./aspell list

But this path did reproduce it in my local testing:

printf "# hello\n\nhelllo\n" | \
  valgrind --tool=helgrind --error-exitcode=99 \
  ./aspell --mode=markdown list

What the tool reports

On the plain program, Helgrind reports invalid mutex operations during exit-time
teardown, for example:

Thread #1's call to pthread_mutex_lock failed
with error code 22 (EINVAL: Invalid argument)
...
by acommon::GlobalCacheBase::release(acommon::Cacheable*) (cache.cpp:39)
...
by ~ModeNotifierImpl (new_fmode.cpp:128)
by acommon::Config::del() (config.cpp:184)
by acommon::Config::~Config() (config.cpp:100)
by __run_exit_handlers

It also reports:

Thread #1's call to pthread_mutex_unlock failed
with error code 22 (EINVAL: Invalid argument)

In another automated reproduction path, Helgrind also reported invalid mutex use
from:

acommon::GlobalCacheBase::~GlobalCacheBase()

So the currently known symptom is:

  • invalid mutex lock/unlock during exit-time teardown
  • involving cache/config-related teardown paths
  • reproducible at least on one plain-program path under Helgrind

Current interpretation

This may indicate a real bug in cache/config teardown ordering or object
lifetime management during process exit.

A likely direction is:

  • some cache/config-related object is still performing lock/unlock operations
    during teardown
  • but the underlying mutex or related state is already no longer valid

However, this is only a working hypothesis, not yet a confirmed root cause.

Why this is being reported

This report is intentionally framed as an unconfirmed possible bug so that
maintainers or other investigators can verify it independently.

The useful facts at this point are:

  • automated tooling reported suspicious exit-time behavior
  • the plain program can reproduce related invalid mutex usage under Helgrind
  • the problem appears to be teardown-related rather than a straightforward
    parser crash
  • further confirmation is needed before making strong claims about the exact cause

Suggested next confirmation steps

  1. Rebuild plain Aspell from current master.

  2. Run:

    printf "# hello\n\nhelllo\n" |
    valgrind --tool=helgrind --error-exitcode=99
    ./aspell --mode=markdown list

  3. Check whether invalid pthread_mutex_lock / pthread_mutex_unlock with
    EINVAL is reported during exit.

  4. If desired, compare with automated-tool reproductions that involve
    sanitizer-assisted verification.

How the local patch was applied

For local confirmation, I tested a very small patch in the Aspell source tree.

The idea was to avoid destroying global_cache_lock during process teardown.

1. Change the declaration

In common/cache-t.hpp:

-  static Mutex global_cache_lock;
+  static Mutex & global_cache_lock;

2. Change the definition

In common/cache.cpp:

static GlobalCacheBase * first_cache = 0;

static Mutex & persistent_global_cache_lock()
{
  static Mutex * lock = new Mutex();
  return *lock;
}

Mutex & GlobalCacheBase::global_cache_lock = persistent_global_cache_lock();

instead of:

static GlobalCacheBase * first_cache = 0;
Mutex GlobalCacheBase::global_cache_lock;

3. Why this was done

The local test patch makes global_cache_lock effectively process-lifetime
persistent, so it is not destroyed before other global/static cache-related
objects finish their own teardown.

This was only used as a confirmation patch for investigation. It shows that the
reported invalid mutex behavior is likely tied to teardown lifetime / destruction
ordering

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions