Skip to content

Make ConfigurationLoader tolerate unknown top-level keys for downstream framework configs #1652

@thirteeneight

Description

@thirteeneight

Is your feature request related to a problem? Please describe.

ConfigurationLoader (pyrit/setup/configuration_loader.py) is a
strict @dataclass and its from_dict constructor does
cls(**filtered_data). Any top-level key not in the dataclass's field
list raises TypeError.

Minimal repro:

from pyrit.setup.configuration_loader import ConfigurationLoader
                                                                                                                                                                             
ConfigurationLoader.from_dict({
    "memory_db_type": "in_memory",                                                                                                                                           
    "targets": [{"name": "x"}],                                      
})                                                                                                                                                                           
# TypeError: ConfigurationLoader.__init__() got an unexpected keyword
#            argument 'targets'                                                                                                                                              

This makes it awkward for teams building red-teaming frameworks on
top of PyRIT to colocate their own config alongside PyRIT's. Common
downstream concepts — target definitions with custom auth, scan modes
with threshold rules, scenario-to-dataset maps — naturally live next
to PyRIT's memory_db_type / initializers / env_files. Today the
options are:

  1. Keep a separate config file with a custom parser. Works, but
    fragments the "here is the config entrypoint" story for users who
    touch both.
  2. Subclass ConfigurationLoader and add fields. Works, but every
    downstream framework ends up with its own loader class that other
    tooling doesn't recognize.
  3. Fork PyRIT. Not sustainable.

If ConfigurationLoader tolerated unknown top-level keys, downstream
frameworks could put their config in the same YAML file under their
own namespace and users would have a single entrypoint.

Describe the solution you'd like

Two lightweight shapes I'd be happy to implement; I don't have a
preference — both solve the problem:

Option A — passthrough field for unknown keys. Add an
extensions: dict[str, Any] = field(default_factory=dict) field.
from_dict routes known keys to their existing fields and any
remaining keys into extensions. Downstream framework reads
loader.extensions["targets"] and validates its own sub-schema.

@DataClass
class ConfigurationLoader(YamlLoadable):
# existing fields...
extensions: dict[str, Any] = field(default_factory=dict)

  @classmethod
  def from_dict(cls, data):                                                                                                                                                
      known = {k: v for k, v in data.items() if k in cls.__dataclass_fields__}
      extras = {k: v for k, v in data.items() if k not in cls.__dataclass_fields__}                                                                                        
      return cls(**known, extensions=extras)

Option B — opt-in permissive mode. Add a classmethod like
from_dict(data, strict: bool = True). When strict=True (default)
behavior is unchanged. When strict=False, unknown keys are attached
to a generic attribute (e.g. _raw_extras) instead of raising. This
keeps the default strict and confines the change to opt-in callers.

Whatever shape, PyRIT's own fields should continue to be strictly
validated — the concern is only about additional top-level keys, not
about deep merging or relaxing validation of known fields.

Describe alternatives you've considered, if relevant

  • Parallel config file with its own parser. Functional but costs
    users a second config surface.
  • Subclassing ConfigurationLoader. Adds fields but means each
    downstream framework ships an incompatible loader class; tooling
    that type-checks against ConfigurationLoader doesn't see the new
    fields.
  • Plugin protocol (ConfigurationLoader.register_extension(...)
    with a namespace + typed sub-schema). More structured than A/B but
    a heavier change; probably only worth it if several downstream
    frameworks ask for this.
  • Environment variables / CLI flags. Works for flat scalars but
    most framework config is structured (lists of targets, nested
    thresholds).

Additional context

Prior art for this pattern:

  • pyproject.toml [tool.*] tables — every tool claims a namespace.
  • Kubernetes CRDs and annotations.
  • OpenAPI x-* extension fields.

All solve the same shape via namespaced passthrough while keeping
the core schema strict.

Happy to draft the PR once there's agreement on Option A vs. B (or a
different shape the maintainers prefer).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions