Make ConfigurationLoader tolerate unknown top-level keys for downstream framework configs

  #### Is your feature request related to a problem? Please describe.                                                                                                          
                                                                                                                                                                               
  `ConfigurationLoader` (`pyrit/setup/configuration_loader.py`) is a                                                                                                           
  strict `@dataclass` and its `from_dict` constructor does                                                                                                                     
  `cls(**filtered_data)`. Any top-level key not in the dataclass's field                                                                                                       
  list raises `TypeError`.                                                                                                                                                     
                                                                                                                                                                               
  Minimal repro:                                                                                                                                                               
                                                                       
  ```python
  from pyrit.setup.configuration_loader import ConfigurationLoader
                                                                                                                                                                               
  ConfigurationLoader.from_dict({
      "memory_db_type": "in_memory",                                                                                                                                           
      "targets": [{"name": "x"}],                                      
  })                                                                                                                                                                           
  # TypeError: ConfigurationLoader.__init__() got an unexpected keyword
  #            argument 'targets'                                                                                                                                              

```
                                                                                                                                                                               
  This makes it awkward for teams building red-teaming frameworks on                                                                                                           
  top of PyRIT to colocate their own config alongside PyRIT's. Common                                                                                                          
  downstream concepts — target definitions with custom auth, scan modes                                                                                                        
  with threshold rules, scenario-to-dataset maps — naturally live next                                                                                                         
  to PyRIT's memory_db_type / initializers / env_files. Today the                                                                                                              
  options are:                                                                                                                                                                 
                                                                                                                                                                               
  1. Keep a separate config file with a custom parser. Works, but                                                                                                              
  fragments the "here is the config entrypoint" story for users who    
  touch both.                                                                                                                                                                  
  2. Subclass ConfigurationLoader and add fields. Works, but every     
  downstream framework ends up with its own loader class that other                                                                                                            
  tooling doesn't recognize.                                                                                                                                                   
  3. Fork PyRIT. Not sustainable.                                                                                                                                              
                                                                                                                                                                               
  If ConfigurationLoader tolerated unknown top-level keys, downstream                                                                                                          
  frameworks could put their config in the same YAML file under their  
  own namespace and users would have a single entrypoint.                                                                                                                      
                                                                                                                                                                               
  #### Describe the solution you'd like
                                                                                                                                                                               
  Two lightweight shapes I'd be happy to implement; I don't have a                                                                                                             
  preference — both solve the problem:
                                                                                                                                                                               
  Option A — passthrough field for unknown keys. Add an                                                                                                                        
  extensions: dict[str, Any] = field(default_factory=dict) field.
  from_dict routes known keys to their existing fields and any                                                                                                                 
  remaining keys into extensions. Downstream framework reads                                                                                                                   
  loader.extensions["targets"] and validates its own sub-schema.
                                                                                                                                                                               
  @dataclass                                                           
  class ConfigurationLoader(YamlLoadable):                                                                                                                                     
      # existing fields...                                             
      extensions: dict[str, Any] = field(default_factory=dict)                                                                                                                 
                                                                                                                                                                               
      @classmethod
      def from_dict(cls, data):                                                                                                                                                
          known = {k: v for k, v in data.items() if k in cls.__dataclass_fields__}
          extras = {k: v for k, v in data.items() if k not in cls.__dataclass_fields__}                                                                                        
          return cls(**known, extensions=extras)
                                                                                                                                                                               
  Option B — opt-in permissive mode. Add a classmethod like            
  from_dict(data, strict: bool = True). When strict=True (default)                                                                                                             
  behavior is unchanged. When strict=False, unknown keys are attached                                                                                                          
  to a generic attribute (e.g. _raw_extras) instead of raising. This
  keeps the default strict and confines the change to opt-in callers.                                                                                                          
                                                                                                                                                                               
  Whatever shape, PyRIT's own fields should continue to be strictly                                                                                                            
  validated — the concern is only about additional top-level keys, not                                                                                                         
  about deep merging or relaxing validation of known fields.                                                                                                                   
                                                                                                                                                                               
  #### Describe alternatives you've considered, if relevant                                                                                                                         
                                                                                                                                                                               
  - Parallel config file with its own parser. Functional but costs                                                                                                             
  users a second config surface.
  - Subclassing ConfigurationLoader. Adds fields but means each                                                                                                                
  downstream framework ships an incompatible loader class; tooling                                                                                                             
  that type-checks against ConfigurationLoader doesn't see the new                                                                                                             
  fields.                                                                                                                                                                      
  - Plugin protocol (ConfigurationLoader.register_extension(...)                                                                                                               
  with a namespace + typed sub-schema). More structured than A/B but                                                                                                           
  a heavier change; probably only worth it if several downstream                                                                                                               
  frameworks ask for this.                                                                                                                                                     
  - Environment variables / CLI flags. Works for flat scalars but                                                                                                              
  most framework config is structured (lists of targets, nested                                                                                                                
  thresholds).                                                                                                                                                                 
                                                                                                                                                                               
  #### Additional context                                                                                                                                                           
                                                                       
  Prior art for this pattern:                                                                                                                                                  
   
  - pyproject.toml [tool.*] tables — every tool claims a namespace.                                                                                                            
  - Kubernetes CRDs and annotations.                                   
  - OpenAPI x-* extension fields.                                                                                                                                              
                                                                                                                                                                               
  All solve the same shape via namespaced passthrough while keeping                                                                                                            
  the core schema strict.                                                                                                                                                      
                                                                                                                                                                               
  Happy to draft the PR once there's agreement on Option A vs. B (or a                                                                                                         
  different shape the maintainers prefer).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make ConfigurationLoader tolerate unknown top-level keys for downstream framework configs #1652

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered, if relevant

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make ConfigurationLoader tolerate unknown top-level keys for downstream framework configs #1652

Description

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered, if relevant

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions