Skip to content

Bug: sparse (g,b) modeling requires hard-coded workaround #167

@pesap

Description

@pesap

Summary

When modeling nodal allocation with sparse (generator,bus) connectivity, Arco currently pushes users toward hard-coded constraints and hard-coded per-generator bus sets.

This is causing brittle model code and blocks reusable formulations.

Current behavior

Given sparse data (only some (g,b) rows exist in dist.csv):

  • if {distance[g,b]} in a generated constraint fails with:
    • unsupported parameter reference 'distance'
  • Replacing with distance_km[g,b] leads to:
    • missing required data point distance_km for key ...
  • Trying selector/filter variants to express pair intersection is not sufficient in current behavior for this case.

Expected behavior

Users should be able to model sparse pair intersections without hard-coding per-generator sets or duplicating constraints per (area, tech) literal.


Minimal example (desired style)

data gen_sites from="gen_sites.csv" {
  set generators alias=g
  set area alias=a
  set tech alias=i
  param cost_spur_usd_per_km_mw {index generators}
}

data bus_sites from="bus_sites.csv" {
  set buses alias=b
}

data distance from="dist.csv" {
  index generators buses
  param distance_km {index generators; index buses}
}

data target from="target.csv" {
  param mw_target {index area; index tech}
}

model M {
  control x lower=0 {index generators; index buses}

  // This should work over sparse connectivity
  constraint capacity_target {
    index a { in area }
    index i { in tech }
    expression {
      sum(x[g,b] for g in generators[area=a tech=i] for b in buses if distance_km[g,b]) >= mw_target[a,i]
    }
  }
}

Simple workaround currently required (hard-coded)

// generated from dist.csv outside the model
set buses_for_g1 { "b1"; "b2" }
set buses_for_g2 { "b2" }

constraint capacity_target_a1_solar {
  expression {
    sum(x["g1",b] for b in buses_for_g1)
    + sum(x["g2",b] for b in buses_for_g2)
    >= 500
  }
}

minimize TotalCost {
  sum((distance_km["g1",b] * cost_spur_usd_per_km_mw["g1"]) * x["g1",b] for b in buses_for_g1)
  +
  sum((distance_km["g2",b] * cost_spur_usd_per_km_mw["g2"]) * x["g2",b] for b in buses_for_g2)
}

This works, but scales poorly and is not maintainable.


Requirements to remove hard-coding

  1. First-class sparse pair iteration domain (iterate existing (g,b) rows directly).
  2. Safe existence semantics in filters/conditions for sparse lookups.
  3. Dynamic intersection filtering with bound loop vars (a, i, g, b).
  4. Early index-signature validation (param declared indices vs use-site indices).
  5. No panics on invalid selector syntax; always structured diagnostics.

Nice-to-have:

  • Better diagnostics that suggest sparse-domain iteration when cartesian+missing keys are detected.
  • Canonical sparse network example in docs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions