I tried running the TPCh benchmarks in https://github.com/coiled/benchmarks and noticed that dask-expr was much slower than before (both with "pandas" and "cudf" as the backend). After looking at a performance report, it was clear that the slow down was not in compute. I confirmed that the regression was coming from optimize() (a change from ~1s to >10s for query 1).
I did a cursory bisection, and found that the regression was definitely introduced in #395
I haven't had time to figure out why 395 is slowing things down yet.