-
Notifications
You must be signed in to change notification settings - Fork 298
Description
I'm capturing a conversation that @roystgnr and I had a while ago on slack for the public eye and also to make it easier for me to find it again in the future
Alexander Lindsay
For doing more fancy preconditioning of condensed systems, I think I want to create a condensed DofMap. As I have it now, the StaticCondensation is a preconditioner that gets init() at the beginning of the first solve(). At this time I would like to construct a LinearImplicitSystem inside of the StaticCondensation which would have the DofMap corresponding to the condensed system. Are there any issues with setting dofs on DofObjects this late in the game?
Roy Stogner
Sorry I missed this. If by setting dof indices on DofObjects you mean overwriting the existing indices, I'd worry about that trashing existing code. It might be a good idea to add an additional "system" (really just a system number) and put the condensed dof indexing on that, though.
Alexander Lindsay
Yes I'm talking about adding a new system (representing just the condensed dofs) to EquationSystems within the StaticCondensation::init . It sounds like this should be fine then.
I will leverage the entire System/DofMap structure because it's needed for the PETSc DM
Roy Stogner
It's fine from the DofObject perspective. We may need some kind of stub of a System to make sure the EquationSystems doesn't trash things or die when a reinit() occurs.
Alexander Lindsay
(Trying to do field splits within the condensed system)
Roy Stogner
Ah, that makes sense.
Alexander Lindsay
I'll be doing the full blown _equation_systems.add_system<LinearImplicitSystem>
Alexander Lindsay
the trickiest part is getting the dofs I don't want excluded from DofMap::reinit for the condensed system, e.g. interior dofs for > 1 order H1, but I should be able to do that fine if you'll let me do some if (_sc) branching 🙂
Roy Stogner
DofMap reinit is already pretty heavyweight, so a few branches shouldn't make a big difference, but I'd imagined you wouldn't want to hit it at all, that for your condensed system you'd want your own dofobject writing.
Alexander Lindsay
I could do that ... in that case would I just never call DofMap::distribute_dofs? DofMap::dof_indices should still work fine...?
Roy Stogner
All distribute_dofs does is write to the DofObject data. The trouble is dof_indices - it relies pretty heavily on FEInterface::*(fe_type, etc).
Alexander Lindsay
Hmm you're right, I definitely want to avoid things that rely on FEInterface. Luckily it looks like PetscDMWrapper goes through DofObject for most things, so I wouldn't have to modify much there.
I've got something wild for you ... I think it would be beneficial to have a DofMap for which calls to things like n_local_dofs() returns the correct number ... but I don't want the stuff in DofMap relying on FEInterface ... what I made a derived DofMap class and made a couple virtual methods?
My StaticCondensation class itself could fulfill a lot of the DofMap APIs.
Conceptually it would almost make the most sense to construct this condensed ImplicitSystem with my StaticCondensation DofMap.
But system's own dof maps, not the other way around.
Roy Stogner
I think it would have to depend on which methods were virtual? If it was just things like n_local_dofs() which are never going to be called from an inner kernel then that's a great idea. But it's going to have to be dof_indices() too, surely, which worries me a bit.
The combination of "this is a ton of work" with "this may be so invasive that we can't merge it" terrifies me.
Alexander Lindsay
I'm ready for an element call to dof_indices
You're worried about the virtual dispatch of dof_indices?
Roy Stogner
Just worried about things snowballing in general. Adding more interdependencies was the first code smell that scared me, even though I can't see a way around it. It always seems like a good idea that makes things easier when you do it, and it never seems to be possible to predict how many times you'll run into "oh, but now I can't do X because it would break Y" afterwards.
Alexander Lindsay
you wait and see, I don't think it will be too bad 🙂
Roy Stogner
Fingers crossed. I think the DofMap refactor scared me more because it reminded me what a tightrope we walk with pitfalls in both directions - OOP is a great way to break interdependencies and get cleaner code, but go too far that way and we end up with runtime overhead we can never get rid of. I love the Elem class hierarchy in theory, but in practice I often wonder how much of a speedup we could get if we chunked our elements by type when iterating and turned a whole bunch of virtual function calls into a handful of integers sitting in L1 cache. Although your initial benchmarks after futzing with dof_indices are promising.
I just wish I'd come up with a test case at the time to mimic the kind of workload that Relap-7 was giving us many years ago. I got huge speedups there from the dof_indices refactoring, and I'm still not certain whether we're avoiding regressing or whether we just don't have anything that hits that code path that hard. Maybe I could come up with something now ... IIRC it was a ton of lowest-order-FE variables on meshes that were networks of a ton of Edge2 elements?
Alexander Lindsay
It would definitely be nice if we had more systematic performance regression testing in the MOOSE herd
Roy Stogner
Could be as simple as adding a MOOSE CI equivalent of the LIBMESH_BENCHMARK setting. Hard to make retroactive but at least having a defined subset of enlarged tests gives a bit of protection for the future.
Alexander Lindsay
How often do you run those in CI?
Roy Stogner
In CI? Never. Manually, every several months. Getting them into even infrequent CI would be a big plus.
Alexander Lindsay
Alright, here's a different idea which I think I like the best. This would also eliminate the need for adding a LinearSolver build option with a System ... I think I'll add a templated PetscDMWrapper init* method that's like:
template <typename SysLikeType, DofMapLikeType>
init(SysLikeType & sys_like, DofMapLikeType & dof_map_like);
and I will pass StaticCondensation in for both of these arguments and have the methods implemented that the PetscDMWrapper needs. What do you think of that? No derived DofMap class. The snowball stays compacted in StaticCondensation (pretty much?)
Roy Stogner
If you can get that to work that sounds great.
And I'm not seeing any obvious reason why that wouldn't work.
My inclination would be to use an actual abstract base class underneath System and DofMap (as well as under your new shim and new alternative) rather than the template arguments, but only because I suspect it would be a little less hassle down the road; if you find the template option easier than go for it.
Alexander Lindsay
I really like your last message. That would reintroduce virtual dof_indices() though. Which honestly feels like the right technical choice, but now you have me scared of performance hits, so tonight I've had all sorts of thoughts race through my head including CRTP ... which would really be a snowball so definitely not the right technical choice.
I did realize that whatever solution I pick to work with PetscDMWrapper I'll have to do the same for MOOSE's DM.
So an abstract base class definitely would be the most elegant solution as opposed to do doing templating in who knows how many places
Roy Stogner
IMHO the right technical choice is "don't try to make the right technical choice by guessing". We might actually want CRTP in the end, but it's a giant headache (the similar trickery in libMesh::FE cost me months of confusion when I was adding my first code to libMesh long long ago...), and if we need it then it's not ridiculous to move to it from a simple virtual inheritance tree, so maybe the thing to do is start with virtual dof_indices() and only get uglier if benchmarking shows we have to.
And there are probably less ugly ways to get uglier. One thing that was a surprisingly significant optimization for us a while back was the trick of creating a "virtual" (nested switch statements in FEInterface, but conceptually the same expensive dispatch) function that returns a function pointer, so we could call that outside inner loops and use the function pointer inside the loops. And the more I think about that, the more I think I'm being silly to worry about making dof_indices virtual - it's never in an inner loop by definition, it's got more-inner loops inside it, and if it wasn't until Relap-7 that those inner-inner loops showed up in profiling then the not-quite-inner loop is probably safe.
Also I bet that those nested switch statements are more expensive than a C++ vtable. At least a little bit more expensive, anyway; they both hit the same "you can no longer inline this" problem that can really kill virtual function calls in inner loops.