Skip to content

Alternative interface to produce many events with varying initial states #188

@HDembinski

Description

@HDembinski

This is an idea that arose out of a lunch discussion with @mohller today, who is using Chromo in CRPROPA, and from the discussion around #174, and it is also related to #65

The current Chromo API is designed around the use-case where you generate lots of a events with a fixed inital state (same particles at same ecm). However, there are also use cases where the initial state varies a lot, for example, when you simulate interactions in an air shower:

for beam_energy, beam_particle, target_particle in some_state_stack:
     model = Sibyll23d(FixedTarget(beam_energy, beam_particle, target_particle))  # A
     event = next(model(1)): # B
     # process event, C

Running Chromo in this use-case is still efficient if the time spend in the model the generate and process one event (B, C) is large compared to the setup time of the model (A). This likely won't be the case, however, if beam_energy is small.

We could technically speed up Chromo for this use-case with an alternative API that pushes this for loop into C or Fortran. It is not a simple change, but it is technically possible.

The call would then look like this:

events = Sibyll23d.process(some_state_stack)

where some_state_stack is a numpy array representating a table with the input state beam_energy, pid1, pid2 (we could add a flag to also support inputs in the center-of-mass frame), and events is a awkward array that contains the event data for each input state. We need the awkward array data structure here because we have a variable number of outgoing particles per input particle and we want to keep the association between input and output particles. Awkward arrays are able to represent such a mapping, while normal numpy arrays cannot.

The potential speed-up here is two-fold.

  1. We are pushing the for loop in the first code example to fast Fortran code. This means we don't pay a Python call overhead each time that the model setup is changed.
  2. We provide the output in form of an efficient data structure that allows us to pass all events at once to next processing step. Awkward arrays are arbitrarily flexible, but specialise to a very simple data structure in this case in which the events are essentially packed into a contiguous table, while a second internal array tracks the indices when one event ends and the next event starts. This data structure is still almost a numpy array and can be processed very efficiently, for example in numba. The simpler alternative of returning a Python list of numpy arrays would be inefficient, because the memory is then not contiguous.

CC @jpivarski

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions