Currently, cub::detail::segmented_scan::agent_segmented_scan processes one segment per block.
This approach does not perform well when segments are short.
This issue is to support processing statically requested number of segments per block though using Schwarz reformulation of segmented scan as an ordinary scan over augmented type and using augmented scan operator.
Having such a support would allow DeviceSegmentedScan to process small size segments more efficiently.