Description
Summary
stat_bin
currently lacks the after_stat(prop)
functionality that stat_count
provides, making it difficult to create proportion-based visualizations for continuous data. This feature request proposes adding a bin_prop
computed variable to stat_bin
to achieve feature parity.
Problem Description
Currently, users can create proportion-based bar charts with discrete data using stat_count
:
# This works with discrete data
ggplot(data, aes(x = discrete_var, y = after_stat(prop), fill = group)) +
geom_bar(position = "dodge")
However, there's no equivalent for continuous data with stat_bin
:
# This doesn't work - no prop variable available
ggplot(data, aes(x = continuous_var, y = after_stat(prop), fill = group)) +
geom_histogram(position = "dodge", bins = 10)
Use Case Example
Consider analyzing weight distribution by sex. Users want to see the proportion of each sex within weight bins:
# Desired functionality (currently not possible)
ggplot(people_data, aes(x = weight, y = after_stat(bin_prop), fill = sex)) +
stat_bin(geom = "col", bins = 8, position = "dodge") +
scale_y_continuous(labels = scales::percent) +
labs(y = "Proportion within bin")
This would show insights such as:
- Lower weight bins: ~100% female
- Middle weight bins: Mixed proportions
- Higher weight bins: ~100% male
Proposed Solution
Add a bin_prop
computed variable to stat_bin
that calculates the proportion of each group within each bin:
bin_prop = count_in_group / total_count_in_bin
- Handles multiple groups and respects weights
- For single groups:
bin_prop = 1
(backwards compatible) - For empty bins:
bin_prop = 0
Benefits
- Feature parity with
stat_count
- Enables proportion-based histograms for continuous data
- Useful for demographic analysis and group comparisons
- Backwards compatible - doesn't break existing code
Alternatives Considered
- Manual calculation: Users could manually calculate proportions, but this is cumbersome and error-prone
- Using stat_count with discretized data: Loses the benefits of proper binning algorithms
- Custom stat function: Would require users to write their own implementation
Expected API
# Documentation would include:
#' @eval rd_computed_vars(
#' count = "number of points in bin.",
#' density = "density of points in bin, scaled to integrate to 1.",
#' ncount = "count, scaled to a maximum of 1.",
#' ndensity = "density, scaled to a maximum of 1.",
#' width = "widths of bins.",
#' bin_prop = "proportion of points in bin that belong to each group."
#' )
This would enable the intuitive usage:
aes(y = after_stat(bin_prop))
Additional Context
This feature would be particularly valuable for:
- Demographic analysis (age/income by group)
- Scientific data (measurements by treatment group)
- Market research (customer segments by behavior)
- Any scenario where you want to show group composition within continuous ranges
The implementation should handle edge cases like empty bins, single groups, and weighted data appropriately.