Tidyup 7: Recoding and replacing values in the tidyverse #29

DavisVaughan · 2025-07-30T13:59:07Z

Easy to read link:
https://github.com/tidyverse/tidyups/blob/feature/007/007-tidyverse-recoding-and-replacing.md

We’d love to get your thoughts on this proposal to add new column recoding and replacing tools to dplyr. The goal is to fill some important gaps left by case_when() and case_match() by creating a slightly larger family of interconnected functions. Specifically, we wish to improve on:

Recoding columns, both interactively and programmatically (i.e. with a pre computed lookup table, like plyr::mapvalues())
- Existing case_when()
- New recode_values()
Replacing a few values within an existing column. In particular by providing obviously named, easy to use, and type stable tools for doing so, which function as enhanced forms of [<- and base::replace().
- New replace_when()
- New replace_values()

Please feel free to contribute however you feel comfortable — you're welcome to comment here on individual lines of the tidyup, or open bigger discussion topics in an new issue. If there are things you’d prefer to discuss in private, please feel free to email me. I’ll plan to close the discussion on Aug 18 and we will advance to the implementation stage.

higgi13425 · 2025-08-04T21:17:53Z

recode_values is the boss for Likert scale responses. So great for a 100 question questionnaire with 5 item Likerts for every Q.
Rensis would approve. https://en.wikipedia.org/wiki/Rensis_Likert
Can you purrr across 100Q in a questionnaire to do this efficiently?

JoFrhwld · 2025-08-05T01:08:42Z

I'm not sure if this is intended, but it's currently not possible to change the data type with replace_when()

penguins |> 
  mutate(
    size = body_mass |> 
    replace_when(
      body_mass > 4750 ~ "large",
      body_mass > 3550 ~ "medium",
      body_mass > 0 ~ "small"
    )
  )

#> Error in `mutate()`:
#> ℹ In argument: `size = replace_when(...)`.
#> Caused by error in `replace_when()`:
#> ! Can't convert `..1 (right)` <character> to <integer>.
#> Run `rlang::last_trace()` to see where the error occurred.

It also looks like if we wanted to use replace_when() as a sequence of if-else logic, we need to go back to how case_when() used to work.

penguins |> 
  mutate(
    size = body_mass |> 
    replace_when(
      body_mass > 4750 ~ 3,
      body_mass > 3550 ~ 2,
      TRUE ~ 1
    )
  )

The proposal doesn't say that replace_when() is meant to supersede case_when(), so would these be use cases where it would be recommended to use case_when() instead?

DavisVaughan · 2025-08-05T01:54:56Z

@JoFrhwld to be extremely clear, case_when() is not going anywhere and is not being superseded.

These 3 functions join case_when() to round out the family, they do not replace it. The intro paragraph above shows how case_when() and recode_values() are on the "recode" side of things, and replace_when() and replace_values() are on the "replace" side of things.

it's currently not possible to change the data type with replace_when()

And that's exactly the point! replace_when() is type safe. If you want to update a few values in a column using a condition from another column, but you want to guarantee that the type of that column doesn't change out from under you, you use replace_when(). case_when() does not have this safety (and can't, it's meant for creating new columns, not updating existing ones)

See https://github.com/tidyverse/tidyups/blob/feature/007/007-tidyverse-recoding-and-replacing.md#type-stability

EmilHvitfeldt · 2025-08-05T02:26:01Z

I'm sure you have thought about it, but i didn't see it explicitly stated. I'm going to assume that if there are duplicate values in from that the first one takes precedence. Is that correctly assumed?

DavisVaughan · 2025-08-05T10:51:50Z

@EmilHvitfeldt yep, same idea as case_when() where "first wins". Will be in the official docs for sure.

dplyr::replace_values(1, from = c(1, 1), to = c(2, 3))
#> [1] 2

dplyr::replace_values(1, 1 ~ 2, 1 ~ 3)
#> [1] 2

^{Created on 2025-08-05 with reprex v2.1.1}

RichardPatterson · 2025-08-11T15:37:20Z

What is the expected use with factors? If the lookup tbl contains a factor in the to column will the levels be passed onto the the recoded variable?

What is the relationship with fct_recode() from forcats.

This all looks really great btw.

DavisVaughan force-pushed the feature/007 branch from 1d08e4f to 2ee1675 Compare July 30, 2025 14:09

Draft tidyup 007

5347976

DavisVaughan force-pushed the feature/007 branch from 2ee1675 to 5347976 Compare July 30, 2025 18:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tidyup 7: Recoding and replacing values in the tidyverse #29

Tidyup 7: Recoding and replacing values in the tidyverse #29

Uh oh!

DavisVaughan commented Jul 30, 2025 •

edited

Loading

Uh oh!

higgi13425 commented Aug 4, 2025 •

edited

Loading

Uh oh!

JoFrhwld commented Aug 5, 2025

Uh oh!

DavisVaughan commented Aug 5, 2025

Uh oh!

EmilHvitfeldt commented Aug 5, 2025

Uh oh!

DavisVaughan commented Aug 5, 2025 •

edited

Loading

Uh oh!

RichardPatterson commented Aug 11, 2025

Uh oh!

Uh oh!

Tidyup 7: Recoding and replacing values in the tidyverse #29

Are you sure you want to change the base?

Tidyup 7: Recoding and replacing values in the tidyverse #29

Uh oh!

Conversation

DavisVaughan commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

higgi13425 commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JoFrhwld commented Aug 5, 2025

Uh oh!

DavisVaughan commented Aug 5, 2025

Uh oh!

EmilHvitfeldt commented Aug 5, 2025

Uh oh!

DavisVaughan commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RichardPatterson commented Aug 11, 2025

Uh oh!

Uh oh!

DavisVaughan commented Jul 30, 2025 •

edited

Loading

higgi13425 commented Aug 4, 2025 •

edited

Loading

DavisVaughan commented Aug 5, 2025 •

edited

Loading