Skip to content

[R] Recipe for random sampling #83

@gshotwell

Description

@gshotwell

It would be great if there were a way to sample from an arrow dataset. I put together this somewhat hacky example, but I bet there's some thing a bit more elegant..

library(arrow)
library(dplyr)
library(nycflights13)

flights <- nycflights13::flights

flights$id <- seq_len(nrow(flights))

for(i in unique(flights$month)) {
  out <- filter(flights, month == i)
  arrow::write_parquet(out, paste0("flight_ds/", i, ".parquet"))
}

ds <- arrow::open_dataset("flight_ds")

sample <- sample(flights$id, 100)

ds %>% 
  filter(id %in% sample) %>% 
  collect()

Metadata

Metadata

Assignees

No one assigned

    Labels

    rThis issue is specific to the R language cookbook

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions