Skip to content

jrycw/turtle-island

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Tests codecov Documentation License

🐒 Turtle Island

Turtle Island is a lightweight utility library that provides helper functions to reduce boilerplate when writing Polars expressions. It aims to simplify common expression patterns and improve developer productivity when working with the Polars API.

⚠️ Disclaimer: This project is in early development. The API is still evolving and may change without notice. Use with caution in production environments.

πŸš€ Installation

Turtle Island is not yet published on PyPI. The recommended way to install it is using uv add:

uv add git+https://github.com/jrycw/turtle-island.git

πŸ“¦ Recommended Import

To keep your code clean and idiomatic, it's recommended to import Turtle Island as a top-level module:

import turtle_island as ti

βš™οΈ Core Spirit

The core spirit of Turtle Island is to embrace expressions over columns.

When wrangling data, it's common to create temporary helper columns as part of the transformation process. However, many of these columns are just intermediate artifacts β€” not part of the final output we actually want. They exist solely to assist with intermediate steps.

Polars offers a powerful distinction between contexts and expressions, allowing us to focus on expression-based transformations without needing to materialize every intermediate result as a column. Turtle Island builds on this principle, encouraging users to rely more on expressions β€” flexible, composable, and context-aware β€” rather than temporary columns.

✨ Selected Functions

case_when()

A more ergonomic way to write chained when-then-otherwise logic in Polars:

df = pl.DataFrame({"x": [1, 2, 3, 4]})

expr_ti = ti.case_when(
    case_list=[(pl.col("x") < 2, pl.lit("small")),
               (pl.col("x") < 4, pl.lit("medium"))],
    otherwise=pl.lit("large"),
).alias("size_ti")

expr_pl = (
    pl.when(pl.col("x") < 2)
    .then(pl.lit("small"))
    .when(pl.col("x") < 4)
    .then(pl.lit("medium"))
    .otherwise(pl.lit("large"))
    .alias("size_pl")
)

df.with_columns(expr_ti, expr_pl)
shape: (4, 3)
β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ x   ┆ size_ti ┆ size_pl β”‚
β”‚ --- ┆ ---     ┆ ---     β”‚
β”‚ i64 ┆ str     ┆ str     β”‚
β•žβ•β•β•β•β•β•ͺ═════════β•ͺ═════════║
β”‚ 1   ┆ small   ┆ small   β”‚
β”‚ 2   ┆ medium  ┆ medium  β”‚
β”‚ 3   ┆ medium  ┆ medium  β”‚
β”‚ 4   ┆ large   ┆ large   β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

make_index()

Adds a sequential index column to the DataFrame:

df = pl.DataFrame({"a": [1, 3, 5], "b": [2, 4, 6]})
df.select(ti.make_index(), pl.all())
shape: (3, 3)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”
β”‚ index ┆ a   ┆ b   β”‚
β”‚ ---   ┆ --- ┆ --- β”‚
β”‚ u32   ┆ i64 ┆ i64 β”‚
β•žβ•β•β•β•β•β•β•β•ͺ═════β•ͺ═════║
β”‚ 0     ┆ 1   ┆ 2   β”‚
β”‚ 1     ┆ 3   ┆ 4   β”‚
β”‚ 2     ┆ 5   ┆ 6   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜

bucketize()

Assign values to rows in a round-robin pattern using Polars expressions:

df = pl.DataFrame({"x": [1, 2, 3, 4, 5]})
df.with_columns(
    ti.bucketize(pl.col("x"), pl.col("x").add(100)).alias("bucketized")
)
shape: (5, 2)
β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ x   ┆ bucketized β”‚
β”‚ --- ┆ ---        β”‚
β”‚ i64 ┆ i64        β”‚
β•žβ•β•β•β•β•β•ͺ════════════║
β”‚ 1   ┆ 1          β”‚
β”‚ 2   ┆ 102        β”‚
β”‚ 3   ┆ 3          β”‚
β”‚ 4   ┆ 104        β”‚
β”‚ 5   ┆ 5          β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

bucketize_lit()

Assign values to rows in a round-robin pattern using literal values:

df = pl.DataFrame({"x": [1, 2, 3, 4, 5]})
df.with_columns(ti.bucketize_lit(True, False).alias("bucketized"))
shape: (5, 2)
β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ x   ┆ bucketized β”‚
β”‚ --- ┆ ---        β”‚
β”‚ i64 ┆ bool       β”‚
β•žβ•β•β•β•β•β•ͺ════════════║
β”‚ 1   ┆ true       β”‚
β”‚ 2   ┆ false      β”‚
β”‚ 3   ┆ true       β”‚
β”‚ 4   ┆ false      β”‚
β”‚ 5   ┆ true       β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

is_every_nth_row()

Mark every second row:

df = pl.DataFrame({"x": [1, 2, 3, 4, 5]})
df.with_columns(ti.is_every_nth_row(2))
shape: (5, 2)
β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ x   ┆ bool_nth_row β”‚
β”‚ --- ┆ ---          β”‚
β”‚ i64 ┆ bool         β”‚
β•žβ•β•β•β•β•β•ͺ══════════════║
β”‚ 1   ┆ true         β”‚
β”‚ 2   ┆ false        β”‚
β”‚ 3   ┆ true         β”‚
β”‚ 4   ┆ false        β”‚
β”‚ 5   ┆ true         β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

To invert the result:

df.with_columns(~ti.is_every_nth_row(2))
shape: (5, 2)
β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ x   ┆ bool_nth_row β”‚
β”‚ --- ┆ ---          β”‚
β”‚ i64 ┆ bool         β”‚
β•žβ•β•β•β•β•β•ͺ══════════════║
β”‚ 1   ┆ false        β”‚
β”‚ 2   ┆ true         β”‚
β”‚ 3   ┆ false        β”‚
β”‚ 4   ┆ true         β”‚
β”‚ 5   ┆ false        β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

move_cols_to_start()

Reorder columns so that selected columns appear first:

df = pl.DataFrame({"a": [1, 2, 3], "b": ["x", "y", "z"], "c": [4.4, 5.5, 6.6]})
df.select(ti.move_cols_to_start(["b", "c"]))

Or by data type:

df.select(ti.move_cols_to_start([pl.Float64, pl.String]))
shape: (3, 3)
β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”
β”‚ b   ┆ c   ┆ a   β”‚
β”‚ --- ┆ --- ┆ --- β”‚
β”‚ str ┆ f64 ┆ i64 β”‚
β•žβ•β•β•β•β•β•ͺ═════β•ͺ═════║
β”‚ x   ┆ 4.4 ┆ 1   β”‚
β”‚ y   ┆ 5.5 ┆ 2   β”‚
β”‚ z   ┆ 6.6 ┆ 3   β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜

move_cols_to_end()

Reorder columns so that selected columns appear last:

df = pl.DataFrame({"a": [1, 2, 3], "b": ["x", "y", "z"], "c": [4.4, 5.5, 6.6]})
df.select(ti.move_cols_to_end(["a", "b"]))

Or by data type:

df.select(ti.move_cols_to_end([pl.String, pl.Int64]))
shape: (3, 3)
β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”
β”‚ c   ┆ a   ┆ b   β”‚
β”‚ --- ┆ --- ┆ --- β”‚
β”‚ f64 ┆ i64 ┆ str β”‚
β•žβ•β•β•β•β•β•ͺ═════β•ͺ═════║
β”‚ 4.4 ┆ 1   ┆ x   β”‚
β”‚ 5.5 ┆ 2   ┆ y   β”‚
β”‚ 6.6 ┆ 3   ┆ z   β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜

make_hyperlink()

Create an HTML anchor tag (<a>) from two columns β€” link text and URL:

df = pl.DataFrame({"name": ["GitHub"], "url": ["https://github.com/"]})
df.select(ti.make_hyperlink("name", "url").alias("link"))
shape: (1, 1)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ link                                                     β”‚
β”‚ ---                                                      β”‚
β”‚ str                                                      β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•‘
β”‚ <a href="https://github.com/" target="_blank">GitHub</a> β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜