Turtle Island is a lightweight utility library that provides helper functions to reduce boilerplate when writing Polars expressions. It aims to simplify common expression patterns and improve developer productivity when working with the Polars API.
β οΈ Disclaimer: This project is in early development. The API is still evolving and may change without notice. Use with caution in production environments.
Turtle Island is not yet published on PyPI. The recommended way to install it is using uv add:
uv add git+https://github.com/jrycw/turtle-island.gitTo keep your code clean and idiomatic, it's recommended to import Turtle Island as a top-level module:
import turtle_island as tiThe core spirit of Turtle Island is to embrace expressions over columns.
When wrangling data, it's common to create temporary helper columns as part of the transformation process. However, many of these columns are just intermediate artifacts β not part of the final output we actually want. They exist solely to assist with intermediate steps.
Polars offers a powerful distinction between contexts and expressions, allowing us to focus on expression-based transformations without needing to materialize every intermediate result as a column. Turtle Island builds on this principle, encouraging users to rely more on expressions β flexible, composable, and context-aware β rather than temporary columns.
A more ergonomic way to write chained when-then-otherwise logic in Polars:
df = pl.DataFrame({"x": [1, 2, 3, 4]})
expr_ti = ti.case_when(
case_list=[(pl.col("x") < 2, pl.lit("small")),
(pl.col("x") < 4, pl.lit("medium"))],
otherwise=pl.lit("large"),
).alias("size_ti")
expr_pl = (
pl.when(pl.col("x") < 2)
.then(pl.lit("small"))
.when(pl.col("x") < 4)
.then(pl.lit("medium"))
.otherwise(pl.lit("large"))
.alias("size_pl")
)
df.with_columns(expr_ti, expr_pl)shape: (4, 3)
βββββββ¬ββββββββββ¬ββββββββββ
β x β size_ti β size_pl β
β --- β --- β --- β
β i64 β str β str β
βββββββͺββββββββββͺββββββββββ‘
β 1 β small β small β
β 2 β medium β medium β
β 3 β medium β medium β
β 4 β large β large β
βββββββ΄ββββββββββ΄ββββββββββ
Adds a sequential index column to the DataFrame:
df = pl.DataFrame({"a": [1, 3, 5], "b": [2, 4, 6]})
df.select(ti.make_index(), pl.all())shape: (3, 3)
βββββββββ¬ββββββ¬ββββββ
β index β a β b β
β --- β --- β --- β
β u32 β i64 β i64 β
βββββββββͺββββββͺββββββ‘
β 0 β 1 β 2 β
β 1 β 3 β 4 β
β 2 β 5 β 6 β
βββββββββ΄ββββββ΄ββββββ
Assign values to rows in a round-robin pattern using Polars expressions:
df = pl.DataFrame({"x": [1, 2, 3, 4, 5]})
df.with_columns(
ti.bucketize(pl.col("x"), pl.col("x").add(100)).alias("bucketized")
)shape: (5, 2)
βββββββ¬βββββββββββββ
β x β bucketized β
β --- β --- β
β i64 β i64 β
βββββββͺβββββββββββββ‘
β 1 β 1 β
β 2 β 102 β
β 3 β 3 β
β 4 β 104 β
β 5 β 5 β
βββββββ΄βββββββββββββ
Assign values to rows in a round-robin pattern using literal values:
df = pl.DataFrame({"x": [1, 2, 3, 4, 5]})
df.with_columns(ti.bucketize_lit(True, False).alias("bucketized"))shape: (5, 2)
βββββββ¬βββββββββββββ
β x β bucketized β
β --- β --- β
β i64 β bool β
βββββββͺβββββββββββββ‘
β 1 β true β
β 2 β false β
β 3 β true β
β 4 β false β
β 5 β true β
βββββββ΄βββββββββββββ
Mark every second row:
df = pl.DataFrame({"x": [1, 2, 3, 4, 5]})
df.with_columns(ti.is_every_nth_row(2))shape: (5, 2)
βββββββ¬βββββββββββββββ
β x β bool_nth_row β
β --- β --- β
β i64 β bool β
βββββββͺβββββββββββββββ‘
β 1 β true β
β 2 β false β
β 3 β true β
β 4 β false β
β 5 β true β
βββββββ΄βββββββββββββββ
To invert the result:
df.with_columns(~ti.is_every_nth_row(2))shape: (5, 2)
βββββββ¬βββββββββββββββ
β x β bool_nth_row β
β --- β --- β
β i64 β bool β
βββββββͺβββββββββββββββ‘
β 1 β false β
β 2 β true β
β 3 β false β
β 4 β true β
β 5 β false β
βββββββ΄βββββββββββββββ
Reorder columns so that selected columns appear first:
df = pl.DataFrame({"a": [1, 2, 3], "b": ["x", "y", "z"], "c": [4.4, 5.5, 6.6]})
df.select(ti.move_cols_to_start(["b", "c"]))Or by data type:
df.select(ti.move_cols_to_start([pl.Float64, pl.String]))shape: (3, 3)
βββββββ¬ββββββ¬ββββββ
β b β c β a β
β --- β --- β --- β
β str β f64 β i64 β
βββββββͺββββββͺββββββ‘
β x β 4.4 β 1 β
β y β 5.5 β 2 β
β z β 6.6 β 3 β
βββββββ΄ββββββ΄ββββββ
Reorder columns so that selected columns appear last:
df = pl.DataFrame({"a": [1, 2, 3], "b": ["x", "y", "z"], "c": [4.4, 5.5, 6.6]})
df.select(ti.move_cols_to_end(["a", "b"]))Or by data type:
df.select(ti.move_cols_to_end([pl.String, pl.Int64]))shape: (3, 3)
βββββββ¬ββββββ¬ββββββ
β c β a β b β
β --- β --- β --- β
β f64 β i64 β str β
βββββββͺββββββͺββββββ‘
β 4.4 β 1 β x β
β 5.5 β 2 β y β
β 6.6 β 3 β z β
βββββββ΄ββββββ΄ββββββ
Create an HTML anchor tag (<a>) from two columns β link text and URL:
df = pl.DataFrame({"name": ["GitHub"], "url": ["https://github.com/"]})
df.select(ti.make_hyperlink("name", "url").alias("link"))shape: (1, 1)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β link β
β --- β
β str β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ‘
β <a href="https://github.com/" target="_blank">GitHub</a> β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ