Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 68 additions & 6 deletions guides/intro.livemd
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,11 @@ For our example, let's imagine we're building a fitness application that tracks

## Setting up the Repo

To get started, you'll need to set up secrets for connecting to a database. This will also require creating a Postgres database. Try the following commands in your shell:
First and foremost, please ensure you have TimescaleDB installed on your machine using the [installation guide](https://docs.timescale.com/install/latest/self-hosted/).

Some of these examples require the [Timescale Toolkit](https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/install-toolkit/) installed with Postgresql as well. If your machine supports it, the `timescale/timescaledb-ha` [Docker image](https://docs.timescale.com/install/latest/installation-docker/) is the easiest way to get up and running. However, if you're running on arm64 (e.g. a Macbook M1/M2), you'll need to install the toolkit by compiling it on your machine by following [these instructions](https://github.com/timescale/timescaledb-toolkit#-tools-setup).

Once Postgres/Timescale is up in running, you'll also need to set up secrets for connecting to a database. This will also require creating a Postgres database. Try the following commands in your shell:

```shell
$ psql -c 'create database timescale_fitness'
Expand Down Expand Up @@ -123,20 +127,21 @@ Repo.migrate({0, Fitness.Repo.Migrations.CreateHeartbeat}, :up)

## Generating mock data

To facilitate our example, let's create two users who are tracking their heartbeats.
To facilitate our example, let's create three users who are tracking their heartbeats.

```elixir
users = [
%{fullname: "Dave Lucia"},
%{fullname: "Alex Koutmos"}
%{fullname: "Alex Koutmos"},
%{fullname: "Peter Ullrich"}
]

Repo.insert_all("users", users, on_conflict: :nothing)

query =
from(u in "users", order_by: [desc: u.fullname], select: %{id: u.id, fullname: u.fullname})

[%{id: dave_id} = dave, %{id: alex_id} = alex] = Repo.all(query)
[%{id: peter_id} = peter, %{id: dave_id} = dave, %{id: alex_id} = alex] = Repo.all(query)
```

Next, we've built a little module to help us simulate heartbeats for an entire day.
Expand Down Expand Up @@ -182,7 +187,7 @@ end

## Generate small dataset

Next, we generate heartbeats for each user, and batch insert them into the database.
Next, we generate heartbeats for one user, and batch insert them into the database.

```elixir
batch_insert.(Fitness.Generator.generate_heartbeats(alex, ~D[2022-09-22]))
Expand Down Expand Up @@ -245,7 +250,7 @@ So far, we've only taken a look at a relatively small dataset.
Repo.one(from(h in "heartbeats", select: count(h)))
```

Let's make this way more interesting by generating a months worth of heartbeats for our two users! As shown above, that should put us around 5 million rows of data. Go grab a soda, as this is gonna take a little while to execute.
Let's make this way more interesting by generating a months worth of heartbeats for our second user. As shown above, that should put us around 5 million rows of data. Go grab a soda, as this is gonna take a little while to execute.

```elixir
for date <- Date.range(~D[2022-10-01], ~D[2022-10-31]) do
Expand Down Expand Up @@ -359,3 +364,60 @@ result8 =
[]
)
```

## Filling gaps in time-series

**Warning: this section requires the [TimescaleDB Toolkit](https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/install-toolkit/) to be installed, which may require building from source.**

```elixir
defmodule Fitness.Repo.Migrations.AddTimescaleToolkit do
use Ecto.Migration

import Timescale.Migration

def up do
create_timescaledb_toolkit_extension()
end

def down do
drop_timescaledb_toolkit_extension()
end
end

Repo.migrate({3, Fitness.Repo.Migrations.AddTimescaleToolkit}, :up)
```

With any system, there are opportunities for the system to go down. This could be due to a datacenter outage, global DNS being attacked, or simply an application configuration mishap. Regardless, it is possible for gaps in the collection of our data, which poses problems for calculations and aggregations.

Luckily, TimescaleDB provides hyperfunctions for gapfilling and interpoltion so smooth over these gaps.

Let's add some heartbeats in for another user every second, skipping over a few.

```elixir
Repo.insert_all("heartbeats", [
%{timestamp: ~N[2022-11-01 09:00:00], user_id: peter_id},
%{timestamp: ~N[2022-11-01 09:00:01], user_id: peter_id},
# No heartbeat at 9:00:02
%{timestamp: ~N[2022-11-01 09:00:03], user_id: peter_id},
# No heartbeat at 9:00:04
%{timestamp: ~N[2022-11-01 09:00:05], user_id: peter_id}
])
```

Now we've got 4 heartbeats, with two missing at seconds 2 and 4. We can use the `time_bucket_gapfill/2` function to fill in heartbeats.

```elixir
Repo.all(
from(h in "heartbeats",
where: h.user_id == ^peter_id,
select: %{
second: selected_as(time_bucket_gapfill(h.timestamp, "1 second"), :second),
user_id: h.user_id
},
where:
h.timestamp >= ^~N[2022-11-01 09:00:00] and
h.timestamp <= ^~N[2022-11-01 09:00:05],
group_by: [selected_as(:second), h.user_id]
)
)
```
14 changes: 14 additions & 0 deletions lib/timescale/hyperfunctions.ex
Original file line number Diff line number Diff line change
Expand Up @@ -99,4 +99,18 @@ defmodule Timescale.Hyperfunctions do
:origin
])
end

@doc """
Works similar to `time_bucket/2` but also activates gap-filling for the interval inferred
by the where clause of the query.

[Documentation](https://docs.timescale.com/api/latest/hyperfunctions/gapfilling-interpolation/time_bucket_gapfill/)
"""
defmacro time_bucket_gapfill(field, time_bucket) do

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would love to see time_bucket_gapfill/2 get a time_bucket_gapfill/3 sibling that takes options in parity with time_bucket/3, as I have a use case of passing timezone to my fragments 🙏

Copy link

@vanderhoop vanderhoop May 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, turns out time_bucket/3 doesn't actually support the timescale implementation's timezone option. my use case calls for timezone to be passed as a positional argument to time_bucket_gapfill. gonna play around with a fragment macro and see if I can get it to blend locally

Copy link

@vanderhoop vanderhoop May 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, even with a working fragment, postgres barfs because my timestamp is a timestamp without timezone, which I'd forgotten about, and which has sent me down a rabbit hole of postgres/ecto timestamp storage/casting articles.

Disregard, though I may report back!

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanderhoop now that I know you're looking for this, I'll try to get this cleaned up and merged in

# quote do
# fragment("time_bucket_gapfill(?, ?)", unquote(time_bucket), unquote(field))
# end

dynamic_function_fragment(:time_bucket, [time_bucket, field], [], [])
end
end
18 changes: 18 additions & 0 deletions lib/timescale/migration.ex
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,24 @@ defmodule Timescale.Migration do
end
end

@doc """
Adds the [TimescaleDB toolkit](https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/install-toolkit/#install-and-update-timescaledb-toolkit) as a Postgres Extension
"""
defmacro create_timescaledb_toolkit_extension do
quote do
Ecto.Migration.execute("CREATE EXTENSION IF NOT EXISTS timescaledb_toolkit CASCADE")
end
end

@doc """
Drops the [TimescaleDB toolkit](https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/install-toolkit/#install-and-update-timescaledb-toolkit) as a Postgres Extension
"""
defmacro drop_timescaledb_toolkit_extension do
quote do
Ecto.Migration.execute("DROP EXTENSION IF EXISTS timescaledb_toolkit CASCADE")
end
end

@doc """
Creates a new [hypertable](https://docs.timescale.com/api/latest/hypertable/create_hypertable/#create-hypertable) in an Ecto Migration.

Expand Down
1 change: 1 addition & 0 deletions priv/repo/migrations/20220826223234_setup_timescale.exs
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,6 @@ defmodule TimescaleApp.Repo.Migrations.SetupTimescale do

def change do
create_timescaledb_extension()
create_timescaledb_toolkit_extension()
end
end
7 changes: 7 additions & 0 deletions test/timescale/hyperfunction_test.exs
Original file line number Diff line number Diff line change
Expand Up @@ -64,4 +64,11 @@ defmodule Timescale.HyperfunctionTest do
~s[SELECT time_bucket('5 minutes', t0."timestamp", offset => '2.5 minutes', origin => '1900-01-01') FROM "test_hypertable" AS t0]
)
end

test "time_bucket_gapfill/2 generates a valid query" do
assert_sql(
from(t in Table, select: time_bucket_gapfill(t.timestamp, "5 minutes")),
~s[SELECT time_bucket_gapfill('5 minutes', t0."timestamp") FROM "test_hypertable" AS t0]
)
end
end
32 changes: 32 additions & 0 deletions test/timescale/integration_test.exs
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,38 @@ defmodule Timescale.IntegrationTest do
end
end

describe "time_bucket_gapfill/2" do
test "fills in gaps based on the where clause" do
naive_fixture(0.0, ~N[1989-09-22 12:00:00.000000])
naive_fixture(2.0, ~N[1989-09-22 12:02:00.000000])
naive_fixture(5.0, ~N[1989-09-22 12:05:00.000000])

start = ~N[1989-09-22 12:00:00.000000]
finish = ~N[1989-09-22 12:05:00.000000]

# query =
# from(t in Table,
# select: %{
# minute: selected_as(time_bucket_gapfill(t.timestamp, "2 minutes"), :minute)
# }
# # where: t.timestamp >= ^start and t.timestamp <= ^finish,
# # group_by: selected_as(:minute)
# )

query = from(t in Table)

assert Repo.all(query) ==
[
{0.0, ~N[1989-09-22 12:00:00.000000]},
{1.0, ~N[1989-09-22 12:00:00.000000]},
{2.0, ~N[1989-09-22 12:02:00.000000]},
{3.0, ~N[1989-09-22 12:02:00.000000]},
{4.0, ~N[1989-09-22 12:04:00.000000]},
{5.0, ~N[1989-09-22 12:04:00.000000]}
]
end
end

def naive_fixture(value, timestamp \\ NaiveDateTime.utc_now()) do
Repo.insert!(%Table{field: value, timestamp: timestamp})
end
Expand Down