diff --git a/guides/intro.livemd b/guides/intro.livemd index 27c4168..50f256d 100644 --- a/guides/intro.livemd +++ b/guides/intro.livemd @@ -20,7 +20,11 @@ For our example, let's imagine we're building a fitness application that tracks ## Setting up the Repo -To get started, you'll need to set up secrets for connecting to a database. This will also require creating a Postgres database. Try the following commands in your shell: +First and foremost, please ensure you have TimescaleDB installed on your machine using the [installation guide](https://docs.timescale.com/install/latest/self-hosted/). + +Some of these examples require the [Timescale Toolkit](https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/install-toolkit/) installed with Postgresql as well. If your machine supports it, the `timescale/timescaledb-ha` [Docker image](https://docs.timescale.com/install/latest/installation-docker/) is the easiest way to get up and running. However, if you're running on arm64 (e.g. a Macbook M1/M2), you'll need to install the toolkit by compiling it on your machine by following [these instructions](https://github.com/timescale/timescaledb-toolkit#-tools-setup). + +Once Postgres/Timescale is up in running, you'll also need to set up secrets for connecting to a database. This will also require creating a Postgres database. Try the following commands in your shell: ```shell $ psql -c 'create database timescale_fitness' @@ -123,12 +127,13 @@ Repo.migrate({0, Fitness.Repo.Migrations.CreateHeartbeat}, :up) ## Generating mock data -To facilitate our example, let's create two users who are tracking their heartbeats. +To facilitate our example, let's create three users who are tracking their heartbeats. ```elixir users = [ %{fullname: "Dave Lucia"}, - %{fullname: "Alex Koutmos"} + %{fullname: "Alex Koutmos"}, + %{fullname: "Peter Ullrich"} ] Repo.insert_all("users", users, on_conflict: :nothing) @@ -136,7 +141,7 @@ Repo.insert_all("users", users, on_conflict: :nothing) query = from(u in "users", order_by: [desc: u.fullname], select: %{id: u.id, fullname: u.fullname}) -[%{id: dave_id} = dave, %{id: alex_id} = alex] = Repo.all(query) +[%{id: peter_id} = peter, %{id: dave_id} = dave, %{id: alex_id} = alex] = Repo.all(query) ``` Next, we've built a little module to help us simulate heartbeats for an entire day. @@ -182,7 +187,7 @@ end ## Generate small dataset -Next, we generate heartbeats for each user, and batch insert them into the database. +Next, we generate heartbeats for one user, and batch insert them into the database. ```elixir batch_insert.(Fitness.Generator.generate_heartbeats(alex, ~D[2022-09-22])) @@ -245,7 +250,7 @@ So far, we've only taken a look at a relatively small dataset. Repo.one(from(h in "heartbeats", select: count(h))) ``` -Let's make this way more interesting by generating a months worth of heartbeats for our two users! As shown above, that should put us around 5 million rows of data. Go grab a soda, as this is gonna take a little while to execute. +Let's make this way more interesting by generating a months worth of heartbeats for our second user. As shown above, that should put us around 5 million rows of data. Go grab a soda, as this is gonna take a little while to execute. ```elixir for date <- Date.range(~D[2022-10-01], ~D[2022-10-31]) do @@ -359,3 +364,60 @@ result8 = [] ) ``` + +## Filling gaps in time-series + +**Warning: this section requires the [TimescaleDB Toolkit](https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/install-toolkit/) to be installed, which may require building from source.** + +```elixir +defmodule Fitness.Repo.Migrations.AddTimescaleToolkit do + use Ecto.Migration + + import Timescale.Migration + + def up do + create_timescaledb_toolkit_extension() + end + + def down do + drop_timescaledb_toolkit_extension() + end +end + +Repo.migrate({3, Fitness.Repo.Migrations.AddTimescaleToolkit}, :up) +``` + +With any system, there are opportunities for the system to go down. This could be due to a datacenter outage, global DNS being attacked, or simply an application configuration mishap. Regardless, it is possible for gaps in the collection of our data, which poses problems for calculations and aggregations. + +Luckily, TimescaleDB provides hyperfunctions for gapfilling and interpoltion so smooth over these gaps. + +Let's add some heartbeats in for another user every second, skipping over a few. + +```elixir +Repo.insert_all("heartbeats", [ + %{timestamp: ~N[2022-11-01 09:00:00], user_id: peter_id}, + %{timestamp: ~N[2022-11-01 09:00:01], user_id: peter_id}, + # No heartbeat at 9:00:02 + %{timestamp: ~N[2022-11-01 09:00:03], user_id: peter_id}, + # No heartbeat at 9:00:04 + %{timestamp: ~N[2022-11-01 09:00:05], user_id: peter_id} +]) +``` + +Now we've got 4 heartbeats, with two missing at seconds 2 and 4. We can use the `time_bucket_gapfill/2` function to fill in heartbeats. + +```elixir +Repo.all( + from(h in "heartbeats", + where: h.user_id == ^peter_id, + select: %{ + second: selected_as(time_bucket_gapfill(h.timestamp, "1 second"), :second), + user_id: h.user_id + }, + where: + h.timestamp >= ^~N[2022-11-01 09:00:00] and + h.timestamp <= ^~N[2022-11-01 09:00:05], + group_by: [selected_as(:second), h.user_id] + ) +) +``` diff --git a/lib/timescale/hyperfunctions.ex b/lib/timescale/hyperfunctions.ex index 7e4b699..e4c5415 100644 --- a/lib/timescale/hyperfunctions.ex +++ b/lib/timescale/hyperfunctions.ex @@ -99,4 +99,18 @@ defmodule Timescale.Hyperfunctions do :origin ]) end + + @doc """ + Works similar to `time_bucket/2` but also activates gap-filling for the interval inferred + by the where clause of the query. + + [Documentation](https://docs.timescale.com/api/latest/hyperfunctions/gapfilling-interpolation/time_bucket_gapfill/) + """ + defmacro time_bucket_gapfill(field, time_bucket) do + # quote do + # fragment("time_bucket_gapfill(?, ?)", unquote(time_bucket), unquote(field)) + # end + + dynamic_function_fragment(:time_bucket, [time_bucket, field], [], []) + end end diff --git a/lib/timescale/migration.ex b/lib/timescale/migration.ex index 4276974..54438af 100644 --- a/lib/timescale/migration.ex +++ b/lib/timescale/migration.ex @@ -23,6 +23,24 @@ defmodule Timescale.Migration do end end + @doc """ + Adds the [TimescaleDB toolkit](https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/install-toolkit/#install-and-update-timescaledb-toolkit) as a Postgres Extension + """ + defmacro create_timescaledb_toolkit_extension do + quote do + Ecto.Migration.execute("CREATE EXTENSION IF NOT EXISTS timescaledb_toolkit CASCADE") + end + end + + @doc """ + Drops the [TimescaleDB toolkit](https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/install-toolkit/#install-and-update-timescaledb-toolkit) as a Postgres Extension + """ + defmacro drop_timescaledb_toolkit_extension do + quote do + Ecto.Migration.execute("DROP EXTENSION IF EXISTS timescaledb_toolkit CASCADE") + end + end + @doc """ Creates a new [hypertable](https://docs.timescale.com/api/latest/hypertable/create_hypertable/#create-hypertable) in an Ecto Migration. diff --git a/priv/repo/migrations/20220826223234_setup_timescale.exs b/priv/repo/migrations/20220826223234_setup_timescale.exs index c26c3e3..4134901 100644 --- a/priv/repo/migrations/20220826223234_setup_timescale.exs +++ b/priv/repo/migrations/20220826223234_setup_timescale.exs @@ -5,5 +5,6 @@ defmodule TimescaleApp.Repo.Migrations.SetupTimescale do def change do create_timescaledb_extension() + create_timescaledb_toolkit_extension() end end diff --git a/test/timescale/hyperfunction_test.exs b/test/timescale/hyperfunction_test.exs index a5579af..f1182d6 100644 --- a/test/timescale/hyperfunction_test.exs +++ b/test/timescale/hyperfunction_test.exs @@ -64,4 +64,11 @@ defmodule Timescale.HyperfunctionTest do ~s[SELECT time_bucket('5 minutes', t0."timestamp", offset => '2.5 minutes', origin => '1900-01-01') FROM "test_hypertable" AS t0] ) end + + test "time_bucket_gapfill/2 generates a valid query" do + assert_sql( + from(t in Table, select: time_bucket_gapfill(t.timestamp, "5 minutes")), + ~s[SELECT time_bucket_gapfill('5 minutes', t0."timestamp") FROM "test_hypertable" AS t0] + ) + end end diff --git a/test/timescale/integration_test.exs b/test/timescale/integration_test.exs index 5df4d83..4b58b28 100644 --- a/test/timescale/integration_test.exs +++ b/test/timescale/integration_test.exs @@ -102,6 +102,38 @@ defmodule Timescale.IntegrationTest do end end + describe "time_bucket_gapfill/2" do + test "fills in gaps based on the where clause" do + naive_fixture(0.0, ~N[1989-09-22 12:00:00.000000]) + naive_fixture(2.0, ~N[1989-09-22 12:02:00.000000]) + naive_fixture(5.0, ~N[1989-09-22 12:05:00.000000]) + + start = ~N[1989-09-22 12:00:00.000000] + finish = ~N[1989-09-22 12:05:00.000000] + + # query = + # from(t in Table, + # select: %{ + # minute: selected_as(time_bucket_gapfill(t.timestamp, "2 minutes"), :minute) + # } + # # where: t.timestamp >= ^start and t.timestamp <= ^finish, + # # group_by: selected_as(:minute) + # ) + + query = from(t in Table) + + assert Repo.all(query) == + [ + {0.0, ~N[1989-09-22 12:00:00.000000]}, + {1.0, ~N[1989-09-22 12:00:00.000000]}, + {2.0, ~N[1989-09-22 12:02:00.000000]}, + {3.0, ~N[1989-09-22 12:02:00.000000]}, + {4.0, ~N[1989-09-22 12:04:00.000000]}, + {5.0, ~N[1989-09-22 12:04:00.000000]} + ] + end + end + def naive_fixture(value, timestamp \\ NaiveDateTime.utc_now()) do Repo.insert!(%Table{field: value, timestamp: timestamp}) end