knowledge/technology/applications/development/TimescaleDB.md
2024-09-30 13:37:38 +02:00

4.7 KiB

obj repo website rev
application https://github.com/timescale/timescaledb https://www.timescale.com 2024-09-30

TimescaleDB

TimescaleDB is an open-source time-series database built on PostgreSQL, designed to handle large volumes of time-series data efficiently. It provides powerful data management features, making it suitable for applications in various domains such as IoT, finance, and analytics.

Features:

  • Hypertables: The backbone of TimescaleDB, hypertables, facilitate automatic data partitioning across time, streamlining the management of vast datasets.
  • Continuous Aggregates: This feature enables the pre-computation and storage of aggregate data, significantly speeding up query times for common analytical operations.
  • Data Compression: TimescaleDB employs sophisticated compression techniques to reduce storage footprint without compromising query performance.
  • Optimized Indexing: With its advanced indexing strategies, including multi-dimensional and time-based indexing, TimescaleDB ensures rapid query responses, making it highly efficient for time-series data.

Installation

Create the extension in your database:

CREATE EXTENSION IF NOT EXISTS timescaledb;

Hypertables

Hypertables are PostgreSQL tables that automatically partition your data by time. You interact with hypertables in the same way as regular PostgreSQL tables, but with extra features that makes managing your time-series data much easier.

In Timescale, hypertables exist alongside regular PostgreSQL tables. Use hypertables to store time-series data. This gives you improved insert and query performance, and access to useful time-series features. Use regular PostgreSQL tables for other relational data.

With hypertables, Timescale makes it easy to improve insert and query performance by partitioning time-series data on its time parameter. Behind the scenes, the database performs the work of setting up and maintaining the hypertable's partitions. Meanwhile, you insert and query your data as if it all lives in a single, regular PostgreSQL table.

Create a hypertable:

  • Create a standard PostgreSQL table:
CREATE TABLE conditions (
   time        TIMESTAMPTZ       NOT NULL,
   location    TEXT              NOT NULL,
   device      TEXT              NOT NULL,
   temperature DOUBLE PRECISION  NULL,
   humidity    DOUBLE PRECISION  NULL
);
  • Convert the table to a hypertable. Specify the name of the table you want to convert, and the column that holds its time values.
SELECT create_hypertable('conditions', by_range('time'));

Hyperfunctions

Hyprfunctions allow you to query and aggregate your time data.

delta

The delta function computes the change in a value over time. It helps in understanding how a metric (e.g., temperature, stock price, etc.) changes between readings.

Example: Calculate Temperature Change Over a Day

SELECT
    delta(temperature) AS temp_change
FROM temperature_readings
WHERE time BETWEEN '2023-09-01' AND '2023-09-02';

derivative

The derivative function calculates the rate of change (derivative) of a series over time.

Example: Calculate the Rate of Temperature Change Per Hour

SELECT
    derivative(avg(temperature), '1 hour') AS temp_rate_change
FROM temperature_readings
GROUP BY time_bucket('1 hour', time);

first & last

The first and last hyperfunctions return the first and last recorded values within a specified period.

SELECT
    time_bucket('1 day', time) AS day,
    first(stock_price, time) AS opening_price,
    last(stock_price, time) AS closing_price
FROM stock_prices
GROUP BY day
ORDER BY day;

locf

The locf (Last Observation Carried Forward) function fills missing data by carrying the last known observation forward to the missing timestamps.

SELECT
    time_bucket('1 hour', time) AS hour,
    locf(last(temperature, time)) AS filled_temperature
FROM temperature_readings
GROUP BY hour
ORDER BY hour;

interpolated_avg

The interpolated_avg hyperfunction computes the average of a series with values interpolated at regular time intervals.

SELECT
    time_bucket('1 hour', time) AS hour,
    interpolated_avg('linear', time, power_usage) AS interpolated_power
FROM power_data
WHERE time BETWEEN '2023-09-01' AND '2023-09-07'
GROUP BY hour;

time_bucket

The time_bucket hyperfunction is essential when you want to analyze or summarize data over time-based intervals, such as calculating daily averages, hourly sums, or other time-bound statistics.

SELECT
    time_bucket('1 hour', time) AS bucketed_time,
    avg(cpu_usage) AS avg_cpu_usage
FROM server_metrics
WHERE time BETWEEN '2023-09-01' AND '2023-09-30'
GROUP BY bucketed_time
ORDER BY bucketed_time;