merge postgres

This commit is contained in:
JMARyA 2024-09-30 13:40:22 +02:00
commit 8bc618e9ea
Signed by: jmarya
GPG key ID: 901B2ADDF27C2263
6 changed files with 672 additions and 0 deletions

View file

@ -53,10 +53,12 @@ rev: 2024-07-14
- [HTTPie](./development/HTTPie.md)
- [MongoDB Compass](./development/MongoDB%20Compass.md)
- [MongoDB](./development/MongoDB.md)
- [Postgres](./development/Postgres.md)
- [Podman Desktop](./development/Podman%20Desktop.md)
- [Visual Studio Code](./development/Visual%20Studio%20Code.md)
- [continue](./development/continue.md)
- [psequel](development/psequel.md)
- [PostgreSQL](development/Postgres.md)
## Documents
- [Tachiyomi](./documents/Tachiyomi.md)

View file

@ -0,0 +1,103 @@
---
obj: application
wiki: https://en.wikipedia.org/wiki/PostGIS
repo: https://git.osgeo.org/gitea/postgis/postgis
website: https://postgis.net
rev: 2024-09-30
---
# PostGIS
PostGIS is a spatial database extender for PostgreSQL. It adds support for geographic objects allowing it to be used as a spatial database for geographic information systems (GIS). With PostGIS, PostgreSQL becomes a powerful database for managing spatial data and performing complex geographic operations.
PostGIS offers the following key features:
- **Geometry and Geography Types**: PostGIS supports two primary types of spatial objects: `Geometry` (for Cartesian coordinates) and `Geography` (for geodetic coordinates).
- **Spatial Indexing**: Support for R-tree-based spatial indexing using GiST (Generalized Search Tree) indexes.
- **Spatial Relationships and Measurements**: Functions to perform spatial analysis, including distance calculations, intersections, unions, and more.
- **3D and 4D Coordinates**: Support for 3D geometries (with Z values) and 4D (with M values for measures).
- **Raster and Vector Data**: PostGIS allows for the handling of both raster (pixel-based) and vector (coordinate-based) spatial data.
- **WKT, WKB, GeoJSON Support**: PostGIS supports common geographic data formats like Well-Known Text (WKT), Well-Known Binary (WKB), and GeoJSON.
## Enable PostGIS in a PostgreSQL Database
After installation, to enable PostGIS on a specific database, run the following SQL commands:
```sql
CREATE EXTENSION postgis;
CREATE EXTENSION postgis_topology;
```
## Spatial Data Types
PostGIS introduces several spatial data types. The two most commonly used types are:
### 1. `Geometry`
Represents geometric shapes in a Cartesian (planar) coordinate system.
```sql
CREATE TABLE my_table (
id SERIAL PRIMARY KEY,
geom GEOMETRY(Point, 4326)
);
```
### 2. `Geography`
Represents geographic shapes in a spherical coordinate system (uses latitudes and longitudes).
```sql
CREATE TABLE my_geo_table (
id SERIAL PRIMARY KEY,
geom GEOGRAPHY(POINT, 4326)
);
```
PostGIS also supports other geometry types, such as:
- `POINT`
- `LINESTRING`
- `POLYGON`
- `MULTIPOINT`
- `MULTILINESTRING`
- `MULTIPOLYGON`
Each of these types can be used in both `GEOMETRY` and `GEOGRAPHY` contexts.
## Spatial Functions
PostGIS provides a vast library of spatial functions for querying and manipulating spatial data. Some important functions include:
### Distance
Calculates the distance between two geometries.
```sql
SELECT ST_Distance(
ST_GeomFromText('POINT(0 0)', 4326),
ST_GeomFromText('POINT(1 1)', 4326)
);
```
### Intersection
Returns the intersection of two geometries.
```sql
SELECT ST_Intersection(
ST_GeomFromText('LINESTRING(0 0, 2 2)', 4326),
ST_GeomFromText('LINESTRING(0 2, 2 0)', 4326)
);
```
### Contains
Checks if one geometry contains another.
```sql
SELECT ST_Contains(
ST_GeomFromText('POLYGON((0 0, 0 2, 2 2, 2 0, 0 0))', 4326),
ST_GeomFromText('POINT(1 1)', 4326)
);
```
### Area
Calculates the area of a polygon.
```sql
SELECT ST_Area(
ST_GeomFromText('POLYGON((0 0, 0 2, 2 2, 2 0, 0 0))', 4326)
);
```

View file

@ -0,0 +1,286 @@
---
obj: application
website: https://www.postgresql.org
repo: https://git.postgresql.org/gitweb/?p=postgresql.git
---
# Postgres
PostgreSQL is an advanced, open-source, object-relational database management system. It is renowned for its scalability, reliability, and compliance with the SQL standard. PostgreSQL supports both SQL (relational) and JSON (non-relational) querying, making it highly versatile.
## Extensions
PostgreSQL can be extended via extensions:
- [TimescaleDB](./TimescaleDB.md) - Time-series data
- [pgVector](./pgvector.md) - Vector database functions
- [PostGIS](./PostGIS.md) - Spatial data
## psql
**psql** is a terminal-based front end to PostgreSQL. It allows users to interact with PostgreSQL databases by executing SQL queries, managing database objects, and performing administrative tasks.
To start psql, open your terminal or command prompt and type:
```bash
psql
```
### Connecting to a Database
You can specify the database name, user, host, and port when launching psql:
```bash
psql -d database_name -U username -h hostname -p port
```
Alternatively, you can use environment variables:
```bash
export PGDATABASE=mydb
export PGUSER=myuser
export PGPASSWORD=mypassword
export PGHOST=localhost
export PGPORT=5432
psql
```
### Listing Databases and Tables
- **List Databases:**
```sql
\l
```
- **List Tables in the Current Database:**
```sql
\dt
```
- **List All Schemas:**
```sql
\dn
```
### Creating and Dropping Databases/Tables
- **Create a Database:**
```sql
CREATE DATABASE mydb;
```
- **Drop a Database:**
```sql
DROP DATABASE mydb;
```
- **Create a Table:**
```sql
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(50) NOT NULL,
email VARCHAR(100) NOT NULL
);
```
- **Drop a Table:**
```sql
DROP TABLE users;
```
### Running SQL Queries
Execute standard SQL commands to interact with your data.
**Example: Inserting Data**
```sql
INSERT INTO users (username, email) VALUES ('john_doe', 'john@example.com');
```
**Example: Querying Data**
```sql
SELECT * FROM users;
```
**Example: Updating Data**
```sql
UPDATE users SET email = 'john.doe@example.com' WHERE username = 'john_doe';
```
**Example: Deleting Data**
```sql
DELETE FROM users WHERE username = 'john_doe';
```
## Meta-Commands
psql provides a set of meta-commands (prefixed with `\`) that facilitate various tasks.
### Common Meta-Commands
- **Help on Meta-Commands:**
```sql
\?
```
- **Help on SQL Commands:**
```sql
\h
```
- **Describe a Table:**
```sql
\d table_name
```
- **List All Tables, Views, and Sequences:**
```sql
\dt
```
- **List All Indexes:**
```sql
\di
```
- **Exit psql:**
```sql
\q
```
## Data Types
### 1. **Numeric Types**
- **Small Integer Types**
- `SMALLINT` (2 bytes): Range from -32,768 to +32,767
- **Integer Types**
- `INTEGER` or `INT` (4 bytes): Range from -2,147,483,648 to +2,147,483,647
- **Big Integer Types**
- `BIGINT` (8 bytes): Range from -9,223,372,036,854,775,808 to +9,223,372,036,854,775,807
- **Decimal/Exact Types**
- `DECIMAL` or `NUMERIC` (variable size): User-defined precision and scale
- **Floating-Point Types**
- `REAL` (4 bytes): Single precision floating-point number
- `DOUBLE PRECISION` (8 bytes): Double precision floating-point number
- **Serial Types (Auto-Incrementing)**
- `SERIAL` (4 bytes): Auto-incrementing integer (small range)
- `BIGSERIAL` (8 bytes): Auto-incrementing integer (large range)
- `SMALLSERIAL` (2 bytes): Auto-incrementing integer (smaller range)
### 2. **Monetary Type**
- `MONEY`: Stores currency amounts with a fixed fractional precision
### 3. **Character Types**
- **Fixed-Length Strings**
- `CHAR(n)` or `CHARACTER(n)`: Fixed length (padded with spaces)
- **Variable-Length Strings**
- `VARCHAR(n)` or `CHARACTER VARYING(n)`: Variable length with a limit
- **Text**
- `TEXT`: Variable length with no specific limit
### 4. **Binary Data Types**
- **Binary Large Object**
- `BYTEA`: Stores binary strings (byte arrays)
### 5. **Date/Time Types**
- **Date and Time**
- `DATE`: Calendar date (year, month, day)
- `TIME` (no time zone): Time of day (without time zone)
- `TIMETZ` (with time zone): Time of day (with time zone)
- `TIMESTAMP` (no time zone): Date and time without time zone
- `TIMESTAMPTZ` (with time zone): Date and time with time zone
- **Intervals**
- `INTERVAL`: Time span (e.g., days, months, hours)
### 6. **Boolean Type**
- `BOOLEAN`: Stores `TRUE`, `FALSE`, or `NULL`
### 7. **UUID Type**
- `UUID`: Stores Universally Unique Identifiers (128-bit values)
### 8. **Enumerated Types**
- `ENUM`: User-defined enumerated type (a static set of values)
### 9. **Geometric Types**
- `POINT`: Stores a geometric point (x, y)
- `LINE`: Infinite line
- `LSEG`: Line segment
- `BOX`: Rectangular box
- `PATH`: Geometric path (multiple points)
- `POLYGON`: Closed geometric figure
- `CIRCLE`: Circle
### 10. **Network Address Types**
- `CIDR`: IPv4 or IPv6 network block
- `INET`: IPv4 or IPv6 address
- `MACADDR`: MAC address
- `MACADDR8`: MAC address (EUI-64 format)
### 11. **Bit String Types**
- **Fixed-Length Bit Strings**
- `BIT(n)`: Fixed-length bit string
- **Variable-Length Bit Strings**
- `BIT VARYING(n)`: Variable-length bit string
### 12. **Text Search Types**
- `TSVECTOR`: Text search document
- `TSQUERY`: Text search query
### 13. **JSON Types**
- `JSON`: Textual JSON data
- `JSONB`: Binary JSON data (more efficient for indexing)
### 14. **Array Types**
- `ARRAY`: Allows any data type to be stored as an array (e.g., `INTEGER[]`, `TEXT[]`)
### 15. **Range Types**
- `INT4RANGE`: Range of `INTEGER`
- `INT8RANGE`: Range of `BIGINT`
- `NUMRANGE`: Range of `NUMERIC`
- `TSRANGE`: Range of `TIMESTAMP WITHOUT TIME ZONE`
- `TSTZRANGE`: Range of `TIMESTAMP WITH TIME ZONE`
- `DATERANGE`: Range of `DATE`
### 16. **Composite Types**
- User-defined types that consist of multiple fields of various types
### 17. **Object Identifier Types (OID)**
- `OID`: Object identifier (used internally by PostgreSQL)
- `REGCLASS`, `REGPROC`, `REGTYPE`: Special types for referencing classes, procedures, and types by OID or name
### 18. **Pseudo-Types**
- `ANY`: Accepts any data type
- `ANYARRAY`: Accepts any array data type
- `ANYELEMENT`: Represents any type of element
- `ANYENUM`: Accepts any `ENUM` type
- `ANYNONARRAY`: Any non-array type
- `VOID`: No data (used with functions that return no value)
- `TRIGGER`: Used in triggers
- `LANGUAGE_HANDLER`: Used internally for language support
## Docker-Compose
```yml
services:
postgres:
image: postgres:17
container_name: postgres
environment:
POSTGRES_USER: myuser
POSTGRES_PASSWORD: mypassword
POSTGRES_DB: mydb
ports:
- "5432:5432"
volumes:
- ./postgres:/var/lib/postgresql/data
restart: always
```

View file

@ -0,0 +1,121 @@
---
obj: application
repo: https://github.com/timescale/timescaledb
website: https://www.timescale.com
rev: 2024-09-30
---
# TimescaleDB
TimescaleDB is an open-source time-series database built on [PostgreSQL](./Postgres.md), designed to handle large volumes of time-series data efficiently. It provides powerful data management features, making it suitable for applications in various domains such as IoT, finance, and analytics.
Features:
- Hypertables: The backbone of TimescaleDB, hypertables, facilitate automatic data partitioning across time, streamlining the management of vast datasets.
- Continuous Aggregates: This feature enables the pre-computation and storage of aggregate data, significantly speeding up query times for common analytical operations.
- Data Compression: TimescaleDB employs sophisticated compression techniques to reduce storage footprint without compromising query performance.
- Optimized Indexing: With its advanced indexing strategies, including multi-dimensional and time-based indexing, TimescaleDB ensures rapid query responses, making it highly efficient for time-series data.
## Installation
**Create the extension in your database**:
```sql
CREATE EXTENSION IF NOT EXISTS timescaledb;
```
## Hypertables
Hypertables are PostgreSQL tables that automatically partition your data by time. You interact with hypertables in the same way as regular PostgreSQL tables, but with extra features that makes managing your time-series data much easier.
In Timescale, hypertables exist alongside regular PostgreSQL tables. Use hypertables to store time-series data. This gives you improved insert and query performance, and access to useful time-series features. Use regular PostgreSQL tables for other relational data.
With hypertables, Timescale makes it easy to improve insert and query performance by partitioning time-series data on its time parameter. Behind the scenes, the database performs the work of setting up and maintaining the hypertable's partitions. Meanwhile, you insert and query your data as if it all lives in a single, regular PostgreSQL table.
**Create a hypertable:**
- Create a standard PostgreSQL table:
```sql
CREATE TABLE conditions (
time TIMESTAMPTZ NOT NULL,
location TEXT NOT NULL,
device TEXT NOT NULL,
temperature DOUBLE PRECISION NULL,
humidity DOUBLE PRECISION NULL
);
```
- Convert the table to a hypertable. Specify the name of the table you want to convert, and the column that holds its time values.
```sql
SELECT create_hypertable('conditions', by_range('time'));
```
## Hyperfunctions
Hyprfunctions allow you to query and aggregate your time data.
### delta
The `delta` function computes the change in a value over time. It helps in understanding how a metric (e.g., temperature, stock price, etc.) changes between readings.
Example: Calculate Temperature Change Over a Day
```sql
SELECT
delta(temperature) AS temp_change
FROM temperature_readings
WHERE time BETWEEN '2023-09-01' AND '2023-09-02';
```
### derivative
The `derivative` function calculates the rate of change (derivative) of a series over time.
Example: Calculate the Rate of Temperature Change Per Hour
```sql
SELECT
derivative(avg(temperature), '1 hour') AS temp_rate_change
FROM temperature_readings
GROUP BY time_bucket('1 hour', time);
```
### first & last
The `first` and `last` hyperfunctions return the first and last recorded values within a specified period.
```sql
SELECT
time_bucket('1 day', time) AS day,
first(stock_price, time) AS opening_price,
last(stock_price, time) AS closing_price
FROM stock_prices
GROUP BY day
ORDER BY day;
```
### locf
The `locf` (Last Observation Carried Forward) function fills missing data by carrying the last known observation forward to the missing timestamps.
```sql
SELECT
time_bucket('1 hour', time) AS hour,
locf(last(temperature, time)) AS filled_temperature
FROM temperature_readings
GROUP BY hour
ORDER BY hour;
```
### interpolated_avg
The `interpolated_avg` hyperfunction computes the average of a series with values interpolated at regular time intervals.
```sql
SELECT
time_bucket('1 hour', time) AS hour,
interpolated_avg('linear', time, power_usage) AS interpolated_power
FROM power_data
WHERE time BETWEEN '2023-09-01' AND '2023-09-07'
GROUP BY hour;
```
### time_bucket
The `time_bucket` hyperfunction is essential when you want to analyze or summarize data over time-based intervals, such as calculating daily averages, hourly sums, or other time-bound statistics.
```sql
SELECT
time_bucket('1 hour', time) AS bucketed_time,
avg(cpu_usage) AS avg_cpu_usage
FROM server_metrics
WHERE time BETWEEN '2023-09-01' AND '2023-09-30'
GROUP BY bucketed_time
ORDER BY bucketed_time;
```

View file

@ -0,0 +1,99 @@
---
obj: application
repo: https://github.com/pgvector/pgvector
rev: 2024-09-30
---
# pgVector
**pgvector** is a [PostgreSQL](./Postgres.md) extension designed to support vector similarity search. With the rise of machine learning models like those in natural language processing (NLP), computer vision, and recommendation systems, the need to efficiently store and query high-dimensional vectors (embeddings) has grown significantly. pgvector provides a solution by enabling PostgreSQL to handle these vector operations, making it possible to search for similar items using vector distance metrics directly in SQL.
## Installation
1. Install pgvector using `git` and `make`:
```bash
git clone https://github.com/pgvector/pgvector.git
cd pgvector
make && make install
```
2. Add the extension to your PostgreSQL database:
```sql
CREATE EXTENSION IF NOT EXISTS vector;
```
## Data Types
pgvector introduces a new data type called `vector`. It is used to store fixed-length vectors, and the size must be specified during table creation.
```sql
CREATE TABLE items (
id serial PRIMARY KEY,
embedding vector(3) -- a 3-dimensional vector
);
```
## Functions and Operators
pgvector provides several functions and operators for vector similarity and distance calculation.
### Distance Metrics
- **Euclidean Distance** (`<->`): Measures the straight-line distance between two vectors.
```sql
SELECT * FROM items ORDER BY embedding <-> '[1, 0, 0]' LIMIT 5;
```
- **Cosine Similarity** (`<=>`): Measures the cosine of the angle between two vectors.
```sql
SELECT * FROM items ORDER BY embedding <=> '[1, 0, 0]' LIMIT 5;
```
- **Inner Product** (`<#>`): Measures the dot product between two vectors.
```sql
SELECT * FROM items ORDER BY embedding <#> '[1, 0, 0]' LIMIT 5;
```
### Basic Operations
- **Set a Vector Value**:
```sql
INSERT INTO items (embedding) VALUES ('[1, 0, 0]');
```
- **Retrieve All Vectors**:
```sql
SELECT * FROM items;
```
## Indexing
To enhance performance for similarity search, pgvector supports indexing. The recommended index types depend on the distance metric you plan to use:
- **Euclidean Distance** (L2):
```sql
CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);
```
- **Cosine Similarity**:
```sql
CREATE INDEX ON items USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
```
- **Inner Product**:
```sql
CREATE INDEX ON items USING ivfflat (embedding vector_ip_ops) WITH (lists = 100);
```
### Index Parameters
- **Lists**: Defines the number of centroids to use in the IVF (Inverted File) index. Higher values of `lists` improve recall but may increase query time.
## Use Cases
1. **Recommendation Systems**: Store user and item embeddings and use similarity search to recommend items based on user preferences.
2. **Search Engines**: Search for semantically similar documents or images using vector embeddings.
3. **NLP Applications**: Store word, sentence, or document embeddings to perform similarity search or clustering of textual data.
4. **Image Recognition**: Query for similar images based on embeddings generated by deep learning models.

View file

@ -0,0 +1,61 @@
---
obj: filesystem
---
# Ceph
#wip
Ceph is a distributed storage system providing Object, Block and Filesystem Storage.
## Concepts
- Monitors: A Ceph Monitor (`ceph-mon`) maintains maps of the cluster state, including the monitor map, manager map, the OSD map, the MDS map, and the CRUSH map. These maps are critical cluster state required for Ceph daemons to coordinate with each other. Monitors are also responsible for managing authentication between daemons and clients. At least three monitors are normally required for redundancy and high availability.
- Managers: A Ceph Manager daemon (`ceph-mgr`) is responsible for keeping track of runtime metrics and the current state of the Ceph cluster, including storage utilization, current performance metrics, and system load. The Ceph Manager daemons also host python-based modules to manage and expose Ceph cluster information, including a web-based Ceph Dashboard and REST API. At least two managers are normally required for high availability.
- Ceph OSDs: An Object Storage Daemon (Ceph OSD, `ceph-osd`) stores data, handles data replication, recovery, rebalancing, and provides some monitoring information to Ceph Monitors and Managers by checking other Ceph OSD Daemons for a heartbeat. At least three Ceph OSDs are normally required for redundancy and high availability.
- MDSs: A Ceph Metadata Server (MDS, `ceph-mds`) stores metadata for the Ceph File System. Ceph Metadata Servers allow CephFS users to run basic commands (like ls, find, etc.) without placing a burden on the Ceph Storage Cluster.
Ceph stores data as objects within logical storage pools. Using the CRUSH algorithm, Ceph calculates which placement group (PG) should contain the object, and which OSD should store the placement group. The CRUSH algorithm enables the Ceph Storage Cluster to scale, rebalance, and recover dynamically.
## Setup
Cephadm creates a new Ceph cluster by bootstrapping a single host, expanding the cluster to encompass any additional hosts, and then deploying the needed services.
Run the ceph bootstrap command with the IP of the first cluster host:
```
cephadm bootstrap --mon-ip <mon-ip>
```
This command will:
- Create a Monitor and a Manager daemon for the new cluster on the local host.
- Generate a new SSH key for the Ceph cluster and add it to the root users `/root/.ssh/authorized_keys` file.
- Write a copy of the public key to `/etc/ceph/ceph.pub`.
- Write a minimal configuration file to `/etc/ceph/ceph.conf`. This file is needed to communicate with Ceph daemons.
- Write a copy of the `client.admin` administrative (privileged!) secret key to `/etc/ceph/ceph.client.admin.keyring`.
- Add the `_admin` label to the bootstrap host. By default, any host with this label will (also) get a copy of `/etc/ceph/ceph.conf` and `/etc/ceph/ceph.client.admin.keyring`.
### Ceph CLI
The `cephadm shell` command launches a bash shell in a container with all of the Ceph packages installed. By default, if configuration and keyring files are found in `/etc/ceph` on the host, they are passed into the container environment so that the shell is fully functional. Note that when executed on a MON host, cephadm shell will infer the config from the MON container instead of using the default configuration. If `--mount <path>` is given, then the host `<path>` (file or directory) will appear under `/mnt` inside the container:
```shell
cephadm shell
```
To execute ceph commands, you can also run commands like this:
```shell
cephadm shell -- ceph -s
```
You can install the ceph-common package, which contains all of the ceph commands, including ceph, rbd, mount.ceph (for mounting CephFS file systems), etc.:
```shell
cephadm add-repo --release reef
cephadm install ceph-common
```
Confirm that the ceph command is accessible with:
```shell
ceph -v
ceph status
```
## Host Management
#todo -> https://docs.ceph.com/en/latest/cephadm/host-management/