merge postgres
This commit is contained in:
commit
8bc618e9ea
6 changed files with 672 additions and 0 deletions
|
@ -53,10 +53,12 @@ rev: 2024-07-14
|
|||
- [HTTPie](./development/HTTPie.md)
|
||||
- [MongoDB Compass](./development/MongoDB%20Compass.md)
|
||||
- [MongoDB](./development/MongoDB.md)
|
||||
- [Postgres](./development/Postgres.md)
|
||||
- [Podman Desktop](./development/Podman%20Desktop.md)
|
||||
- [Visual Studio Code](./development/Visual%20Studio%20Code.md)
|
||||
- [continue](./development/continue.md)
|
||||
- [psequel](development/psequel.md)
|
||||
- [PostgreSQL](development/Postgres.md)
|
||||
|
||||
## Documents
|
||||
- [Tachiyomi](./documents/Tachiyomi.md)
|
||||
|
|
103
technology/applications/development/PostGIS.md
Normal file
103
technology/applications/development/PostGIS.md
Normal file
|
@ -0,0 +1,103 @@
|
|||
---
|
||||
obj: application
|
||||
wiki: https://en.wikipedia.org/wiki/PostGIS
|
||||
repo: https://git.osgeo.org/gitea/postgis/postgis
|
||||
website: https://postgis.net
|
||||
rev: 2024-09-30
|
||||
---
|
||||
|
||||
# PostGIS
|
||||
PostGIS is a spatial database extender for PostgreSQL. It adds support for geographic objects allowing it to be used as a spatial database for geographic information systems (GIS). With PostGIS, PostgreSQL becomes a powerful database for managing spatial data and performing complex geographic operations.
|
||||
|
||||
PostGIS offers the following key features:
|
||||
|
||||
- **Geometry and Geography Types**: PostGIS supports two primary types of spatial objects: `Geometry` (for Cartesian coordinates) and `Geography` (for geodetic coordinates).
|
||||
- **Spatial Indexing**: Support for R-tree-based spatial indexing using GiST (Generalized Search Tree) indexes.
|
||||
- **Spatial Relationships and Measurements**: Functions to perform spatial analysis, including distance calculations, intersections, unions, and more.
|
||||
- **3D and 4D Coordinates**: Support for 3D geometries (with Z values) and 4D (with M values for measures).
|
||||
- **Raster and Vector Data**: PostGIS allows for the handling of both raster (pixel-based) and vector (coordinate-based) spatial data.
|
||||
- **WKT, WKB, GeoJSON Support**: PostGIS supports common geographic data formats like Well-Known Text (WKT), Well-Known Binary (WKB), and GeoJSON.
|
||||
|
||||
## Enable PostGIS in a PostgreSQL Database
|
||||
After installation, to enable PostGIS on a specific database, run the following SQL commands:
|
||||
|
||||
```sql
|
||||
CREATE EXTENSION postgis;
|
||||
CREATE EXTENSION postgis_topology;
|
||||
```
|
||||
|
||||
## Spatial Data Types
|
||||
PostGIS introduces several spatial data types. The two most commonly used types are:
|
||||
|
||||
### 1. `Geometry`
|
||||
Represents geometric shapes in a Cartesian (planar) coordinate system.
|
||||
|
||||
```sql
|
||||
CREATE TABLE my_table (
|
||||
id SERIAL PRIMARY KEY,
|
||||
geom GEOMETRY(Point, 4326)
|
||||
);
|
||||
```
|
||||
|
||||
### 2. `Geography`
|
||||
Represents geographic shapes in a spherical coordinate system (uses latitudes and longitudes).
|
||||
|
||||
```sql
|
||||
CREATE TABLE my_geo_table (
|
||||
id SERIAL PRIMARY KEY,
|
||||
geom GEOGRAPHY(POINT, 4326)
|
||||
);
|
||||
```
|
||||
|
||||
PostGIS also supports other geometry types, such as:
|
||||
|
||||
- `POINT`
|
||||
- `LINESTRING`
|
||||
- `POLYGON`
|
||||
- `MULTIPOINT`
|
||||
- `MULTILINESTRING`
|
||||
- `MULTIPOLYGON`
|
||||
|
||||
Each of these types can be used in both `GEOMETRY` and `GEOGRAPHY` contexts.
|
||||
|
||||
## Spatial Functions
|
||||
PostGIS provides a vast library of spatial functions for querying and manipulating spatial data. Some important functions include:
|
||||
|
||||
### Distance
|
||||
Calculates the distance between two geometries.
|
||||
|
||||
```sql
|
||||
SELECT ST_Distance(
|
||||
ST_GeomFromText('POINT(0 0)', 4326),
|
||||
ST_GeomFromText('POINT(1 1)', 4326)
|
||||
);
|
||||
```
|
||||
|
||||
### Intersection
|
||||
Returns the intersection of two geometries.
|
||||
|
||||
```sql
|
||||
SELECT ST_Intersection(
|
||||
ST_GeomFromText('LINESTRING(0 0, 2 2)', 4326),
|
||||
ST_GeomFromText('LINESTRING(0 2, 2 0)', 4326)
|
||||
);
|
||||
```
|
||||
|
||||
### Contains
|
||||
Checks if one geometry contains another.
|
||||
|
||||
```sql
|
||||
SELECT ST_Contains(
|
||||
ST_GeomFromText('POLYGON((0 0, 0 2, 2 2, 2 0, 0 0))', 4326),
|
||||
ST_GeomFromText('POINT(1 1)', 4326)
|
||||
);
|
||||
```
|
||||
|
||||
### Area
|
||||
Calculates the area of a polygon.
|
||||
|
||||
```sql
|
||||
SELECT ST_Area(
|
||||
ST_GeomFromText('POLYGON((0 0, 0 2, 2 2, 2 0, 0 0))', 4326)
|
||||
);
|
||||
```
|
286
technology/applications/development/Postgres.md
Normal file
286
technology/applications/development/Postgres.md
Normal file
|
@ -0,0 +1,286 @@
|
|||
---
|
||||
obj: application
|
||||
website: https://www.postgresql.org
|
||||
repo: https://git.postgresql.org/gitweb/?p=postgresql.git
|
||||
---
|
||||
|
||||
# Postgres
|
||||
PostgreSQL is an advanced, open-source, object-relational database management system. It is renowned for its scalability, reliability, and compliance with the SQL standard. PostgreSQL supports both SQL (relational) and JSON (non-relational) querying, making it highly versatile.
|
||||
|
||||
## Extensions
|
||||
PostgreSQL can be extended via extensions:
|
||||
- [TimescaleDB](./TimescaleDB.md) - Time-series data
|
||||
- [pgVector](./pgvector.md) - Vector database functions
|
||||
- [PostGIS](./PostGIS.md) - Spatial data
|
||||
|
||||
## psql
|
||||
**psql** is a terminal-based front end to PostgreSQL. It allows users to interact with PostgreSQL databases by executing SQL queries, managing database objects, and performing administrative tasks.
|
||||
|
||||
To start psql, open your terminal or command prompt and type:
|
||||
|
||||
```bash
|
||||
psql
|
||||
```
|
||||
|
||||
### Connecting to a Database
|
||||
You can specify the database name, user, host, and port when launching psql:
|
||||
|
||||
```bash
|
||||
psql -d database_name -U username -h hostname -p port
|
||||
```
|
||||
|
||||
Alternatively, you can use environment variables:
|
||||
|
||||
```bash
|
||||
export PGDATABASE=mydb
|
||||
export PGUSER=myuser
|
||||
export PGPASSWORD=mypassword
|
||||
export PGHOST=localhost
|
||||
export PGPORT=5432
|
||||
|
||||
psql
|
||||
```
|
||||
|
||||
### Listing Databases and Tables
|
||||
|
||||
- **List Databases:**
|
||||
|
||||
```sql
|
||||
\l
|
||||
```
|
||||
|
||||
- **List Tables in the Current Database:**
|
||||
|
||||
```sql
|
||||
\dt
|
||||
```
|
||||
|
||||
- **List All Schemas:**
|
||||
|
||||
```sql
|
||||
\dn
|
||||
```
|
||||
|
||||
### Creating and Dropping Databases/Tables
|
||||
|
||||
- **Create a Database:**
|
||||
|
||||
```sql
|
||||
CREATE DATABASE mydb;
|
||||
```
|
||||
|
||||
- **Drop a Database:**
|
||||
|
||||
```sql
|
||||
DROP DATABASE mydb;
|
||||
```
|
||||
|
||||
- **Create a Table:**
|
||||
|
||||
```sql
|
||||
CREATE TABLE users (
|
||||
id SERIAL PRIMARY KEY,
|
||||
username VARCHAR(50) NOT NULL,
|
||||
email VARCHAR(100) NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
- **Drop a Table:**
|
||||
|
||||
```sql
|
||||
DROP TABLE users;
|
||||
```
|
||||
|
||||
### Running SQL Queries
|
||||
Execute standard SQL commands to interact with your data.
|
||||
|
||||
**Example: Inserting Data**
|
||||
|
||||
```sql
|
||||
INSERT INTO users (username, email) VALUES ('john_doe', 'john@example.com');
|
||||
```
|
||||
|
||||
**Example: Querying Data**
|
||||
|
||||
```sql
|
||||
SELECT * FROM users;
|
||||
```
|
||||
|
||||
**Example: Updating Data**
|
||||
|
||||
```sql
|
||||
UPDATE users SET email = 'john.doe@example.com' WHERE username = 'john_doe';
|
||||
```
|
||||
|
||||
**Example: Deleting Data**
|
||||
|
||||
```sql
|
||||
DELETE FROM users WHERE username = 'john_doe';
|
||||
```
|
||||
|
||||
## Meta-Commands
|
||||
psql provides a set of meta-commands (prefixed with `\`) that facilitate various tasks.
|
||||
|
||||
### Common Meta-Commands
|
||||
|
||||
- **Help on Meta-Commands:**
|
||||
|
||||
```sql
|
||||
\?
|
||||
```
|
||||
|
||||
- **Help on SQL Commands:**
|
||||
|
||||
```sql
|
||||
\h
|
||||
```
|
||||
|
||||
- **Describe a Table:**
|
||||
|
||||
```sql
|
||||
\d table_name
|
||||
```
|
||||
|
||||
- **List All Tables, Views, and Sequences:**
|
||||
|
||||
```sql
|
||||
\dt
|
||||
```
|
||||
|
||||
- **List All Indexes:**
|
||||
|
||||
```sql
|
||||
\di
|
||||
```
|
||||
|
||||
- **Exit psql:**
|
||||
|
||||
```sql
|
||||
\q
|
||||
```
|
||||
|
||||
## Data Types
|
||||
### 1. **Numeric Types**
|
||||
- **Small Integer Types**
|
||||
- `SMALLINT` (2 bytes): Range from -32,768 to +32,767
|
||||
- **Integer Types**
|
||||
- `INTEGER` or `INT` (4 bytes): Range from -2,147,483,648 to +2,147,483,647
|
||||
- **Big Integer Types**
|
||||
- `BIGINT` (8 bytes): Range from -9,223,372,036,854,775,808 to +9,223,372,036,854,775,807
|
||||
- **Decimal/Exact Types**
|
||||
- `DECIMAL` or `NUMERIC` (variable size): User-defined precision and scale
|
||||
- **Floating-Point Types**
|
||||
- `REAL` (4 bytes): Single precision floating-point number
|
||||
- `DOUBLE PRECISION` (8 bytes): Double precision floating-point number
|
||||
- **Serial Types (Auto-Incrementing)**
|
||||
- `SERIAL` (4 bytes): Auto-incrementing integer (small range)
|
||||
- `BIGSERIAL` (8 bytes): Auto-incrementing integer (large range)
|
||||
- `SMALLSERIAL` (2 bytes): Auto-incrementing integer (smaller range)
|
||||
|
||||
### 2. **Monetary Type**
|
||||
- `MONEY`: Stores currency amounts with a fixed fractional precision
|
||||
|
||||
### 3. **Character Types**
|
||||
- **Fixed-Length Strings**
|
||||
- `CHAR(n)` or `CHARACTER(n)`: Fixed length (padded with spaces)
|
||||
- **Variable-Length Strings**
|
||||
- `VARCHAR(n)` or `CHARACTER VARYING(n)`: Variable length with a limit
|
||||
- **Text**
|
||||
- `TEXT`: Variable length with no specific limit
|
||||
|
||||
### 4. **Binary Data Types**
|
||||
- **Binary Large Object**
|
||||
- `BYTEA`: Stores binary strings (byte arrays)
|
||||
|
||||
### 5. **Date/Time Types**
|
||||
- **Date and Time**
|
||||
- `DATE`: Calendar date (year, month, day)
|
||||
- `TIME` (no time zone): Time of day (without time zone)
|
||||
- `TIMETZ` (with time zone): Time of day (with time zone)
|
||||
- `TIMESTAMP` (no time zone): Date and time without time zone
|
||||
- `TIMESTAMPTZ` (with time zone): Date and time with time zone
|
||||
- **Intervals**
|
||||
- `INTERVAL`: Time span (e.g., days, months, hours)
|
||||
|
||||
### 6. **Boolean Type**
|
||||
- `BOOLEAN`: Stores `TRUE`, `FALSE`, or `NULL`
|
||||
|
||||
### 7. **UUID Type**
|
||||
- `UUID`: Stores Universally Unique Identifiers (128-bit values)
|
||||
|
||||
### 8. **Enumerated Types**
|
||||
- `ENUM`: User-defined enumerated type (a static set of values)
|
||||
|
||||
### 9. **Geometric Types**
|
||||
- `POINT`: Stores a geometric point (x, y)
|
||||
- `LINE`: Infinite line
|
||||
- `LSEG`: Line segment
|
||||
- `BOX`: Rectangular box
|
||||
- `PATH`: Geometric path (multiple points)
|
||||
- `POLYGON`: Closed geometric figure
|
||||
- `CIRCLE`: Circle
|
||||
|
||||
### 10. **Network Address Types**
|
||||
- `CIDR`: IPv4 or IPv6 network block
|
||||
- `INET`: IPv4 or IPv6 address
|
||||
- `MACADDR`: MAC address
|
||||
- `MACADDR8`: MAC address (EUI-64 format)
|
||||
|
||||
### 11. **Bit String Types**
|
||||
- **Fixed-Length Bit Strings**
|
||||
- `BIT(n)`: Fixed-length bit string
|
||||
- **Variable-Length Bit Strings**
|
||||
- `BIT VARYING(n)`: Variable-length bit string
|
||||
|
||||
### 12. **Text Search Types**
|
||||
- `TSVECTOR`: Text search document
|
||||
- `TSQUERY`: Text search query
|
||||
|
||||
### 13. **JSON Types**
|
||||
- `JSON`: Textual JSON data
|
||||
- `JSONB`: Binary JSON data (more efficient for indexing)
|
||||
|
||||
### 14. **Array Types**
|
||||
- `ARRAY`: Allows any data type to be stored as an array (e.g., `INTEGER[]`, `TEXT[]`)
|
||||
|
||||
### 15. **Range Types**
|
||||
- `INT4RANGE`: Range of `INTEGER`
|
||||
- `INT8RANGE`: Range of `BIGINT`
|
||||
- `NUMRANGE`: Range of `NUMERIC`
|
||||
- `TSRANGE`: Range of `TIMESTAMP WITHOUT TIME ZONE`
|
||||
- `TSTZRANGE`: Range of `TIMESTAMP WITH TIME ZONE`
|
||||
- `DATERANGE`: Range of `DATE`
|
||||
|
||||
### 16. **Composite Types**
|
||||
- User-defined types that consist of multiple fields of various types
|
||||
|
||||
### 17. **Object Identifier Types (OID)**
|
||||
- `OID`: Object identifier (used internally by PostgreSQL)
|
||||
- `REGCLASS`, `REGPROC`, `REGTYPE`: Special types for referencing classes, procedures, and types by OID or name
|
||||
|
||||
### 18. **Pseudo-Types**
|
||||
- `ANY`: Accepts any data type
|
||||
- `ANYARRAY`: Accepts any array data type
|
||||
- `ANYELEMENT`: Represents any type of element
|
||||
- `ANYENUM`: Accepts any `ENUM` type
|
||||
- `ANYNONARRAY`: Any non-array type
|
||||
- `VOID`: No data (used with functions that return no value)
|
||||
- `TRIGGER`: Used in triggers
|
||||
- `LANGUAGE_HANDLER`: Used internally for language support
|
||||
|
||||
## Docker-Compose
|
||||
```yml
|
||||
services:
|
||||
postgres:
|
||||
image: postgres:17
|
||||
container_name: postgres
|
||||
environment:
|
||||
POSTGRES_USER: myuser
|
||||
POSTGRES_PASSWORD: mypassword
|
||||
POSTGRES_DB: mydb
|
||||
ports:
|
||||
- "5432:5432"
|
||||
volumes:
|
||||
- ./postgres:/var/lib/postgresql/data
|
||||
restart: always
|
||||
```
|
121
technology/applications/development/TimescaleDB.md
Normal file
121
technology/applications/development/TimescaleDB.md
Normal file
|
@ -0,0 +1,121 @@
|
|||
---
|
||||
obj: application
|
||||
repo: https://github.com/timescale/timescaledb
|
||||
website: https://www.timescale.com
|
||||
rev: 2024-09-30
|
||||
---
|
||||
|
||||
# TimescaleDB
|
||||
TimescaleDB is an open-source time-series database built on [PostgreSQL](./Postgres.md), designed to handle large volumes of time-series data efficiently. It provides powerful data management features, making it suitable for applications in various domains such as IoT, finance, and analytics.
|
||||
|
||||
Features:
|
||||
- Hypertables: The backbone of TimescaleDB, hypertables, facilitate automatic data partitioning across time, streamlining the management of vast datasets.
|
||||
- Continuous Aggregates: This feature enables the pre-computation and storage of aggregate data, significantly speeding up query times for common analytical operations.
|
||||
- Data Compression: TimescaleDB employs sophisticated compression techniques to reduce storage footprint without compromising query performance.
|
||||
- Optimized Indexing: With its advanced indexing strategies, including multi-dimensional and time-based indexing, TimescaleDB ensures rapid query responses, making it highly efficient for time-series data.
|
||||
|
||||
## Installation
|
||||
**Create the extension in your database**:
|
||||
```sql
|
||||
CREATE EXTENSION IF NOT EXISTS timescaledb;
|
||||
```
|
||||
|
||||
## Hypertables
|
||||
Hypertables are PostgreSQL tables that automatically partition your data by time. You interact with hypertables in the same way as regular PostgreSQL tables, but with extra features that makes managing your time-series data much easier.
|
||||
|
||||
In Timescale, hypertables exist alongside regular PostgreSQL tables. Use hypertables to store time-series data. This gives you improved insert and query performance, and access to useful time-series features. Use regular PostgreSQL tables for other relational data.
|
||||
|
||||
With hypertables, Timescale makes it easy to improve insert and query performance by partitioning time-series data on its time parameter. Behind the scenes, the database performs the work of setting up and maintaining the hypertable's partitions. Meanwhile, you insert and query your data as if it all lives in a single, regular PostgreSQL table.
|
||||
|
||||
**Create a hypertable:**
|
||||
- Create a standard PostgreSQL table:
|
||||
```sql
|
||||
CREATE TABLE conditions (
|
||||
time TIMESTAMPTZ NOT NULL,
|
||||
location TEXT NOT NULL,
|
||||
device TEXT NOT NULL,
|
||||
temperature DOUBLE PRECISION NULL,
|
||||
humidity DOUBLE PRECISION NULL
|
||||
);
|
||||
```
|
||||
|
||||
- Convert the table to a hypertable. Specify the name of the table you want to convert, and the column that holds its time values.
|
||||
```sql
|
||||
SELECT create_hypertable('conditions', by_range('time'));
|
||||
```
|
||||
|
||||
## Hyperfunctions
|
||||
Hyprfunctions allow you to query and aggregate your time data.
|
||||
|
||||
### delta
|
||||
The `delta` function computes the change in a value over time. It helps in understanding how a metric (e.g., temperature, stock price, etc.) changes between readings.
|
||||
|
||||
Example: Calculate Temperature Change Over a Day
|
||||
```sql
|
||||
SELECT
|
||||
delta(temperature) AS temp_change
|
||||
FROM temperature_readings
|
||||
WHERE time BETWEEN '2023-09-01' AND '2023-09-02';
|
||||
```
|
||||
|
||||
### derivative
|
||||
The `derivative` function calculates the rate of change (derivative) of a series over time.
|
||||
|
||||
Example: Calculate the Rate of Temperature Change Per Hour
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
derivative(avg(temperature), '1 hour') AS temp_rate_change
|
||||
FROM temperature_readings
|
||||
GROUP BY time_bucket('1 hour', time);
|
||||
```
|
||||
|
||||
### first & last
|
||||
The `first` and `last` hyperfunctions return the first and last recorded values within a specified period.
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
time_bucket('1 day', time) AS day,
|
||||
first(stock_price, time) AS opening_price,
|
||||
last(stock_price, time) AS closing_price
|
||||
FROM stock_prices
|
||||
GROUP BY day
|
||||
ORDER BY day;
|
||||
```
|
||||
|
||||
### locf
|
||||
The `locf` (Last Observation Carried Forward) function fills missing data by carrying the last known observation forward to the missing timestamps.
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
time_bucket('1 hour', time) AS hour,
|
||||
locf(last(temperature, time)) AS filled_temperature
|
||||
FROM temperature_readings
|
||||
GROUP BY hour
|
||||
ORDER BY hour;
|
||||
```
|
||||
|
||||
### interpolated_avg
|
||||
The `interpolated_avg` hyperfunction computes the average of a series with values interpolated at regular time intervals.
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
time_bucket('1 hour', time) AS hour,
|
||||
interpolated_avg('linear', time, power_usage) AS interpolated_power
|
||||
FROM power_data
|
||||
WHERE time BETWEEN '2023-09-01' AND '2023-09-07'
|
||||
GROUP BY hour;
|
||||
```
|
||||
|
||||
### time_bucket
|
||||
The `time_bucket` hyperfunction is essential when you want to analyze or summarize data over time-based intervals, such as calculating daily averages, hourly sums, or other time-bound statistics.
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
time_bucket('1 hour', time) AS bucketed_time,
|
||||
avg(cpu_usage) AS avg_cpu_usage
|
||||
FROM server_metrics
|
||||
WHERE time BETWEEN '2023-09-01' AND '2023-09-30'
|
||||
GROUP BY bucketed_time
|
||||
ORDER BY bucketed_time;
|
||||
```
|
99
technology/applications/development/pgvector.md
Normal file
99
technology/applications/development/pgvector.md
Normal file
|
@ -0,0 +1,99 @@
|
|||
---
|
||||
obj: application
|
||||
repo: https://github.com/pgvector/pgvector
|
||||
rev: 2024-09-30
|
||||
---
|
||||
|
||||
# pgVector
|
||||
**pgvector** is a [PostgreSQL](./Postgres.md) extension designed to support vector similarity search. With the rise of machine learning models like those in natural language processing (NLP), computer vision, and recommendation systems, the need to efficiently store and query high-dimensional vectors (embeddings) has grown significantly. pgvector provides a solution by enabling PostgreSQL to handle these vector operations, making it possible to search for similar items using vector distance metrics directly in SQL.
|
||||
|
||||
## Installation
|
||||
1. Install pgvector using `git` and `make`:
|
||||
```bash
|
||||
git clone https://github.com/pgvector/pgvector.git
|
||||
cd pgvector
|
||||
make && make install
|
||||
```
|
||||
|
||||
2. Add the extension to your PostgreSQL database:
|
||||
```sql
|
||||
CREATE EXTENSION IF NOT EXISTS vector;
|
||||
```
|
||||
|
||||
## Data Types
|
||||
pgvector introduces a new data type called `vector`. It is used to store fixed-length vectors, and the size must be specified during table creation.
|
||||
|
||||
```sql
|
||||
CREATE TABLE items (
|
||||
id serial PRIMARY KEY,
|
||||
embedding vector(3) -- a 3-dimensional vector
|
||||
);
|
||||
```
|
||||
|
||||
## Functions and Operators
|
||||
pgvector provides several functions and operators for vector similarity and distance calculation.
|
||||
|
||||
### Distance Metrics
|
||||
|
||||
- **Euclidean Distance** (`<->`): Measures the straight-line distance between two vectors.
|
||||
|
||||
```sql
|
||||
SELECT * FROM items ORDER BY embedding <-> '[1, 0, 0]' LIMIT 5;
|
||||
```
|
||||
|
||||
- **Cosine Similarity** (`<=>`): Measures the cosine of the angle between two vectors.
|
||||
|
||||
```sql
|
||||
SELECT * FROM items ORDER BY embedding <=> '[1, 0, 0]' LIMIT 5;
|
||||
```
|
||||
|
||||
- **Inner Product** (`<#>`): Measures the dot product between two vectors.
|
||||
|
||||
```sql
|
||||
SELECT * FROM items ORDER BY embedding <#> '[1, 0, 0]' LIMIT 5;
|
||||
```
|
||||
|
||||
### Basic Operations
|
||||
|
||||
- **Set a Vector Value**:
|
||||
|
||||
```sql
|
||||
INSERT INTO items (embedding) VALUES ('[1, 0, 0]');
|
||||
```
|
||||
|
||||
- **Retrieve All Vectors**:
|
||||
|
||||
```sql
|
||||
SELECT * FROM items;
|
||||
```
|
||||
|
||||
## Indexing
|
||||
To enhance performance for similarity search, pgvector supports indexing. The recommended index types depend on the distance metric you plan to use:
|
||||
|
||||
- **Euclidean Distance** (L2):
|
||||
|
||||
```sql
|
||||
CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);
|
||||
```
|
||||
|
||||
- **Cosine Similarity**:
|
||||
|
||||
```sql
|
||||
CREATE INDEX ON items USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
|
||||
```
|
||||
|
||||
- **Inner Product**:
|
||||
|
||||
```sql
|
||||
CREATE INDEX ON items USING ivfflat (embedding vector_ip_ops) WITH (lists = 100);
|
||||
```
|
||||
|
||||
### Index Parameters
|
||||
- **Lists**: Defines the number of centroids to use in the IVF (Inverted File) index. Higher values of `lists` improve recall but may increase query time.
|
||||
|
||||
## Use Cases
|
||||
|
||||
1. **Recommendation Systems**: Store user and item embeddings and use similarity search to recommend items based on user preferences.
|
||||
2. **Search Engines**: Search for semantically similar documents or images using vector embeddings.
|
||||
3. **NLP Applications**: Store word, sentence, or document embeddings to perform similarity search or clustering of textual data.
|
||||
4. **Image Recognition**: Query for similar images based on embeddings generated by deep learning models.
|
61
technology/linux/filesystems/Ceph.md
Normal file
61
technology/linux/filesystems/Ceph.md
Normal file
|
@ -0,0 +1,61 @@
|
|||
---
|
||||
obj: filesystem
|
||||
---
|
||||
|
||||
# Ceph
|
||||
#wip
|
||||
|
||||
Ceph is a distributed storage system providing Object, Block and Filesystem Storage.
|
||||
|
||||
## Concepts
|
||||
- Monitors: A Ceph Monitor (`ceph-mon`) maintains maps of the cluster state, including the monitor map, manager map, the OSD map, the MDS map, and the CRUSH map. These maps are critical cluster state required for Ceph daemons to coordinate with each other. Monitors are also responsible for managing authentication between daemons and clients. At least three monitors are normally required for redundancy and high availability.
|
||||
- Managers: A Ceph Manager daemon (`ceph-mgr`) is responsible for keeping track of runtime metrics and the current state of the Ceph cluster, including storage utilization, current performance metrics, and system load. The Ceph Manager daemons also host python-based modules to manage and expose Ceph cluster information, including a web-based Ceph Dashboard and REST API. At least two managers are normally required for high availability.
|
||||
- Ceph OSDs: An Object Storage Daemon (Ceph OSD, `ceph-osd`) stores data, handles data replication, recovery, rebalancing, and provides some monitoring information to Ceph Monitors and Managers by checking other Ceph OSD Daemons for a heartbeat. At least three Ceph OSDs are normally required for redundancy and high availability.
|
||||
- MDSs: A Ceph Metadata Server (MDS, `ceph-mds`) stores metadata for the Ceph File System. Ceph Metadata Servers allow CephFS users to run basic commands (like ls, find, etc.) without placing a burden on the Ceph Storage Cluster.
|
||||
|
||||
Ceph stores data as objects within logical storage pools. Using the CRUSH algorithm, Ceph calculates which placement group (PG) should contain the object, and which OSD should store the placement group. The CRUSH algorithm enables the Ceph Storage Cluster to scale, rebalance, and recover dynamically.
|
||||
|
||||
## Setup
|
||||
Cephadm creates a new Ceph cluster by bootstrapping a single host, expanding the cluster to encompass any additional hosts, and then deploying the needed services.
|
||||
|
||||
Run the ceph bootstrap command with the IP of the first cluster host:
|
||||
```
|
||||
cephadm bootstrap --mon-ip <mon-ip>
|
||||
```
|
||||
|
||||
This command will:
|
||||
- Create a Monitor and a Manager daemon for the new cluster on the local host.
|
||||
- Generate a new SSH key for the Ceph cluster and add it to the root user’s `/root/.ssh/authorized_keys` file.
|
||||
- Write a copy of the public key to `/etc/ceph/ceph.pub`.
|
||||
- Write a minimal configuration file to `/etc/ceph/ceph.conf`. This file is needed to communicate with Ceph daemons.
|
||||
- Write a copy of the `client.admin` administrative (privileged!) secret key to `/etc/ceph/ceph.client.admin.keyring`.
|
||||
- Add the `_admin` label to the bootstrap host. By default, any host with this label will (also) get a copy of `/etc/ceph/ceph.conf` and `/etc/ceph/ceph.client.admin.keyring`.
|
||||
|
||||
### Ceph CLI
|
||||
The `cephadm shell` command launches a bash shell in a container with all of the Ceph packages installed. By default, if configuration and keyring files are found in `/etc/ceph` on the host, they are passed into the container environment so that the shell is fully functional. Note that when executed on a MON host, cephadm shell will infer the config from the MON container instead of using the default configuration. If `--mount <path>` is given, then the host `<path>` (file or directory) will appear under `/mnt` inside the container:
|
||||
|
||||
```shell
|
||||
cephadm shell
|
||||
```
|
||||
|
||||
To execute ceph commands, you can also run commands like this:
|
||||
```shell
|
||||
cephadm shell -- ceph -s
|
||||
```
|
||||
|
||||
You can install the ceph-common package, which contains all of the ceph commands, including ceph, rbd, mount.ceph (for mounting CephFS file systems), etc.:
|
||||
```shell
|
||||
cephadm add-repo --release reef
|
||||
cephadm install ceph-common
|
||||
```
|
||||
|
||||
Confirm that the ceph command is accessible with:
|
||||
```shell
|
||||
ceph -v
|
||||
ceph status
|
||||
```
|
||||
|
||||
## Host Management
|
||||
#todo -> https://docs.ceph.com/en/latest/cephadm/host-management/
|
||||
|
||||
|
Loading…
Reference in a new issue