|
8 | 8 | - [Postgres is eating the database world](https://medium.com/@fengruohang/postgres-is-eating-the-database-world-157c204dcfc4) |
9 | 9 | - Packages |
10 | 10 | - [{]{style="color: #990000"}[RPostgres](https://rpostgres.r-dbi.org/){style="color: #990000"}[}]{style="color: #990000"} - DBI-compliant interface to the postgres database |
11 | | - - [{{]{style="color: goldenrod"}[psycopg](https://www.psycopg.org/psycopg3/docs/index.html){style="color: goldenrod"}[}}]{style="color: goldenrod"} - PostgreSQL database adapter |
| 11 | + - [{]{style="color: goldenrod"}[psycopg](https://www.psycopg.org/psycopg3/docs/index.html){style="color: goldenrod"}[}]{style="color: goldenrod"} - PostgreSQL database adapter |
12 | 12 | - Resources |
13 | 13 | - [Docs](https://docs.jade.fyi/postgres/postgres.html) - All on one page so you can just [ctrl + f]{.arg-text} |
14 | 14 | - [Exploring Enterprise Databases with R: A Tidyverse Approach](https://smithjd.github.io/sql-pet/) |
|
39 | 39 |
|
40 | 40 | - [Apache AGE]{.underline} |
41 | 41 | - [Website](https://age.apache.org/), [Docs](https://age.apache.org/age-manual/master/index.html) |
42 | | - - The goal of the project is to create single storage that can handle both relational and graph model data so that users can use standard ANSI SQL along with openCypher, the Graph query language. |
| 42 | + - The goal of the project is to create single storage that can handle both **relational and graph model data** so that users can use standard ANSI SQL along with openCypher, the Graph query language. |
43 | 43 | - Users can read and write graph data in nodes and edges. They can also use various algorithms such as variable length and edge traversal when analyzing data. |
44 | 44 | - [pgai]{.underline} |
45 | 45 | - [Repo](https://github.com/timescale/pgai/?ref=timescale.com), [Intro](https://www.timescale.com/blog/how-we-made-postgresql-as-fast-as-pinecone-for-vector-data/) |
46 | | - - Simplifies the process of building search, and Retrieval Augmented Generation(RAG) AI applications with PostgreSQL. |
| 46 | + - Simplifies the process of **building search, and Retrieval Augmented Generation (RAG) AI applications** with PostgreSQL. |
47 | 47 | - Features |
48 | 48 | - Create embeddings for your data. |
49 | 49 | - Retrieve LLM chat completions from models like OpenAI GPT4o. |
50 | 50 | - Reason over your data and facilitate use cases like classification, summarization, and data enrichment on your existing relational data in PostgreSQL. |
51 | 51 | - [pg_analytics]{.underline} |
52 | 52 | - [Intro](https://blog.paradedb.com/pages/introducing_analytics), [Repo](https://github.com/paradedb/paradedb/tree/dev/pg_analytics) |
53 | | - - Arrow and Datafusion integrated with Postgres |
54 | | - - Delta Lake tables behave like regular Postgres tables but use a column-oriented layout via Apache Arrow and utilize Apache DataFusion, a query engine optimized for column-oriented data |
| 53 | + - **Arrow and Datafusion** integrated with Postgres |
| 54 | + - **Delta Lake tables** behave like regular Postgres tables but use a column-oriented layout via Apache Arrow and utilize Apache DataFusion, a query engine optimized for column-oriented data |
55 | 55 | - Data is persisted to disk with Parquet |
56 | 56 | - The delta-rs library is a Rust-based implementation of Delta Lake. This library adds ACID transactions, updates and deletes, and file compaction to Parquet storage. It also supports querying over data lakes like S3, which introduces the future possibility connecting Postgres tables to cloud data lakes. |
57 | 57 | - [pg_bm25]{.underline} |
58 | 58 | - [Intro](https://blog.paradedb.com/pages/introducing_bm25), [Repo](https://github.com/paradedb/paradedb/tree/dev/pg_bm25#overview) |
59 | | - - Rust-based extension that significantly improves Postgres’ full text search capabilities |
| 59 | + - Rust-based extension that significantly improves Postgres’ **full text search** capabilities |
60 | 60 | - Built to be an Elasticsearch inside of a postgres db |
61 | 61 | - Performant on large tables, adds support for operations like fuzzy search, relevance tuning, or BM25 relevance scoring (same algo as Elasticsearch), real-time search — new data is immediately searchable without manual reindexing |
62 | 62 | - Query times over 1M rows are 20x faster compared to tsquery and ts_ran (built-in search and sort) |
63 | 63 | - Can be combined with PGVector for semantic fuzzy search |
64 | 64 | - [Citus]{.underline} |
65 | 65 | - [Website](https://www.citusdata.com/) |
66 | 66 | - Distributed Postgres |
67 | | - - Transforms a standalone cluster into a horizontally partitioned distributed database cluster. |
| 67 | + - Transforms a standalone cluster into a horizontally partitioned **distributed database cluster**. |
68 | 68 | - Scales Postgres by distributing data & queries. You can start with a single Citus node, then add nodes & rebalance shards when you need to grow. |
69 | 69 | - Can combine with PostGIS for a distributed geospatial database, PGVector for a distributed vector database, pg_bm25 for a distributed full-text search database, etc. |
70 | 70 | - [yugabytedb](https://www.yugabyte.com/) is also an option for distributed postgres |
71 | 71 | - [pg_duckdb]{.underline} |
72 | 72 | - [Repo](https://github.com/duckdb/pg_duckdb), [Intro](https://motherduck.com/blog/pg_duckdb-postgresql-extension-for-duckdb-motherduck/) |
73 | | - - Official Postgres extension for DuckDB |
| 73 | + - **Official Postgres extension for DuckDB** |
74 | 74 | - Developed in collaboration with our partners, Hydra and MotherDuck |
75 | 75 | - Embeds DuckDB's columnar-vectorized analytics engine and features into Postgres |
76 | 76 | - `SELECT` queries executed by the DuckDB engine can directly read Postgres tables |
|
87 | 87 | - Sync and Tranformation |
88 | 88 | - Leverages PostgreSQL's logical replication system to capture and stream data changes. It uses NATS as a message broker to decouple reading from the WAL through the replicator and worker processes, providing flexibility and scalability. Transformations and filtrations are applied before the data reaches the destination. |
89 | 89 | - Use Cases |
90 | | - - Continuously sync production data to staging, leveraging powerful transformation rules to maintain data privacy and security practices. |
91 | | - - Sync and transform data to separate databases for archiving, auditing and analytics purposes. |
| 90 | + - Continuously **sync production data to staging**, leveraging powerful transformation rules to maintain data privacy and security practices. |
| 91 | + - **Sync and transform data to separate databases** for archiving, auditing and analytics purposes. |
92 | 92 | - [PGLite]{.underline} |
93 | 93 | - [Website](https://pglite.dev/) |
94 | | - - Embeddable Postgres (e.g. for things like apps) |
| 94 | + - **Embeddable** Postgres (e.g. for things like apps) |
95 | 95 | - Run a full Postgres database locally in WASM with reactivity and live sync. |
96 | 96 | - [pg_mooncake]{.underline} |
97 | 97 | - [Repo](https://github.com/Mooncake-Labs/pg_mooncake), [Site](https://pgmooncake.com/) |
98 | 98 | - Adds native columnstore tables with DuckDB execution for 1000x faster analytics. |
99 | | - - Columnstore tables are stored as Iceberg or Delta Lake tables (parquet files + metadata) in object storage. Differes from pg_duckdb, because these tables support **transactional** and batch inserts, updates, and deletes, as well as joins with regular PostgreSQL tables. |
| 99 | + - Columnstore tables are stored as Iceberg or Delta Lake tables (parquet files + metadata) in object storage. **Differs from pg_duckdb, because these tables support transactional and batch inserts, updates, and deletes, as well as joins with regular PostgreSQL tables.** |
100 | 100 | - Available on [Neon Postgres](https://neon.tech/home). |
101 | 101 | - [pg_parquet]{.underline} |
102 | 102 | - [Repo](https://github.com/CrunchyData/pg_parquet/), [Intro](https://www.crunchydata.com/blog/pg_parquet-an-extension-to-connect-postgres-and-parquet) |
103 | 103 | - Sources: Locally or S3 |
104 | 104 | - Dependencies: Apache Arrow and pgrx extension |
105 | 105 | - Features |
106 | | - - Export Postgres tables/queries to Parquet files, |
107 | | - - Ingest data from Parquet files to Postgres tables, |
108 | | - - Inspect the schema and metadata of Parquet files. |
| 106 | + - **Export** Postgres tables/queries to **Parquet** files, |
| 107 | + - **Ingest** data from Parquet files to Postgres tables, |
| 108 | + - **Inspect** the schema and metadata of Parquet files. |
109 | 109 | - [plprql]{.underline} |
110 | 110 | - [Repo](https://github.com/kaspermarstal/plprql) |
111 | | - - Enables you to run PRQL queries. PRQL has a syntax that is similar to [{dplyr}]{style="color: #990000"} |
| 111 | + - Enables you to run **PRQL** queries. PRQL has a syntax that is similar to [{dplyr}]{style="color: #990000"} |
112 | 112 | - Built in Rust so you have to have [pgrx]{.underline} installed. Repo has directions. |
113 | 113 | - [pgroll]{.underline} |
114 | 114 | - [Repo](https://github.com/xataio/pgroll) |
115 | | - - An open-source schema migration tool for Postgres, built to enable zero downtime, reversible schema migrations using the expand/contract pattern |
| 115 | + - An open-source **schema migration tool** for Postgres, built to enable zero downtime, reversible schema migrations using the expand/contract pattern |
116 | 116 | - Creates virtual schemas based on PostgreSQL views on top of the physical tables. This allows you to make changes to your database without impacting the application. |
117 | 117 | - [pgrx]{.underline} |
118 | 118 | - [Repo](https://github.com/pgcentralfoundation/pgrx) |
119 | 119 | - Framework for developing PostgreSQL extensions in Rust |
120 | | - - To install extensions built in Rust, you need to have this extension installed |
| 120 | + - To **install extensions built in Rust,** you need to have this extension installed |
121 | 121 | - [pg_sparse]{.underline} |
122 | 122 | - [Intro](https://blog.paradedb.com/pages/introducing_sparse), [Repo](https://github.com/paradedb/paradedb/tree/dev/pg_sparse#overview) |
123 | | - - Enables efficient storage and retrieval of *sparse* vectors using HNSW |
| 123 | + - Enables efficient **storage and retrieval of *sparse* vectors** using HNSW |
124 | 124 | - SPLADE outputs sparse vectors with over 30,000 entries. Sparse vectors can detect the presence of exact keywords while also capturing semantic similarity between terms. |
125 | 125 | - Fork of pgvector with modifications |
126 | 126 | - Compatible alongside both pg_bm25 and pgvector |
127 | 127 | - [pgstream]{.underline} |
128 | 128 | - [Intro](https://xata.io/blog/postgres-webhooks-with-pgstream), [Site](https://xata.io/pgstream), [Repo](https://github.com/xataio/pgstream) |
129 | | - - CDC (Change-Data-Capture) CLI tool that calls webhooks whenever there is a data (or schema) change |
| 129 | + - CDC (Change-Data-Capture) CLI tool that **calls webhooks whenever there is a data (or schema) change** |
130 | 130 | - Whenever a row is inserted, updated, or deleted, or a table is created, altered, truncated or deleted, a webhook is notified of the relevant event detail |
131 | 131 | - [pg_timeseries]{.underline} |
132 | 132 | - [Intro](https://tembo.io/blog/pg-timeseries), [Repo](https://github.com/tembo-io/pg_timeseries) |
133 | | - - An alternative to [TimescaleDB](https://github.com/timescale/timescaledb). That license restricts use of features such as compression, incremental materialized views, and bottomless storage but that might because the company ([tembo](https://tembo.io/)) that open sourced this extension has their own stack, cloud, etc. |
| 133 | + - An **alternative to [TimescaleDB](https://github.com/timescale/timescaledb)**. That license restricts use of features such as compression, incremental materialized views, and bottomless storage but that might because the company ([tembo](https://tembo.io/)) that open sourced this extension has their own stack, cloud, etc. |
134 | 134 | - Features such as [native partitioning](#0), variety of [indexes](#0), [materialized views](#0), and [window / analytics functions](#0) |
135 | 135 | - You can compress tables if the table data is older than a certain time period (e.g. 90 days) |
136 | 136 | - [pg_tracing]{.underline} |
137 | 137 | - [Repo](https://github.com/DataDog/pg_tracing) |
138 | | - - Generates server-side spans for distributed tracing |
| 138 | + - Generates server-side spans for **distributed tracing** |
139 | 139 | - [pgvector]{.underline} |
140 | 140 | - [Repo](https://github.com/pgvector/pgvector) |
141 | 141 | - Also see [Databases, Vector Databases](db-vector.qmd#sec-db-vect){style="color: green"} for alternatives and comparisons |
142 | | - - Enables efficient storage and retrieval of *dense* vectors using HNSW |
| 142 | + - Enables efficient **storage and retrieval of *dense* vectors** using HNSW |
143 | 143 | - OpenAI’s text-embedding-ada-002 model outputs dense vectors with 1536 entries |
144 | 144 | - Exact and Approximate Nearest Neighbor search |
145 | 145 | - L2 distance, Inner Product, and Cosine Distance |
146 | 146 | - Supported inside AWS RDS |
147 | 147 | - [pg_vectorize]{.underline} |
148 | 148 | - [Repo](https://github.com/tembo-io/pg_vectorize) |
149 | | - - Workflows for both vector search and RAG |
| 149 | + - Workflows for both **vector search and RAG** |
150 | 150 | - Integrations with OpenAI's [embeddings](https://platform.openai.com/docs/guides/embeddings) and [chat-completion](https://platform.openai.com/docs/guides/text-generation) endpoints and a self-hosted container for running [Hugging Face Sentence-Transformers](https://huggingface.co/sentence-transformers) |
151 | 151 | - Automated creation of Postgres triggers to keep your embeddings up to date |
152 | 152 | - High level API - one function to initialize embeddings transformations, and another function to search |
153 | 153 | - [pgvectorscale]{.underline} |
154 | 154 | - [Repo](https://github.com/timescale/pgvectorscale/), [Intro](https://www.timescale.com/blog/how-we-made-postgresql-as-fast-as-pinecone-for-vector-data/) |
155 | | - - A complement to pgvector for high performance, cost efficient vector search on large workloads. |
| 155 | + - A complement to pgvector for high performance, cost efficient **vector search on large workloads**. |
156 | 156 | - Features |
157 | 157 | - A new index type called StreamingDiskANN, inspired by the [DiskANN](https://github.com/microsoft/DiskANN) algorithm, based on research from Microsoft. |
158 | 158 | - Statistical Binary Quantization: developed by Timescale researchers, This compression method improves on standard Binary Quantization. |
|
0 commit comments