ercbk
diff --git a/‎qmd/db-duckdb.qmd‎
Lines changed: 11 additions & 7 deletions b/‎qmd/db-duckdb.qmd‎
Lines changed: 11 additions & 7 deletions
diff --git a/‎qmd/db-postgres.qmd‎
Lines changed: 25 additions & 25 deletions b/‎qmd/db-postgres.qmd‎
Lines changed: 25 additions & 25 deletions
diff --git a/‎qmd/r-snippets.qmd‎
Lines changed: 24 additions & 0 deletions b/‎qmd/r-snippets.qmd‎
Lines changed: 24 additions & 0 deletions
@@ -1146,10 +1146,10 @@
             ```
 -   [Dash]{.underline}
     -   [Repo](https://github.com/gropaul/dash), [Docs](https://www.dash.builders/docs/dashboards#dashboards)
-    -   A local first data exploration and visualization tool built on top of DuckDB. Use it in your browser or as a DuckDB Extension, analyze, and visualize your data with ease. (See also the web app in Misc \>\> Tools)
+    -   A local first data **exploration and visualization tool** built on top of DuckDB. Use it in your browser or as a DuckDB Extension, analyze, and visualize your data with ease. (See also the web app in Misc \>\> Tools)
 -   [dplyr]{.underline}
     -   [Repo](https://github.com/mrchypark/libdplyr), [Docs](https://duckdb.org/community_extensions/extensions/dplyr)
-    -   Enables R users to write database queries using familiar dplyr syntax and converts them to efficient SQL for execution.
+    -   Enables R users to **write database queries using familiar dplyr syntax** and converts them to efficient SQL for execution.
     -   Supports multiple SQL dialects (PostgreSQL, MySQL, SQLite, DuckDB) for use across various database environments.
 -   [JSON]{.underline}
     -   [Webpage](https://duckdb.org/docs/extensions/json.html)
@@ -1201,7 +1201,7 @@
 
             -   Might need to use `FORCE INSTALL postgres`
 
-        -   Allows DuckDB to connect to those systems and operate on them in the same way that it operates on its own native storage engine.
+        -   **Allows DuckDB to connect to those systems and operate on them** in the same way that it operates on its own native storage engine.
 
         -   Use Cases
 
@@ -1272,22 +1272,26 @@
             ```
 -   [GSheets]{.underline}
     -   [Repo](https://github.com/evidence-dev/duckdb_gsheets)
-    -   Extension for reading and writing Google Sheets with SQL
+    -   Extension for **reading and writing Google Sheets** with SQL
 -   [Infera]{.underline}
     -   [Repo](https://github.com/CogitatorTech/infera)
     -   Allows you to use machine learning (ML) models directly in SQL queries to perform inference on data stored in DuckDB tables.
-        -   These are pretrained models and this extension allows you to perform prediction within duckdb.
+        -   These are **pretrained models and this extension allows you to perform prediction within duckdb.**
     -   Developed in Rust and uses Tract as the backend inference engine.
     -   Supports loading and running models in Open Neural Network Exchange (ONNX) format. See [repo](https://github.com/onnx/models), [huggingface](https://huggingface.co/onnxmodelzoo)
         -   Currently seems to be mostly Computer Vision and Natural Language Processing (NLP)
         -   There's also a forecasting model and a couple recommender models
+-   [duckdb_mcp]{.underline}
+    -   [Webpage](https://duckdb.org/community_extensions/extensions/duckdb_mcp), [Intro](https://dailydrop.hrbrmstr.dev/2026/02/04/drop-767-2026-02-04-if-it-walks-like-a/)
+    -   Enables seamless **integration between SQL databases and MCP servers**
+    -   Provides both client capabilities for accessing remote MCP resources via SQL and server capabilities for exposing database content as MCP resources
 -   [mlpack]{.underline}
     -   [Repo](https://github.com/eddelbuettel/duckdb-mlpack), [Extension webpage](https://duckdb.org/community_extensions/extensions/mlpack)
-    -   Allows to you to fit (or train) and predict (or classify) from the models implemented
+    -   Allows to you to **fit (or train) and predict (or classify) from the models** implemented
     -   Currently just supports adaBoost and (regularized) linear regression
 -   [quackstore]{.underline}
     -   [Repo](https://github.com/coginiti-dev/QuackStore)
-    -   For caching frequently queried files locally
+    -   For **caching frequently queried files locally**
     -   When you query remote files (like CSV files from the web), DuckDB normally downloads them every time. With QuackStore, the first query downloads and caches the file locally. Subsequent queries use the cached version, making them much faster.
     -   Key Benefits
         -   Block-based caching: Only caches the parts of files you actually access (blocks)
 
@@ -8,7 +8,7 @@
     -   [Postgres is eating the database world](https://medium.com/@fengruohang/postgres-is-eating-the-database-world-157c204dcfc4)
 -   Packages
     -   [{]{style="color: #990000"}[RPostgres](https://rpostgres.r-dbi.org/){style="color: #990000"}[}]{style="color: #990000"} - DBI-compliant interface to the postgres database
-    -   [{{]{style="color: goldenrod"}[psycopg](https://www.psycopg.org/psycopg3/docs/index.html){style="color: goldenrod"}[}}]{style="color: goldenrod"} - PostgreSQL database adapter
+    -   [{]{style="color: goldenrod"}[psycopg](https://www.psycopg.org/psycopg3/docs/index.html){style="color: goldenrod"}[}]{style="color: goldenrod"} - PostgreSQL database adapter
 -   Resources
     -   [Docs](https://docs.jade.fyi/postgres/postgres.html) - All on one page so you can just [ctrl + f]{.arg-text}
     -   [Exploring Enterprise Databases with R: A Tidyverse Approach](https://smithjd.github.io/sql-pet/)
@@ -39,38 +39,38 @@
 
 -   [Apache AGE]{.underline}
     -   [Website](https://age.apache.org/), [Docs](https://age.apache.org/age-manual/master/index.html)
-    -   The goal of the project is to create single storage that can handle both relational and graph model data so that users can use standard ANSI SQL along with openCypher, the Graph query language.
+    -   The goal of the project is to create single storage that can handle both **relational and graph model data** so that users can use standard ANSI SQL along with openCypher, the Graph query language.
     -   Users can read and write graph data in nodes and edges. They can also use various algorithms such as variable length and edge traversal when analyzing data.
 -   [pgai]{.underline}
     -   [Repo](https://github.com/timescale/pgai/?ref=timescale.com), [Intro](https://www.timescale.com/blog/how-we-made-postgresql-as-fast-as-pinecone-for-vector-data/)
-    -   Simplifies the process of building search, and Retrieval Augmented Generation(RAG) AI applications with PostgreSQL.
+    -   Simplifies the process of **building search, and Retrieval Augmented Generation (RAG) AI applications** with PostgreSQL.
     -   Features
         -   Create embeddings for your data.
         -   Retrieve LLM chat completions from models like OpenAI GPT4o.
         -   Reason over your data and facilitate use cases like classification, summarization, and data enrichment on your existing relational data in PostgreSQL.
 -   [pg_analytics]{.underline}
     -   [Intro](https://blog.paradedb.com/pages/introducing_analytics), [Repo](https://github.com/paradedb/paradedb/tree/dev/pg_analytics)
-    -   Arrow and Datafusion integrated with Postgres
-    -   Delta Lake tables behave like regular Postgres tables but use a column-oriented layout via Apache Arrow and utilize Apache DataFusion, a query engine optimized for column-oriented data
+    -   **Arrow and Datafusion** integrated with Postgres
+    -   **Delta Lake tables** behave like regular Postgres tables but use a column-oriented layout via Apache Arrow and utilize Apache DataFusion, a query engine optimized for column-oriented data
     -   Data is persisted to disk with Parquet
     -   The delta-rs library is a Rust-based implementation of Delta Lake. This library adds ACID transactions, updates and deletes, and file compaction to Parquet storage. It also supports querying over data lakes like S3, which introduces the future possibility connecting Postgres tables to cloud data lakes.
 -   [pg_bm25]{.underline}
     -   [Intro](https://blog.paradedb.com/pages/introducing_bm25), [Repo](https://github.com/paradedb/paradedb/tree/dev/pg_bm25#overview)
-    -   Rust-based extension that significantly improves Postgres’ full text search capabilities
+    -   Rust-based extension that significantly improves Postgres’ **full text search** capabilities
         -   Built to be an Elasticsearch inside of a postgres db
     -   Performant on large tables, adds support for operations like fuzzy search, relevance tuning, or BM25 relevance scoring (same algo as Elasticsearch), real-time search — new data is immediately searchable without manual reindexing
         -   Query times over 1M rows are 20x faster compared to tsquery and ts_ran (built-in search and sort)
     -   Can be combined with PGVector for semantic fuzzy search
 -   [Citus]{.underline}
     -   [Website](https://www.citusdata.com/)
     -   Distributed Postgres
-    -   Transforms a standalone cluster into a horizontally partitioned distributed database cluster.
+    -   Transforms a standalone cluster into a horizontally partitioned **distributed database cluster**.
     -   Scales Postgres by distributing data & queries. You can start with a single Citus node, then add nodes & rebalance shards when you need to grow.
     -   Can combine with PostGIS for a distributed geospatial database, PGVector for a distributed vector database, pg_bm25 for a distributed full-text search database, etc.
     -   [yugabytedb](https://www.yugabyte.com/) is also an option for distributed postgres
 -   [pg_duckdb]{.underline}
     -   [Repo](https://github.com/duckdb/pg_duckdb), [Intro](https://motherduck.com/blog/pg_duckdb-postgresql-extension-for-duckdb-motherduck/)
-    -   Official Postgres extension for DuckDB
+    -   **Official Postgres extension for DuckDB**
     -   Developed in collaboration with our partners, Hydra and MotherDuck
     -   Embeds DuckDB's columnar-vectorized analytics engine and features into Postgres
     -   `SELECT` queries executed by the DuckDB engine can directly read Postgres tables
@@ -87,72 +87,72 @@
     -   Sync and Tranformation
     -   Leverages PostgreSQL's logical replication system to capture and stream data changes. It uses NATS as a message broker to decouple reading from the WAL through the replicator and worker processes, providing flexibility and scalability. Transformations and filtrations are applied before the data reaches the destination.
     -   Use Cases
-        -   Continuously sync production data to staging, leveraging powerful transformation rules to maintain data privacy and security practices.
-        -   Sync and transform data to separate databases for archiving, auditing and analytics purposes.
+        -   Continuously **sync production data to staging**, leveraging powerful transformation rules to maintain data privacy and security practices.
+        -   **Sync and transform data to separate databases** for archiving, auditing and analytics purposes.
 -   [PGLite]{.underline}
     -   [Website](https://pglite.dev/)
-    -   Embeddable Postgres (e.g. for things like apps)
+    -   **Embeddable** Postgres (e.g. for things like apps)
     -   Run a full Postgres database locally in WASM with reactivity and live sync.
 -   [pg_mooncake]{.underline}
     -   [Repo](https://github.com/Mooncake-Labs/pg_mooncake), [Site](https://pgmooncake.com/)
     -   Adds native columnstore tables with DuckDB execution for 1000x faster analytics.
-    -   Columnstore tables are stored as Iceberg or Delta Lake tables (parquet files + metadata) in object storage. Differes from pg_duckdb, because these tables support **transactional** and batch inserts, updates, and deletes, as well as joins with regular PostgreSQL tables.
+    -   Columnstore tables are stored as Iceberg or Delta Lake tables (parquet files + metadata) in object storage. **Differs from pg_duckdb, because these tables support transactional and batch inserts, updates, and deletes, as well as joins with regular PostgreSQL tables.**
     -   Available on [Neon Postgres](https://neon.tech/home).
 -   [pg_parquet]{.underline}
     -   [Repo](https://github.com/CrunchyData/pg_parquet/), [Intro](https://www.crunchydata.com/blog/pg_parquet-an-extension-to-connect-postgres-and-parquet)
     -   Sources: Locally or S3
     -   Dependencies: Apache Arrow and pgrx extension
     -   Features
-        -   Export Postgres tables/queries to Parquet files,
-        -   Ingest data from Parquet files to Postgres tables,
-        -   Inspect the schema and metadata of Parquet files.
+        -   **Export** Postgres tables/queries to **Parquet** files,
+        -   **Ingest** data from Parquet files to Postgres tables,
+        -   **Inspect** the schema and metadata of Parquet files.
 -   [plprql]{.underline}
     -   [Repo](https://github.com/kaspermarstal/plprql)
-    -   Enables you to run PRQL queries. PRQL has a syntax that is similar to [{dplyr}]{style="color: #990000"}
+    -   Enables you to run **PRQL** queries. PRQL has a syntax that is similar to [{dplyr}]{style="color: #990000"}
     -   Built in Rust so you have to have [pgrx]{.underline} installed. Repo has directions.
 -   [pgroll]{.underline}
     -   [Repo](https://github.com/xataio/pgroll)
-    -   An open-source schema migration tool for Postgres, built to enable zero downtime, reversible schema migrations using the expand/contract pattern
+    -   An open-source **schema migration tool** for Postgres, built to enable zero downtime, reversible schema migrations using the expand/contract pattern
     -   Creates virtual schemas based on PostgreSQL views on top of the physical tables. This allows you to make changes to your database without impacting the application.
 -   [pgrx]{.underline}
     -   [Repo](https://github.com/pgcentralfoundation/pgrx)
     -   Framework for developing PostgreSQL extensions in Rust
-    -   To install extensions built in Rust, you need to have this extension installed
+    -   To **install extensions built in Rust,** you need to have this extension installed
 -   [pg_sparse]{.underline}
     -   [Intro](https://blog.paradedb.com/pages/introducing_sparse), [Repo](https://github.com/paradedb/paradedb/tree/dev/pg_sparse#overview)
-    -   Enables efficient storage and retrieval of *sparse* vectors using HNSW
+    -   Enables efficient **storage and retrieval of *sparse* vectors** using HNSW
         -   SPLADE outputs sparse vectors with over 30,000 entries. Sparse vectors can detect the presence of exact keywords while also capturing semantic similarity between terms.
     -   Fork of pgvector with modifications
     -   Compatible alongside both pg_bm25 and pgvector
 -   [pgstream]{.underline}
     -   [Intro](https://xata.io/blog/postgres-webhooks-with-pgstream), [Site](https://xata.io/pgstream), [Repo](https://github.com/xataio/pgstream)
-    -   CDC (Change-Data-Capture) CLI tool that calls webhooks whenever there is a data (or schema) change
+    -   CDC (Change-Data-Capture) CLI tool that **calls webhooks whenever there is a data (or schema) change**
     -   Whenever a row is inserted, updated, or deleted, or a table is created, altered, truncated or deleted, a webhook is notified of the relevant event detail
 -   [pg_timeseries]{.underline}
     -   [Intro](https://tembo.io/blog/pg-timeseries), [Repo](https://github.com/tembo-io/pg_timeseries)
-    -   An alternative to [TimescaleDB](https://github.com/timescale/timescaledb). That license restricts use of features such as compression, incremental materialized views, and bottomless storage but that might because the company ([tembo](https://tembo.io/)) that open sourced this extension has their own stack, cloud, etc.
+    -   An **alternative to [TimescaleDB](https://github.com/timescale/timescaledb)**. That license restricts use of features such as compression, incremental materialized views, and bottomless storage but that might because the company ([tembo](https://tembo.io/)) that open sourced this extension has their own stack, cloud, etc.
     -   Features such as [native partitioning](#0), variety of [indexes](#0), [materialized views](#0), and [window / analytics functions](#0)
     -   You can compress tables if the table data is older than a certain time period (e.g. 90 days)
 -   [pg_tracing]{.underline}
     -   [Repo](https://github.com/DataDog/pg_tracing)
-    -   Generates server-side spans for distributed tracing
+    -   Generates server-side spans for **distributed tracing**
 -   [pgvector]{.underline}
     -   [Repo](https://github.com/pgvector/pgvector)
     -   Also see [Databases, Vector Databases](db-vector.qmd#sec-db-vect){style="color: green"} for alternatives and comparisons
-    -   Enables efficient storage and retrieval of *dense* vectors using HNSW
+    -   Enables efficient **storage and retrieval of *dense* vectors** using HNSW
         -   OpenAI’s text-embedding-ada-002 model outputs dense vectors with 1536 entries
     -   Exact and Approximate Nearest Neighbor search
     -   L2 distance, Inner Product, and Cosine Distance
     -   Supported inside AWS RDS
 -   [pg_vectorize]{.underline}
     -   [Repo](https://github.com/tembo-io/pg_vectorize)
-    -   Workflows for both vector search and RAG
+    -   Workflows for both **vector search and RAG**
     -   Integrations with OpenAI's [embeddings](https://platform.openai.com/docs/guides/embeddings) and [chat-completion](https://platform.openai.com/docs/guides/text-generation) endpoints and a self-hosted container for running [Hugging Face Sentence-Transformers](https://huggingface.co/sentence-transformers)
     -   Automated creation of Postgres triggers to keep your embeddings up to date
     -   High level API - one function to initialize embeddings transformations, and another function to search
 -   [pgvectorscale]{.underline}
     -   [Repo](https://github.com/timescale/pgvectorscale/), [Intro](https://www.timescale.com/blog/how-we-made-postgresql-as-fast-as-pinecone-for-vector-data/)
-    -   A complement to pgvector for high performance, cost efficient vector search on large workloads.
+    -   A complement to pgvector for high performance, cost efficient **vector search on large workloads**.
     -   Features
         -   A new index type called StreamingDiskANN, inspired by the [DiskANN](https://github.com/microsoft/DiskANN) algorithm, based on research from Microsoft.
         -   Statistical Binary Quantization: developed by Timescale researchers, This compression method improves on standard Binary Quantization.
 
@@ -152,6 +152,30 @@
 
         -   tidyselect functions are used to select particular sets of variables
 
+    -   Using `dplyr::replace_values` ([source](https://tidyverse.org/blog/2026/02/dplyr-1-2-0/#replace_values))
+
+        ``` r
+        state <- c("NC", "NY", "CA", NA, "NY", "Unknown", NA)
+
+        # Replace missing values with a constant
+        replace_values(state, NA ~ "Unknown")
+        #> [1] "NC"      "NY"      "CA"      "Unknown" "NY"      "Unknown" "Unknown"
+
+        # Replace missing values with the corresponding value from another column
+        region <- c("South", "North", "West", "East", "North", "Unknown", "West")
+        replace_values(state, NA ~ region)
+        #> [1] "NC"      "NY"      "CA"      "East"    "NY"      "Unknown" "West"
+
+        # Replace problematic values with a missing value
+        replace_values(state, "Unknown" ~ NA)
+        #> [1] "NC" "NY" "CA" NA   "NY" NA   NA
+
+        # Standardize multiple issues at once
+        replace_values(state, c(NA, "Unknown") ~ "<missing>")
+        #> [1] "NC"        "NY"        "CA"        "<missing>" "NY"        "<missing>"
+        #> [7] "<missing>"
+        ```
+
 -   Find duplicate rows
 
     -   [{]{style="color: #990000"}[janitor::get_dupes](https://sfirke.github.io/janitor/reference/get_dupes.html){style="color: #990000"}[}]{style="color: #990000"}