-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathREADME.Rmd
More file actions
327 lines (229 loc) · 10 KB
/
README.Rmd
File metadata and controls
327 lines (229 loc) · 10 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# read.sql
<!-- <img src='man/figures/' align="right" height="131.5" /></a> -->
<!-- badges: start -->
<!-- badges: end -->
I believe SQL should be written in a seperate file that can be modified later easily instead of a string variable that we all are used to. It also helps you add a proper DB admin into the team who can modify queries without ever learning R. This package consists of few very simple functions that I have been using in my projects for very long and I see myself rewriting them again and again. This package doesn't have any dependencies that is not useful while talking to a DB. I would recommend to use it whenever you have SQL files in your project.
## Installation
It's not on cran yet. The development version can be installed from [GitHub](https://github.com/) with:
```r
# install.packages("devtools")
devtools::install_github("vikram-rawat/read_sql_files")
```
## Example
This is a basic example which shows you how to solve a common problem:
```{r example}
library(read.sql)
## basic example code
```
Always start with creating a config file for keeping all your basic environment variables in a file, specially db configurations. It looks something like this
default:
datawarehouse:
server: localhost
uid: postgres
pwd: postgres
port: 5432
database: chatbot
Then you can read it with config package. make sure you don't add a drv name in the YML file and also avoid using the password directly into Yml file. Use environment variable instead. read.sql have 1 function to create either a connection or a pool directly from this list.
```{r eval=FALSE}
dw <- config::get("datawarehouse")
conn <- read.sql::rs_create_conn(
driver = RPostgres::Postgres(),
param_list = dw
)
pool <- read.sql::rs_create_conn(
driver = RPostgres::Postgres(),
param_list = dw,
pool = TRUE
)
```
These functions come in handy when you want to re-establish a connection. So I preferred to use them in the file.
Then there are only 3 functions that remain
```{r connectdb}
conn <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
DBI::dbWriteTable(conn,"iris", iris)
```
Always write your SQL code in a separate file where you can edit the code for later use or you can give it somebody who can optimize the code. An external file helps you write and maintain large SQL code. Now imagine you have a code like this.
### get_sql_query
```{sql connection=conn}
select
*
from
iris
limit
5
```
In this case you can use the function like this
query <- read.sql::rs_read_query(filepath = "path/to/sql/file", method = "DBI::dbGetQuery")
Now suppose you have a query like this.
```{sql connection=conn}
select
*
from
iris
where
`Sepal.Length` > 5
and
`Petal.Length` < 1.7
```
what if you want to make this query reuse able and use multiple parameters. You could use SQL interpolations like this.
```{sql eval=FALSE}
select
*
from
iris
where
`Sepal.Length` > ?minsepal
and
`Petal.Length` < ?minpetal
```
and then you could use it in the function as this.
```{r eval=FALSE}
sql_query_object <- read.sql::rs_interpolate(
sql_query = sql_query_object, # object created from rs_read_query function
sql_conn = conn,
query_params = list(
minsepal = 5,
minpetal = 5
)
)
```
query_params is an optional argument. You would only need it when you want to interpolate a value in the SQL file you need.
### update meta data in SQL query
sometimes you will feel like you want to dynamically update the table name or column name in a query. This functionality is not available in `DBI::sqlInterpolate` function. Hence this has to be dealt with seperately. So there is an optional parameter called `meta_query_params` in the function `rs_interpolate` which will simply replace anything inside the `{}` curly brackets by matching it with named list. liket this
```{sql eval=FALSE}
select
*
from
{main_table}
where
{column1} >= ?mincol1
and
{column2} >= ?mincol2
and
{column3} in ({species_value})
and
`Petal.Width` in ({width_value})
```
to run a query like this we need to pass 2 values like this.
```{r eval=FALSE}
query_obj <- read.sql::rs_interpolate(
sql_query = sql_query_object, # object created from rs_read_query function
sql_conn = conn,
meta_query_params = list(
main_table = "iris",
column1 = "`Sepal.Length`",
column2 = "`Petal.Length`",
column3 = "`Species`",
species_value = c("Setosa", "versicolor"),
width_value = seq(1.0, 1.4, 0.1)
),
query_params = list(
mincol1 = 4,
mincol2 = 4
)
)
```
### get_sql_query (Inspection and Debugging)
While rs_execute runs the query, the rs_get_sql_query function is perhaps the most important for development and quality assurance.
After a query file has gone through the transformation pipeline (including interpolation, WHERE clause building, and meta-data replacement), it's essential to confirm the final SQL statement is correct before execution.
rs_get_sql_query provides a simple way to extract the final, fully prepared SQL string without actually connecting to or hitting the database.
```r
# After reading and interpolating the query object:
final_query_object <- read.sql::rs_interpolate(
sql_query = base_query,
sql_conn = conn,
# ... all parameters ...
)
# Use rs_get_sql_query to view the final SQL statement for debugging/logging
cat("\n--- Final SQL Statement ---\n")
cat(read.sql::rs_get_sql_query(final_query_object))
# Output will be the exact string sent to the DB (e.g., SELECT * FROM users WHERE user_id = '123')
```
### execute_sql_file
The rs_execute function is the core execution engine. It takes the final sql_query object (which contains the prepared SQL and the execution method) and runs it against the database connection.
This function is highly flexible and automatically handles argument passing for the most common database packages, simplifying your execution calls.
#### Execution Method (method argument of rs_read_query)
The method used for execution is specified when you call rs_read_query. rs_execute uses the following rules to determine which arguments to pass:
##### 1. DBI Methods (Reading or Writing):
If the method is DBI::dbGetQuery or DBI::dbExecute, rs_execute automatically passes the arguments as (conn, statement, ...).
##### 2. ADBC Methods (Reading or Writing):
If the method is read_adbc or execute_adbc, rs_execute automatically passes the arguments as (db_or_con, query, ...).
##### 3. Custom/Other Methods:
If the method is any other function (e.g., dbSendStatement, a custom S7 method, etc.), rs_execute requires the user to manually pass the connection object via the optional ... arguments. It will only automatically pass the SQL statement using the argument name specified by stmt_arg_name (defaulting to "statement").
```{r eval=FALSE}
# 1. DBI (Default Read) - Automatic Argument Handling
query_obj_read <- read.sql::rs_read_query(
filepath = "path/to/read.sql",
method = "DBI::dbGetQuery"
)
query_obj_read |>
read.sql::rs_execute(sql_conn = conn)
# Executes: DBI::dbGetQuery(conn = conn, statement = SQL)
# 2. ADBC (Write/Execute) - Automatic Argument Handling
query_obj_post <- read.sql::rs_read_query(
filepath = "path/to/update.sql",
method = "adbc::execute_adbc"
)
query_obj_post |>
read.sql::rs_execute(sql_conn = conn)
# Executes: adbc::execute_adbc(db_or_con = conn, query = SQL)
# 3. Custom Method (Requires Manual Connection Passing via ...)
query_obj_custom <- read.sql::rs_read_query(
filepath = "path/to/custom.sql",
method = "my_package::custom_exec_fn"
)
query_obj_custom |>
read.sql::rs_execute(
sql_conn = conn,
conn_arg = conn # User must supply the connection here
)
# Executes: my_package::custom_exec_fn(conn_arg = conn, statement = SQL)
```
# Overriding the Execution Method
You can now easily switch execution methods at runtime using the replace_exec_method argument. This is useful, for example, if you want to reuse a query file designed for reading (dbGetQuery) to perform a destructive action (dbExecute) without modifying the original sql_query object.
### Example: Switching from Read to Write
Suppose query_obj_read was created with method = "DBI::dbGetQuery":
```r
# Execute the query as a standard SELECT (default behavior)
query_obj_read |>
read.sql::rs_execute(sql_conn = conn)
# Executes: DBI::dbGetQuery(...)
# Execute the exact same query as an UPDATE/DELETE command
# The argument is overridden from "DBI::dbGetQuery" to "DBI::dbExecute"
query_obj_read |>
read.sql::rs_execute(
sql_conn = conn,
replace_exec_method = "DBI::dbExecute"
)
# Executes: DBI::dbExecute(...)
```
### migration
The rs_migrate function safely applies a series of SQL scripts saved in the sql/migrate/up or sql/migrate/down folders.
Crucially, this function wraps the entire sequence of script executions in a single database transaction.
If all scripts run successfully, the transaction is committed.
If any script fails, the transaction is automatically rolled back, ensuring the database is left in its original, consistent state.
```{r eval=FALSE}
# Execute all scripts in the 'up' folder within a single transaction
conn |>
rs_migrate(
up = TRUE,
default_method = "DBI::dbExecute"
)
```
The function relies on the DBI::dbWithTransaction generic, so the connection object supplied must support DBI transactions (e.g., standard RPostgres, RSQLite, etc., connections). This will help you set up a DB again and again anytime you need it.
### warning
Minor refinement to emphasize the crashing behavior is intentional for reliable execution.
Package doesn't assume anything and it does no checking at all. It is meant to be used with existing architecture where you will write all the logic on top of it. So if there is anything wrong it will simply crash.
Please read the documentation of each function to understand different arguments used in them.