Skip to content

Commit b7352a5

Browse files
authored
Replace inline styles with Bootstrap utility classes (#165)
1 parent f4ee574 commit b7352a5

10 files changed

Lines changed: 37 additions & 54 deletions

content/blog/2024-01-19-datafusion-34.0.0.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -112,17 +112,17 @@ more than 2x faster on [ClickBench] compared to version `25.0.0`, as shown below
112112

113113
[ClickBench]: https://benchmark.clickhouse.com/
114114

115-
<figure style="text-align: center;">
116-
<img src="/blog/images/datafusion-34.0.0/compare-new.png" width="100%" class="img-fluid" alt="Fig 1: Adaptive Arrow schema architecture overview.">
115+
<figure class="text-center">
116+
<img src="/blog/images/datafusion-34.0.0/compare-new.png" class="img-fluid" alt="Fig 1: Adaptive Arrow schema architecture overview.">
117117
<figcaption>
118118
<b>Figure 1</b>: Performance improvement between <code>25.0.0</code> and <code>34.0.0</code> on ClickBench.
119119
Note that DataFusion <code>25.0.0</code>, could not run several queries due to
120120
unsupported SQL (Q9, Q11, Q12, Q14) or memory requirements (Q33).
121121
</figcaption>
122122
</figure>
123123

124-
<figure style="text-align: center;">
125-
<img src="/blog/images/datafusion-34.0.0/compare.png" width="100%" class="img-fluid" alt="Fig 1: Adaptive Arrow schema architecture overview.">
124+
<figure class="text-center">
125+
<img src="/blog/images/datafusion-34.0.0/compare.png" class="img-fluid" alt="Fig 1: Adaptive Arrow schema architecture overview.">
126126
<figcaption>
127127
<b>Figure 2</b>: Total query runtime for DataFusion <code>34.0.0</code> and DataFusion <code>25.0.0</code>.
128128
</figcaption>

content/blog/2024-03-06-comet-donation.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,10 +35,9 @@ accelerate Spark workloads. It is designed as a drop-in
3535
replacement for Spark's JVM based SQL execution engine and offers significant
3636
performance improvements for some workloads as shown below.
3737

38-
<figure style="text-align: center;">
38+
<figure class="text-center">
3939
<img
4040
src="/blog/images/datafusion-comet/comet-architecture.png"
41-
width="100%"
4241
class="img-fluid"
4342
alt="Fig 1: Adaptive Arrow schema architecture overview."
4443
>

content/blog/2024-08-20-python-datafusion-40.0.0.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -68,10 +68,9 @@ Modern IDEs use language servers such as
6868
hints, and identify usage errors. These are major tools in the python user community. With this
6969
release, users can fully use these tools in their workflow.
7070

71-
<figure style="text-align: center;">
71+
<figure class="text-center">
7272
<img
7373
src="/blog/images/python-datafusion-40.0.0/vscode_hover_tooltip.png"
74-
width="100%"
7574
class="img-fluid"
7675
alt="Fig 1: Enhanced tooltips in an IDE."
7776
>
@@ -84,10 +83,9 @@ release, users can fully use these tools in their workflow.
8483
By having the type annotations, these IDEs can also identify quickly when a user has incorrectly
8584
used a function's arguments as shown in Figure 2.
8685

87-
<figure style="text-align: center;">
86+
<figure class="text-center">
8887
<img
8988
src="/blog/images/python-datafusion-40.0.0/pylance_error_checking.png"
90-
width="100%"
9189
class="img-fluid"
9290
alt="Fig 2: Error checking in static analysis"
9391
>

content/blog/2025-03-11-ordering-analysis.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ limitations under the License.
3131

3232
## Introduction
3333
In this blog post, we explain when an ordering requirement of an operator is satisfied by its input data. This analysis is essential for order-based optimizations and is often more complex than one might initially think.
34-
<blockquote style="border-left: 4px solid #007bff; padding: 10px; background-color: #f8f9fa;">
34+
<blockquote class="border-start border-primary border-4 ps-3 py-2 bg-light">
3535
<strong>Ordering Requirement</strong> for an operator describes how the input data to that operator must be sorted for the operator to compute the correct result. It is the job of the planner to make sure that these requirements are satisfied during execution (See DataFusion <a href="https://docs.rs/datafusion/latest/datafusion/physical_optimizer/enforce_sorting/struct.EnforceSorting.html" target="_blank">EnforceSorting</a> for an implementation of such a rule).
3636
</blockquote>
3737

@@ -134,7 +134,7 @@ Let's start by creating an example table that we will refer throughout the post.
134134

135135
<br>
136136

137-
<blockquote style="border-left: 4px solid #007bff; padding: 10px; background-color: #f8f9fa;">
137+
<blockquote class="border-start border-primary border-4 ps-3 py-2 bg-light">
138138
<strong>How can a table have multiple orderings?</strong> At first glance it may seem counterintuitive for a table to have more than one valid ordering. However, during query execution such scenarios can arise.
139139

140140
For example consider the following query:
@@ -197,7 +197,7 @@ To solve the shortcomings above DataFusion needs to track of following propertie
197197
- Equivalent Expression Groups (will be explained shortly)
198198
- Succinct Valid Orderings (will be explained shortly)
199199

200-
<blockquote style="border-left: 4px solid #007bff; padding: 10px; background-color: #f8f9fa;">
200+
<blockquote class="border-start border-primary border-4 ps-3 py-2 bg-light">
201201
<strong>Note:</strong> These properties are implemented in the <code>EquivalenceProperties</code> structure in <code>DataFusion</code>, please see the <a href="https://github.com/apache/datafusion/blob/f47ea73b87eec4af044f9b9923baf042682615b2/datafusion/physical-expr/src/equivalence/properties/mod.rs#L134" target="_blank">source</a> for more details<br>
202202
</blockquote>
203203

@@ -210,7 +210,7 @@ For instance in the example table:
210210

211211
- Columns `hostname` and `currency` are constant because every row in the table has the same value (`'app.example.com'` for `hostname`, and `'USD'` for `currency`) for these columns.
212212

213-
<blockquote style="border-left: 4px solid #007bff; padding: 10px; background-color: #f8f9fa;">
213+
<blockquote class="border-start border-primary border-4 ps-3 py-2 bg-light">
214214
<strong>Note:</strong> Constant expressions can arise during query execution. For example, in following query:<br>
215215
<code>SELECT hostname FROM logs</code><br><code>WHERE hostname='app.example.com'</code> <br>
216216
after filtering is done, for subsequent operators the <code>hostname</code> column will be constant.
@@ -221,7 +221,7 @@ Equivalent expression groups are expressions that always hold the same value acr
221221

222222
In the example table, the expressions `price` and `price_cloned` form one equivalence group, and `time` and `time_cloned` form another equivalence group.
223223

224-
<blockquote style="border-left: 4px solid #007bff; padding: 10px; background-color: #f8f9fa;">
224+
<blockquote class="border-start border-primary border-4 ps-3 py-2 bg-light">
225225
<strong>Note:</strong> Equivalent expression groups can arise during the query execution. For example, in the following query:<br>
226226
<code>SELECT time, time as time_cloned FROM logs</code> <br>
227227
after the projection is done, for subsequent operators <code>time</code> and <code>time_cloned</code> will form an equivalence group. As another example, in the following query:<br>
@@ -293,7 +293,7 @@ Following third and fourth constraints for the simplified table, the succinct va
293293
`[time_bin ASC]`,
294294
`[time ASC]`
295295

296-
<blockquote style="border-left: 4px solid #007bff; padding: 10px; background-color: #f8f9fa;">
296+
<blockquote class="border-start border-primary border-4 ps-3 py-2 bg-light">
297297
<p><strong>How can DataFusion find orderings?</strong></p>
298298
DataFusion's <code>CREATE EXTERNAL TABLE</code> has a <code>WITH ORDER</code> clause (see <a href="https://datafusion.apache.org/user-guide/sql/ddl.html#create-external-table">docs</a>) to specify the known orderings of the table during table creation. For example the following query:<br>
299299
<pre><code>

content/blog/2025-03-30-datafusion-python-46.0.0.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -177,10 +177,9 @@ for the user to click on it to expand the data.
177177
In the below view you can see an example of some of these features such as the
178178
expandable text and scroll bars.
179179

180-
<figure style="text-align: center;">
180+
<figure class="text-center">
181181
<img
182182
src="/blog/images/python-datafusion-46.0.0/html_rendering.png"
183-
width="100%"
184183
class="img-fluid"
185184
alt="Fig 1: Example html rendering in a jupyter notebook."
186185
>

content/blog/2025-04-10-fastest-tpch-generator.md

Lines changed: 5 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -25,16 +25,6 @@ limitations under the License.
2525

2626
[TOC]
2727

28-
<style>
29-
/* Table borders */
30-
table, th, td {
31-
border: 1px solid black;
32-
border-collapse: collapse;
33-
}
34-
th, td {
35-
padding: 3px;
36-
}
37-
</style>
3828
**TLDR: TPC-H SF=100 in 1min using tpchgen-rs vs 30min+ with dbgen**.
3929

4030
3 members of the [Apache DataFusion] community used Rust and open source
@@ -135,7 +125,7 @@ bound on the Scale Factor.
135125

136126
**Figure 2**: Example TBL formatted output of `dbgen` for the `LINEITEM` table
137127

138-
<table>
128+
<table class="table table-bordered">
139129
<tr>
140130
<td><strong>Scale Factor</strong>
141131
</td>
@@ -308,7 +298,7 @@ compatible port, and knew about the performance shortcomings and how to approach
308298
them.
309299

310300

311-
<table>
301+
<table class="table table-bordered">
312302
<tr>
313303
<td><strong>Scale Factor</strong>
314304
</td>
@@ -356,7 +346,7 @@ list of optimizations:
356346
At the time of writing, single threaded performance is now 2.5x-2.7x faster than the initial version, as shown in Table 3.
357347

358348

359-
<table>
349+
<table class="table table-bordered">
360350
<tr>
361351
<td><strong>Scale Factor</strong>
362352
</td>
@@ -412,7 +402,7 @@ When writing to `/dev/null` tpchgen generates the entire dataset in 25 seconds
412402
(4 GB/s).
413403

414404

415-
<table>
405+
<table class="table table-bordered">
416406
<tr>
417407
<td><strong>Scale Factor</strong>
418408
</td>
@@ -516,7 +506,7 @@ or CSV, tpchgen-cli creates the full SF=100 parquet format dataset in less than
516506
[a small 300 line PR]: https://github.com/clflushopt/tpchgen-rs/pull/61
517507
[Rust Parquet writer]: https://crates.io/crates/parquet
518508

519-
<table>
509+
<table class="table table-bordered">
520510
<tr>
521511
<td><strong>Scale Factor</strong>
522512
</td>

content/blog/2025-06-30-cancellation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -359,7 +359,7 @@ To illustrate what this process looks like, let's have a look at the execution o
359359
If we assume a task budget of 1 unit, each time Tokio schedules the task would result in the following sequence of function calls.
360360

361361
<figure>
362-
<img src="/blog/images/task-cancellation/tokio_budget.png" style="width: 100%; max-width: 100%" class="img-fluid" alt="Sequence diagram showing how the tokio task budget is used and reset."
362+
<img src="/blog/images/task-cancellation/tokio_budget.png" class="img-fluid" alt="Sequence diagram showing how the tokio task budget is used and reset."
363363
/>
364364
<figcaption>Tokio task budget system, assuming the task budget is set to 1, for the plan above.</figcaption>
365365
</figure>

content/blog/2025-07-11-datafusion-47.0.0.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -193,7 +193,7 @@ logging, or metrics) across thread boundaries without depending on any specific
193193
You can use the [JoinSetTracer] API to instrument DataFusion plans with your own tracing or logging libraries, or
194194
use pre-integrated community crates such as the [datafusion-tracing] crate.
195195

196-
<div style="text-align: center;">
196+
<div class="text-center">
197197
<a href="https://github.com/datafusion-contrib/datafusion-tracing">
198198
<img
199199
src="/blog/images/datafusion-47.0.0/datafusion-telemetry.png"

content/blog/2025-07-14-user-defined-parquet-indexes.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -173,24 +173,24 @@ A **distinct value index** stores the unique values of a specific column. This t
173173

174174
For example, if the files contain a column named `Category` like this:
175175

176-
<table style="border-collapse:collapse;">
176+
<table class="table table-bordered">
177177
<tr>
178-
<td style="border:1px solid #888;padding:2px 6px;"><b><code>Category</code></b></td>
178+
<td><b><code>Category</code></b></td>
179179
</tr>
180180
<tr>
181-
<td style="border:1px solid #888;padding:2px 6px;"><code>foo</code></td>
181+
<td><code>foo</code></td>
182182
</tr>
183183
<tr>
184-
<td style="border:1px solid #888;padding:2px 6px;"><code>bar</code></td>
184+
<td><code>bar</code></td>
185185
</tr>
186186
<tr>
187-
<td style="border:1px solid #888;padding:2px 6px;"><code>...</code></td>
187+
<td><code>...</code></td>
188188
</tr>
189189
<tr>
190-
<td style="border:1px solid #888;padding:2px 6px;"><code>baz</code></td>
190+
<td><code>baz</code></td>
191191
</tr>
192192
<tr>
193-
<td style="border:1px solid #888;padding:2px 6px;"><code>foo</code></td>
193+
<td><code>foo</code></td>
194194
</tr>
195195
</table>
196196

content/blog/2025-12-15-avoid-consecutive-repartitions.md

Lines changed: 9 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -25,18 +25,17 @@ limitations under the License.
2525
{% endcomment %}
2626
-->
2727

28-
<div style="display: flex; align-items: center; gap: 20px; margin-bottom: 20px;">
29-
<div style="flex: 1;">
28+
<div class="row align-items-center mb-3">
29+
<div class="col-md-7">
3030

3131
Databases are some of the most complex yet interesting pieces of software. They are amazing pieces of abstraction: query engines optimize and execute complex plans, storage engines provide sophisticated infrastructure as the backbone of the system, while intricate file formats lay the groundwork for particular workloads. All of this is exposed by a user-friendly interface and query languages (typically a dialect of SQL).
3232
<br><br>
3333
Starting a journey learning about database internals can be daunting. With so many topics that are whole PhD degrees themselves, finding a place to start is difficult. In this blog post, I will share my early journey in the database world and a quick lesson on one of the first topics I dove into. If you are new to the space, this post will help you get your first foot into the database world, and if you are already a veteran, you may still learn something new.
3434

3535
</div>
36-
<div style="flex: 0 0 40%; text-align: center;">
36+
<div class="col-md-5 text-center">
3737
<img
3838
src="/blog/images/avoid-consecutive-repartitions/database_system_diagram.png"
39-
width="100%"
4039
class="img-fluid"
4140
alt="Database System Components"
4241
/>
@@ -122,18 +121,17 @@ Partitioning is a "divide-and-conquer" approach to executing a query. Each parti
122121

123122
#### **Round-Robin Repartitioning**
124123

125-
<div style="display: flex; align-items: top; gap: 20px; margin-bottom: 20px;">
126-
<div style="flex: 1;">
124+
<div class="row align-items-start mb-3">
125+
<div class="col-md-9">
127126

128127
Round-robin repartitioning is the simplest partitioning strategy. Incoming data is processed in batches (chunks of rows), and these batches are distributed across partitions cyclically or sequentially, with each new batch assigned to the next available partition.
129128
<br><br>
130129
Round-robin repartitioning is useful when the data grouping isn't known or when aiming for an even distribution across partitions. Because it simply assigns batches in order without inspecting their contents, it is a low-overhead way to increase parallelism for downstream operations.
131130

132131
</div>
133-
<div style="flex: 0 0 25%; text-align: center;">
132+
<div class="col-md-3 text-center">
134133
<img
135134
src="/blog/images/avoid-consecutive-repartitions/round_robin_repartitioning.png"
136-
width="100%"
137135
class="img-fluid"
138136
alt="Round-Robin Repartitioning"
139137
/>
@@ -142,18 +140,17 @@ Round-robin repartitioning is useful when the data grouping isn't known or when
142140

143141
#### **Hash Repartitioning**
144142

145-
<div style="display: flex; align-items: top; gap: 20px; margin-bottom: 20px;">
146-
<div style="flex: 1;">
143+
<div class="row align-items-start mb-3">
144+
<div class="col-md-9">
147145

148146
Hash repartitioning distributes data based on a hash function applied to one or more columns, called the partitioning key. Rows with the same hash value are placed in the same partition.
149147
<br><br>
150148
Hash repartitioning is useful when working with grouped data. Imagine you have a database containing information on company sales, and you are looking to find the total revenue each store produced. Hash repartitioning would make this query much more efficient. Rather than iterating over the data on a single thread and keeping a running sum for each store, it would be better to hash repartition on the store column and have multiple threads calculate individual store sales.
151149

152150
</div>
153-
<div style="flex: 0 0 25%; text-align: center;">
151+
<div class="col-md-3 text-center">
154152
<img
155153
src="/blog/images/avoid-consecutive-repartitions/hash_repartitioning.png"
156-
width="100%"
157154
class="img-fluid"
158155
alt="Hash Repartitioning"
159156
/>

0 commit comments

Comments
 (0)