Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion config.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
baseURL: "https://dvla.github.io/"
title: DVLA Engineering
paginate: 5
pagination:
pagerSize: 5
theme: PaperMod

enableRobotsTXT: true
Expand Down
4 changes: 4 additions & 0 deletions content/open-source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,10 @@ Open Source library to apply Royal Mail Rules & Exceptions to PAF (Postcode Addr

This gem has pre-configured browser drivers that you can use out-of-the-box for the development of your automated test suite or application.

### [dvla-kaping](https://github.com/dvla/kaping)

Ka-Ping! An Idiomatic ruby way to construct ElasticSearch queries.

# Dynamics 365

### [dataverse-helper](https://github.com/dvla/dataverse-helper)
Expand Down
200 changes: 200 additions & 0 deletions content/posts/2025-04-24-kaping-gem-project/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
---
author: "Kevin Upstill"
title: "KA-PING! a gem to integrated with Elastic and OpenSearch"
description: "Idiomatic may to build complex OpenSearch queries in Ruby."
draft: false
date: 2025-04-24
tags: ["Ruby, ElasticSearch, OpenSearch"]
categories: ["Ruby, Elastic"]
ShowToc: true
TocOpen: true
---

***You can only stretch elastic so far before it goes KA-PING!***

The starting point for creating a new DVLA gem to integrate with AWS OpenSearch and ElasticSearch was the amount of documentation
there is surrounding the technology. There is a lot for a reason, it's a very powerful and useful tool,
but the flip side is the learning curve involved in understanding all the features and query structure.

Our use case is we wanted squads to have a simple way to retrieve specific driver test data to use in their
tests without having to fully understand Elasticsearch and all its capabilities.

A lot of our tests require very specific data requirements which can result in complex search queries with multiple parameters and
search terms to consider. The code to write these complex queries can ballon into a very deep nested JSON.

Out test platform architecture is based on Ruby, so we wanted a Ruby way to write large queries easily, which lead to the
development of a query builder using the dot notation concept to chain the queries terms together.

We didn't need every aspect of the full OpenSearch capabilities to begin with, so we started with a sub set of search terms
which we commonly use. With this in mind there is further development work to cover more of the OpensSearch functions down the line.


## The Ka-Ping Ruby Gem

The Ka-Ping Ruby Gem enables the user to build complex ElasticSearch DSL Queries for searching and filtering large data sets without
having to worry about formatting the JSON payloads.

Using intuitive search terms and operations, it's easier to construct human-readable search definitions without needing a deep
understanding of the Query DSL syntax.

### Complex search term query looks like with JSON notation
Probably the best way to demonstrate Kaping is to look at the traditional query construct, then the Kaping way.


#### Traditional JSON query
```ruby

def multi
date = DateTime.now.strftime('%Y-%m-%d')

query = {
query: {
bool: {
must: [
{ match_phrase: { 'fruit.destination': 'Market' } },
{ match_phrase: { 'fruit.code': 'LA-MOP' } },
{ match_phrase: { fruitStatus: 'Ripe' } },
{ match_phrase: { 'current.address.first_line': 'Plantation Row' } },
{ match: { 'fruit.category': 'Tropical' } },
{ range: { 'fruit.pickedDate' => { gte: '2025-08-21', lte: date } } },
{ range: { 'fruit.inspection.taken' => { gte: date } } },
{ range: { 'fruit.importDate' => { gte: date } } },
],
must_not: [{ exists: { field: 'fruitEndDate' } },
{ exists: { field: 'fruit.import.USATariffs' } },
{ exists: { field: 'fruit.status.goneBad' } },
{ match_phrase: { 'fruit.category': { query: 'B A NN NA AN', operator: 'or' } } },
{ wildcard: { 'fruit.address.import.code': 'NIT*' } },
{ wildcard: { 'fruit.address.import.code': 'NAT*' } },
{ wildcard: { 'fruit.address.import.code': 'NOO*' } }],
filter: { term: { 'licence.fruit': 'Valid' } },

},
match: { 'fruit.type': 'Tropical' },
},
}
end
```
In the above example the nested JSON can become quite a handful, Kaping solves this by making the code more human readable.


#### What the query looks likes with Kaping
```ruby
def multi
date = DateTime.now.strftime('%Y-%m-%d')

q = DVLA::Kaping::Query.new('bool')
q.must.match('fruit.destination', 'Market').
match_phrase('fruit.code', 'LA-MOP').
match_phrase('fruitStatus', 'Ripe').
match_phrase('current.address.first_line', 'Plantation Row').
between('fruit.pickedDate', '2025-08-21', date.to_s).
between('fruit.inspection.taken', '2025-08-21', date.to_s).
between('fruit.importDate', '2025-08-21', date.to_s).
match('fruit.category', 'Tropical')
q.must_not.
exists('field', 'fruitEndDate').
exists('field', 'fruit.import.USATariffs').
exists('field', 'fruit.status.goneBad').
match_phrase('fruit.category', query: 'B A NN NA AN', operator: 'or').
wildcard('fruit.address.import.code', 'NIT*').
wildcard('fruit.address.import.code', 'NAT*').
wildcard('fruit.address.import.code', 'NOO*').
filter('licence.fruit', 'Valid')
q.to_json
end
```

The key point here is you don't need to worry about the nested JSON structure, the naming convention is intuitive and closely resembles
the OpenSearch syntax.

Let's break that down further

```Ruby
my_query = DVLA::Kaping::Query.new('bool')
```
This is the starting point for creating a query definition by calling an instance of the Kaping:: Query
class and assigning it to a variable, e.g. 'my_query'

We then set the type of query we want. The common ones are 'bool' or 'match'
depending on your search context.

If don't require a complex query you could do a very basic match query:

```Ruby
my_query = DVLA::Kaping::Query.new('match_phrase', foo: 'Bar')
my_query.to_json
```
this is equivalent to writing this query in JSON

```Ruby
my_query = { "query":
{ "match_phrase":
{ "foo": "Bar" }
}
}
```

With Kaping the JSON formation and formatting is taken care of with the common Ruby call .to_json

In the large example above, each line is a new search term definition. There are various different terms you can use depending on what
functionality you require. You can group these terms in positive or negative boolean operations.

#### Current sub list of terms are:

- **match_phrase** - match documents that contain an exact phrase
- **match** - full-text search on a specific document field
- **exist** - search for documents that contain a specific field.
- **wildcard** - match a wildcard pattern, such as He**o
- **term** - search for exact term in a field.
- **prefix** - search for terms that begin with a specific prefix
- **regex** - search for terms that match a regular expression, eg "[a-zA-Z]amlet"
- **between** - search for a range of values in a field

These are a mix of full-text queries and term-level queries, but they are most commonly use for our kind of searches,
other can easily be added as requirements dictate.

The next part is the data field you want to search on

> DVLA::Kaping::Query.new('match_phrase', **foo**: 'Bar')

Then the last part is the data you are looking for, the data type can be a range, exact match string, regex or a filter for example

> DVLA::Kaping::Query.new('match_phrase', foo: **' Bar'**)

## Configuration
The Gem can be configured through a 'kaping.yml' file. Such configs as the logging level, result size and the AWS configs
can be set in the yaml, The config file can also be used to pick up any environment settings which is useful if running
in a CI pipeline.


## The Client
Currently we have a AWS OpenSearch client which takes care of the Sig4C signing. The AWS credentials can either be supplied
as ENV variables or through a profile.

## Search
As long as your client is configure you can also use the optional built-in search facility

```ruby
my_query = DVLA::Kaping::Query.new('match_phrase', foo: 'Bar')
my_query.to_json
response = DVLA::Kaping.search(my_query)
response.dig('hits', 'hits')
```

## Opens Source

[External link to Kaping in rubygems](https://rubygems.org/gems/dvla-kaping)

We wanted to open source this Ruby Gem to a wider audiance so that others can also benefit from simplifying their OpenSearch queries. We also
embrace the feedback to further enhance the Gem and increase it's scope.


## Further Development

As mentioned before, not all the functionality of OpenSearch has been implemented, but any requests to expand will be taken into consideration.
The code base for Kaping is small, the query builder is 40 lines of code. The separation of terms into their own module makes it easy to
add additional query terms as required.



2 changes: 1 addition & 1 deletion themes/PaperMod
Loading