Commit a7f49d2
authored
Format UK school postcodes (#397)
**Context**
UK postcode data is used in our reporting tools in order to measure how
effective we are at reaching deprived areas (using the IMD / Indices of
Multiple Deprivation:
https://www.gov.uk/government/collections/english-indices-of-deprivation).
This is a key part our programme impact monitoring work.
The IMD lookup relies on a correctly formatted postcode as defined by
the NSPL (National Statistics Postcode Lookup)
[data](https://geoportal.statistics.gov.uk/datasets/9ac0331178b0435e839f62f41cc61c16):
2-4 char outward code, a space, then a 3-char inward code.
**Problem**
Postcode validation (as in "is this an actual UK postcode that is
definitely in the Royal Mail Postcode Address File / PAF") is
suboptimal, as doing so with a regex is a pain and potentially error
prone (with the onus to 'fix' a potentially valid postcode being put on
the user). We're also currently discounting using an API to look up /
validate postcodes, though this may be a valid approach going forwards.
**Proposed approach**
Whilst we don't want to take a heavy-handed approach to postcode
validation, as outlined above, we do want to make sure that any postcode
that meets the criteria of being "very likely to be in the PAF and NSPL
data sets" is formatted in a way which makes lookups possible (in order
to prevent cases such as `CB11NT`, `CB 11NT`):
* Take any >=5 char UK postcode and ensure it has one space in it before
the final 3 chars / inward code.
**Additional steps**
Existing school data will need to have the same transform / corrections
applied, something like:
```
UPDATE schools
SET postal_code =
CASE
WHEN LENGTH(postal_code) >= 5 AND POSITION(' ' IN postal_code) = 0
THEN CONCAT(SUBSTRING(postal_code FROM 1 FOR LENGTH(postal_code) - 3), ' ', SUBSTRING(postal_code FROM LENGTH(postal_code) - 2))
ELSE postal_code
END
WHERE country_code = 'GB';
```2 files changed
+62
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| 33 | + | |
| 34 | + | |
33 | 35 | | |
34 | 36 | | |
35 | 37 | | |
| |||
65 | 67 | | |
66 | 68 | | |
67 | 69 | | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
68 | 74 | | |
69 | 75 | | |
70 | 76 | | |
| |||
83 | 89 | | |
84 | 90 | | |
85 | 91 | | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
86 | 103 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
331 | 331 | | |
332 | 332 | | |
333 | 333 | | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
334 | 379 | | |
335 | 380 | | |
336 | 381 | | |
| |||
0 commit comments