Skip to content

Commit ed5fd4a

Browse files
authored
Merge pull request #416 from digital-land/msj/amd-CA-dups
De-duplication of CA data
2 parents eea3fe7 + 3974282 commit ed5fd4a

1 file changed

Lines changed: 39 additions & 0 deletions

File tree

docs/data-operations-manual/Tutorials/Monitoring-Data-Quality.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -214,3 +214,42 @@ Sometimes it may take a long time for data to be transitioned to a newly created
214214
5. The entity-organisation range should be assigned to the new organisation for any of the entity numbers which are now being used for the new organisation's records.
215215

216216
6. [Retire any endpoints](../../How-To-Guides/Retiring/Retire-endpoints.md) for the old organisation's provisions so they are no longer collected.
217+
218+
219+
## De-duplication of conservation-area data
220+
221+
The purpose of this process is to ensure that duplicate data is not stored unnecessarily for the conservation-area dataset generated by an organisation which may have also been provided by Historic England(HE).
222+
223+
The steps required for this process:-
224+
225+
1. Run the add-data tasks for conservation-area dataset (making a note of how many entities were added in the lookup file).
226+
227+
2. Raise the pull-request(PR) and ensure that it has been merged into the main branch so that the duplicate entities are picked up by the expectation report on the following day.
228+
229+
3. `DO NOT` inform the organisation at this stage.
230+
231+
4. On Power BI navigate to the "Digital Planning" workspace then to the "Planning Data Monitoring" report from where you select the "Duplicate Conservation Area" page.(Link_[0])
232+
233+
5. Click on the reports TITLE in order for the options panel to appear to right hand side
234+
235+
6. Click on the three dots for the more options dropdown menu, from which you select "Export data" to download the output.
236+
237+
7. Open up the exported file to show the HE duplicate entites.
238+
239+
8. Filter on the message column for "complete_match" criteria
240+
241+
9. Filter on the entity_a_organisation.name column for the organisation Historic England and filter on the entity_b_organisation.name column for the organisation for which the data was added on the previous day (re:step 1)
242+
243+
10. Copy the entities in columns entity_a and entity_b
244+
245+
11. Prepare the data to be appended to the old-enity.csv located at Link_[1] in following format
246+
where entity_a=old-entity and entity_b=entity
247+
e.g. 44012512,301,44013703,,redirect Historic England duplicate to LPA entity,2025-08-28,
248+
249+
12. `Also DO NOT forget` to update the entity-organisation file located at Link_[2]
250+
251+
13. When this change is merged, check the PowerBI report to confirm the duplicate entities have been fixed.
252+
253+
[0]: <https://app.powerbi.com/groups/80b5c556-2a94-402f-bd6a-225e9a9b6561/list?experience=power-bi>
254+
[1]: <config/pipeline/conservation-area/old-entity.csv at main · digital-land/config>
255+
[2]: <config/pipeline/conservation-area/entity-organisation.csv at main . digital-land/config>

0 commit comments

Comments
 (0)