Issue 256 - move camsl to geo_point table#263
Conversation
lukas-phaf
left a comment
There was a problem hiding this comment.
I did a high level review, focusing mostly on the protobuf and postgres migrations. I have added some comments around these, and will try to get to the golang code later.
A general question is how we test this, as there is no ingestion and/or API support to see if inserting and querying actually works (unless I am missing something).
Ideally, we would have the ingestion and API in the same PR, so that we can add some camsl's to the test dataset, and confirm that everything works by testing that the data shows up in the API, preferable using the camsl range query.
| string processing_level = 9 [json_name = "processing_level"]; | ||
| int64 quality_code = 10 [json_name = "quality_code"]; | ||
| int64 camsl = 11; // centimeters above mean sea level | ||
| optional int32 camsl = 3; // centimeters above mean sea level |
There was a problem hiding this comment.
Are we okay with re-ordering all the indices (also the fields below)?
It makes the messages incompatible (the docs say you should never do this):
https://protobuf.dev/programming-guides/proto3/#consequences
We do tend to change all the components at once, but in the EWC setup they are on separate machines, so there might still be some issues.
I think it might actually lead to storing garbage.
Why just not leave caml at 11, and the rest untouched? You can leave the order in the file as it is now, I don't think it is used for anything.
| // '10', '1*', '*0', '*'. | ||
| // | ||
| map<string, Strings> filter = 5; | ||
| map<string, Strings> filter = 6; |
There was a problem hiding this comment.
Avoid change in field number.
| repeated string included_response_fields = 7; | ||
|
|
||
| // repeated string excluded_response_fields = 7; // TODO | ||
| // repeated string excluded_response_fields = 8; // TODO |
There was a problem hiding this comment.
Avoid change in field number.
| string camsl_range = 4; | ||
| map<string, Strings> filter = 5; |
There was a problem hiding this comment.
Avoid change in field number.
| @@ -1 +1,9 @@ | |||
| ALTER TABLE observation ADD COLUMN camsl BIGINT; | |||
| ALTER TABLE geo_point ADD COLUMN camsl INTEGER; | |||
There was a problem hiding this comment.
If I start on master branch, load data, switch to your branch, and run the migrations (just up), nothing happens. This is because you reused an existing migration file, which according to the DB table schema_migrations was already applied.
Now we can just assume that production systems never ran the original version of the migration... but it is much safer to make a new one where you drop the column in observations, and add a new on to geo_point.
The only potential issue is then that any camsl data that was already there is ignore... but I don't thing the ingest was changed to put anything in, so this is fine.
| ALTER TABLE geo_point ADD COLUMN camsl INTEGER; | ||
|
|
||
| -- drop UNIQUE constraint of 'point' column | ||
| -- WARNING: we assume that the constraint name is the correct one (it was never explicitly set) |
There was a problem hiding this comment.
The original table definition was this:
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
point GEOGRAPHY(Point, 4326) NOT NULL UNIQUE
);
CREATE INDEX geo_point_idx ON geo_point USING GIST(point);
This gave the following constraint and indieces:


After forcing the migration to your branch, I have the following constraint and indices:


This corresponds to the indices that you would get with the following table definition (without any migrations):
CREATE TABLE geo_point(
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
point GEOGRAPHY(Point, 4326),
camsl INTEGER,
CONSTRAINT geo_point_point_camsl_key UNIQUE NULLS NOT DISTINCT (point, camsl)
);
CREATE INDEX geo_point_idx ON geo_point USING GIST(point);
So I guess the question is, do we want to have a GIST index that includes camsl? This depends on the query, which I haven't looked at.
There was a problem hiding this comment.
The upserting 2 points example should also change, correct?
| lat float64 | ||
| lon float64 | ||
| lat float64 | ||
| camsl *int32 |
There was a problem hiding this comment.
Why a pointer? So we can use nil for not set/NULL?
lukas-phaf
left a comment
There was a problem hiding this comment.
Note to also fix just load. Gives concurrency errors...
| SELECT c.id, point, camsl FROM input_rows | ||
| JOIN geo_point c USING (point, camsl) |
There was a problem hiding this comment.
| SELECT c.id, point, camsl FROM input_rows | |
| JOIN geo_point c USING (point, camsl) | |
| SELECT c.id, c.point, c.camsl FROM input_rows i | |
| JOIN geo_point c ON i.camsl IS NOT DISTINCT FROM c.camsl AND i.point=c.point; -- Workaround for JOIN on NULL's |
There was a problem hiding this comment.
Looks like this fixes the "concurrency" issue on the point table during data load, and the integration test.
Fixes Issue 256.