-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathGADQL_Note.txt
More file actions
299 lines (219 loc) · 10.7 KB
/
GADQL_Note.txt
File metadata and controls
299 lines (219 loc) · 10.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
A proposed revision to the IVOA ADQL 2.0 standard regarding
specification of geometric constraints within queries
Walter Landry, Caltech/IPAC, December 2017
Rationale
=========
ADQL 2.0 makes significant requirements on web service back ends if
they want to support geometry. In particular, there is an assumption
that the data is stored in a relational database that has extensive
geometric support. At present, full support only exists on two
platforms: Postgres with pgSphere, and maybe SQL Server with spatial
extensions? (FIXME: Check with MAST if they support full syntax)
This can be problematic for some data providers who wish to use other
data stores, even if it is a relatonal engine such as Oracle, MySQL,
Firebase, or SQLite. This is true even if they are using any of the
panopoly of GIS implementations. ADQL requires more functions than
are part of the GIS standard.
It gets even more problematic if the data providers want to use a
specialized format that is optimized for spatial searches.
The purpose of this note is to recommend a subset of geometry
capabilities that can be more widely supported. The basic idea is to
segregate the geometric and non-geometric parts of the queries. In
addition, the geometric parts are also simplified. This does not
result in any significant reduction of capability. For example, all
of the ObsCore use cases are still covered.
Modified Grammar
================
We modified the PEG grammar of ADQL (adql.peg) to create gadql.peg.
The changes are
1) Remove the STRING_GEOMETRY_FUNCTION rule from the
STRING_VALUE_FUNCTION rule. This also involves removing the
EXTRACT_COORDSYS rule.
2) Remove the GEOMETRY_VALUE_EXPRESSION rule from VALUE_EXPRESSION.
3) Remove the NUMERIC_GEOMETRY_FUNCTION rule from NUMERIC_VALUE_FUNCTION.
4) Require all COORDINATES to be either numeric literals or column names.
5) Remove the CENTROID, REGION, COORD1, and COORD2 rules. These are
not supported by GIS and not required for ObsCore support.
6) Remove the AREA and DISTANCE rules. These can be mostly replaced
with judicious use of the INTERSECTS rule.
7) Remove the CONTAINS rule. The simplest case of a point inside a
circle or polygon, can be replaced with INTERSECTS. The more
complicated cases (polygon inside polygon) are not supported by GIS
backends.
8) For the first argument to INTERSECTS, it must be either a column name
or a shape literal. In turn, the shape literals must be composed of
either numeric literals or column names
(e.g. POINT('J2000',210.8021726,54.3487903) or
CIRCLE('J2000',ra,dec,r)).
The second argument is always a column name in a table. So an
example of a table upload would be
INTERSECTS(CIRCLE('J2000',TAP_UPLOAD.ra,TAP_UPLOAD.dec,TAP_UPLOAD.r),
dbtable.s_region)
For a catalog, dbtable.s_region might be a POINT(), while for an
image it might be a POLYGON(). This means that users will not be
able to use an expression like
POINT('J2000',ra,dec)
instead of the correct column name (e.g. s_region). This should
reduce errors and confusion. Tables can support more than one
type of geometric search (e.g. an image table also allows searching
on the center of the image) by adding an appropriate column.
FIXME: Make sure this is completely implementable in GIS or spatial
backend.
9) Make two variants of the WHERE rule: WHERE and WHERE_NO_GEOMETRY.
For the rule with geometry, the geometric parts are constrained to
be cleanly segregated from the rest of the 'where' clause and
consist entirely of an INTERSECTS rule. The non-geometric parts
must be entirely enclosed within parentheses '()'.
FIXME: I think parentheses are not really required if there is no
geometry.
10) Make two variants of the QUERY rule: one with geometry and one
without. SUBQUERY's can only use the non-geometric version.
Taken together, these restrictions make it possible to fully implement
a backend using an off-the-shelf GIS backend.
Removing many of the geometric functions and separating the query into
two parts mean that the geometrical part can be handled by an
optimized spatial data store. The result can then be fed to an SQL
data store where the non-geometric clause is applied.
ObsCore Use Cases
=================
Appendix A of ObsCore lists a number of use cases. Broadly, there are
three types of queries: single position, multiple position via an
uploaded table, and all-sky. Where explicit queries are given in
ObsCore, we provide the equivalent GADQL.
Simple Examples
***************
Simple Query by Position
------------------------
This is a spatial query for a point. The ObsCore query is
SELECT * FROM ivoa.Obscore WHERE
CONTAINS(POINT(‘ICRS’,16.0,40.0),s_region)=1
In GADQL, the only change is replacing 'CONTAINS' with INTERSECTS.
SELECT * FROM ivoa.Obscore WHERE
INTERSECTS(POINT(‘ICRS’,16.0,40.0),s_region)=1
Query Images by both Spatial and Spectral Attributes
----------------------------------------------------
This is a spatial query for a coordinate range. The ObsCore query is
SELECT * FROM ivoa.Obscore
WHERE dataproduct_type=’image'
AND s_resolution < 0.3 AND s_ra > 240 AND s_ra < 255
AND s_dec > 10 AND s_dec < 11
AND (em_min > 2.1e-06 AND em_max < 2.4e-06)
OR(em_min >= 1.6e-06 AND em_max <= 1.8e-06)
OR(em_min >= 1.2e-06 AND em_max <= 1.4e-06)
Coordinate ranges can be difficult to implement efficiently for
specialized spatial backends. A common use for coordinate ranges is
to tile the sky. This can also be done with polygons. So a similar
polygon query in GADQL would be
SELECT * FROM ivoa.Obscore
WHERE INTERSECTS(POLYGON('ICRS',240,10,240,11,255,11,255,10),s_region)=1
AND (dataproduct_type='image'
AND s_resolution < 0.3
AND (em_min > 2.1e-06 AND em_max < 2.4e-06)
OR(em_min >= 1.6e-06 AND em_max <= 1.8e-06)
OR(em_min >= 1.2e-06 AND em_max <= 1.4e-06))
Note that the non-geometric parts have all been segregated into a
separate clause enclosed by parentheses '()'.
Datasets selection based on self criteria
*****************************************
A.1.1 Use case 1.1
------------------
This is a simple spatial query with additional constraints. The
ObsCore query is
SELECT * FROM ivoa.Obscore
WHERE em_min < 2.48E-10 AND em_max > 2.48 E-10
AND CONTAINS(POINT('ICRS',16.0,10.0),s_region) = 1
AND t_exptime > 10000
Transforming it into a GADQL requires only a minor rearrangement
SELECT * FROM ivoa.Obscore
WHERE INTERSECTS(POINT('ICRS',16.0,10.0),s_region) = 1
AND (em_min < 2.48E-10 AND em_max > 2.48 E-10
AND t_exptime > 10000)
Again, note that the non-geometric parts have all been segregated into
a separate clause enclosed by parentheses '()'.
A.1.2 Use case 1.2
------------------
This performs a cross match with an uploaded table. Then there is
some filtering to only return rows when there are results in both
X-ray and the 500-900 nm band. The ObsCore document says that this
may require multiple steps, and there is no explicit query given. A
first step of the query might be
SELECT * from ivoa.Obccore
WHERE CONTAINS(POINT('ICRS',TAP_UPLOAD.user_table.ra,TAP_UPLOAD.user_table.dec),s_region) = 1
AND (dataproduct_type='image' or dataproduct_type='cube' or dataproduct_type='spectra')
Transforming it to GADQL results in
SELECT * from ivoa.Obccore
WHERE INTERSECTS(POINT('ICRS',TAP_UPLOAD.user_table.ra,TAP_UPLOAD.user_table.dec),s_region) = 1
AND (dataproduct_type='image' or dataproduct_type='cube' or dataproduct_type='spectra')
A.1.3 Use case 1.3
------------------
This is very similar use case 1.1 with different non-geometric criteria.
A.1.4 Use case 1.4
------------------
This is an all-sky search. No query is given, but the non-geometric
parts should carry over directly. The only wrinkle is that the
non-geometric parts of the WHERE clause must be enclosed by parentheses '()'.
A.1.5 Use case 1.5
------------------
This is also a cross match with an uploaded table. We use the same
strategy as in use case 1.2 to implement it.
A.1.6 Use case 1.6
------------------
This has both a geometric and time constraint. The ObsCore query is
SELECT * FROM ivoa.Obscore
WHERE dataproduct_type='image'
AND CONTAINS(POINT('ICRS',user_ra,user_dec), s_region) = 1
AND ABS((t_max + t_min)/2 – user_date ) < 1
The time constraint can be handled like any other non-geometric
constraint, so the GADQL query becomes
SELECT * FROM ivoa.Obscore
WHERE INTERSECTS(POINT('ICRS',user_ra,user_dec), s_region) = 1
AND (dataproduct_type='image'
AND ABS((t_max + t_min)/2 – user_date ) < 1)
A.2 Discovering spectra data
****************************
All of the use cases in this section (2.1, 2.2, 2.3) have no geometric
restrictions. The only minor modification required is to enclose the
whole WHERE clause with parentheses '()'.
A.3 Discover multi-dimensional datasets
***************************************
The use cases 3.1, 3.2, 3.3, and 3.5 are all simple spatial searches
with additional constraints. So we use a similar strategy as in use
case 1.1.
The use case 3.4 is an all sky search, so it maps almost directly to GADQL.
The use case 3.6 is a cross match, so we a similar strategy as in use
case 1.2.
FIXME: Clarify the descriptions in section A.3, since it is not clear
exactly what query is being run.
A.4 Discovering time series
***************************
All of the use cases in this section (4.1, 4.2, 4.3) are simple
spatial searches with additional constraints. So we use a similar
strategy as in use case 1.1.
A.5 Discovering event lists
***************************
A.5.1 Use case 5.1
------------------
This is a simple spatial search with additional constraints. So we
use a similar strategy as in use case 1.1.
A.5.2 Use case 5.2
------------------
This use case requires checking if a polygon is completely enclosed by
a region in the table. This is not properly implemented by GIS
backends, giving incorrect answers near poles and the dateline.
(FIXME: postgis flattens. Need to check other GIS systems)
A.6. Discovering general data from collections counterparts
***********************************************************
The use cases in this section are mainly simple spatial searches with
additional criteria. Use case 6.1 has the additional requirement for
cutouts, but that handled by SODA, not the TAP service.
A.7. Complex Use Cases
**********************
These all require interactions between multiple tables in way that is
not entirely clear even with the existing spec.
Conclusion
==========
We present a proposed modification to the ADQL grammar that restricts
and simplifies the geometry rules. IRSA has been using this
restricted form of the ADQL grammar for several years, and users have
never asked for any functionality that would have been covered by the
full geometric rules.