-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathuniversal-data-api.bs
More file actions
629 lines (462 loc) · 20.1 KB
/
universal-data-api.bs
File metadata and controls
629 lines (462 loc) · 20.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
<pre class='metadata'>
Title: Universal Data API
Shortname: uda
Level: 1
Status: LS
URL: https://open.mimiro.io/specifications/universal-data-api-specification
Editor: Graham Moore, Mimiro http://mimiro.io, graham.moore @ mimiro.no
Repository: https://github.com/mimiro-io/universal-data-api-specification, Specification Repository
Abstract: Specification for semantic graph data model and API for publishing, updating and querying datasets.
Boilerplate: copyright no
</pre>
# Document Status
Documents can either be a draft or a release. Numbering uses semantic versioning. Everything except major releases is backwards compatible.
This document is: 0.7.0 draft.
# Introduction
This specification defines a semantic graph data model, its serialisation and a simple and consistent API to expose, synchronise, update and query datasets. The specification defines how a server manages and exposes a collection of datasets. A dataset consists of entities. An entity has properties and references, and can be serialised as a JSON object. Clients can request all data in a dataset or just the entities that have changed since a given point in time. As well as consuming datasests, clients can also post updates to writeable datasets.
# Motivation
Many different APIs with different semantics and different data representations are constantly being defined and published. This requires developers to learn about and create different clients for the different endpoints when many of the core semantics remain the same. This specification looks to generalise the solution to publishing, consuming, synchronising and updating datasets.
# Background
The authors of this specification have been contributors to open standards for over twenty years. Some of the direct influences on this work are SDShare, RDF Net API, ISO Topic Maps Data Model, RDF, and OData.
# Entity graph Data Model
The Entity graph Data Model is the meta-model that is used to describe all data. It has a formal definition and a JSON serialisation. This data model is introduced to provide a common standard for representing data. A graph data model was chosen for its flexibility and wide utility in representing data of many shapes.
## The Entity Graph Data Model
The data model consists of entities. An entity is a single data object that represents some subject, some thing. An entity has identity, properties and references. Property values are primarily literals, or a list of values. References are typed references to other entities. The identity of every entity is described using a URI. All property and reference types are also described using URIs. An entity, or list of entities, can also be the value of a property.
The model is defined formally as:
```ebnf
entity := { id, deleted, recorded, properties, references }
id := xsd:uri
recorded := xsd:uint64
properties := [ key-val-pair* ]
deleted := xsd:boolean
references := [ key-ref-pair* ]
key-val-pair := xsd:uri , value
key-ref-pair := xsd:uri , xsd:uri | [ xsd:uri ]
value := child-entity | xsd:string |
xsd:int | xsd:datetime | xsd:uri | xsd:double | xsd:float | [ value* ]
child-entity := { id, properties, references } | { properties, references }
```
## JSON Serialisation
The JSON serialisation of an entity is simple and consistent. Each entity is serialised as a JSON Object, The id, deleted and recorded properties are top level keys in the JSON object, while the Properties and References are serialised as values of the `props` and `refs` keys respectively.
The following example shows an entity with no properties or references.
```json
{
"id" : "http://data.mimiro.io/people/bob"
}
```
Properties are serialised as a JSON object with multiple keys. The values of these keys are either a single value or a list of values:
```json
{
"id" : "http://data.mimiro.io/people/bob",
"props" : {
"http://data.mimiro.io/people/name" : "bob",
"http://data.mimiro.io/people/nicknames" : [ "bobby", "bobs" ]
}
}
```
By default the built in JSON data types are used for literals. Sometimes, however, it is necessary to be very specific about the data type for a literal value. In these cases the literal value can be prefixed with `xsd:xxx:`, where `xxx` is the specific xsd data type.
The following example is semantically equivalent to the previous example.
```json
{
"id" : "http://data.mimiro.io/people/bob",
"props" : {
"http://data.mimiro.io/people/name" : "bob",
"http://data.mimiro.io/people/nicknames" :
[ "xsd:string:bobby",
"xsd:string:bobs" ]
}
}
```
References are serialised as a JSON object with multiple keys. Values of these keys are either a single value or a list of values:
```json
{
"id" : "http://data.mimiro.io/people/bob",
"refs" : {
"http://data.mimiro.io/people/lives-in" : "http://data.mimiro.io/places/oslo",
"http://data.mimiro.io/people/friends" :
[ "http://data.mimiro.io/people/colin",
"http://data.mimiro.io/people/james" ]
}
}
```
A deleted entity is represented as:
```json
{
"id" : "http://data.mimiro.io/people/bob",
"deleted" : True
}
```
With deleted objects, it is also permissable to include all the properties and references at the time the entity was deleted.
## Context
The use of URIs for the identity of entities and property types makes the JSON verbose. To mitigate this, a context can be used.
A context can be provided as as the first object in an array of entities. The context defines the mappings between short prefixes and full URIs. This follows the ideas in (https://www.w3.org/TR/curie/).
A context is defined as a JSON object. The object has one key called `id`, whose value MUST be `@context`.
Another key `namespaces` MUST have a JSON object as the value. Each key pair in the namespaces object is an expansion definition.
The key '_' is the default expansion and used when no prefix is provided for an id value, a property key, or a reference key.
The followig context defines the default expansion and one expansion for the prefix `people`.
```json
{
"id" : "@context",
"namespaces" : {
"_" : "http://data.mimiro.io/properties/"
"people" : "http://data.mimiro.io/people/"
}
}
```
A context is used as part of an array of entities:
```json
[
{
"id" : "@context",
"namespaces" : {
"_" : "http://data.mimiro.io/properties/".
"people" : "http://data.mimiro.io/people/"
}
},
{
"id" : "people:bob",
"props" : {
"name" : "bob"
}
}
]
```
The expanded version of the above entity with given namespaces context is:
```
{
"id" : "http://data.mimiro.io/people/bob",
"props" : {
"http://data.mimiro.io/properties/name" : "bob"
}
}
```
# API
The API is defined in terms of a RESTful API. A server implementing the api MUST expose the following endpoints and implement the described semantics. The API uses a fixed URI structure (there are dynamic values within) to expose resources.
## Dataset List
The dataset list is accessed as follows:
```http
GET /datasets
```
The get dataset list request returns a list of dataset objects. Each dataset is represented as a single JSON object inside a JSON array. The JSON object MUST have the following property:
: name
:: the locally unique identifier of the dataset. There MUST be only one JSON object with the same name within the array of dataset objects.
A server MAY include additional properties in each JSON object.
The following example provides a normative example of a request / response interaction.
```JSON
GET /datasets
returns ==>
200 OK
[
{
"name" : "people"
},
{
"name" : "products",
}
]
```
## Dataset Information
The dataset is accessed as follows:
```http
GET /datasets/{dataset_name}
```
Each dataset can publish information about itself. Each dataset resource returns information about the dataset as a single JSON object.
The JSON object has the following properties:
: name (required)
:: dataset name
: since (optional)
:: Indicates if this dataset supports the since query parameter or always returns the complete dataset. If omitted client MUST assume false.
: lastModified (optional)
:: A date time of when the contents of the dataset last changed in UTC ISO date format.
The following example provides a normative example of a request / response interaction.
```json
GET /datasets/people
returns
{
"name" : "People",
"since" : true,
"lastModified" : "12-03-2020T00:00:00Z"
}
```
A server MAY include additional properties in each JSON object.
## Dataset Changes
The dataset changes are accessed as follows:
```http
GET /datasets/{dataset_name}/changes?since={since_token}
```
The changes endpoint provides a stream of entities that have changed in the dataset. If a since token is provided then the set of entities consists of only those entities that have changed after after the time indicated by this token. A change includes any entity that has been created, updated or deleted in the underlying datasource. The absense of a since token indicates the server MUST return all entities.
: dataset_name
:: path parameter defines the identity of the dataset.
: since
:: optional query parameter is a base64 encoded value that the server understands.
The response MAY contain the 'universal-data-api-fullsync' HTTP header. If this is set to true then the client SHOULD delete all local data and make a new request to the server with no since token. The data provided by the subsequent request should be consider the new local data.
The response body is an array of JSON Objects. The first object in the array MUST be a context, the following objects are entities serialised to JSON as described in the data model, the last entity MAY be a continuation object.
The following example is a normative request / response interaction.
```json
GET /datasets/people/changes?since=[since_token]
returns ==>
200 OK
[
{
"id" : "@context",
"namespaces" : {
"_" : "http://data.mimiro.io/people/".
}
},
{
"id" : "person-42",
"props" : {
"name" : "bob",
"title" : "mr",
"phone" : "+150050444"
}
},
{
"id" : "person-43",
"deleted" : true
}
]
```
The server is responsible for returning continuation tokens. These tokens can be stored and used by clients as values of subsequence since query parameter. A server returns a continuation token as a special json object in the response body.
A server MAY inject as many continuation tokens as it wants to in a response. Typically a server will return one token at the end of the response.
A continuation token is a JSON object that MUST have an 'id' key whose value MUST be '@continuation'. In addition, the JSON object MUST contain the key 'token' and the value is a base64 encoded string that the server can use as a since query parameter.
The following example is a normative request / response interaction.
```json
GET /datasets/people/changes
returns ==>
200 OK
[
{
"id" : "@context",
"namespaces" : {
"_" : "http://data.mimiro.io/people/".
}
},
{
"id" : "person-42",
"props" : {
"name" : "bob",
"title" : "mr",
"phone" : "+150050444"
}
},
{
"id" : "@continuation",
"token" : "sfbbfsbfjskjfsjk="
}
]
A subsequent client request would use the token as follows:
GET /datasets/people/changes?since=sfbbfsbfjskjfsjk=
```
## Dataset entities
The entities endpoint supports both read and update of the entities in a dataset. Servers are free to restrict access to datasets with security. The entities endpoint returns the latest set of entities in a dataset (this can of course be different from changes).
### GET
The dataset entities are accessed as follows:
```http
GET /datasets/{dataset_name}/entities?from={since_token}
```
The GET request has a path parameter of the dataset name and optionally a query parameter, from. The from parameter is used in cases where the server returns a continuation token as part of a response - typically to support paging.
The response from a GET operation on the entities of a dataset returns an array of entity JSON objects. The first object MUST be a context, followed by any number of entities serialised according to the rules above, and optionally finishing with a continuation object.
The following example is a normative request / response interaction.
```json
GET /datasets/people/entities
returns ==>
200 OK
[
{
"id" : "@context",
"namespaces" : {
"_" : "http://data.mimiro.io/people/".
}
},
{
"id" : "person-42",
"props" : {
"name" : "bob",
"title" : "mr",
"phone" : "+150050444"
}
}
... more entities
]
```
If the server wishes to serve pages of entities then it can use a continuation token:
```json
GET /datasets/people/entities
returns ==>
200 OK
[
{
"id" : "@context",
"namespaces" : {
"_" : "http://data.mimiro.io/people/".
}
},
{
"id" : "person-42",
"props" : {
"name" : "bob",
"title" : "mr",
"phone" : "+150050444"
}
}
... more entities,
{
"id" : "@continuation",
"token" : "xxxx-yyyyy"
}
]
```
Clients can use the token in subsequent requests to page through results.
```
GET /datasets/people/entities?from=xxxx-yyyyy
```
### Lookup
It is possible to lookup a single entity by its id using the entities endpoint. The request URL is in the form:
```
/datasets/{dataset_name}/entities?id={entity_id}
```
The entity_id is the full URI of the entity to lookup. It MUST be URL encoded.
The response is a single entity JSON object. If the entity is not found the server MUST return a 404 Not Found response.
### POST
It is also possible to POST changes to the entities endpoint. The changes can either be incremental or a full reload.
The request URL is in the form:
```
/datasets/{dataset_name}/entities
```
The following POST request is used to send incremental changes to the server. The body of the request is a JSON array. The first object in the array MUST be a context object. The following JSON objects are the entities to be updated, added or deleted.
```json
Example body:
[
{
"id" : "@context",
"namespaces" : {
"_" : "http://data.mimiro.io/properties/",
"people" : "http://data.mimiro.io/people/"
}
},
{
"id" : "people:42",
"props" : {
"name" : "bob",
"title" : "mr",
"phone" : "+150050444"
}
},
{
"id" : "people:43",
"deleted" : true
}
]
```
To send a full sync requires the use of the following headers used in a specific way. The headers are:
```
universal-data-api-full-sync-start : bool
universal-data-api-full-sync-end : bool
universal-data-api-full-sync-id : string
```
For a client to force a remote dataset to consider multiple requests as the full dataset to be reloaded it should send the above payload as one or more requests:
1. Send the body along with the HTTP header `universal-data-api-full-sync-id`. The value of this header is some generated id that is used for the duration of the full sync. The HTTP header `universal-data-api-full-sync-start` should also be included and have a value of `true`.
2. Send multiple requests containing entities and include the HTTP header `universal-data-api-full-sync-id`, whose value is the same as that sent in the first request.
3. When all entities have been sent or the client knows that this is the last request; send the http headers `universal-data-api-full-sync-end` value `true`, and `universal-data-api-full-sync-id` with the same value as used in all previous requests.
# Security
All communication MUST be over TLS via https endpoints.
It is recommended that JSON Web Tokens(JWT) are used for control of access to different datasets and permissions.
How these tokens are acquired is implementation and service specific.
# Common Patterns
## Conveying Type
By default all entities are untyped. To convey the type of an entity there is a convention that defines a reference between the entity and its type. The reference type used is `http://www.w3.org/1999/02/22-rdf-syntax-ns#type`.
```json
[
{
"id" : "@context",
"namespaces" : {
"rdf" : "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"types" : "http://data.example.org/types/",
"people" : "http://data.mimiro.io/people/"
}
},
{
"id" : "people:bob",
"refs" : {
"rdf:type" : "types:Person"
}
}
]
```
# JSON-LD Binding
The JSON-LD binding assumes a generalised mapping between an entity and the bounded context of an RDF resource. A single Entity corresponds to a RDF resource, the entity properties are RDF statements where the object is a literal or a blank node, and RDF statements where the object is a resource map to entity references. There are no specific mappings for ordered lists.
To align the JSON-LD binding with the core streaming semantic of the protocol we introduce JSON-LD Streaming representation.
The JSON-LD Streaming representation is am array of JSON objects. The first object in the array is a JSON-LD context object. Then follows N objects are JSON-LD entity representations. Optionally, the final json object is JSON-LD representation of a continuation token.
Given the following Entity Model stream as JSON object:
```json
[
{
"id" : "@context",
"namespaces" : {
"_" : "http://data.mimiro.io/people/" ,
"schema" : "http://data.mimiro.io/schema/",
"places" : "http://data.mimiro.io/places/",
"rdf" : "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
}
},
{
"id" : "bob",
"recorded" : 1672299810499868928,
"deleted" : false,
"props" : {
"schema:name" : "bob",
},
"refs" : {
"schema:lives-in" : "places:oslo",
"schema:friends" : [ "colin", "james" ],
"rdf:type" : "schema:Person"
}
},
{
"id" : "@continuation",
"token" : "sfbbfsbfjskjfsjk="
}
]
```
The JSON-LD stream is as follows:
```json
[
{
"@context": {
"core": "http://data.mimiro.io/core/uda/",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"people": "http://data.mimiro.io/people/",
"places": "http://data.mimiro.io/places/",
"schema": "http://data.mimiro.io/schema/"
}
},
{
"@id": "ns3:homer",
"core:recorded": 1672299810499868928,
"core:deleted": false,
"rdf:type": { "@id" : "schema:Person" },
"schema:lives-in": [ { "@id": "places:oslo"} ],
"schema:friends": [ { "@id": "people:colin"} , { "@id": "people:james"} ],
"schema:name": "bob"
},
{
"rdf:type": { "@id" : "core:continuation" },
"core:token": "AAgAAAACAAAAAAAAAAo="
}
]
```
Note that the context is a normal JSON-LD context object. The entities are each represented as a single JSON-LD object. The entity id becomes the `@id` property.
The built in properties, recorded, deleted, are mapped into the core namespace; `http://data.mimiro.io/core/uda/` as `core:recorded` and `core:deleted`.
The entity references are encoded in the form of
```json
{
"@id" : "reference URI"
}
```
The continuation token is represented as a JSON-LD object with the rdf type `core:continuation` and the token value as the `core:token` property.
## JSON-LD Content Type
The JSON-LD content type is `application/ld+json`. It should be used when the client is able to accept JSON-LD responses. It should be sent as an `Accept` header in the request.
The endpoints that support JSON-LD are:
* `/datasets/{dataset}/changes`
* `/datasets/{dataset}/entities`