Clarifying flat vs. structured data responses

Hi @mmcclenn & @jpjenk I just want to clarify the discussion we had about flat data structures in the API response.

Right now, regardless of data format (`json`, `xml`, `csv`), we are returning data as a flat table.

I understand the motivation for doing this for `csv` formats, but the JSON and XML formats are designed to return structured data, so I'm not clear why we wouldn't use this in that case.

For example, the [bibJSON schema](http://okfnlabs.org/bibjson/) for publications is designed to support (for example) variable length author lists, or sets of publications with differing reference structures.

Given the extent of repetition and the potentially large size of some of our responses it might make sense to consider structured data formats for some of the responses, particularly since we're making our users define the response type they're expecting.

For example, a `publication` response in `JSON` would use the bibJSON standard, while in CSV is would be wide table that could be saved as `csv`.

My thinking is two-fold:
1. I want to avoid repetition in the response as much as possible.  Even structuring the API response for occurrences:

``` javascript
{
"elapsed_time":14.8,
"warnings":[
"Neotoma: Request failed",
"Neotoma:  WKT not properly formatted: Polygon((-180 -90,10 -90,10 180,-180 180,-180 -90))"
],
"records": [
{"Database":"PaleoBioDB","OccurrenceID":"pbdb:occ:94749","RecordType":"Occurrence","TaxonName":"Busycon","TaxonID":"pbdb:txn:10874","AgeOlder":2.588,"AgeYounger":0.0117,"AgeUnit":"Ma","SiteID":"pbdb:col:7108"},
. . . 
{"Database":"PaleoBioDB","OccurrenceID":"pbdb:occ:94752","RecordType":"Occurrence","TaxonName":"Busycotypus canaliculatus","TaxonID":"pbdb:txn:94432","AgeOlder":2.588,"AgeYounger":0.0117,"AgeUnit":"Ma","SiteID":"pbdb:col:7108"}]}
```

versus:

``` javascript
{
"elapsed_time":14.8,
"records": [
{"Database":"PaleoBioDB","occurrences":[{"OccurrenceID":"pbdb:occ:94749","RecordType":"Occurrence","TaxonName":"Busycon","TaxonID":"pbdb:txn:10874","AgeOlder":2.588,"AgeYounger":0.0117,"AgeUnit":"Ma","SiteID":"pbdb:col:7108"},
. . . 
{"OccurrenceID":"pbdb:occ:94752","RecordType":"Occurrence","TaxonName":"Busycotypus canaliculatus","TaxonID":"pbdb:txn:94432","AgeOlder":2.588,"AgeYounger":0.0117,"AgeUnit":"Ma","SiteID":"pbdb:col:7108"}]}]}

```

saves us an astounding 24 bytes per row :)  Which isn't that much, I suppose, but then we could add a bit more structure, returning a taxon table for multi-taxon responses that would link the taxon IDs to the names, so we wouldn't need to repeat those as well.  I think we'd see performance improvements in the downstream applications that use the application, particularly web based services that use JSON natively.

Tagging @spatialit as well.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarifying flat vs. structured data responses #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarifying flat vs. structured data responses #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions