The file Dockerfile.pg-hll builds a postgresql 13 container with HyperLogLog extension.
docker build -f Dockerfile.pg-hll -t pg-hll .
Then run the container like this :
docker run -e POSTGRES_PASSWORD=password -p 5432:5432 pg-hll
Migrations are managed with yoyo
pip install yoyo-migration psycopg2
yoyo apply --database postgresql://user:password@host:port/database migrations
Columns definition :
id: uniquely identify an entry in this tablename: the usual acronym of the nodecontact: an email contact
Describe all the tokens :
id: uniquely identify a tokennode_id: refernce of thenode(id)columnvalue: the value of the tokenvalid_from: timestamp when the token starts to be validvalid_until: when the token expirescreated_at: timestamp when the token has been created
This table keeps track of all the payloads received. In order to prevent the server to ingest twice the same statistics
node_id: reference of thenode(id)columnhash: mmh3 hash of the received payloadversion: the version of the aggregation schemagenerated_at: the date when the aggregation has been computedfirst_stat_at: the first date for the statistic (not the month, but the real event)last_stat_at: the last date for the statistic (not the month, but the real event)created_at: timestamp when the payload has been received
Uniqueness is defined on node_id + hash
Columns definition :
node_id: reference of thenode(id)columndate: the date of the statistics. It's the first day of a month, as the stats are aggregated monthly.network: extended seismic network code (eg Z32015)station,location,channel: usual seismological stream identificationcountry: iso code for the countrybytes: quantity of data deliverednb_reqs: number of total requestsnb_successful_reqs: number of successful requests (http rc 200)nb_failedrequests: number of failed requests TODO: clarify this definitionclients: unique IPs as a HyperLogLog hashcreated_at: timestamp when the statistic has been submittedupdated_at: timestamp when the statistic has been updated
Uniqueness is defined on node_id + date + network + station + location + channel + country
This view is used as a helper to consult statistics coverage for each node.
CREATE OR REPLACE VIEW coverage AS
SELECT a.name AS node, unnest(b.stats) AS stat_day FROM
(SELECT nodes.id,nodes.name FROM nodes) a
LEFT JOIN
(SELECT node_id, array_agg(splitup) as stats FROM
(select node_id, unnest(coverage) splitup FROM payloads) splitup
group by 1) b
ON a.id=b.node_id;For instance, in order to get the monthly coverage for a given node :
select node, a.month, round(count(*)/(SELECT extract(days FROM a.month + interval '1 month - 1 day'))*100) as percent from
(select node, date_trunc('month',stat_day) as month, stat_day from coverage where node='RESIF' and
date_part('year', stat_day)=2021) as a group by 1,2 order by 1;