Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 105 additions & 0 deletions aw_server/sync_api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
"""
ActivityWatch Sync API
Allows exporting and importing bucket data between devices.
"""
import json
from datetime import datetime, timezone
Comment on lines +5 to +6
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 import json is unused — jsonify handles all JSON serialization in this module.

Suggested change
import json
from datetime import datetime, timezone
from datetime import datetime, timezone

from flask import Blueprint, jsonify, request, current_app

from .api import ServerAPI

sync_blueprint = Blueprint("sync", __name__, url_prefix="/api")
Comment on lines +1 to +11
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Blueprint never registered

sync_blueprint is defined but never imported or registered in server.py. The AWFlask.__init__ method only calls self.register_blueprint(rest.blueprint)sync_blueprint is absent. All three sync endpoints are completely unreachable at runtime; the entire feature is a no-op until this is wired up.

Comment on lines +1 to +11
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 security Missing host_header_check — DNS rebinding vulnerability

Every route in rest.py is protected against DNS rebinding via the host_header_check decorator (applied at the Api level). The new sync_blueprint bypasses this entirely, exposing all three sync endpoints — including the full-data export — to DNS rebinding attacks from any malicious web page. The fix is to apply host_header_check as a before_request hook on sync_blueprint or decorate each view function.


@sync_blueprint.route("/0/sync/export", methods=["GET"])
def sync_export():
"""Export all buckets and their events as a portable format."""
api: ServerAPI = current_app.api
buckets = api.get_buckets()
export_data = {}

for bucket_id in buckets:
events = api.get_events(bucket_id, limit=None)
export_data[bucket_id] = {
"metadata": buckets[bucket_id],
"events": [{
"id": e.get("id"),
"timestamp": e.get("timestamp").isoformat() if hasattr(e.get("timestamp"), "isoformat") else e.get("timestamp"),
"duration": e.get("duration"),
"data": e.get("data", {}),
} for e in events],
}

return jsonify({
"version": 1,
"exported_at": datetime.now(timezone.utc).isoformat(),
"device_id": api.get_info().get("device_id", "unknown"),
"buckets": export_data,
})

@sync_blueprint.route("/0/sync/import", methods=["POST"])
def sync_import():
"""Import bucket data from another device.

Accepts JSON in the same format as export.
Uses last-write-wins conflict resolution based on event timestamps.
"""
api: ServerAPI = current_app.api
data = request.get_json()

if not data or "buckets" not in data:
return {"error": "Invalid sync data format"}, 400

source_device = data.get("device_id", "unknown")
imported_count = 0
skipped_count = 0

for bucket_id, bucket_data in data["buckets"].items():
Comment on lines +53 to +56
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 get_bucket_metadata raises NotFound, never returns None

ServerAPI.get_bucket_metadata is decorated with @check_bucket_exists, which raises a NotFound exception (HTTP 404) when the bucket is absent. It never returns a falsy value. The if not existing: branch is dead code — imports targeting a new bucket will always surface as a 404 error before the create_bucket call is reached.

# Create bucket if it doesn't exist
existing = api.get_bucket_metadata(bucket_id)
if not existing:
meta = bucket_data.get("metadata", {})
api.create_bucket(
bucket_id,
meta.get("client", "sync"),
meta.get("type", "unknown"),
meta.get("hostname", source_device),
)
Comment on lines +61 to +66
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 create_bucket's second positional parameter is event_type, not client. As written, client value is passed as event_type and type value as client, storing them in the wrong DB columns for every imported bucket.

Suggested change
api.create_bucket(
bucket_id,
meta.get("client", "sync"),
meta.get("type", "unknown"),
meta.get("hostname", source_device),
)
api.create_bucket(
bucket_id,
meta.get("type", "unknown"),
meta.get("client", "sync"),
meta.get("hostname", source_device),
)


# Import events
for event_data in bucket_data.get("events", []):
ts = event_data.get("timestamp")
duration = event_data.get("duration", 0)
event_payload = event_data.get("data", {})

# Last-write-wins: check if event with same id exists
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The comment says "Last-write-wins" but the code does the opposite — it skips an event when a matching ID already exists, making it first-write-wins. This mismatch will mislead anyone trying to understand or extend the conflict-resolution logic.

Suggested change
# Last-write-wins: check if event with same id exists
# First-write-wins: skip event if it already exists locally

event_id = event_data.get("id")
if event_id:
Comment on lines +71 to +76
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Deduplication fetches only 1,000 events per event, silently re-importing the rest

get_events(bucket_id, limit=1000) is called inside the per-event loop, so for a bucket with more than 1,000 events the deduplication check only covers the first 1,000. Events beyond that window will be silently re-imported on every sync, causing duplicates. This also makes the import O(N×M) in the number of events — for large buckets this becomes very slow.

existing_events = api.get_events(bucket_id, limit=1000)
exists = any(e.get("id") == event_id for e in existing_events)
if exists:
skipped_count += 1
Comment on lines +79 to +80
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 heartbeat() called with wrong signature — runtime crash on every import

ServerAPI.heartbeat has the signature heartbeat(bucket_id, heartbeat: Event, pulsetime: float). This call passes a plain dict (event_payload) where an Event object is required, passes duration as pulsetime (semantically unrelated — pulsetime is a merge window, not an event duration), and passes timestamp=ts which is not a parameter of heartbeat at all. Every call will raise a TypeError. The correct approach is to construct an Event object and call api.create_events(bucket_id, [event]) for bulk import, or use the existing api.import_bucket helper.

continue

api.heartbeat(bucket_id, event_payload, duration, timestamp=ts)
imported_count += 1

return jsonify({
"imported": imported_count,
"skipped": skipped_count,
"source_device": source_device,
})

@sync_blueprint.route("/0/sync/status", methods=["GET"])
def sync_status():
"""Get sync status info."""
api: ServerAPI = current_app.api
buckets = api.get_buckets()
info = api.get_info()

return jsonify({
"device_id": info.get("device_id", "unknown"),
"hostname": info.get("hostname"),
"version": info.get("version"),
"bucket_count": len(buckets),
"bucket_ids": list(buckets.keys()),
})