Skip to content

Raccoon needs to add ingestion time to every event #11

@chakravarthyvp

Description

@chakravarthyvp

Problem

We (GoJek) use Raccoon currently to source clickstream events from the gojek app. The concrete product proto contains an event_timestamp field which the downstream systems such as DWH can use to partition the data on. However we see some amount of data arrives in partitions in future dates while some other arrive at different days for the same event timestamp date. There are 2 scenarios that causes this issue:

  1. The time/clock in the mobile app is reset by the user to a future date
  2. The app was inactive and those events were sent at a later point of time by the mobile sdk

Is there any workaround?
The DWH can partition based on a field which is like an ingestion time into the warehouse. However this needs backfills & repartitions on existing data and the upstream applications may need to change the way they query.

What is the impact?
Upstream applications' & services' query returns erroneous results

Which version was this found?
NA

Solution
Raccoon needs to provide an ingestion time for each event. The ingestion time should be considered as the time it was ingested into raccoon. This enables DWH to partition data based on the ingestion time as an alternate option to event_timestamp.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions