Skip to content

Event Based Time Series #229

@nsteins

Description

@nsteins

Proposing a new class for Traces EventSeries for handling data that is a series of timestamps denoting the occurrence of discrete events. For example this collection of 311 requests in Chicago, where each record is a request that has a timestamp for when it was opened and when it was closed. This is a fit for Traces because it is another example of unevenly-spaced time series and can use traces.TimeSeries for certain calculations

An example of how the API might look

df = pd.read_csv('311_Service_Requests.csv',nrows=10000)
creation = EventSeries(df['CREATED_DATE'].dropna())
completion = EventSeries(df['CLOSED_DATE'].dropna())

Event series could tell you the amount of events that occured between two arbitrary timestamps

>>> creation.events_between(pd.Timestamp('2018-01-01'),pd.Timestamp('2019-02-01'))
6681

EventSeries would also have a cumulative sum function which returns a TimeSeries of the cumulative number of events that have occured since the first record

>>>ts = creation.cumsum()
>>>ts.plot()

image

For events that have a "open" and "close" time stamp, EventSeries can calculate the number of active open cases

>>>diff = EventSeries.count_active(creation, completion)
>>>diff.plot()

image

Finally, EventSeries can calculate the inter-event arrival times and create visualizations for analysis

>>>after = creation.time_lag(how='after')
>>>creation.plot_time_lag(how='after')

image

I am already working on implementing this, but I would appreciate feedback and suggestions on API or features. Particularly interested if this can be extended to support the use case outlined in this issue #227

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions