Skip to content

Conversation

@lode-mgp
Copy link
Contributor

@lode-mgp lode-mgp commented Jun 5, 2025

This PR represents an effort to implement the functionality described in Issue 476.

  • I refactored the MinutesPlayedAggregator to make it more flexible and extensible.
  • I added an option to calculate minutes played per possession state (in possession, out of possession and dead ball).

Some notes:

  • The effective play time per game falls in the range of 55-65 which seems to be realistic.
  • This PR involves a breaking change, as the include_position flag is no longer supported and player inside MinutesPlayed is replaced by a more generic key. For example, this breaks the example in the documentation, since you can no longer access item.player but should use item.key.player. How should we handle this?

@UnravelSports
Copy link
Contributor

UnravelSports commented Jun 6, 2025

I haven't fully digested this, but shouldn't the following also be in RESULTS_CAUSING_DEAD_BALL or RESULTS_CAUSING_DEAD_BALL:

  • PassResult.OUT
  • ShotResult.OFF_TARGET
  • InterceptionResult.OUT
  • EventType.BALL_OUT (not sure why this exists next to PassResult.OUT and InterceptionResult.OUT, but that's beside the point.
  • SetPieceType in case providers mistag or miss an event prior?

I say the above with the notion that some data providers are really bad at tagging "OUT" events.

Additionally (and I think the answer is no), but is there a way to account for the fact that the Result has the time stamp of the action and not the "end timestamp" / exact moment it happened? Perhaps this works for EventType.BALL_OUT but not for PassResult.OUT since it's an outcome associated with a Pass that happened prior.

Also, it MinutesPlayedAggregator inherits from EventDatasetAggregator, but we do some of these operations on TrackingDataset right?

@lodevt
Copy link
Contributor

lodevt commented Jun 6, 2025

Thanks for your feedback @UnravelSports!

I haven't fully digested this, but shouldn't the following also be in RESULTS_CAUSING_DEAD_BALL or RESULTS_CAUSING_DEAD_BALL:

  • PassResult.OUT
  • ShotResult.OFF_TARGET
  • InterceptionResult.OUT
  • EventType.BALL_OUT (not sure why this exists next to PassResult.OUT and InterceptionResult.OUT, but that's beside the point.
  • SetPieceType in case providers mistag or miss an event prior?

The results/event types you mentioned indeed also cause a dead ball, but strictly speaking, we don't need to include them for the results to be correct because of the following:

  • the ball state ofEventType.BALL_OUT already is BallState.DEAD (so adding it it in the EVENTS_CAUSING_DEAD_BALLwouldn't make a difference)
  • events with PassResult.OUT, ShotResult.OFF_TARGET or InterceptionResult.OUT are always followed by a BALL_OUT event

So if the provider correctly tags "OUT" events, then it is not necessary to include these results in the RESULTS_CAUSING_DEAD_BALL.

I say the above with the notion that some data providers are really bad at tagging "OUT" events.

But of course, if the BALL_OUT events are often missing, then this makes it indeed necessary to include more items in the EVENTS_CAUSING_DEAD_BALL / RESULTS_CAUSING_DEAD_BALL. Do you have an example of a data provider that is very inconsistent with BALL_OUT events?

Additionally (and I think the answer is no), but is there a way to account for the fact that the Result has the time stamp of the action and not the "end timestamp" / exact moment it happened? Perhaps this works for EventType.BALL_OUT but not for PassResult.OUT since it's an outcome associated with a Pass that happened prior.

By excluding PassResult.OUT from RESULTS_CAUSING_DEAD_BALL (in the assumption this is always followed by BALL_OUT event), this problem would be solved (in the assumption each data provider correctly tags out events). However, it would still exist in other cases (for example PassResult.OFFSIDE). I think we can account for this by using the receive_timestamp(if this information is provided by the data provider). But if the receive_timestamp is missing, I also don't really see a way to account for this inaccuracy.

Also, it MinutesPlayedAggregator inherits from EventDatasetAggregator, but we do some of these operations on TrackingDataset right?

All operations can currently be done on an EventDataset (the information about whether a player was on the pitch or not is extracted from his positions which include a start and end time).

@UnravelSports
Copy link
Contributor

From my experience Opta is not great at tagging out events.

I think it makes sense if we simply include everything that should create, or happen within a Dead Ball state simply to cover all bases. (I don't really see a downside to this).

Using receival_timestamp should be fine if it exists indeed!

@lode-mgp
Copy link
Contributor Author

lode-mgp commented Jun 10, 2025

I have updated the logic to include everything that should create a "dead ball".

I also handled set pieces if the ball_state was not already "dead". Sometimes a set piece would occur without being preceded by a BallOutEvent or any other event that would indicate a dead ball. This can happen, for example, for clearances that went out of play (ClearanceEvent has no result). It can also happen for an incomplete CarryEvent, or just when a provider misses a BallOutEvent. In this case, I made the assumption that the previous event (excluding generic events) would have caused the dead ball.


I ran into some other issues while working on this:

  • The type of end_timestamp (CarryEvent and PressureEvent) and receive_timestamp (PassEvent) was correctly marked as Time, but it was actually a timedelta. I updated this in the corresponding event definitions.

  • I think there is an issue with the files/statsbomb_lineup.json test file:

    • The last (non-generic) event in the events file is a ShotEvent (with ID f942c5b5-df4b-4ee4-9e90-ed5f5), but this is incorrect since there are events with a later timestamp in that test file.
    • Additionally, the minute and second fields of the raw event seem to be inconsistent, as the event happens in the second period but is tagged with "minute": 44 (should be 89).

@lode-mgp
Copy link
Contributor Author

@UnravelSports @koenvo Any thoughts on this?

@UnravelSports
Copy link
Contributor

A couple of this:

  1. Should we add a test to assert the times are the same for both teams? Just as a sanity check.
teams = list(team_minutes_played_map.keys())
assert team_minutes_played_map[teams[0]] == team_minutes_played_map[teams[1]]
  1. I think it might be inline with the rest of kloppy's functionality to allow the breakdown key to be the string representation of the BreakdownKey too. So, "possession_state" and "position".
minutes_played_per_possession_state = dataset.aggregate(
    "minutes_played", breakdown_key="possession_state"
)

I haven't looked at the actual inner workings.

@UnravelSports UnravelSports added this to the 3.19.0 milestone Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants