Skip to content

Conversation

@UnravelSports
Copy link
Contributor

It seems SecondSpectrum assigns a lot of dead ball frames with totally wrong ball coordinates (i.e., [0.0,0.0,-10]). They are easily spotted because they for ball_z to -10.

I've added a parameter to the SecondSpectrum deserializer that allows us to optionally skip all these frames, reducing overhead. This seems redundant when only_alive=True but it's not when only_alive=False because not all dead ball frames have this value. It seems to happen when ball is actually out of frame or something.

Note: this builds on PR#522

@probberechts
Copy link
Contributor

probberechts commented Dec 22, 2025

I haven’t looked at the data myself yet, but I don’t fully understand why you would want to remove these frames. Are these simply frames where the ball’s location wasn’t tracked correctly? Hence, I guess this could also happen if the ball is in play but briefly occluded by a player, for example. In that case, I would rather set the ball_coordinates to None than discard the entire frame. That would also be more consistent with how it's handled by other data providers.

@UnravelSports
Copy link
Contributor Author

Apologies for not being more clear in my message. From the data I have I can see that the ball_z = -10 only happens while ball_state="dead". It's also worth nothing that it's about 45% of all frames that have this, so it's a significant overhead to parse them all out, and then have to filter them out right after because they are useless frames in most cases. Setting only_alive=True is not good enough for this.

missing ball_state len
true "dead" ~68000
false "alive" ~83000
false "dead" ~3000

Could we do both? Settings ball coordinates to None if ball_z = -10 and allowing the user to remove them if they want?

Note: I currently have only 1 game.

@probberechts
Copy link
Contributor

So I understand that there are two separate issues here.

  1. SecondSpectrum uses z = -10 as a sentinel value to indicate missing ball coordinates.
    Instead of parsing this as a Point3D(x, y, -10), we should interpret it in kloppy as a None value.

  2. You want to introduce a parameter that automatically filters out frames where the ball coordinates are missing.
    I have three concerns about this:

    • If we add such a parameter, it should apply to all data providers, not just SecondSpectrum.
    • I would definitely not make this the default behavior.
    • I’m still not convinced about the usefulness of this parameter or what the concrete use case for it would be. In practice, I’ve never wanted to filter out frames with missing ball coordinates as a first step. Instead, I would typically try to interpolate the missing positions or (depending on the analysis) discard entire sequences that contain missing data. I don't really see why you would want a dataset with unpredictable gaps in it.

@UnravelSports
Copy link
Contributor Author

UnravelSports commented Dec 23, 2025

  1. Yes

2a. I don't think it has to apply to all providers, skillcorner has include_empty_frames as a parameter, I see it analogous to that, but the wording can't be the same here, because the behavior is slightly different.
2b. I agree, the default is set to True (so we include them)
2c. It's a provider specific problem where specifically dead ball frames have bad data for the ball. only_alive doesn't cut it because it removes too much, and simply loading everything could be okay too but it seems like a waste. Finally, there is no way to actually interpolate these missing frames because it's 45% of the data set. And, since it's with only dead ball situations I don't see the point of interpolation.

What we could do (but this might make it more complicated for the user) is have include_missing_ball_frames=False only work on dead ball frames. We might have to rename the parameter then.

@probberechts
Copy link
Contributor

probberechts commented Dec 23, 2025

2a. Missing ball coordinates is something that appears with all data providers if the ball goes too far out of bounds. I do not see why this would only apply to SecondSpectrum.
2b. Indeed, but I meant that the parameter name should be something like exclude_...=False. If you use include_empty_frames it seems like including them is the non-default.
2c. But why do you want to know the position of the ball when it is not in-play? Almost always, I would use only_alive=True anyway. And if I would use only_alive=False that is because I'm interested in the positions of the players (e.g., figure out how they set up for a set piece) and I really do not care about where the a ball is. Hence, I still do not know any use case where I would like to use this parameter.

@UnravelSports
Copy link
Contributor Author

UnravelSports commented Dec 24, 2025

2a. True, but not sure every data provider is as clear about the fact that the ball is not being captured. At least secondspectrum sets it to -10 when they can't track it. I'm fine if you want this to be available for every provider, but I've never seen it being made this clear / explicit with other data providers. Also, I don't think it makes sense for broadcast providers, only in stadium because their quality of tracking is higher and they have a lot fewer frames with missing ball while the ball is actually in play. My idea is only to throw out the frames when ball is dead.
2b. That makes sense.
2c. The out of bounds situations where the ball is close to being back in the pitch, or leading up to being within bounds (e.g. around throw-ins, corners or free kicks) can actually be useful for analysis. Additionally, databallpy synchronization [and other syncing algorithms] expects dead ball frames too by default and they apply a weighting factor (or not) for dead ball frames to discount them, because sometimes syncing makes more sense to a dead ball frame (e.g. when the data provider annotates alive too late etc.)

@UnravelSports
Copy link
Contributor Author

@probberechts I have renamed include_missing_ball_frames to exclude_missing_ball_frames and flipped the functionality. I've also created an Issue #528 so we can track if other providers have / need a similar functionality too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants