Before:
00:20:11,320 --> 00:20:15,399
One: If he's secretly got
a thing for big women,
After:
00:20:11,320 --> 00:20:15,399
If he's secretly got
a thing for big women,
I have similar script for stripping SDH, and came up with idea to calculate number of each potential SDH feature, and strip each only when it meets certain threshold.
sdh_features = {
brackets: texts.count("["),
parentheses: texts.count("("),
speaker_labels: texts.scan(/^-?\s*\[?\p{Lu}[\p{L}\s\.]+\]?:\s*/).size,
speaker_labels_upper_case: texts.scan(/^-?\s*\[?\p{Lu}[\p{Lu}\s\.]+\]?:\s*/).size,
all_caps: texts.scan(/^[A-Z ]+$/).size
}
Thoughts on this?
Before:
After:
I have similar script for stripping SDH, and came up with idea to calculate number of each potential SDH feature, and strip each only when it meets certain threshold.
Thoughts on this?