Hi, I'm trying to use your code as part of an algorithm on a large dataset using Scala and Apache Spark. I'm having great results in terms of accuracy but I did if on several samples of GPS tracklog data and have a very skewed distribution of duration
| Metric |
Min |
25th percentile |
Median |
75th percentile |
Max |
| Duration |
10 s |
1.2 min |
6.2 min |
12 min |
51 min |
| GC Time |
0.2 s |
2 s |
4 s |
4 s |
10 s |
| Input |
25.5 MB |
128.1 MB |
128.1 MB |
128.1 MB |
128.1 MB |
| Output |
18.4 MB |
93.3 MB |
93.6 MB |
93.7 MB |
94.0 MB |
I would like to know it this is an expected behaviour of this algorithm or if you have some tips and tricks to have a more stable results for any dataset (having less variance, maybe at the cost of having an higher average duration.
Hi, I'm trying to use your code as part of an algorithm on a large dataset using Scala and Apache Spark. I'm having great results in terms of accuracy but I did if on several samples of GPS tracklog data and have a very skewed distribution of duration
I would like to know it this is an expected behaviour of this algorithm or if you have some tips and tricks to have a more stable results for any dataset (having less variance, maybe at the cost of having an higher average duration.