While the EBMs are an incredible model, a main pain-point is that they are relatively slow to fit (compared to random forests). Bagging can easily be parallelized, while boosting is sequential. However, in the special case of EBMs I get the impression, that it should be possible to parallelize the boosting algorithm over the features.
In the explanations for EBMs, it is mentioned that a tiny tree is fit to each feature in a round-robin fashion while employing a very small learning rate, such that the order of features doesn't matter. If the order doesn't matter, shouldn't it be possible to fit all features in parallel? Of course, the results will be slightly different (in the same way, the results would be slightly different with the round-robin algorithm if we shuffle the features).
For now, this is mostly a theoretical questions, as I haven't investigated the C++ core yet. A main question is, of course, what kind of speed-ups are possible if the boosting can indeed be parallelized. After all, communication is necessary after each boosting round of fitting a tree for each feature. This probably depends on how expensive it is to fit the trees compared to the cost of calculating the residues.
While the EBMs are an incredible model, a main pain-point is that they are relatively slow to fit (compared to random forests). Bagging can easily be parallelized, while boosting is sequential. However, in the special case of EBMs I get the impression, that it should be possible to parallelize the boosting algorithm over the features.
In the explanations for EBMs, it is mentioned that a tiny tree is fit to each feature in a round-robin fashion while employing a very small learning rate, such that the order of features doesn't matter. If the order doesn't matter, shouldn't it be possible to fit all features in parallel? Of course, the results will be slightly different (in the same way, the results would be slightly different with the round-robin algorithm if we shuffle the features).
For now, this is mostly a theoretical questions, as I haven't investigated the C++ core yet. A main question is, of course, what kind of speed-ups are possible if the boosting can indeed be parallelized. After all, communication is necessary after each boosting round of fitting a tree for each feature. This probably depends on how expensive it is to fit the trees compared to the cost of calculating the residues.