-
Notifications
You must be signed in to change notification settings - Fork 0
Parallelism in Storm
Parallelism in Storm is bounded to streams; the more computational power we want, the more spouts and bolts our topology needs. Given this system, it makes a lot of sense make our tuples as much independent as we can, so the maximum amount of parallelism can be achieved. To set up a topology efficiently it is needed to know topics such as how daemons interact with each other, how the worker processes compute, or how to map topology spouts and bolts to physical machines so they latency and bandwidth requirements are satisfied as much as possible given our machine.
Storm has the Nimbus process and the Supervisor processes. Nimbus process is in charge of knowing the topology and to give basic orders to Zookeeper servers to coordinate, which are not used to do message passing but are a bridge between Nimbus and supervisor processes; among other functionalities they can save the Nimbus and supervisor states to allowing they to be fail-fast and stateless.