You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To determine, what we call, a "generator-id", often also called machine-id or worker-id is one of the harder parts of using IdGen correctly.
Ideally you have a service that coordinates handing out these generator-id's as you spin up idgenerators. However, one problem of this is that you still have a dependency on a single point of failure. You could make these coordinators redundant and use fallbacks etc. but things get really complicated, real fast.
For predictability a 'static' assignment is a much better choice. Every process using an IdGenerator could take it's machine ID from an environmentvariable or a config file or the kubernetes node ID or... a bunch of other options. Ofcourse, care has to be taken that these things are unique (enough). Given that we're ready to leave that responsibility to the developer or system administrator, I can see a usecase for several ways to determine the generator-id:
IP Address: what differentiates machines in a cluster is often, among other things, it's IP address
MAC address: see IP address, MAC addresses are usually (no, not always) unique
Datacenter / machine ID: if the host is somehow aware of it's datacenter and rackspace "id" this could be used
Kubernetes pod name / node id should be unique
Other options could be:
CPU-ID
Motherboard serial or BIOS UUID etc.
Hostname
AWS / Azure / GCP / ... instance ID or other metadata
For the first group of options I think these are (or should be) "safe" to use without collision given that care is taken in how these things are used. For instance as long as we have a generator-id of, say, 16 bits we can use a host's IP address to determine it's generator-id as long as we use the last two octets in the (IPv4) IPAddress and coordinate with the sysadmin that all IP's (using IdGen) should be in, say, the 10.0.x.x range. So, with some careful planning and coordination we should be able to guarantee unique generator-id's based on a hosts' IP address.
MAC addresses are a little tougher; we usually don't control these but I can see this being controlled in a Citrix or VMWare cluster or something where VM's are given 'hardcoded' MAC addresses. Again, only if some requirements are met, this could work.
Datacenter / machine ID: as long as a host has -some- way of retrieving it's datacenter and, say, rack number I could see generator ID's being based on this. And same goes for K8S pod name / node id. The host should, somehow, know how to get these values and 'convert' these into a generator-id. As long as some requirements are met. Care should (always) be taken to not run into any collisions (or just YOLO it...).
The second set of options are, for me, too high of a risk of a collision and/or we're not able to control these things too much. Yes, we could take some hash of a hostname and take X amount of bits to base the generator-id on, but the chance on collisions are too high. I could see a -very- careful coordination of hostname or cloud-metadata being used but I'm afraid control over this is too little.
I have implemented 3 "id-generator creators" (better naming options are very welcome 😅) that could help in this process:
The IPAddressGeneratorIdCreator takes an IPv4 or IPv6 address or uses the first (desired IPv4 or IPv6 or any) IP address of the machine, takes the desired amount of bits of that IP and uses that to create a generator-id from.
The MacAddressGeneratorIdCreator does basically the same, it takes a given (or the hosts first) MAC address and uses it to create a generator-id from.
The last, the DatacenterMachineIdGeneratorIdCreator assumes you, somehow, determined your datacenter-id and worker-id and will combine these to into a generator-id.
All three implementations are quite basic.
Now, here is my discussion:
I'm thinking of providing these the "id-generator creators" in the 'code' IdGen package. I think their usage would be common enough to warrant that. However, for other such "id-generator creators" (e.g. a K8SGeneratorIdCreator or CPUIDGeneratorIdCreator) I would prefer a separate package (for each it's own package so that when you need to use some packages for, say, K8S to call some API for node information you don't have to force these dependencies on others that don't use K8S). So, maybe a separate package for the three proposed "id-generator creators" would also make sense? What are your thoughts?
As mentioned earlier, besides the 3 already implemented "id-generator creators" I have ideas for other "id-generator creators". Do you have other ideas not mentioned already?
Are there other thoughts or ideas you head while reading this? Please, discuss!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
To determine, what we call, a "generator-id", often also called machine-id or worker-id is one of the harder parts of using IdGen correctly.
Ideally you have a service that coordinates handing out these generator-id's as you spin up idgenerators. However, one problem of this is that you still have a dependency on a single point of failure. You could make these coordinators redundant and use fallbacks etc. but things get really complicated, real fast.
For predictability a 'static' assignment is a much better choice. Every process using an IdGenerator could take it's machine ID from an environmentvariable or a config file or the kubernetes node ID or... a bunch of other options. Ofcourse, care has to be taken that these things are unique (enough). Given that we're ready to leave that responsibility to the developer or system administrator, I can see a usecase for several ways to determine the generator-id:
Other options could be:
For the first group of options I think these are (or should be) "safe" to use without collision given that care is taken in how these things are used. For instance as long as we have a generator-id of, say, 16 bits we can use a host's IP address to determine it's generator-id as long as we use the last two octets in the (IPv4) IPAddress and coordinate with the sysadmin that all IP's (using IdGen) should be in, say, the 10.0.x.x range. So, with some careful planning and coordination we should be able to guarantee unique generator-id's based on a hosts' IP address.
MAC addresses are a little tougher; we usually don't control these but I can see this being controlled in a Citrix or VMWare cluster or something where VM's are given 'hardcoded' MAC addresses. Again, only if some requirements are met, this could work.
Datacenter / machine ID: as long as a host has -some- way of retrieving it's datacenter and, say, rack number I could see generator ID's being based on this. And same goes for K8S pod name / node id. The host should, somehow, know how to get these values and 'convert' these into a generator-id. As long as some requirements are met. Care should (always) be taken to not run into any collisions (or just YOLO it...).
The second set of options are, for me, too high of a risk of a collision and/or we're not able to control these things too much. Yes, we could take some hash of a hostname and take X amount of bits to base the generator-id on, but the chance on collisions are too high. I could see a -very- careful coordination of hostname or cloud-metadata being used but I'm afraid control over this is too little.
I have implemented 3 "id-generator creators" (better naming options are very welcome 😅) that could help in this process:
The
IPAddressGeneratorIdCreatortakes an IPv4 or IPv6 address or uses the first (desired IPv4 or IPv6 or any) IP address of the machine, takes the desired amount of bits of that IP and uses that to create a generator-id from.The
MacAddressGeneratorIdCreatordoes basically the same, it takes a given (or the hosts first) MAC address and uses it to create a generator-id from.The last, the
DatacenterMachineIdGeneratorIdCreatorassumes you, somehow, determined your datacenter-id and worker-id and will combine these to into a generator-id.All three implementations are quite basic.
Now, here is my discussion:
K8SGeneratorIdCreatororCPUIDGeneratorIdCreator) I would prefer a separate package (for each it's own package so that when you need to use some packages for, say, K8S to call some API for node information you don't have to force these dependencies on others that don't use K8S). So, maybe a separate package for the three proposed "id-generator creators" would also make sense? What are your thoughts?I'd like to hear your opinions!
Beta Was this translation helpful? Give feedback.
All reactions