-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
C
- previously we made a deicsion to bake envoy + consul into each dker image, hopefully this doesnt backfire on us with nomad integration
- any issues we're facing at the network layer is pure knowledge gap
- im sure the architecture is sound unless evidence proves otherwise
- nomad has first class consul (and vault) integration,
- however, we are using nomad to start the consul service: lets see how this chicken-and-egg dependency plays out
- best case scenario:
- we can leave the consul + envoy baked into each image, supporting interoperability between envs
- we dont need to setup consul for nomad tasks
- we just need to point upstreams to the consul allocation
- this can be achieved via a template on each task, than queries
nomad service Xto find retrieve the service IP
- register nomad clients with the consul agent for the task their running
- this is overkil: we just need to know where the services are deployed, then consul + envoy will take over
- or perhaps set group.service.provider === nomad
- worked perfectly
- best case scenario:
- best case worked out perfectly: leaving this here for when I forget in the future
- workaround scenario 1:
- we create a user-defined network and have all clients join it
- then upstreams can discover core-consul via nomad SRV records
- all services use consul intentions anyway to manage authnz, so this shouldnt be too much of a security concern
- workaround scenario 2:
- we do a soft integration with consul + nomad, just for service discovery between allocations
- one thing to watch for is redundant envoy + consul processes running
- each cunt has a bootstrap file for managing the consul agent + envoy sidecar thats baked into the image
- if we then run another consul + envoy process for nomad, that redundancy seems wasteful
- worst case scenario: we have to remove consul + envoy from the image
- this will require us to add additional docker services (1 for consul, 1 for envoy) for each application service in the compose file for development
- definitely not something we want to do, hence why we baked them into the image
- we will have to dupliate that logic in nomad for each env,
- not something we want to do, hbence why we baked them into the image
- this will require us to add additional docker services (1 for consul, 1 for envoy) for each application service in the compose file for development
- less worst case, but stilll worst case scenario
- use nomad for development:
- then having consul + envoy baked into the image will be the problem, instead of this ticket
- we can configure consul + envoy as a system job and it will automatically be provisioned on each client
- this is idiomatic nomad
- not something we want to do, nothing beats just pure fkn docker for development
- lol hence why we baked the fkn consul and envoy into the image
- we have validation, explicitly for running prod-like environment without imposing restrictions/non-dev concerns on developers
- use nomad for development:
T
- docker tasks use docker bridge and not nomad bridge, so we need to configure it
- group.service: attrs to review
- x
- group.network: attrs to review, and should be used instead of task when attrs clash
- x
- task.config.X:
- attrs to review
- extra_hosts
- ports
- do a manual review of this, docker sets
NOMAD_PORT_poopin each cunt
- do a manual review of this, docker sets
- network_aliases: we can use the nomad runtime vars unlike docker to have distinct cunt aliases; but requires a user defined network
- attrs to avoid
- hostname
- privileged
- ipc_mode
- ipv4_address
- ipv6_address
- must be configured at group.network
- dns_search_domains
- dns_options
- dns_servers
- network_mode
- attrs to review
- docker plugin conf
- check the infra_image attr, from the docs it appears nomad hardcodes it to 3.1
- group.service: attrs to review
A
- see DNS Support in Nomad Service Discovery hashicorp/nomad#12588
- details exactly wtf we need to do with nomad service discovery
- see https://developer.hashicorp.com/nomad/docs/job-specification/template#nomad-integration
- see https://developer.hashicorp.com/nomad/tutorials/load-balancing/load-balancing-haproxy
- see https://discuss.hashicorp.com/t/nomad-1-4-and-haproxy-server-template-without-consul-and-its-dns-feature/44499/2
- issue 1: chatter across allocations
- this was expected, as config is pretty much copypasted from the docker convert env file
- core-consul (see below) hostname doesnt exist in validation
- ^ it needs to point to the core-consul allocation ip
- ^ or somehow discover on which client core-consul is allocated
- sanity check:
- set static port allocations for all core-consul (especially serf) ports
- hard code core consul addr in core proxy retry_join attr
- makes sense that it works with hardcoded values: since everythings running on my machine
- still a useful sanity check
- real fix: discovery....
--
2023-01-27T02:16:17.643Z [WARN] agent.cache: handling error in Cache.Notify: cache-type=intention-match error="No known Consul servers" index=0
2023-01-27T02:16:17.643Z [ERROR] agent.proxycfg: Failed to handle update from watch: kind=connect-proxy proxy=core-proxy-1-sidecar-proxy service_id=core-proxy-1-sidecar-proxy id=intentions error="error filling agent cache: No known Consul servers"
--
2023-01-27T02:15:12.560Z [INFO] agent.client.serf.lan: serf: Attempting re-join to previously known node: core-vault-247bb920bc1a: 172.21.0.2:8301
2023-01-27T02:15:12.918Z [INFO] agent: (LAN) joining: lan_addresses=["core-consul"]
2023-01-27T02:15:12.941Z [WARN] agent.router.manager: No servers available
2023-01-27T02:15:12.978Z [WARN] agent.client.memberlist.lan: memberlist: Failed to resolve core-consul: lookup core-consul on 192.168.0.1:53: no such host
2023-01-27T02:15:12.978Z [WARN] agent: (LAN) couldn't join: number_of_nodes=0
error=
| 1 error occurred:
| * Failed to resolve core-consul: lookup core-consul on 192.168.0.1:53: no such host
|
2023-01-27T02:15:12.978Z [WARN] agent: Join cluster failed, will retry: cluster=LAN retry_interval=10s
error=
| 1 error occurred:
| * Failed to resolve core-consul: lookup core-consul on 192.168.0.1:53: no such host
| -- issue: token/acl
--
2023-01-27T04:28:12.087Z [INFO] agent.client.serf.lan: serf: Attempting re-join to previously known node: core-proxy-da6a390b2832: 172.22.0.3:8301
127.0.0.1:53492 [27/Jan/2023:04:28:12.107] edge forward_https/serverhttps 1/-1/+0 +0 -- 1/1/0/0/1 0/0
2023-01-27T04:28:13.388Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=172.26.65.117:8300 error="rpc error making call: Permission denied: token with AccessorID 'bdad85af-9fc8-e41d-593f-c73cebef40fc' lacks permission 'node:write' on \"core-proxy-4652f5c62fdf\""
2023-01-27T04:28:13.388Z [WARN] agent: Coordinate update blocked by ACLs: accessorID=bdad85af-9fc8-e41d-593f-c73cebef40fc
--
127.0.0.1:39828 [27/Jan/2023:04:28:22.108] edge forward_https/serverhttps 1/-1/+0 +0 -- 1/1/0/0/1 0/0
2023-01-27T04:28:22.580Z [ERROR] agent.client: RPC failed to server: method=Catalog.Register server=172.26.65.117:8300 error="rpc error making call: Permission denied: token with AccessorID 'bdad85af-9fc8-e41d-593f-c73cebef40fc' lacks permission 'node:write' on \"core-proxy-4652f5c62fdf\""
2023-01-27T04:28:22.580Z [WARN] agent: Node info update blocked by ACLs: node=3bb036b6-c034-7abb-42df-01c8f7a5b1ea accessorID=bdad85af-9fc8-e41d-593f-c73cebef40fc
--- issue: vault backend
- this makes sense because vault has been commented
[NOTICE] (15) : haproxy version is 2.7.1-3e4af0e
[NOTICE] (15) : path to executable is /usr/local/sbin/haproxy
[WARNING] (15) : config : [/var/lib/haproxy/configs/002-001-vault.cfg:19] : 'server lb-vault/core-vault-c-dns1' : could not resolve address 'core-vault.service.search', disabling server.
[WARNING] (15) : config : [/var/lib/haproxy/configs/002-001-vault.cfg:20] : 'server lb-vault/core-vault-d-dns1' : could not resolve address 'core-vault', disabling server.
[NOTICE] (15) : New worker (71) forked
[NOTICE] (15) : Loading success.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
THE GROOVE