We are migrating from service profile to httproute, but after deleting the service profile and apply the new http route, the dynamic routing still not work for some pods.
after loadtest runing for 5 mins, delete the serviceprofile. you can see the all traffic still go to the canary-example-primary.
and we restart the loadtest deployment, we found the http route works.
we found some debug log of linkerd proxy, after delete the serviceprofile, we can this the sidecar change to ClientPolicy, but seems linkerd_distribute::stack didn't update the right httproute info.
after the deployment restart, I can see linkerd_distribute::stack update with right info:
[ 1.712172s] DEBUG ThreadId(01) outbound:accept{client.addr=10.80.65.40:40112 server.addr=192.168.146.209:5000}:proxy{addr=192.168.146.209:5000}:http: linkerd_distribute::stack: New distribution backends=[Backend { route_ref: RouteRef(Resource { group: "gateway.networking.k8s.io", kind: "HTTPRoute", name: "canary-example", namespace: "test-canary", section: None, port: None }), concrete: Concrete { target: Balance(NameAddr { name: Name("canary-example-primary.test-canary.svc.cluster.local"), port: 5000 }, EwmaConfig { default_rtt: 30ms, decay: 10s }), authority: Some(canary-example-primary.test-canary.svc.cluster.local:5000), parent: Http(HttpSidecar { orig_dst: OrigDstAddr(192.168.146.209:5000), version: HTTP/1, routes: Receiver { shared: Shared { value: RwLock(PhantomData<std::sync::rwlock::RwLock<linkerd_app_outbound::http::logical::Routes>>, RwLock { data: Policy(Http(Params { addr: Socket(192.168.146.209:5000), meta: ParentRef(Resource { group: "core", kind: "Service", name: "canary-example", namespace: "test-canary", section: None, port: Some(5000) }), routes: [Route { hosts: [], rules: [Rule { matches: [MatchRequest { path: Some(Prefix("/")), headers: [], query_params: [], method: None }], policy: RoutePolicy { meta: Resource { group: "gateway.networking.k8s.io", kind: "HTTPRoute", name: "canary-example", namespace: "test-canary", section: None, port: None }, filters: [], distribution: RandomAvailable([(RouteBackend { filters: [RequestHeaders(ModifyHeader { add: [("l5d-dst-canonical", "canary-example-primary.test-canary.svc.cluster.local:5000")], set: [], remove: [] })], backend: Backend { meta: Resource { group: "core", kind: "Service", name: "canary-example-primary", namespace: "test-canary", section: None, port: Some(5000) }, queue: Queue { capacity: 100, failfast_timeout: 3s }, dispatcher: BalanceP2c(PeakEwma(PeakEwma { decay: 10s, default_rtt: 30ms }), DestinationGet { path: "canary-example-primary.test-canary.svc.cluster.local:5000" }) }, request_timeout: None }, 50), (RouteBackend { filters: [RequestHeaders(ModifyHeader { add: [("l5d-dst-canonical", "canary-example-canary.test-canary.svc.cluster.local:5000")], set: [], remove: [] })], backend: Backend { meta: Resource { group: "core", kind: "Service", name: "canary-example-canary", namespace: "test-canary", section: None, port: Some(5000) }, queue: Queue { capacity: 100, failfast_timeout: 3s }, dispatcher: BalanceP2c(PeakEwma(PeakEwma { decay: 10s, default_rtt: 30ms }), DestinationGet { path: "canary-example-canary.test-canary.svc.cluster.local:5000" }) }, request_timeout: None }, 50)]), request_timeout: None, failure_policy: StatusRanges([500..=599]) } }] }], backends: [Backend { meta: Resource { group: "core", kind: "Service", name: "canary-example-primary", namespace: "test-canary", section: None, port: Some(5000) }, queue: Queue { capacity: 100, failfast_timeout: 3s }, dispatcher: BalanceP2c(PeakEwma(PeakEwma { decay: 10s, default_rtt: 30ms }), DestinationGet { path: "canary-example-primary.test-canary.svc.cluster.local:5000" }) }, Backend { meta: Default { name: "service" }, queue: Queue { capacity: 100, failfast_timeout: 3s }, dispatcher: BalanceP2c(PeakEwma(PeakEwma { decay: 10s, default_rtt: 30ms }), DestinationGet { path: "canary-example.test-canary.svc.cluster.local:5000" }) }, Backend { meta: Resource { group: "core", kind: "Service", name: "canary-example-canary", namespace: "test-canary", section: None, port: Some(5000) }, queue: Queue { capacity: 100, failfast_timeout: 3s }, dispatcher: BalanceP2c(PeakEwma(PeakEwma { decay: 10s, default_rtt: 30ms }), DestinationGet { path: "canary-example-canary.test-canary.svc.cluster.local:5000" }) }], failure_accrual: None })) }), version: Version(0), is_closed: false, ref_count_rx: 41 }, version: Version(0) } }), parent_ref: ParentRef(Resource { group: "core", kind: "Service", name: "canary-example", namespace: "test-canary", section: None, port: Some(5000) }), backend_ref: BackendRef(Resource { group: "core", kind: "Service", name: "canary-example-primary", namespace: "test-canary", section: None, port: Some(5000) }), failure_accrual: None }, filters: [RequestHeaders(ModifyHeader { add: [("l5d-dst-canonical", "canary-example-primary.test-canary.svc.cluster.local:5000")], set: [], remove: [] })], request_timeout: None }, Backend { route_ref: RouteRef(Resource { group: "gateway.networking.k8s.io", kind: "HTTPRoute", name: "canary-example", namespace: "test-canary", section: None, port: None }), concrete: Concrete { target: Balance(NameAddr { name: Name("canary-example-canary.test-canary.svc.cluster.local"), port: 5000 }, EwmaConfig { default_rtt: 30ms, decay: 10s }), authority: Some(canary-example-canary.test-canary.svc.cluster.local:5000), parent: Http(HttpSidecar { orig_dst: OrigDstAddr(192.168.146.209:5000), version: HTTP/1, routes: Receiver { shared: Shared { value: RwLock(PhantomData<std::sync::rwlock::RwLock<linkerd_app_outbound::http::logical::Routes>>, RwLock { data: Policy(Http(Params { addr: Socket(192.168.146.209:5000), meta: ParentRef(Resource { group: "core", kind: "Service", name: "canary-example", namespace: "test-canary", section: None, port: Some(5000) }), routes: [Route { hosts: [], rules: [Rule { matches: [MatchRequest { path: Some(Prefix("/")), headers: [], query_params: [], method: None }], policy: RoutePolicy { meta: Resource { group: "gateway.networking.k8s.io", kind: "HTTPRoute", name: "canary-example", namespace: "test-canary", section: None, port: None }, filters: [], distribution: RandomAvailable([(RouteBackend { filters: [RequestHeaders(ModifyHeader { add: [("l5d-dst-canonical", "canary-example-primary.test-canary.svc.cluster.local:5000")], set: [], remove: [] })], backend: Backend { meta: Resource { group: "core", kind: "Service", name: "canary-example-primary", namespace: "test-canary", section: None, port: Some(5000) }, queue: Queue { capacity: 100, failfast_timeout: 3s }, dispatcher: BalanceP2c(PeakEwma(PeakEwma { decay: 10s, default_rtt: 30ms }), DestinationGet { path: "canary-example-primary.test-canary.svc.cluster.local:5000" }) }, request_timeout: None }, 50), (RouteBackend { filters: [RequestHeaders(ModifyHeader { add: [("l5d-dst-canonical", "canary-example-canary.test-canary.svc.cluster.local:5000")], set: [], remove: [] })], backend: Backend { meta: Resource { group: "core", kind: "Service", name: "canary-example-canary", namespace: "test-canary", section: None, port: Some(5000) }, queue: Queue { capacity: 100, failfast_timeout: 3s }, dispatcher: BalanceP2c(PeakEwma(PeakEwma { decay: 10s, default_rtt: 30ms }), DestinationGet { path: "canary-example-canary.test-canary.svc.cluster.local:5000" }) }, request_timeout: None }, 50)]), request_timeout: None, failure_policy: StatusRanges([500..=599]) } }] }], backends: [Backend { meta: Resource { group: "core", kind: "Service", name: "canary-example-primary", namespace: "test-canary", section: None, port: Some(5000) }, queue: Queue { capacity: 100, failfast_timeout: 3s }, dispatcher: BalanceP2c(PeakEwma(PeakEwma { decay: 10s, default_rtt: 30ms }), DestinationGet { path: "canary-example-primary.test-canary.svc.cluster.local:5000" }) }, Backend { meta: Default { name: "service" }, queue: Queue { capacity: 100, failfast_timeout: 3s }, dispatcher: BalanceP2c(PeakEwma(PeakEwma { decay: 10s, default_rtt: 30ms }), DestinationGet { path: "canary-example.test-canary.svc.cluster.local:5000" }) }, Backend { meta: Resource { group: "core", kind: "Service", name: "canary-example-canary", namespace: "test-canary", section: None, port: Some(5000) }, queue: Queue { capacity: 100, failfast_timeout: 3s }, dispatcher: BalanceP2c(PeakEwma(PeakEwma { decay: 10s, default_rtt: 30ms }), DestinationGet { path: "canary-example-canary.test-canary.svc.cluster.local:5000" }) }], failure_accrual: None })) }), version: Version(0), is_closed: false, ref_count_rx: 41 }, version: Version(0) } }), parent_ref: ParentRef(Resource { group: "core", kind: "Service", name: "canary-example", namespace: "test-canary", section: None, port: Some(5000) }), backend_ref: BackendRef(Resource { group: "core", kind: "Service", name: "canary-example-canary", namespace: "test-canary", section: None, port: Some(5000) }), failure_accrual: None }, filters: [RequestHeaders(ModifyHeader { add: [("l5d-dst-canonical", "canary-example-canary.test-canary.svc.cluster.local:5000")], set: [], remove: [] })], request_timeout: None }]
[ 1.712172s] DEBUG ThreadId(01) outbound:accept{client.addr=10.80.65.40:40112 server.addr=192.168.146.209:5000}:proxy{addr=192.168.146.209:5000}:http: linkerd_distribute::stack: New distribution backends=[Backend { route_ref: RouteRef(Resource { group: "gateway.networking.k8s.io", kind: "HTTPRoute", name: "canary-example", namespace: "test-canary", section: None, port: None }), concrete: Concrete { target: Balance(NameAddr { name: Name("canary-example-primary.test-canary.svc.cluster.local"), port: 5000 }, EwmaConfig { default_rtt: 30ms, decay: 10s }), authority: Some(canary-example-primary.test-canary.svc.cluster.local:5000), parent: Http(HttpSidecar { orig_dst: OrigDstAddr(192.168.146.209:5000), version: HTTP/1, routes: Receiver { shared: Shared { value: RwLock(PhantomData<std::sync::rwlock::RwLock<linkerd_app_outbound::http::logical::Routes>>, RwLock { data: Policy(Http(Params { addr: Socket(192.168.146.209:5000), meta: ParentRef(Resource { group: "core", kind: "Service", name: "canary-example", namespace: "test-canary", section: None, port: Some(5000) }), routes: [Route { hosts: [], rules: [Rule { matches: [MatchRequest { path: Some(Prefix("/")), headers: [], query_params: [], method: None }], policy: RoutePolicy { meta: Resource { group: "gateway.networking.k8s.io", kind: "HTTPRoute", name: "canary-example", namespace: "test-canary", section: None, port: None }, filters: [], distribution: RandomAvailable([(RouteBackend { filters: [RequestHeaders(ModifyHeader { add: [("l5d-dst-canonical", "canary-example-primary.test-canary.svc.cluster.local:5000")], set: [], remove: [] })], backend: Backend { meta: Resource { group: "core", kind: "Service", name: "canary-example-primary", namespace: "test-canary", section: None, port: Some(5000) }, queue: Queue { capacity: 100, failfast_timeout: 3s }, dispatcher: BalanceP2c(PeakEwma(PeakEwma { decay: 10s, default_rtt: 30ms }), DestinationGet { path: "canary-example-primary.test-canary.svc.cluster.local:5000" }) }, request_timeout: None }, 50), (RouteBackend { filters: [RequestHeaders(ModifyHeader { add: [("l5d-dst-canonical", "canary-example-canary.test-canary.svc.cluster.local:5000")], set: [], remove: [] })], backend: Backend { meta: Resource { group: "core", kind: "Service", name: "canary-example-canary", namespace: "test-canary", section: None, port: Some(5000) }, queue: Queue { capacity: 100, failfast_timeout: 3s }, dispatcher: BalanceP2c(PeakEwma(PeakEwma { decay: 10s, default_rtt: 30ms }), DestinationGet { path: "canary-example-canary.test-canary.svc.cluster.local:5000" }) }, request_timeout: None }, 50)]), request_timeout: None, failure_policy: StatusRanges([500..=599]) } }] }], backends: [Backend { meta: Resource { group: "core", kind: "Service", name: "canary-example-primary", namespace: "test-canary", section: None, port: Some(5000) }, queue: Queue { capacity: 100, failfast_timeout: 3s }, dispatcher: BalanceP2c(PeakEwma(PeakEwma { decay: 10s, default_rtt: 30ms }), DestinationGet { path: "canary-example-primary.test-canary.svc.cluster.local:5000" }) }, Backend { meta: Default { name: "service" }, queue: Queue { capacity: 100, failfast_timeout: 3s }, dispatcher: BalanceP2c(PeakEwma(PeakEwma { decay: 10s, default_rtt: 30ms }), DestinationGet { path: "canary-example.test-canary.svc.cluster.local:5000" }) }, Backend { meta: Resource { group: "core", kind: "Service", name: "canary-example-canary", namespace: "test-canary", section: None, port: Some(5000) }, queue: Queue { capacity: 100, failfast_timeout: 3s }, dispatcher: BalanceP2c(PeakEwma(PeakEwma { decay: 10s, default_rtt: 30ms }), DestinationGet { path: "canary-example-canary.test-canary.svc.cluster.local:5000" }) }], failure_accrual: None })) }), version: Version(0), is_closed: false, ref_count_rx: 41 }, version: Version(0) } }), parent_ref: ParentRef(Resource { group: "core", kind: "Service", name: "canary-example", namespace: "test-canary", section: None, port: Some(5000) }), backend_ref: BackendRef(Resource { group: "core", kind: "Service", name: "canary-example-primary", namespace: "test-canary", section: None, port: Some(5000) }), failure_accrual: None }, filters: [RequestHeaders(ModifyHeader { add: [("l5d-dst-canonical", "canary-example-primary.test-canary.svc.cluster.local:5000")], set: [], remove: [] })], request_timeout: None }, Backend { route_ref: RouteRef(Resource { group: "gateway.networking.k8s.io", kind: "HTTPRoute", name: "canary-example", namespace: "test-canary", section: None, port: None }), concrete: Concrete { target: Balance(NameAddr { name: Name("canary-example-canary.test-canary.svc.cluster.local"), port: 5000 }, EwmaConfig { default_rtt: 30ms, decay: 10s }), authority: Some(canary-example-canary.test-canary.svc.cluster.local:5000), parent: Http(HttpSidecar { orig_dst: OrigDstAddr(192.168.146.209:5000), version: HTTP/1, routes: Receiver { shared: Shared { value: RwLock(PhantomData<std::sync::rwlock::RwLock<linkerd_app_outbound::http::logical::Routes>>, RwLock { data: Policy(Http(Params { addr: Socket(192.168.146.209:5000), meta: ParentRef(Resource { group: "core", kind: "Service", name: "canary-example", namespace: "test-canary", section: None, port: Some(5000) }), routes: [Route { hosts: [], rules: [Rule { matches: [MatchRequest { path: Some(Prefix("/")), headers: [], query_params: [], method: None }], policy: RoutePolicy { meta: Resource { group: "gateway.networking.k8s.io", kind: "HTTPRoute", name: "canary-example", namespace: "test-canary", section: None, port: None }, filters: [], distribution: RandomAvailable([(RouteBackend { filters: [RequestHeaders(ModifyHeader { add: [("l5d-dst-canonical", "canary-example-primary.test-canary.svc.cluster.local:5000")], set: [], remove: [] })], backend: Backend { meta: Resource { group: "core", kind: "Service", name: "canary-example-primary", namespace: "test-canary", section: None, port: Some(5000) }, queue: Queue { capacity: 100, failfast_timeout: 3s }, dispatcher: BalanceP2c(PeakEwma(PeakEwma { decay: 10s, default_rtt: 30ms }), DestinationGet { path: "canary-example-primary.test-canary.svc.cluster.local:5000" }) }, request_timeout: None }, 50), (RouteBackend { filters: [RequestHeaders(ModifyHeader { add: [("l5d-dst-canonical", "canary-example-canary.test-canary.svc.cluster.local:5000")], set: [], remove: [] })], backend: Backend { meta: Resource { group: "core", kind: "Service", name: "canary-example-canary", namespace: "test-canary", section: None, port: Some(5000) }, queue: Queue { capacity: 100, failfast_timeout: 3s }, dispatcher: BalanceP2c(PeakEwma(PeakEwma { decay: 10s, default_rtt: 30ms }), DestinationGet { path: "canary-example-canary.test-canary.svc.cluster.local:5000" }) }, request_timeout: None }, 50)]), request_timeout: None, failure_policy: StatusRanges([500..=599]) } }] }], backends: [Backend { meta: Resource { group: "core", kind: "Service", name: "canary-example-primary", namespace: "test-canary", section: None, port: Some(5000) }, queue: Queue { capacity: 100, failfast_timeout: 3s }, dispatcher: BalanceP2c(PeakEwma(PeakEwma { decay: 10s, default_rtt: 30ms }), DestinationGet { path: "canary-example-primary.test-canary.svc.cluster.local:5000" }) }, Backend { meta: Default { name: "service" }, queue: Queue { capacity: 100, failfast_timeout: 3s }, dispatcher: BalanceP2c(PeakEwma(PeakEwma { decay: 10s, default_rtt: 30ms }), DestinationGet { path: "canary-example.test-canary.svc.cluster.local:5000" }) }, Backend { meta: Resource { group: "core", kind: "Service", name: "canary-example-canary", namespace: "test-canary", section: None, port: Some(5000) }, queue: Queue { capacity: 100, failfast_timeout: 3s }, dispatcher: BalanceP2c(PeakEwma(PeakEwma { decay: 10s, default_rtt: 30ms }), DestinationGet { path: "canary-example-canary.test-canary.svc.cluster.local:5000" }) }], failure_accrual: None })) }), version: Version(0), is_closed: false, ref_count_rx: 41 }, version: Version(0) } }), parent_ref: ParentRef(Resource { group: "core", kind: "Service", name: "canary-example", namespace: "test-canary", section: None, port: Some(5000) }), backend_ref: BackendRef(Resource { group: "core", kind: "Service", name: "canary-example-canary", namespace: "test-canary", section: None, port: Some(5000) }), failure_accrual: None }, filters: [RequestHeaders(ModifyHeader { add: [("l5d-dst-canonical", "canary-example-canary.test-canary.svc.cluster.local:5000")], set: [], remove: [] })], request_timeout: None }]
we found if the pod stop requesting the target for a while, the discovery cache will expired and httproute works.
for some reason, we cannot restart all the deployment every time for migrating service profile to Httproute, hope a better solution.
What is the issue?
We are migrating from service profile to httproute, but after deleting the service profile and apply the new http route, the dynamic routing still not work for some pods.
How can it be reproduced?
create deployment canary-example with this python flask server:
create deployment canary-example-primary with this python flask server:
we add the linkerd.io/inject: enabled on the pod template annotations of course
create three service(mostly generated by flagger):
create a service profile
create a httproute:
create a deployment with a load test program on same namespace, set url env: http://canary-example:5000:
after loadtest runing for 5 mins, delete the serviceprofile. you can see the all traffic still go to the canary-example-primary.
and then exec other pods which is meshed, request the http://canary-example.test-canary:5000, we can see the traffic splits to canary-example-primary and canary-example service.
and we restart the loadtest deployment, we found the http route works.
Logs, error output, etc
we found some debug log of linkerd proxy, after delete the serviceprofile, we can this the sidecar change to ClientPolicy, but seems linkerd_distribute::stack didn't update the right httproute info.
after the deployment restart, I can see linkerd_distribute::stack update with right info:
output of
linkerd check -o shortEnvironment
Possible solution
we found if the pod stop requesting the target for a while, the discovery cache will expired and httproute works.
Additional context
for some reason, we cannot restart all the deployment every time for migrating service profile to Httproute, hope a better solution.
Would you like to work on fixing this bug?
None