-
Notifications
You must be signed in to change notification settings - Fork 45
refactor service caching #855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
33935b7 to
f8d2794
Compare
f8d2794 to
ea85208
Compare
Previously, entries were cached by entry key only. This caused issues when multiple services referenced the same entry. This update changes the caching logic to: * Cache each entry by service ID while allowing multiple services to target the same entry. * When multiple services reference the same entry, override the previous value and keep the latest. * On service entry deletion, check whether other services still reference the same entry; if so, recreate the entry. This ensures consistent behavior when entries are shared across services.
ea85208 to
35634e5
Compare
sknat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for taking a stab at this. This looks good, but I think we might still be vulnerable to overrides in VPP that will cause double free. Details inline
| // the entry is still referenced by another service, recreate another service entry randomly | ||
| s.log.Warnf("svc(del) entry %s was referenced by multiple services", entry.Key()) | ||
| var randomService string | ||
| for randomService = range s.cnatEntriesByService[entry.Key()] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to have a way to deterministically pick the service we want to install
| entry: entry, | ||
| vppID: entryID, | ||
| } | ||
| if len(s.cnatEntriesByService[entry.Key()]) > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An issue I see is that if that happens, we might try to install two services with the same Key in VPP, which will cause issues during cleanup, as VPP's hash table only contains a single slot for (IP;port;proto)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This step is going to override the existing entry because it's the same key, and keep the latest created entry. It's like an update of the existing entry.
So from vpp's perspective we always keep one entry per (IP;port;proto).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe ideally in vpp we should extend the hash to something more than just (IP;port;proto), but for now can you think of a better way to deal with multiple services creating same entry ?
|
|
||
| serviceStateMap map[string]ServiceState | ||
| // map entry key to a map of service id to entry in vpp | ||
| cnatEntriesByService map[string]map[string]*cnatEntry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maybe do two maps instead ?
// cnatEntriesByServiceName is a map of the k8s service ID to the entry,
// that is the object representing the translation in VPP
cnatEntriesByServiceName map[string]*cnatEntry
// cnatEntriesByKey is a map of (proto;port;ip) to the active k8s serviceID
serviceIdByKey map[string]stringThis would enable us on Add:
- if
serviceIdByKeyis empty we just callcnatAdd() - if
serviceIdByKeyhas a value, we callcnatDelete()for the previous entry andcnatAdd()for the new one
On Delete:
- We only delete from VPP if
serviceIdByKeycontains our service ID
This makes the choice deterministic (last takes precedence), and prevent double adds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current code does this I think. We just re-add the entry and it's like an update.
func (v *VppLink) CnatTranslateAdd(tr *types.CnatTranslateEntry) (uint32, error) {
if len(tr.Backends) == 0 {
return InvalidID, nil
}
client := cnat.NewServiceClient(v.GetConnection())
paths := make([]cnat.CnatEndpointTuple, 0, len(tr.Backends))
for _, backend := range tr.Backends {
paths = append(paths, cnat.CnatEndpointTuple{
SrcEp: types.ToCnatEndpoint(backend.SrcEndpoint),
DstEp: types.ToCnatEndpoint(backend.DstEndpoint),
Flags: backend.Flags,
})
}
response, err := client.CnatTranslationUpdate(v.GetContext(), &cnat.CnatTranslationUpdate{
Translation: cnat.CnatTranslation{
Vip: types.ToCnatEndpoint(tr.Endpoint),
IPProto: types.ToVppIPProto(tr.Proto),
Paths: paths,
IsRealIP: BoolToU8(tr.IsRealIP),
Flags: uint8(cnat.CNAT_TRANSLATION_ALLOC_PORT),
LbType: cnat.CnatLbType(tr.LbType),
FlowHashConfig: ip.IPFlowHashConfigV2(tr.HashConfig),
},
})
if err != nil {
return InvalidID, fmt.Errorf("add/upd CnatTranslate failed: %w", err)
}
return response.ID, nil
}
This actually is a CnatTranslationUpdate call.
Previously, entries were cached by entry key only. This caused issues when multiple services referenced the same entry.
This update changes the caching logic to:
Cache each entry by service ID while allowing multiple services to target the same entry.
When multiple services reference the same entry, override the previous value and keep the latest.
On service entry deletion, check whether other services still reference the same entry; if so, recreate the entry.
This ensures consistent behavior when entries are shared across services.