Skip to content

Commit 4ca1368

Browse files
committed
Fix for delete lb and stale lb dsr vfp rules.
1 parent 114e2bb commit 4ca1368

4 files changed

Lines changed: 570 additions & 0 deletions

File tree

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
# Stale LB DSR Rules Cleanup
2+
3+
## Overview
4+
5+
This mitigation script automatically detects and removes stale Load Balancer Direct Server Return (LB DSR) rules from VFP (Virtual Filtering Platform) that reference non-existent backend endpoints. It runs continuously to maintain network health by cleaning up orphaned rules that can cause connectivity issues.
6+
7+
## Problem Statement
8+
9+
When backend endpoints are removed or become unavailable, the corresponding LB DSR rules in VFP may not be cleaned up properly. These stale rules can:
10+
- Cause packet routing failures
11+
- Lead to connection timeouts
12+
- Create unnecessary overhead in the networking stack
13+
- Result in traffic being sent to non-existent endpoints
14+
15+
## Solution
16+
17+
The `cleanup-stale-lb-rules.ps1` script:
18+
1. Checks and sets the required registry configuration for LB DSR feature management
19+
2. Continuously monitors VFP LB DSR rules (both IPv4 and IPv6)
20+
3. Compares rule destination IPs (DIPs) against active HNS endpoints
21+
4. Automatically removes rules that reference non-existent endpoints
22+
23+
## Prerequisites
24+
25+
- Windows Server with HNS (Host Network Service) enabled
26+
- VFP control utilities (`vfpctrl.exe`) available
27+
- PowerShell with administrator privileges
28+
- HNS PowerShell module
29+
30+
## Usage
31+
32+
### Running the Script on a Single Node
33+
34+
```powershell
35+
.\cleanup-stale-lb-rules.ps1
36+
```
37+
38+
The script will:
39+
1. Check registry key `HKLM:\SYSTEM\CurrentControlSet\Policies\Microsoft\FeatureManagement\Overrides\140377743`
40+
2. If the key value is 1, set it to 0 and restart the node (this disables PR 13179278 which is causing delete LB RPC calls from KubeProxy to fail with Invalid IP Error - ICM: 719903780)
41+
3. Start a continuous monitoring loop with 10-second intervals
42+
4. Clean up any stale LB DSR rules found
43+
44+
**Note:** This approach fixes issues on a single node. If the issue is widespread across the cluster, deploy the solution using a DaemonSet:
45+
46+
```powershell
47+
kubectl create -f cleanup-stale-lb-rules.yaml
48+
```
49+
50+
This will run the mitigation script as HPC pods on all affected nodes.
51+
52+
### Configuration
53+
54+
You can modify these parameters at the top of the script:
55+
56+
- **`$groups`**: VFP groups to monitor (default: `LB_DSR_IPv4_OUT`, `LB_DSR_IPv6_OUT`)
57+
- **`$refreshIntervalSeconds`**: Time between cleanup iterations (default: 10 seconds)
58+
59+
## How It Works
60+
61+
### 1. Registry Check
62+
The script first ensures the feature flag registry key (140377743) is set to 0. If not, it sets the value and restarts the node.
63+
64+
### 2. Endpoint Collection
65+
- Retrieves all HNS policies
66+
- Extracts endpoint references
67+
- Builds a dictionary of valid endpoint IP addresses
68+
69+
### 3. Rule Validation
70+
For each VFP port and LB DSR group:
71+
- Lists all rules in the `LB_DSR` layer
72+
- Extracts DIP (Destination IP) ranges from each rule
73+
- Compares DIPs against the valid endpoint dictionary
74+
75+
### 4. Cleanup
76+
- Rules with DIPs not found in active endpoints are flagged as stale
77+
- Stale rules are automatically deleted using `vfpctrl /remove-rule`
78+
79+
## Output Examples
80+
81+
### Healthy State
82+
```
83+
All DIP ranges are present in the dictionary.
84+
```
85+
86+
### Stale Rules Detected
87+
```
88+
Missing DIP ranges:
89+
- 10.244.0.25
90+
- fdf5:5d67:b9ce:b28f::13f
91+
Deleting rule : ruleId: ABC123, port: Port1, group: LB_DSR_IPv4_OUT
92+
```
93+
94+
## Monitoring
95+
96+
The script provides color-coded output:
97+
- **Green**: Healthy state, all rules valid
98+
- **Yellow**: Configuration changes or rule deletion in progress
99+
- **Red**: Stale rules detected
100+
- **Cyan**: Status updates and iteration markers
101+
102+
## Important Notes
103+
104+
- The script runs indefinitely until manually stopped (Ctrl+C)
105+
- Node restart may occur on first run if registry configuration is incorrect
106+
- Ensure no legitimate endpoint updates are in progress during cleanup to avoid false positives
107+
- The script requires elevated privileges to modify VFP rules and registry settings
108+
109+
## Troubleshooting
110+
111+
### Script doesn't detect stale rules
112+
- Verify VFP and HNS are functioning correctly
113+
- Check that `vfpctrl.exe` is accessible in the system PATH
114+
- Ensure HNS endpoints are properly registered
115+
116+
### Node restarts unexpectedly
117+
- This is expected behavior if the registry key is not set to 0
118+
- After restart, the script will continue normal operation
119+
120+
### Permission errors
121+
- Run PowerShell as Administrator
122+
- Verify account has rights to modify VFP rules and registry
123+
124+
## Related Documentation
125+
126+
- [VFP Documentation](../../helper/VFP.psm1)
127+
- [HNS Module](../HNS/)
128+
- [Network Health Monitoring](../../networkhealth/)
129+
130+
## Support
131+
132+
For issues or questions, please refer to the main repository documentation or open an issue.
Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
$groups = @("LB_DSR_IPv4_OUT", "LB_DSR_IPv6_OUT")
2+
$refreshIntervalSeconds = 10
3+
4+
function Get-EndpointIpDictionary {
5+
$dict = @{}
6+
7+
$policies = Get-HnsPolicyList
8+
9+
$endpointIds = $policies.References |
10+
Where-Object { $_ -like "/endpoints/*" } |
11+
ForEach-Object { ($_ -split "/")[-1] } |
12+
Sort-Object -Unique
13+
14+
$endpointIds | ForEach-Object {
15+
try {
16+
$endpoint = Get-HnsEndpoint -Id $_
17+
} catch {
18+
Write-Host "Failed to get HNS endpoint $_`: $($_.Exception.Message)" -ForegroundColor Yellow
19+
continue
20+
}
21+
22+
if ($null -eq $endpoint) {
23+
Write-Host "HNS endpoint $_ not found, skipping." -ForegroundColor Yellow
24+
continue
25+
}
26+
27+
if ($null -ne $endpoint.IPAddress) {
28+
$dict[$endpoint.IPAddress] = $true
29+
}
30+
if ($null -ne $endpoint.IPv6Address) {
31+
$dict[$endpoint.IPv6Address] = $true
32+
}
33+
}
34+
35+
return $dict
36+
}
37+
38+
function Get-StaleRuleCommands {
39+
param(
40+
[string[]]$Groups
41+
)
42+
43+
$dictDstIPs = Get-EndpointIpDictionary
44+
$staleRuleCommands = [System.Collections.Generic.List[string]]::new()
45+
46+
$ports = (vfpctrl.exe /list-vmswitch-port /format 1 | ConvertFrom-Json).Ports.Name
47+
foreach ($port in $ports) {
48+
foreach ($group in $Groups) {
49+
$rules = (vfpctrl /port $port /layer LB_DSR /group $group /list-rule /format 1 | ConvertFrom-Json).Rules
50+
foreach ($rule in $rules) {
51+
$ruleId = $rule.Id
52+
$ruleText = vfpctrl /get-rule-info /port $port /layer LB_DSR /group $group /rule $ruleId 2>&1
53+
if (-not $ruleText) {
54+
Write-Host "No output from vfpctrl"
55+
continue
56+
}
57+
58+
$dips = Get-DipRangesFromRuleText -RuleText $ruleText
59+
# Check which DIPs are missing in the dictionary
60+
$missingDIPs = $dips | Where-Object { -not $dictDstIPs.ContainsKey($_) }
61+
62+
if ($missingDIPs.Count -eq 0) {
63+
# Write-Host "All DIP ranges are present in the dictionary." -ForegroundColor Green
64+
} else {
65+
# Write-Host "Missing DIP ranges:" -ForegroundColor Red
66+
# $missingDIPs | ForEach-Object { Write-Host " - $_" }
67+
$staleRuleCommands.Add("vfpctrl /remove-rule /port $port /layer LB_DSR /group $group /rule $ruleId")
68+
}
69+
}
70+
}
71+
}
72+
73+
return $staleRuleCommands
74+
}
75+
76+
function Get-DipRangesFromRuleText {
77+
param([string[]]$RuleText)
78+
79+
$collect = $false
80+
$dips = @()
81+
82+
foreach ($line in $RuleText) {
83+
84+
# Detect beginning of DIP Range block
85+
if ($line -match "DIP Range") {
86+
$collect = $true
87+
continue
88+
}
89+
90+
# Stop when FlagsEx or another header appears
91+
if ($collect -and $line -match "FlagsEx") {
92+
break
93+
}
94+
95+
# Process lines like:
96+
# { 10.244.0.25 : 53 }
97+
# { fdf5:5d67:b9ce:b28f::13f : 4445 }
98+
if ($collect -and $line.Trim().StartsWith("{")) {
99+
100+
# Remove surrounding { } then trim
101+
$clean = $line.Trim().Trim('{','}').Trim()
102+
# Use regex to extract IP before last " : "
103+
if ($clean -match '(.+)\s*:\s*\d+$') {
104+
$ip = $matches[1].Trim()
105+
$dips += $ip
106+
}
107+
}
108+
}
109+
110+
return $dips
111+
}
112+
113+
$regKeyVal = (Get-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Policies\Microsoft\FeatureManagement\Overrides" -Name 140377743).140377743
114+
if ($regKeyVal -eq 1) {
115+
Write-Host "##========== Registry keys are not zero. Setting reg key to 0 and restarting the node." -ForegroundColor Yellow
116+
Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Policies\Microsoft\FeatureManagement\Overrides" -Name 140377743 -Value 0 -Type DWORD
117+
Restart-Computer -Force
118+
Start-Sleep -Seconds 30
119+
} else {
120+
Write-Host "##========== Registry keys are zero. Continuing the script." -ForegroundColor Green
121+
}
122+
123+
While($true) {
124+
Write-Host "##========== Waiting for $refreshIntervalSeconds seconds for the next iteration..." -ForegroundColor Cyan
125+
Start-Sleep -Seconds $refreshIntervalSeconds
126+
Write-Host "##========== Starting new iteration to check for stale LB DSR rules..." -ForegroundColor Cyan
127+
$staleRuleCommands_1 = Get-StaleRuleCommands -Groups $groups
128+
Start-Sleep -Seconds 60 # Short pause before executing commands
129+
$staleRuleCommands_2 = Get-StaleRuleCommands -Groups $groups
130+
131+
# Rules present in both passes (consistently stale)
132+
$inBothPasses = $staleRuleCommands_1 | Where-Object { $staleRuleCommands_2 -contains $_ }
133+
134+
if ($inBothPasses.Count -gt 0) {
135+
Write-Host "##========== Found $($inBothPasses.Count) stale rule(s) to remove." -ForegroundColor Yellow
136+
} else {
137+
Write-Host "##========== No stale rules found." -ForegroundColor Green
138+
}
139+
140+
# Execute only rules that appeared in both passes (consistently stale)
141+
foreach ($cmd in $inBothPasses) {
142+
Write-Host "##========== Executing Delete Command: $cmd" -ForegroundColor Yellow
143+
Invoke-Expression $cmd
144+
}
145+
}

0 commit comments

Comments
 (0)