fix: prevent InternalServerError loop when --update-endpoint fails on create#304
fix: prevent InternalServerError loop when --update-endpoint fails on create#304sellakumaran merged 5 commits intomainfrom
Conversation
… create When --update-endpoint fails during the CREATE step (e.g. Azure InternalServerError), Azure may partially provision the new endpoint. Because BotMessagingEndpoint only updates on success, all subsequent runs delete the stale old endpoint and attempt to create the same partially-provisioned endpoint again, hitting InternalServerError on every retry. Add Step 1.5 in UpdateEndpointAsync: pre-emptively delete the target endpoint name (derived from newEndpointUrl) before creating it, ensuring a clean slate regardless of any previous failed run. Only applies when NeedDeployment=false (non-Azure-hosted agents). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
There was a problem hiding this comment.
Pull request overview
This PR prevents repeated InternalServerError failures when --update-endpoint previously failed during the create step and left a partially provisioned Azure Bot Service endpoint behind, by adding a pre-create cleanup delete for the target endpoint name (non-Azure hosted agents only).
Changes:
- Add a “Step 1.5” pre-create cleanup that deletes the target endpoint name derived from the new endpoint URL before attempting registration (for
NeedDeployment=false). - Update
UpdateEndpointAsyncunit test expectations to account for the new pre-create cleanup delete call. - Add a regression test ensuring different Dev Tunnel URLs produce different endpoint names (so deletion targets the correct Azure resource).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/Microsoft.Agents.A365.DevTools.Cli/Commands/SetupSubcommands/BlueprintSubcommand.cs | Adds pre-create cleanup deletion of the target endpoint name before registration for non-Azure hosted agents. |
| src/Tests/Microsoft.Agents.A365.DevTools.Cli.Tests/Commands/BlueprintSubcommandTests.cs | Updates test to assert the new pre-create cleanup delete call occurs when there is no existing old endpoint. |
| src/Tests/Microsoft.Agents.A365.DevTools.Cli.Tests/Services/Helpers/EndpointHelperTests.cs | Adds regression coverage ensuring Dev Tunnel URL changes result in different derived endpoint names. |
Comments suppressed due to low confidence (2)
src/Microsoft.Agents.A365.DevTools.Cli/Commands/SetupSubcommands/BlueprintSubcommand.cs:1949
- Step 1.5 ignores the return value of DeleteEndpointWithAgentBlueprintAsync. If this pre-create cleanup fails (returns false), the subsequent create can still hit the same InternalServerError loop and the user won’t get any indication that the cleanup step failed. Consider capturing the bool result and at minimum logging a warning/error (and optionally failing fast with a SetupValidationException) when the delete was not successful.
var targetEndpointName = Services.Helpers.EndpointHelper.GetEndpointNameFromUrl(newEndpointUrl, setupConfig.AgentBlueprintId);
var targetLocation = setupConfig.Location.Replace(" ", "").ToLowerInvariant();
logger.LogInformation("Removing target endpoint '{EndpointName}' to ensure a clean state before registration.", targetEndpointName);
await botConfigurator.DeleteEndpointWithAgentBlueprintAsync(targetEndpointName, targetLocation, setupConfig.AgentBlueprintId);
}
src/Microsoft.Agents.A365.DevTools.Cli/Commands/SetupSubcommands/BlueprintSubcommand.cs:1949
- For needsDeployment=false, Step 1 (delete old) and Step 1.5 (delete target) can end up deleting the same endpoint name (e.g., when re-registering the same URL/host). That results in two full delete operations (each runs
az account show, acquires a token, and makes an HTTP call). Consider de-duplicating by comparing the resolved existing endpointName and targetEndpointName and skipping the second delete when they match.
// Step 1.5: Pre-create cleanup of the target endpoint name.
// If a previous --update-endpoint failed during the create step, Azure may have
// partially provisioned the new endpoint and left it in a bad state that blocks
// subsequent creates with InternalServerError. Delete it now to ensure a clean slate.
if (!setupConfig.NeedDeployment && !string.IsNullOrWhiteSpace(setupConfig.Location))
{
var targetEndpointName = Services.Helpers.EndpointHelper.GetEndpointNameFromUrl(newEndpointUrl, setupConfig.AgentBlueprintId);
var targetLocation = setupConfig.Location.Replace(" ", "").ToLowerInvariant();
logger.LogInformation("Removing target endpoint '{EndpointName}' to ensure a clean state before registration.", targetEndpointName);
await botConfigurator.DeleteEndpointWithAgentBlueprintAsync(targetEndpointName, targetLocation, setupConfig.AgentBlueprintId);
}
Refactor UpdateEndpointAsync to normalize location once and reuse for all endpoint operations, reducing duplication. Add warning logging if pre-create cleanup does not confirm deletion. Update unit tests to reflect normalization and endpoint name logic. Add test to verify both old and target endpoints are deleted when partially-provisioned endpoints exist.
Enhanced logging for endpoint deletion by: - Including both endpoint name and source URL in removal logs. - Promoting endpoint name and agent blueprint ID logs to Information level. - Logging full delete request payload details (instance name, AppId, TenantId, location, environment). These changes improve traceability and aid in debugging endpoint deletion issues.
Updated the delete request payload logging to use config.Environment instead of _config.Environment, ensuring the correct environment value is logged from the local configuration.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (3)
src/Microsoft.Agents.A365.DevTools.Cli/Services/BotConfigurator.cs:300
- The
HttpResponseMessagereturned byhttpClient.SendAsync(request)is not disposed. Wrap it in ausing(or otherwise dispose it) after reading the content to avoid socket/resource leaks, especially if this is called repeatedly (e.g., retries/cleanup loops).
_logger.LogInformation(" Environment: {Environment}", EndpointHelper.GetDeploymentEnvironment(config.Environment));
using var request = new HttpRequestMessage(HttpMethod.Delete, deleteEndpointUrl);
request.Content = new StringContent(deleteEndpointBody.ToJsonString(), System.Text.Encoding.UTF8, "application/json");
var response = await httpClient.SendAsync(request);
src/Microsoft.Agents.A365.DevTools.Cli/Services/BotConfigurator.cs:296
- These logs were changed from Debug to Information and now include detailed request payload fields (including tenant/app IDs). This will show up in normal CLI output and may be overly noisy and unnecessarily expose identifiers; consider keeping this at Debug/Verbose level and/or logging only a correlation ID + endpoint name at Information.
_logger.LogInformation("Delete request payload:");
_logger.LogInformation(" AzureBotServiceInstanceName: {Name}", endpointName);
_logger.LogInformation(" AppId: {AppId}", agentBlueprintId);
_logger.LogInformation(" TenantId: {TenantId}", tenantId);
_logger.LogInformation(" Location: {Location}", normalizedLocation);
_logger.LogInformation(" Environment: {Environment}", EndpointHelper.GetDeploymentEnvironment(config.Environment));
src/Tests/Microsoft.Agents.A365.DevTools.Cli.Tests/Commands/BlueprintSubcommandTests.cs:1477
- This test sets
NeedDeployment = falsebut leavesMessagingEndpointunset. In real configsmessagingEndpointis required whenneedDeploymentis false, so Step 1 would normally run and attempt an idempotent delete before Step 1.5. Consider settingMessagingEndpointto a realistic value and updating the assertions accordingly so the test matches real-world behavior.
var config = new Agent365Config
{
TenantId = "00000000-0000-0000-0000-000000000000",
AgentBlueprintId = "blueprint-123",
// WebAppName not set, so BotName will be empty
Location = "eastus",
DeploymentProjectPath = Path.GetTempPath(),
NeedDeployment = false // Non-Azure hosting
};
Each cleanup and blueprint command now generates and logs its own correlation ID at execution time for better traceability. The correlation ID is propagated through async operations and method calls, improving observability and debugging by ensuring all related logs are tagged with a unique identifier. Method signatures and invocations were updated to support this change.
Problem: When --update-endpoint fails during the CREATE step (e.g. Azure InternalServerError), Azure may partially provision the new endpoint. Because BotMessagingEndpoint only updates on success, every subsequent run deletes the stale old endpoint and tries to create the same partially-provisioned endpoint again — hitting InternalServerError indefinitely.
Fix: Add Step 1.5 in UpdateEndpointAsync: pre-emptively delete the target endpoint name (derived from newEndpointUrl) before creating it. Only applies when NeedDeployment=false (non-Azure-hosted agents).
Customer workaround (while waiting for this build): Edit a365.generated.config.json, change botMessagingEndpoint to the new tunnel URL, then run a365 cleanup blueprint --endpoint-only.