-
Notifications
You must be signed in to change notification settings - Fork 320
Description
I copied this from the end of #903 because I felt it should get better visibility.
Versions and setup:
Minimum scale out of 3
Maximum of 10
Instance size: EP3
Functions V4, .NET Core 8, in-process
Durable task extension 3.4.1 Azure Storage backend
AzureFunctionsJobHost__Extensions__DurableTask__StorageProvider__PartitionCount: 16
AzureFunctionsJobHost__Extensions__DurableTask__MaxConcurrentActivityFunctions: 100
AzureFunctionsJobHost__Extensions__DurableTask__MaxConcurrentOrchestratorFunctions: 70
Hi, we've got this issue in our banking app handling billions of pounds of traffic per day.
We have two control queues, 08 and 13, stuck for hours (they have messages being addeed, but none are being dequeued)
Both control queues in this state are using the same curernt owner.
In this state, I believed the queue would be backed up forever. However, if I stop adding files to the input (which kick off new orchestrators) about every 2 minutes or so, it seems 1000 messages do get processed, or at least completed.
Any thoughts in how to troubleshoot this problem? What should we be looking for? Or is this just a plain bug in the Azure Storage backend?
Before you ask, no we cannot migrate to DTS because our data is sometimes much larger than 1MB.
I've been monitoring our app with powershell, and recently added the queue length of the durable task queues. This has paid off.
2025-10-24T12:57:32.8085760+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 1
2025-10-24T13:00:02.8953936+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 1
2025-10-24T13:02:31.5272807+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 0
2025-10-24T13:11:19.9940839+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 0
2025-10-24T13:19:19.4048982+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 15
2025-10-24T13:22:10.1375039+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 118
2025-10-24T13:24:40.6653776+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 416
2025-10-24T13:27:23.2663002+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 562
2025-10-24T13:29:53.3611451+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 744
2025-10-24T13:32:23.0374825+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 934
2025-10-24T13:34:55.1558335+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 1654
2025-10-24T13:37:25.0837628+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 2506
2025-10-24T13:39:55.9073730+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 3409
Note how after about 13:19:19, the queue length kept increasing, and didn't decrease until we stopped feeding it data. See here:
2025-10-24T14:31:28.9081651+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 23973
2025-10-24T14:34:02.3983302+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 24841
2025-10-24T14:36:36.2052815+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 25344
2025-10-24T14:39:08.0376097+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 25643
2025-10-24T14:41:44.0906229+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 25802
2025-10-24T14:44:15.8541931+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 25509
2025-10-24T14:46:48.5499487+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 24816
2025-10-24T14:49:25.5136979+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 25356
2025-10-24T14:51:57.6595944+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 25426
2025-10-24T14:54:30.1004432+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 25360
2025-10-24T14:57:02.1779496+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 24506
2025-10-24T14:59:35.0506516+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 23871
2025-10-24T15:02:07.5116697+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 22853
2025-10-24T15:04:38.6246701+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 21648
2025-10-24T15:07:10.7206149+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 21394
2025-10-24T15:09:42.3077077+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 20404
2025-10-24T15:12:17.3671656+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 19910
2025-10-24T15:14:48.9950207+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 19220
2025-10-24T15:17:21.3608939+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 18891
2025-10-24T15:19:53.2938963+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 17823
2025-10-24T15:22:25.9971303+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 16466
2025-10-24T15:25:00.5869464+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 15846
2025-10-24T15:27:37.3661966+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 14678
2025-10-24T15:30:12.3097051+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 13925
2025-10-24T15:32:48.3155140+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 14009
2025-10-24T15:35:22.3370456+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 12689
2025-10-24T15:37:56.5844638+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 12314
2025-10-24T15:40:33.4368450+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 12195
2025-10-24T15:43:05.5088857+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 11419
2025-10-24T15:45:39.3806519+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 10930
2025-10-24T15:48:12.6941979+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 9847
2025-10-24T15:50:44.6108600+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 10276
2025-10-24T15:53:18.7534214+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 8585
2025-10-24T15:55:49.9630915+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 123
2025-10-24T15:58:20.3766872+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 3
2025-10-24T16:00:50.0202858+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 0
2025-10-24T16:03:19.0296815+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 6
2025-10-24T16:05:48.7418232+01:00 ** Queue 'dftaskhub20251017-control-08' approximate message count: 0
Looking into the partition table in table storage:
So they are both trying to use the same host (we have 10 max scale-out and 16 partitions)
Could this be a concern? I'm happy to drop to 10 partitions so we'll have a 1-1 relationship at max scale-out if that's [currently] the right answer.