Sunday, 8 September 2013

Simpler solution than TPL Dataflow for parallel async blob deletion

Simpler solution than TPL Dataflow for parallel async blob deletion

I'm implementing a worker role on Azure which needs to delete blobs from
Azure storage. Let's assume my list of blobs has about 10K items.
The simplest synchronous approach would probably be:
Parallel.ForEach(list, x => ((CloudBlob) x).Delete());
Requirements:
I want to implement the same thing asynchronously (on a single thread).
I want to limit the number of concurrent connections to 50 - so I'll do my
10K deletions when only 50 async ones are being performed at the same
time. If one deletion completes, a new one can be started.
Solution?
So far, after reading this question and this one, it seems that TPL
Dataflow is the way to go.
This is such a simple problem and dataflow seems like an overkill. Is
there any simpler alternative?
If not, how would this be implemented using dataflow? As I understand, I
need a single action block which performs the async delete (do I need
await?). When creating my block I should set MaxDegreeOfParallelism to 50.
Then I need to post my 10K blobs from the list to the block and then
execute with block.Completion.Wait(). Is this correct?

No comments:

Post a Comment