Copying files around Azure Storage accounts, the effective way
If you are a dev or architect busy on Azure and you want to copy files between Storage Account or even between different 3rd party Software As A Services (SAAS) and you want to do it in repeatable, sustainable way, I’ll show you one that lives and breathes the low-code paradigm. Leaving you with no code to maintain, no containers to deal with (even though that can be straight-forward on Azure as well).
The solution centers around Azure Data Factory, to take that away. In Azure Data Factory you can build up in a graphical interface so called pipelines which are essentially programmatic process flows. It takes a little bit of getting used to if the usual way of integrating software is by writing code, however it features a range of connectors and allows to parametrize those pipelines in a way you can be sure to be able to re-use them in any given context. Right out of your own code, out of Logic Apps or whatever else. And Data Factory (DF) is built for ETL, meaning it can deal with large files and demanding integration scenarios — including conditions, exceptions and what not.
What does it take in DF to copy a file from a Blob container in one Storage Account into another Blob container in another Storage Account, now?
Not much, let’s start with the data sets you need to build a pipeline. We’ll create one with AzureBlobStorage as “linked service” type and point it to Storage account #1.
And then create another, pointing at Storage Account #2.
Now you create a “Copy data” task (which will be the only task in this rather mundane pipeline).
The parameter section here is interesting because it gives you the option to have variables valid throughout the pipeline. Think of it as global variables in any given programming context. These parameters will allow you to re-use this pipeline as you wish in the future.
For the Copy data task you need to define a “source” and a “sink”. Source is the origin of your file to be copied and sink is the destination.
Both source and sink make use of the datasets defined earlier — and you can open have short-links to these data sets straight from the source and sink tabs.
And that is about it, all you need now is something to create a pipeline instance, in Data Factory lingo called a “pipeline run”.
Since I promised a low code heavy solution I will show you how to creata a pipeline run using Logic Apps. (Via SKDs for any programming language of your choice or directly against the Azure APIs you could certainly do it all writing code no problem.)
Create a Logic Apps flow triggered by anything you like. I personally use often recurring flows or ones triggered by events. Now in this flow, create a task step called “Create pipeline run”. Enter the details for your Data Factory instance and select the right pipeline. To set the parameters of your pipeline — you remember the global variables defined earlier-, all you need to do now is pass on a piece of JSON with all the parameters you want to override. Of course, Logic Apps gives you the option to not only use static values here but as well dynamic ones derived from whatever else you do in your Logic App flow. And that’s it! The pipeline will be kicked off when your flow runs and in the background all the copy action takes place without you having to worry about HOW.
If you are interested in learning more about using Logic Apps and Data Factory together, check out my article on using both to have fun with video encoding in Azure Media Services. Or, a little bit simpler and therefore faster to grasp — how to use Data Factory to overcome the file size limitations of Logic Apps.
NB: If you want to copy files rarely or even one-off, your best bet is the grand Azure CLI. In that case I would not bother setting up whole parametrized integration flows as shown in this little walk-through.
Any other ideas, comments? Let me know. If this article or any other is interesting for your publication, drop me a message!