Get Started with Azure Data Factory

Part 2: Create your first ADF pipeline

Author: Eric Rouach, DBA at Madeira Data Solutions

Date: 2022-02

Description: The following will guide you into building your first Azure Data Factory pipeline.

Pre-requisite: completing Part 1 of this guide

From the Azure portal, select (click) your ADF and open you Azure Data Factory Studio:

From the left panel, select the pencil icon:

Click on the ellipsis button and then, click New pipeline:

In the Properties pane, give a meaningful name to your pipeline:

Then close the Properties pane:

The pipeline should take the following actions:

-1) Truncate the staging table in the StagingDB database (Stored Procedure activity)

-2) Get the .csv file from the blob storage (Get Metadata activity)

-3) Copy the data from the .csv file to the staging table in StagingDB database (Copy activity)

-4) Get the distinct Id's from the staging table (Lookup activity)

-5) Create a Pipeline Variable called "InstituteId"

-6) Put the Id's from the previous activity into an "InstituteId" variable (Set Variable activity)

-7) For Each Id, do the following (ForEach activity):

*Get the database name (Lookup activity executing the dbo.GetDatabaseName stored procedure.

*Merge the "Institute" table using the staging external table

(Stored Procedure activity executing the dbo.MergeInstituteTable)

-8) Execute the pipeline (Debug)

-10) Create an execution Trigger.

Let’s begin!

The pipeline is now ready to get its first activity

First, we will add a Stored Procedure activity to truncate the staging table in StagingDB database:

(If hidden, expand the activities pane)

Choose “General”, then “Stored Procedure”. Drag it to the canvas and give it a meaningful name:

In the settings tab, we will have to provide a Linked Service (which will point to the relevant Azure SQL database), and an existing stored procedure name:

Fill the relevant fields as following:

Test the connection. When successful, click “Create”.

Your new Linked Service is now appearing in the activity’s settings tab:

Now, browse for the stored procedure name dbo.TruncateStagingTable:

Your first pipeline activity is now created!

The next step is to tell ADF to get some data from the .csv file you have created before. So first, upload it to the relevant blob storage container:

You can now call a “Get Metadata” activity to have the pipeline getting data from the file:

Drag the new activity from the pane, link it to the previous one using the green arrow, and rename it:

In the settings tab, you must create a new dataset (which will point to the relevant blob storage container):

Click “New” and select the Azure Blob Storage Icon:

Click “Continue”.

Select the .csv format:

Click “Continue”.

Next, you must choose a meaningful name for the dataset and create a new linked service pointing to the storage account:

Fill the relevant fields as following, test the connection and create the linked service:

Now, set the dataset file path:

Click OK. The "Get Metadata" activity now has a defined dataset:

In the Field list, select Child items:

Now, using a Copy Data Activity, you need to copy the data from the .csv file into the staging table in StagingDB.

From the activities pane, drag the activity to the canvas, and rename it accordingly:

The Source and the Sink need to be configured:

First, click the Source tab and select the previously created source dataset:

Set the File path type as Wildcard file path:

Then, in the sink tab, create a new sink dataset for the staging table in StagingDB:

Pick the Azure SQL Database icon:

Give the dataset a meaningful name, then select the previously created linked service, select the table, and click Ok:

Now, from the mapping tab, click Import Schemas:

You may now see the mapping between the source file and the target table:

After this last activity runs, the staging table should be loaded with the file data. To merge the right data to the right Institute database, we need to have the pipeline perform an activity for each institute Id found in the staging table. Therefore, our next step will be to pick those Id’s, and put them into a variable:

First, let’s add a Lookup activity to the pipeline:

From the settings tab, select the previously created source dataset and select the Query option: