top of page

Azure data factory self-hosted integration runtime. what is it and when to use it?

What are Integration runtimes in Azure data factory?

In Azure data factory (ADF), An integration runtime is a compute resource to run your pipelines on. When you run an application on your computer, it uses the computer resources, such as CPU and memory, to run its tasks. When you run activities in a pipeline in ADF, they also need resources to do their job, like copying data or writing a file, and these are provided by the integration runtime.

When you create an instance of ADF, you get a default integration runtime, hosted in the same region that you created ADF in. If you need, you can add your own integration runtimes, either on Azure, or you can download and install a self-hosted integration runtime (SHIR) on your own server.

When would I want to use self-hosted integration runtime?

Use case 1 – Data sources behind a firewall

Azure data factory (ADF) is a cloud-based service for data integration, and that’s great when you want to connect to other cloud-based services. You simply create a linked service, provide a way of authentication to that service, and the data starts flowing. However, things are more complicated if some of your data sources or destinations are on an on-premises server. To have access to those data sources, you will need a way through the company firewall, and in most cases, scary information security officers will refuse it (and rightfully so).

The Azure data factory self-hosted integration runtime (SHIR) comes to solve this problem.

SHIR is installed on one of your on-prem servers and acts as a bridge between the ADF cloud service, and the on-prem data sources. It only opens connections outside, so it’s safer to use behind a firewall.

Use case 2 – you need some special driver

ADF includes built-in connectors to many data sources and services, but sometimes you need to access a data source that require installing of specific software, like a driver. That’s something you can't do on Azure integration runtime.

With SHIR, you can install the driver on the same machine that SHIR is installed on, and then use a generic connection, like ODBC, to query that data source with ADF.

Use case 3 – Static IP

Some data sources require you to access them only from a static IP address, that was added to the service white list. Azure IR has many different IP addresses, and you cannot know which one will be used beforehand.

If you install SHIR on a virtual machine, with a static IP address, then ADF will always access the data source with that IP address.

Please note that there are other solutions to this challenge, such as setting ADF on a Vnet.

Use case 4 – All your data is on-prem

Maybe all your data sources and destinations are not in the cloud but are still on an on-prem network, but you still want to work with ADF. Why not? It’s a great tool! You can use ADF to run pipelines where all data movement is on-prem, and, except using a modern and feature full data integration tool, you will also have the benefit of good performance, since all your connection are between on-prem servers, which are usually connected with a broad bandwidth.

Use case 5 – Save money

Since you bring your own compute resources with SHIR, running activities, and coping data is much cheaper. On the other hand, you need to have a server installed. So, make a calculation to see the costs of each scenario.

What do I need, and how to install SHIR?

If one of these use cases is suitable to your situation, use the links below to learn more about the requirements and installation steps.

Self hosted IR requirements:

Good luck with SHIR and ADF! And if you find another use case for using SHIR, Please tell me in the comments




Get New posts delivered straight to your inbox

Thank you for subscribing!

bottom of page