In this post I will discuss the solution to a problem that plagued me on a client job for about 7 days! The problem is a very specific configuration of an Azure Data Factory Self Hosted Integration Runtime, which I will explain in the next section.
This post is aimed at people who are familiar with Azure Data Factory, and ideally have a basic understanding of what a Self Hosted Integration Runtime is. I will be using the following abbreviations going forwards because otherwise it becomes a mouthful:
The scenario has 3 important features:
Next, we'll go through each of these features and explain how to configure the SHIR application to work with them individually. Following that, we'll discuss why combining the three together poses difficulties, and how to configure the SHIR application to work with the combination of features.
If you aren't interested in reading about the specific features and just want the solution, skip here.
When you install a SHIR on an on-prem machine, Azure requires a way of enabling your on-prem network to talk to the Azure network where your ADF resource sits. This is where Azure Relay comes into play.
At a high level, Azure Relay enables the secure exposure of services running in your corporate network to the public cloud in a less intrusive way than traditional network-level integration technologies, such as VPN. It has two features which facilitate this communication; the one that the ADF SHIR uses is called Hybrid Connections.
The Hybrid Connections relay connects two services by providing a "rendezvous point" in the Azure cloud that each service can discover and connect to from their own network's perspective. This "rendezvous point" is the Hybrid Connection. It allows for relaying Web Socket connections and HTTP(S) requests and responses.
What implications does this have for our SHIR? Well, it means that we need to allow our SHIR server to be able to talk to this rendezvous point. The domain of the rendezvous point is *.servicebus.windows.net
, because Azure Relay is actually one of the key capabilities of the Azure Service Bus platform. When using an on-prem SHIR, ADF sends any activity jobs to the Azure Relay service which then queues the request. The SHIR then polls the queue and begins the jobs.
Therefore, the SHIR server requires outbound communication to *.servicebus.windows.net
on port 443.
If, for security reasons, you are not allowed to use wildcard URLs in your whitelist, the exact list of URLs required for connectivity can be found in your ADF resource. Navigate to your integration runtime and select the "Nodes" pane.
If your SHIR server (on-prem or not) is configured to use a proxy server for communication to the internet, the SHIR application needs to be configured to be able to use the proxy as well.
There are two options for configuring the proxy settings:
When securing your ADF resource with private endpoints, your SHIR can no longer communicate to your ADF via the public endpoint. Depending on where you SHIR sits, there could be a lot of network routing required in order to facilitate this private communication.
For example, you may need to add some NSG rules if your SHIR server sits in Azure but in a different virtual network to your private endpoint. If your SHIR is on-prem, you will need to add some firewall rules to allow communication to pass through the VPN, ExpressRoute, or whatever solution you have in place for connecting your on-prem network to Azure.
I won't go into detail on these various scenarios, however we will go on to discuss why the addition of using ADF with private endpoints, along with the first two features (an on-premise server behind a proxy), poses difficulties.
The scenario we have is shown in the diagram below.
We have so far established that for an on-prem SHIR to work, it needs outbound communication to *.servicebus.windows.net
on port 443. If the on-prem SHIR server sits behind a proxy, then the traffic intended for Azure Relay will need to be directed to the proxy, since it's going outwards to the internet. This is depicted by the yellow line on the diagram.
As well as this, since we are using a private endpoint for the SHIR, traffic intended for our ADF resource must stay local and travel direct to our private endpoint. This is depicted by the green line on the diagram. Specifically, this means that traffic intended for our private endpoint, should not be directed to the proxy, and herein lies the difficulty.
As mentioned previously, we could select the "use custom proxy" option on the SHIR app and specify our proxy address directly. The problem with this, is that it will send all traffic to the proxy, including traffic intended for our private endpoint (the green line). The green line will then try to enter the ADF resource via the public endpoint, which we have blocked (if it isn't blocked by the proxy first).
The other proxy option on the SHIR app is "use system proxy", which picks up whatever settings are specified in the configuration files. This how we obtain our desired result.
The following steps will enable us to have the set up shown in the above diagram.
<version number>
\Shared\diahost.exe.config<version number>
\Shared\diawp.exe.configNavigate to the <system.net>
tag and replace the whole tag with the following code:
<system.net>
<defaultProxy>
<bypasslist>
<add address = "adfresourcename\.location\.datafactory\.azure\.net" />
</bypasslist>
<proxy
usesystemdefault="True"
proxyaddress="http://yourproxyaddress:80"
bypassonlocal="True"
/>
</defaultProxy>
</system.net>
This code ensures that traffic intended for your ADF resource bypasses the proxy, and everything else doesn't.
Note (1): The fully qualified domain name (FQDN) of your private endpoint can be found on the DNS Configuration pane of your private endpoint resource in Azure. It is important to list the domain name and not the IP address, because the domain otherwise resolves to the public IP address and is not bypassed.
Note (2): Address names being added to the proxy bypass list must be written in regular expression.
The Microsoft documentation has quite a detailed section on configuring proxy server settings (but note there is nothing regarding bypass lists here.) Search for "Configure proxy server settings."
See here for detailed documentation on the <defaultProxy>
element. It is also easy to navigate to the <proxy>
and <bypasslist>
element documentation from here.