A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://learn.microsoft.com/en-us/azure/data-factory/create-self-hosted-integration-runtime below:

Create a self-hosted integration runtime - Azure Data Factory & Azure Synapse

APPLIES TO: Azure Data Factory Azure Synapse Analytics

The integration runtime (IR) is the compute infrastructure that Azure Data Factory and Synapse pipelines use to provide data-integration capabilities across different network environments.

A self-hosted integration runtime provides these capabilities between a cloud data store and a data store in a private network, for example an on-premises network or an Azure virtual network. This article describes how you can create and configure a self-hosted IR on a machine inside your private network, so you can connect your data source to your data integration resources.

Considerations for using a self-hosted IR

Note

Currently self-hosted integration runtime can only be shared with multiple data factories, it can't be shared across Synapse workspaces or between data factory and Synapse workspace.

Command flow and data flow

When you move data between on-premises and the cloud, the activity uses a self-hosted integration runtime to transfer the data between an on-premises data source and the cloud.

Here's a high-level summary of the data-flow steps for copying with a self-hosted IR:

  1. A data developer first creates a self-hosted integration runtime within an Azure data factory or Synapse workspace by using the Azure portal or the PowerShell cmdlet. Then the data developer creates a linked service for an on-premises data store, specifying the self-hosted integration runtime instance that the service should use to connect to data stores.

  2. The self-hosted integration runtime node encrypts the credentials by using Windows Data Protection Application Programming Interface (DPAPI) and saves the credentials locally. If multiple nodes are set for high availability, the credentials are further synchronized across other nodes. Each node encrypts the credentials by using DPAPI and stores them locally. Credential synchronization is transparent to the data developer and is handled by the self-hosted IR.

  3. Azure Data Factory and Synapse pipelines communicate with the self-hosted integration runtime to schedule and manage jobs. Communication is via a control channel that uses a shared Azure Relay connection. When an activity job needs to be run, the service queues the request along with any credential information. It does so in case credentials aren't already stored on the self-hosted integration runtime. The self-hosted integration runtime starts the job after it polls the queue.

  4. The self-hosted integration runtime copies data between an on-premises store and cloud storage. The direction of the copy depends on how the copy activity is configured in the data pipeline. For this step, the self-hosted integration runtime directly communicates with cloud-based storage services like Azure Blob storage over a secure HTTPS channel.

Prerequisites

Installation of the self-hosted integration runtime on a domain controller isn't supported.

Setting up a self-hosted integration runtime

To create and set up a self-hosted integration runtime, use the following procedures.

Create a self-hosted IR via Azure PowerShell
  1. You can use Azure PowerShell for this task. Here's an example:

    Set-AzDataFactoryV2IntegrationRuntime -ResourceGroupName $resourceGroupName -DataFactoryName $dataFactoryName -Name $selfHostedIntegrationRuntimeName -Type SelfHosted -Description "selfhosted IR description"
    
  2. Download and install the self-hosted integration runtime on a local machine.

  3. Retrieve the authentication key and register the self-hosted integration runtime with the key. Here's a PowerShell example:

    
    Get-AzDataFactoryV2IntegrationRuntimeKey -ResourceGroupName $resourceGroupName -DataFactoryName $dataFactoryName -Name $selfHostedIntegrationRuntimeName  
    
    
Create a self-hosted IR via UI

Use the following steps to create a self-hosted IR using the Azure Data Factory or Azure Synapse UI.

  1. On the home page of the Azure Data Factory UI, select the Manage tab from the leftmost pane.

  2. Select Integration runtimes on the left pane, and then select +New.

  3. On the Integration runtime setup page, select Azure, Self-Hosted, and then select Continue.

  4. On the following page, select Self-Hosted to create a Self-Hosted IR, and then select Continue.

  1. On the home page of the Azure Synapse UI, select the Manage tab from the leftmost pane.

  2. Select Integration runtimes on the left pane, and then select +New.

  3. On the following page, select Self-Hosted to create a Self-Hosted IR, and then select Continue.

Configure a self-hosted IR via UI
  1. Enter a name for your IR, and select Create.

  2. On the Integration runtime setup page, select the link under Option 1 to open the express setup on your computer. Or follow the steps under Option 2 to set up manually. The following instructions are based on manual setup:

    1. Copy and paste the authentication key. Select Download and install integration runtime.

    2. Download the self-hosted integration runtime on a local Windows machine. Run the installer.

    3. On the Register Integration Runtime (Self-hosted) page, paste the key you saved earlier, and select Register.

    4. On the New Integration Runtime (Self-hosted) Node page, select Finish.

  3. After the self-hosted integration runtime is registered successfully, you see the following window:

Set up a self-hosted IR on an Azure VM via an Azure Resource Manager template

You can automate self-hosted IR setup on an Azure virtual machine by using the Create self host IR template. The template provides an easy way to have a fully functional self-hosted IR inside an Azure virtual network. The IR has high-availability and scalability features, as long as you set the node count to 2 or higher.

Set up an existing self-hosted IR via local PowerShell

You can use a command line to set up or manage an existing self-hosted IR. This usage can especially help to automate the installation and registration of self-hosted IR nodes.

Dmgcmd.exe is included in the self-hosted installer. It's typically located in the C:\Program Files\Microsoft Integration Runtime\5.0\Shared\ folder. This application supports various parameters and can be invoked via a command line using batch scripts for automation.

Use the application as follows:

dmgcmd ACTION args...

Here are details of the application's actions and arguments:

ACTION args Description -rn,
-RegisterNewNode "<AuthenticationKey>" ["<NodeName>"] Register a self-hosted integration runtime node with the specified authentication key and node name. -era,
-EnableRemoteAccess "<port>" ["<thumbprint>"] Enable remote access on the current node to set up a high-availability cluster. Or enable setting credentials directly against the self-hosted IR without going through an Azure Data Factory or Azure Synapse workspace. You do the latter by using the New-AzDataFactoryV2LinkedServiceEncryptedCredential cmdlet from a remote machine in the same network. -erac,
-EnableRemoteAccessInContainer "<port>" ["<thumbprint>"] Enable remote access to the current node when the node runs in a container. -dra,
-DisableRemoteAccess Disable remote access to the current node. Remote access is needed for multinode setup. The New-AzDataFactoryV2LinkedServiceEncryptedCredential PowerShell cmdlet still works even when remote access is disabled. This behavior is true as long as the cmdlet is executed on the same machine as the self-hosted IR node. -k,
-Key "<AuthenticationKey>" Overwrite or update the previous authentication key. Be careful with this action. Your previous self-hosted IR node can go offline if the key is of a new integration runtime. -gbf,
-GenerateBackupFile "<filePath>" "<password>" Generate a backup file for the current node. The backup file includes the node key and data-store credentials. -ibf,
-ImportBackupFile "<filePath>" "<password>" Restore the node from a backup file. -r,
-Restart Restart the self-hosted integration runtime host service. -s,
-Start Start the self-hosted integration runtime host service. -t,
-Stop Stop the self-hosted integration runtime host service. -sus,
-StartUpgradeService Start the self-hosted integration runtime upgrade service. -tus,
-StopUpgradeService Stop the self-hosted integration runtime upgrade service. -tonau,
-TurnOnAutoUpdate Turn on the self-hosted integration runtime auto-update. This command is for Azure Data Factory V1 only. -toffau,
-TurnOffAutoUpdate Turn off the self-hosted integration runtime auto-update. This command is for Azure Data Factory V1 only. -ssa,
-SwitchServiceAccount "<domain\user>" ["<password>"] Set DIAHostService to run as a new account. Use the empty password "" for system accounts and virtual accounts. -elma,
-EnableLocalMachineAccess Enable local machine access (localhost, private IP) on the current self-hosted IR node. In self-hosted IR High Availability scenario, the action needs to be invoked on every self-hosted IR node. -dlma,
-DisableLocalMachineAccess Disable local machine access (localhost, private IP) on the current self-hosted IR node. In self-hosted IR High Availability scenario, the action needs to be invoked on every self-hosted IR node. -DisableLocalFolderPathValidation Disable security validation to enable access to file system of the local machine. -EnableLocalFolderPathValidation Enable security validation to disable access to file system of the local machine. -eesp,
-EnableExecuteSsisPackage Enable SSIS package execution on self-hosted IR node. -desp,
-DisableExecuteSsisPackage Disable SSIS package execution on self-hosted IR node. -gesp,
-GetExecuteSsisPackage Get the value if ExecuteSsisPackage option is enabled on self-hosted IR node.
If the returned value is true, then ExecuteSSISPackage is enabled; If the returned value is false or null, then ExecuteSSISPackage is disabled. Install and register a self-hosted IR from Microsoft Download Center
  1. Go to the Microsoft integration runtime download page.

  2. Select Download, select the 64-bit version, and select Next. The 32-bit version isn't supported.

  3. Run the MSI file directly, or save it to your hard drive and run it.

  4. On the Welcome window, select a language and select Next.

  5. Accept the Microsoft Software License Terms and select Next.

  6. Select folder to install the self-hosted integration runtime, and select Next.

  7. On the Ready to install page, select Install.

  8. Select Finish to complete installation.

  9. Get the authentication key by using PowerShell. Here's a PowerShell example for retrieving the authentication key:

    Get-AzDataFactoryV2IntegrationRuntimeKey -ResourceGroupName $resourceGroupName -DataFactoryName $dataFactoryName -Name $selfHostedIntegrationRuntimeName
    
  10. On the Register Integration Runtime (Self-hosted) window of Microsoft Integration Runtime Configuration Manager running on your machine, take the following steps:

    1. Paste the authentication key in the text area.

    2. Optionally, select Show authentication key to see the key text.

    3. Select Register.

Service account for self-hosted integration runtime

The default log on service account of the self-hosted integration runtime is NT SERVICE\DIAHostService. You can see it in Services -> Integration Runtime Service -> Properties -> Log on.

Make sure the account has the permission of Log on as a service. Otherwise self-hosted integration runtime can't start successfully. You can check the permission in Local Security Policy -> Security Settings -> Local Policies -> User Rights Assignment -> Log on as a service

Notification area icons and notifications

If you move your cursor over the icon or message in the notification area, you can see details about the state of the self-hosted integration runtime.

High availability and scalability

You can associate a self-hosted integration runtime with multiple on-premises machines or virtual machines in Azure. These machines are called nodes. You can have up to four nodes associated with a self-hosted integration runtime. The benefits of having multiple nodes on on-premises machines that have a gateway installed for a logical gateway are:

You can associate multiple nodes by installing the self-hosted integration runtime software from Download Center. Then, register it by using either of the authentication keys that were obtained from the New-AzDataFactoryV2IntegrationRuntimeKey cmdlet, as described in the tutorial.

Note

You don't need to create a new self-hosted integration runtime to associate each node. You can install the self-hosted integration runtime on another machine and register it by using the same authentication key.

Note

Before you add another node for high availability and scalability, ensure that the Remote access to intranet option is enabled on the first node. To do so, select Microsoft Integration Runtime Configuration Manager > Settings > Remote access to intranet.

Scale considerations Scale out

When processor usage is high and available memory is low on the self-hosted IR, add a new node to help scale out the load across machines. If activities fail because they time out or the self-hosted IR node is offline, it helps if you add a node to the gateway. To add a node, complete the following steps:

  1. Download the SHIR setup from the Azure Data Factory portal.
  2. Run the Installer on the node you want to add to the cluster.
  3. During the installation, select the option to join an existing integration runtime, and provide the authentication key from the existing SHIR to link the new node to the existing SHIR cluster.
Scale up

When the processor and available RAM aren't well utilized, but the execution of concurrent jobs reaches a node's limits, scale up by increasing the number of concurrent jobs that a node can run. You might also want to scale up when activities time out because the self-hosted IR is overloaded. As shown in the following image, you can increase the maximum capacity for a node:

TLS/SSL certificate requirements

If you want to enable remote access from intranet with TLS/SSL certificate (Advanced) to secure communication between integration runtime nodes, you can follow steps in Enable remote access from intranet with TLS/SSL certificate.

Note

This certificate is used:

We suggest you use this certificate if your private network environment isn't secure or if you want to secure the communication between nodes within your private network.

Data movement in transit from a self-hosted IR to other data stores always happens within an encrypted channel, regardless of whether or not this certificate is set.

Credential sync

If you don't store credentials or secret values in an Azure Key Vault, the credentials or secret values will be stored in the machines where your self-hosted integration runtime is located. Each node will have a copy of the credentials with a certain version. In order to make all nodes work together, the version number should be the same for all nodes.

Proxy server considerations

If your corporate network environment uses a proxy server to access the internet, configure the self-hosted integration runtime to use appropriate proxy settings. You can set the proxy during the initial registration phase.

When configured, the self-hosted integration runtime uses the proxy server to connect to the cloud service's source and destination (which use the HTTP or HTTPS protocol). This is why you select Change link during initial setup.

There are three configuration options:

The integration runtime host service restarts automatically after you save the updated proxy settings.

After you register the self-hosted integration runtime, if you want to view or update proxy settings, use Microsoft Integration Runtime Configuration Manager.

  1. Open Microsoft Integration Runtime Configuration Manager.
  2. Select the Settings tab.
  3. Under HTTP Proxy, select the Change link to open the Set HTTP Proxy dialog box.
  4. Select Next. You then see a warning that asks for your permission to save the proxy setting and restart the integration runtime host service.

You can use the configuration manager tool to view and update the HTTP proxy.

Note

If you set up a proxy server with NTLM authentication, the integration runtime host service runs under the domain account. If you later change the password for the domain account, remember to update the configuration settings for the service and restart the service. Because of this requirement, we suggest that you access the proxy server by using a dedicated domain account that doesn't require you to update the password frequently.

Configure proxy server settings

If you select the Use system proxy option for the HTTP proxy, the self-hosted integration runtime uses the proxy settings in diahost.exe.config and diawp.exe.config. When these files specify no proxy, the self-hosted integration runtime connects to the cloud service directly without going through a proxy. The following procedure provides instructions for updating the diahost.exe.config file:

  1. In File Explorer, make a safe copy of C:\Program Files\Microsoft Integration Runtime\5.0\Shared\diahost.exe.config as a backup of the original file.

  2. Open Notepad running as administrator.

  3. In Notepad, open the text file C:\Program Files\Microsoft Integration Runtime\5.0\Shared\diahost.exe.config.

  4. Find the default system.net tag as shown in the following code:

    <system.net>
        <defaultProxy useDefaultCredentials="true" />
    </system.net>
    

    You can then add proxy server details as shown in the following example:

    <system.net>
        <defaultProxy enabled="true">
              <proxy bypassonlocal="true" proxyaddress="http://proxy.domain.org:8888/" />
        </defaultProxy>
    </system.net>
    

    The proxy tag allows additional properties to specify required settings like scriptLocation. See <proxy> Element (Network Settings) for syntax.

    <proxy autoDetect="true|false|unspecified" bypassonlocal="true|false|unspecified" proxyaddress="uriString" scriptLocation="uriString" usesystemdefault="true|false|unspecified "/>
    
  5. Save the configuration file in its original location. Then restart the self-hosted integration runtime host service, which picks up the changes.

    To restart the service, use the services applet from Control Panel. Or from Integration Runtime Configuration Manager, select the Stop Service button, and then select Start Service.

    If the service doesn't start, you likely added incorrect XML tag syntax in the application configuration file that you edited.

Important

Don't forget to update both diahost.exe.config and diawp.exe.config.

You also need to make sure that Microsoft Azure is in your company's allowlist. You can download the list of valid Azure IP addresses. IP Ranges for each cloud, broken down by region and by the tagged services in that cloud are now available on MS Download:

Configure proxy server settings when using a private endpoint

If your company's network architecture involves the use of private endpoints and for security reasons, and your company's policy doesn't allow a direct internet connection from the VM hosting the Self Hosted Integration Runtime to the Azure Data Factory service URL, then you'll need to allow bypass the ADF Service URL for full connectivity. The following procedure provides instructions for updating the diahost.exe.config file. You should also repeat these steps for the diawp.exe.config file.

  1. In File Explorer, make a safe copy of C:\Program Files\Microsoft Integration Runtime\5.0\Shared\diahost.exe.config as a backup of the original file.

  2. Open Notepad running as administrator.

  3. In Notepad, open C:\Program Files\Microsoft Integration Runtime\5.0\Shared\diahost.exe.config.

  4. Find the default system.net tag as shown here:

    <system.net>
        <defaultProxy useDefaultCredentials="true" />
    </system.net>
    

    You can then add bypasslist details as shown in the following example:

    <system.net>
      <defaultProxy>
          <bypasslist>
              <add address = "[adfresourcename].[adfresourcelocation].datafactory.azure.net" />
          </bypasslist>
          <proxy 
          usesystemdefault="True"
          proxyaddress="http://proxy.domain.org:8888/"
          bypassonlocal="True"
          />
      </defaultProxy>
    </system.net>
    
Possible symptoms for issues related to the firewall and proxy server

If you see error messages like the following ones, the likely reason is improper configuration of the firewall or proxy server. Such configuration prevents the self-hosted integration runtime from connecting to Data Factory or Synapse pipelines to authenticate itself. To ensure that your firewall and proxy server are properly configured, refer to the previous section.

Enable remote access from an intranet

If you use PowerShell to encrypt credentials from a networked machine other than where you installed the self-hosted integration runtime, you can enable the Remote Access from Intranet option. If you run PowerShell to encrypt credentials on the machine where you installed the self-hosted integration runtime, you can't enable Remote Access from Intranet.

Enable Remote Access from Intranet before you add another node for high availability and scalability.

When you run the self-hosted integration runtime setup version 3.3 or later, by default the self-hosted integration runtime installer disables Remote Access from Intranet on the self-hosted integration runtime machine.

When you use a firewall from a partner or others, you can manually open port 8060 or the user-configured port. If you have a firewall problem while setting up the self-hosted integration runtime, use the following command to install the self-hosted integration runtime without configuring the firewall:

msiexec /q /i IntegrationRuntime.msi NOFIREWALL=1

If you choose not to open port 8060 on the self-hosted integration runtime machine, use mechanisms other than the Setting Credentials application to configure data-store credentials. For example, you can use the New-AzDataFactoryV2LinkedServiceEncryptCredential PowerShell cmdlet.

Ports and firewalls

There are two firewalls to consider:

At the corporate firewall level, you need to configure the following domains and outbound ports:

Domain names Outbound ports Description Public Cloud: *.servicebus.windows.net
Azure Government: *.servicebus.usgovcloudapi.net
China: *.servicebus.chinacloudapi.cn 443 Required by the self-hosted integration runtime for interactive authoring. Not required if self-contained interactive authoring is enabled. Public Cloud: {datafactory}.{region}.datafactory.azure.net
or *.frontend.clouddatahub.net
Azure Government: {datafactory}.{region}.datafactory.azure.us
China: {datafactory}.{region}.datafactory.azure.cn 443 Required by the self-hosted integration runtime to connect to the Data Factory service.
For new created Data Factory in public cloud, find the fully qualified domain name (FQDN) from your Self-hosted Integration Runtime key, which is in format {data factory}.{region}.datafactory.azure.net. For old Data factory and any version of Azure Synapse Analytics, if you don't see the FQDN in your Self-hosted Integration key, use *.frontend.clouddatahub.net instead. download.microsoft.com 443 Required by the self-hosted integration runtime for downloading the updates. If you disabled autoupdate, you can skip configuring this domain. Key Vault URL 443 Required by Azure Key Vault if you store the credential in Key Vault.

At the Windows Firewall level or machine level, these outbound ports are normally enabled. If they aren't, you can configure the domains and ports on a self-hosted integration runtime machine.

Note

As currently Azure Relay doesn't support service tag, you have to use service tag AzureCloud or Internet in NSG rules for the communication to Azure Relay. For the communication to Azure Data Factory and Synapse workspaces, you can use service tag DataFactoryManagement in the NSG rule setup.

Based on your source and sinks, you might need to allow additional domains and outbound ports in your corporate firewall or Windows Firewall.

Domain names Outbound ports Description *.core.windows.net 443 Used by the self-hosted integration runtime to connect to the Azure storage account when you use the staged copy feature. *.database.windows.net 1433 Required only when you copy from or to Azure SQL Database or Azure Synapse Analytics and optional otherwise. Use the staged-copy feature to copy data to SQL Database or Azure Synapse Analytics without opening port 1433. *.azuredatalakestore.net
login.microsoftonline.com/<tenant>/oauth2/token 443 Required only when you copy from or to Azure Data Lake Store and optional otherwise.

For some cloud databases, such as Azure SQL Database and Azure Data Lake, you might need to allow IP addresses of self-hosted integration runtime machines on their firewall configuration.

Note

It isn't right to install both Integration Runtime and Power BI gateway in same machine, because mainly Integration Runtime uses port number 443, which is one of the main ports being used by Power BI gateway as well.

In order to perform interactive authoring actions such as data preview and connection testing, the self-hosted integration runtime requires a connection to Azure Relay. If the connection isn't established, there are two possible solutions to ensure uninterrupted functionality. The first option is to add the Azure Relay endpoints to your firewall's allowlist Get URL of Azure Relay. Alternatively, you can enable self-contained interactive authoring.

Note

If the self-hosted integration runtime fails to establish a connection to Azure Relay, its status will be marked as "limited".

Note

While self-contained interactive authoring is enabled, all interactive authoring traffic will be routed exclusively through this functionality, bypassing Azure Relay. The traffic will only be redirected back to Azure Relay once you choose to disable this feature.

Note

Both "Get IP" and "Send log" aren't supported when self-contained interactive authoring is enabled.

Get URL of Azure Relay

One required domain and port that need to be put in the allowlist of your firewall is for the communication to Azure Relay. The self-hosted integration runtime uses it for interactive authoring such as test connection, browse folder list and table list, get schema, and preview data. If you don't want to allow .servicebus.windows.net and would like to have more specific URLs, then you can see all the FQDNs that are required by your self-hosted integration runtime from the service portal.

Get URL of Azure Relay via UI:

Follow these steps:

  1. Go to the service portal and select your self-hosted integration runtime.

  2. In Edit page, select Nodes.

  3. Select View Service URLs to get all FQDNs.

  4. You can add these FQDNs in the allowlist of firewall rules.

Get URL of Azure Relay via script:
# The documentation of Synapse self hosted integration runtime (SHIR) mentions that the SHIR requires access to the Azure Service Bus IP addresses
# https://learn.microsoft.com/en-us/azure/data-factory/create-self-hosted-integration-runtime
# It is a requirement to use a wildcard (*.servicebus.windows.net) in your firewalls.
# While this is the easiest way to clear the firewall, it also opens the firewall to all Azure Service Bus IP addresses, including malicious_actor.servicebus.windows.net.
# This might be restricted by your security policies.
# This script resolves the Azure Service Bus IP addresses used by an integration runtime and adds them to the network security group (NSG) rule for the Synapse self-hosted integration runtime (SHIR).
# As the mapping of IP addresses to Domain Names might change, we recommend to run at least once a day to keep the NSG up to date.
# An alternative to running this script is to use the "Self-contained interactive authoring" feature of the self hosted integration runtime.

# Prerequisites:
# - PowerShell installed
# - Azure CLI (az) installed and logged in (https://learn.microsoft.com/en-us/cli/azure/)
# - signed in user needs rights to modify NSG (e.g. Network contributor) and to read status of the SHIR (e.g. reader), plus reader on the subscription

param (
    [string]$synapseResourceGroupName = "synapse_test",
    [string]$nsgResourceGroupName = "adf_shir_rg",
    [string]$synapseWorkspaceName = "synapse-test-jugi2",
    [string]$integrationRuntimeName = "IntegrationRuntime2",
    [string]$networkSecurityGroupName = "jugis-shir-nsg",
    [string]$securityRuleName = "AllowSynapseServiceBusIPs",
    [int]$priority = 100
)

# Check if the user is already logged in
$azAccount = az account show 2>$null

if (-not $azAccount) {
    # Run az login with managed identity if not logged in
    az login --identity
}

# Retrieve the URLs of the connections from the Synapse self-hosted integration runtime
$urls = az synapse integration-runtime get-status `
    --resource-group $synapseResourceGroupName `
    --workspace-name $synapseWorkspaceName `
    --name $integrationRuntimeName `
    --query "properties.serviceUrls" -o tsv

# Initialize an empty array to hold the IP addresses
$ipAddresses = @()

# Iterate over the URLs to resolve and collect the IP addresses
# The proper DNS resolution might only work within Azure, not locally
foreach ($url in $urls) {
    Write-Output "Processing URL: $url"
    $ip = [System.Net.Dns]::GetHostAddresses($url) | Where-Object { $_.AddressFamily -eq 'InterNetwork' } | Select-Object -ExpandProperty IPAddressToString
    if ($ip) {
        $ipAddresses += $ip
    }
}

# Remove duplicate IP addresses from the array
$ipAddresses = $ipAddresses | Sort-Object -Unique

# Convert the array of IP addresses to a space-separated string
$ipAddressesString = $ipAddresses -join ' '

# Create or update the network security group rule to allow outbound traffic for the collected IP addresses
# Using Invoke-Expression to handle the command string
$az_cmd = "az network nsg rule create --resource-group $nsgResourceGroupName --nsg-name $networkSecurityGroupName --name $securityRuleName --priority $priority --destination-address-prefixes $ipAddressesString --destination-port-ranges '443' --direction Outbound --access Allow --protocol '*' --description 'Allow outbound access to Synapse servicebus IPs'"
Invoke-Expression $az_cmd
Copy data from a source to a sink

Ensure that you properly enable firewall rules on the corporate firewall, the Windows Firewall of the self-hosted integration runtime machine, and the data store itself. Enabling these rules lets the self-hosted integration runtime successfully connect to both source and sink. Enable rules for each data store that is involved in the copy operation.

For example, to copy from an on-premises data store to a SQL Database sink or an Azure Synapse Analytics sink, take the following steps:

  1. Allow outbound TCP communication on port 1433 for both the Windows Firewall and the corporate firewall.
  2. Configure the firewall settings of the SQL Database to add the IP address of the self-hosted integration runtime machine to the list of allowed IP addresses.

Note

If your firewall doesn't allow outbound port 1433, the self-hosted integration runtime can't access the SQL database directly. In this case, you can use a staged copy to SQL Database and Azure Synapse Analytics. In this scenario, you require only HTTPS (port 443) for the data movement.

If all your data source and sink and self-hosted integration runtime are in on-premises environment, then the copied data won't go to cloud but strictly remain within on-premises.

Store credentials

There are two ways to store the credentials when using self-hosted integration runtime:

  1. Use Azure Key Vault (recommended) - The self-hosted integration runtime can directly get the credentials from Azure Key Vault which can highly avoid some potential security issues or any credential in-sync problems between self-hosted integration runtime nodes.

  2. Store credentials locally - The credentials will be pushed to the machine where your self-hosted integration runtime is installed and be encrypted.

    Note

    If you prefer to store the credential locally, your need to put the domain for interactive authoring in the allowlist of your firewall and open the port. This channel also allows the self-hosted integration runtime to get the credentials. For the domain and ports required for interactive authoring, refer to Ports and firewalls

If you recover your self-hosted integration runtime from a crash, you can either recover the credentials from the one you saved as a back up, or edit the linked service and let the credential be pushed to self-hosted integration runtime again. Otherwise, a pipeline using the self-hosted integration runtime won't work due to the lack of credential.

Installation best practices

You can install the self-hosted integration runtime by downloading a Managed Identity setup package from Microsoft Download Center. See the article Move data between on-premises and cloud for step-by-step instructions.

Important considerations

When installing a self-hosted integration runtime consider following

For step-by-step instructions, see Tutorial: Copy on-premises data to cloud.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4