---
title: "Path constraints and link exclusions"
date: "2021-12-16T11:58:59+00:00"
summary: "Learn how to set up path constraints and link exclusions for efficient website scanning and crawling."
image:
type: "page"
url: "/web-governance/path-constraints-and-link-exclusions"
id: "d337ce00-1524-4fb0-b5cb-d891a3c53ff7"
---

Table of contents will be added

Introduction
------------

This article provides instructions and examples for how to set up path constraints and link exclusions for the scan.

Note

If you plan to do the scan on a path-based domain, you must set up path constraints because the crawler will crawl everything it can find on the domain. For more information on how to set up path constraints, contact your customer service manager.

Path constraints
----------------

This section provides instructions and examples on how to set up path constraints. Path constraints are regular expressions.

A Regular Expression (regex) is an API that defines string patterns. Use a regex to search, manipulate, and edit a string in Java. Email validation and passwords are two examples of strings where Regex can define the constraints.

Note

Regular Expressions in source code exclusions are not 100% compatible with the Policies module. The languages are different (Java and Ruby).

### Introduction

Use _Path Constraints_ to instruct the scan to only process parts of a domain. The scan compares URLs in the site with the pattern given. URLs that match the pattern are scanned and URLs that do not match the pattern are considered external links.

A Path Constraint can consist of a word or a regular expression. In most cases, users set up _Path Constraints_ to:

*   Restrict the scanner to only recognize parts of a site with a pattern. For example:
    
        ^/en
    
    This instructs the scan to handle any URL that does not begin with /en (such as, `http://foo(dot)com/fr/bar`) as an external link. For example, the crawler tests the link but does not follow any links on `http://foo(dot)com/fr/bar`_._
    
*   Instruct the scan to ignore parts of the site with a pattern. For example:
    
        !^/fr
    
    This instructs the scan to handle any URL that begin with /fr (for example, `http://foo(dot)com/fr/bar`) as an external link. This means that the scan tests the link but does not follow any links on `http://foo(dot)com/fr/bar`.
    

The difference between the two is that in the first case ALL pages under `/en` are scanned, and nothing else. In the second example, all pages EXCEPT `/fr` are scanned.

Important

Make sure that the URL for the domain is set to a page that matches the constraint. If this is not done, only one page will be included in the scan, since the crawler cannot proceed to any other page than the page it starts on.

For example, with a constraint of:

    ^/en/booking

the crawler cannot start on _http://foo(dot)com_. The crawler will request _http://foo(dot)com_, receive the page, and find that no links match _http://foo(dot)com/en/booking_, and therefore only the first page is scanned.

Note

Regular Expressions in source-code exclusions are not 100% compatible with the Policies module. The languages are different (Java and Ruby).

### **Instructions**

1.  Click **Admin Settings** (gear icon) at the top of the _Domain Overview_ page. The _Admin Settings_ page opens.
    
    ![The location of the Admin Settings button on the main menu bar.](https://acquia.widen.net/content/44dfb321-2b2f-48da-8d3d-d42d738df888/web/WebGov_MainToolbar-AdminSettingsButton.png)
    
2.  Click **Action**. on the same row as the domain you want to do a scan on.
3.  ![The location of the Action button and the expanded list of options, on the same row as a domain.](https://acquia.widen.net/content/5721cbb7-fb99-405b-af36-756fe71843bb/web/WebGov_DomainOverview-ActionButtonAndMenu.png)
    
4.  Select **Edit Domain** from the drop-down list.
    
    ![The location of the Edit Domain option in the Action drop-down menu.](https://acquia.widen.net/content/bdb381db-4101-44a6-85fa-2cbb99ffccd6/web/Mon_Opt_AdminSettings-Action-EditDomain.png)
    
    The _Edit Domain_ page opens.
    
5.  Scroll to the **Advanced Domain Options** section.
    
    ![The Advanced Domain Options section on the domain setup page.](https://acquia.widen.net/content/3dd70796-f804-450a-b340-8b6544861259/web/Mon_Opt_AdminSettings-EditDomain-AdvancedDomainOptions.png)
    
6.  Locate the **Path Constraints** section:
    
    *   **Search**: Enter a search parameter for matching strings within the _Constraint Patterns_ list.
    *   **Constraint pattern**: Enter a constraint pattern.
    *   **\+ Add**: Click + to add a new _Constraint pattern_. An empty row is added to the list.
        
        Note
        
        The window only shows the first five items. With more than five list items, a paginate function sorts the list items.
        
    *   **Delete**: Click the trashcan icon to delete an item from the list.
    
    Note
    
    Some CMS selections have pre-filled parameters, as per the majority of users with those CMS operators. Click **Default** to remove them if needed. The **Default** button only appears for certain operators.
    
    For more information and examples, visit the external article [Class Patterns by Oracle](https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html)
    

Link excludes
-------------

This section provides instructions and examples on how to set up the link exclusions. Link exclusions are regular expressions.

A Regular Expression (regex) is an API to define String patterns. Use a regex to search, manipulate, and edit a string in Java. Email validation and passwords are two examples of strings where Regex can define the constraints.

Note

Regular Expressions in source-code exclusions are not 100% compatible with the Policies module. The languages are different (Java and Ruby).

Choose to exclude a word or a regular expression. Use _Link Excludes_ to instruct the crawler to completely ignore a link on the pages. Pages that match the pattern are not tested.

Use link exclusions to:

*   Filter out print pages with a pattern such as _print=true_
    
    This instructs the scan to ignore (and not test) any URL with the pattern, for example `http://foo.com/bar?print=true`_._
    
*   Filter out redirected login pages with a pattern such as `login.aspx?return_url=zyx`_._
    
    This instructs the scan to ignore all URLs with the pattern, for example: `http://foo(dot)com/bar/login.aspx?return_url=zyx`_._
    

_Tip!_ If "Scan subdomains" is turned on for the domain, use the **§** sign in front of the exclude pattern to match URLs that use the full string instead of the relative one. For example, to exclude the scan for the "blog" subdomain, enter as the pattern:

    §http://blog.foo.bar

Note

Regular Expressions in source-code exclusions are not 100% compatible with the Policies module. The languages are different (Java and Ruby).

### Instructions

1.  Click **Settings** (gear icon) at the top of the _Domain Overview_ page. 
    
    ![The location of the Admin Settings button on the main menu bar.](https://acquia.widen.net/content/44dfb321-2b2f-48da-8d3d-d42d738df888/web/WebGov_MainToolbar-AdminSettingsButton.png)
    
2.  Click **Action** on the same row as the domain to scan.
    
    ![The location of the Action button and the expanded list of options, on the same row as a domain.](https://acquia.widen.net/content/5721cbb7-fb99-405b-af36-756fe71843bb/web/WebGov_DomainOverview-ActionButtonAndMenu.png)
    
3.  Select **Edit Domain** from the drop-down list.
    
    ![The location of the Edit Domain option in the Action drop-down menu.](https://acquia.widen.net/content/bdb381db-4101-44a6-85fa-2cbb99ffccd6/web/Mon_Opt_AdminSettings-Action-EditDomain.png)
    
    The _Edit Domain_ page opens.
    
4.  Scroll to the **Advanced Domain Options** section.
    
    ![The Advanced Domain Options section on the domain setup page.](https://acquia.widen.net/content/3dd70796-f804-450a-b340-8b6544861259/web/Mon_Opt_AdminSettings-EditDomain-AdvancedDomainOptions.png)
    
5.  In the _Link Excludes_ section:
    
    *   **Search**: Enter a search parameter for matching strings within the _Link excludes_ list.
    *   **Exclude pattern**: Enter a pattern to exclude from the scan.
        
        Note
        
        The window only shows the first five items. With more than five list items, a paginate function begins to sort consecutive list items.
        
    *   **\+ Add**: Click + to add a new _Exclude pattern_. An empty row appears in the list.
    *   **Delete**: Click the trashcan icon to delete an item from the list.
    *   **Internal URLs:**
        *   **Operator**: Click the drop-down arrow to select _Contains_, _Starts with_, or _Regex_.
        *   **URL**: Type a URL in the field.
        *   **Delete**: Click the trashcan icon to delete the row.
        *   **\+ Add**: Click to add a new _Input Selector_. An empty row appears in the list.
    
    Important
    
    It is possible to do a link exclusion for a link that is attached to an image. The _link_ is then excluded from the scan. However, the image itself could still appear on the SEO and QA pages as an issue to be fixed if it does not meet other requirements (for example, missing ALT text).
    

For more information and examples, visit the external article [Class Patterns by Oracle](https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html).

Additional resources
--------------------

For more information, visit:

*   [Configure Domain Scans](https://docs.acquia.com/acquia-optimize/admin/configure-scans/domain-scans)
*   [Source code exclusions](https://docs.acquia.com/acquia-optimize/admin/configure-scans/source-code-exclusions)

For advanced instructions on this topic, visit [CMS Integration](https://docs.acquia.com/acquia-optimize/developer/cms-integration).