Acquia Optimize

Path constraints and link exclusions

Introduction

This article gives instructions and examples for how to set up Path Constraints and Link Exclusions for the scan.

Note

If you plan to do the scan on a path-based domain, you need to set up path constraints because the crawler will crawl everything it can find on the domain. For more information on how to set up path constraints, contact your customer service manager.

Path Constraints

This section gives instructions and examples on how to set up path constraints. Path constraints are regular expressions.

A Regular Expression (regex) is an API that defines string patterns. Use a regex to search, manipulate, and edit a string in Java. Email validation and passwords are two examples of strings where Regex can define the constraints.

Note

Regular Expressions in source-code exclusions are not 100% compatible with Acquia Optimize Policies. The languages are different (Java and Ruby).

Introduction

Use Path Constraints to instruct the scan to only process parts of a domain. The scan compares URLs in the site with the pattern given. URLs that match the pattern are scanned and URLs that do not match the pattern are considered external links.

A Path Constraint can be a word or a regular expression. In most cases, users set up Path Constraints to:

  • Restrict the scanner to only recognize parts of a site with a pattern. For example:

    ^/en

    This instructs the scan to handle any URL that does not begin with /en (for example, http://foo(dot)com/fr/bar) as an external link. For example, the crawler tests the link but does not follow any links on http://foo(dot)com/fr/bar.

  • Instruct the scan to ignore parts of the site with a pattern. For example:

    !^/fr

    This instructs the scan to handle any URL that begin with /fr (for example, http://foo(dot)com/fr/bar) as an external link. This means that the scan tests the link but does not follow any links on http://foo(dot)com/fr/bar.

The difference between the two is that in the first case ALL pages under /en are scanned and nothing else. In the second example, all pages EXCEPT /fr are scanned.

Important

Make sure that the URL for the domain is set to a page that matches the constraint.

If this is not done, just one page is included in the scan - since the crawler cannot proceed to any other page than the page it starts on.

For example, with a constraint of:

^/en/booking

the crawler cannot start on http://foo(dot)com. The crawler will request http://foo(dot)com, receive the page, and find that no links match http://foo(dot)com/en/booking, and therefore only the first page is scanned.

Note

Regular Expressions in source-code exclusions are not 100% compatible with Acquia Optimize Policies. The languages are different (Java and Ruby).

Instructions

  1. Click Settings (gear icon) at the top of the Domain Overview page. The Admin Settings page opens.

    Note

    The Settings button is only available to site admins.

  2. Click Action. on the same row as the domain you want to do a scan on.

  3. Select Edit Domain from the drop-down list.

    The Edit Domain page opens.

  4. Scroll to the Advanced Domain Options section.

  5. Locate the Path Constraints section:

    • Search: Enter a search parameter for matching strings within the Constraint Patterns list.
    • Constraint pattern: Enter a constraint pattern.
    • + Add: Click + to add a new Constraint pattern. An empty row is added to the list.

      Note

      The window only shows the first five items. With more than five list items, a paginate function sorts the list items.

    • Delete: Click the trashcan icon to delete an item from the list.
    Note

    Some CMS selections have pre-filled parameters, as per the majority of users with those CMS operators. Click Default to remove them if needed. The Default button only appears for certain operators.

    For more information and examples, see the external article:

Link excludes

This section gives instructions and examples for how to set up the link exclusions. Link exclusions are regular expressions.

A Regular Expression (regex) is an API to define String patterns. Use a regex to search, manipulate, and edit a string in Java. Email validation and passwords are two examples of strings where Regex can define the constraints.

Note

Regular Expressions in source-code exclusions are not 100% compatible with Acquia Optimize Policies. The languages are different (Java and Ruby).

Introduction

Choose to exclude a word or a regular expression. Use Link Excludes to instruct the crawler to completely ignore a link on the pages. Pages that match the pattern are not tested.

Use link exclusions to:

  • Filter out print pages with a pattern such as print=true

    This instructs the scan to ignore (and not test) any URL with the pattern, for example:

    http:/ /foo.com/bar?print=true.

  • Filter out redirected login pages with a pattern such as:

    login.aspx?return_url=zyx.

    This instructs the scan to ignore all URLs with the pattern, for example: http://foo(dot)com/bar/login.aspx?return_url=zyx.

Tip! If "Scan subdomains" is turned on for the domain, use the § sign in front of the exclude pattern to match URLs that use the full string instead of the relative one. For example, to exclude the scan for the "blog" subdomain, enter as the pattern:

§http://blog.foo.bar
Note

Regular Expressions in source-code exclusions are not 100% compatible with Acquia Optimize Policies. The languages are different (Java and Ruby).

Instructions

  1. Click Settings (gear icon) at the top of the Domain Overview page. 

    Note

    The Settings button is only available to site admins.

  2. Click Action on the same row as the domain to scan.

  3. Select Edit Domain from the drop-down list.

    The Edit Domain page opens.

  4. Scroll to the Advanced Domain Options section.

  5. In the Link Excludes section:

    • Search: Enter a search parameter for matching strings within the Link excludes list.
    • Exclude pattern: Enter a pattern to exclude from the scan.

      Note

      The window only shows the first five items. With more than five list items, a paginate function begins to sort consecutive list items.

    • + Add: Click + to add a new Exclude pattern. An empty row appears in the list.
    • Delete: Click the trashcan icon to delete an item from the list.
    • Internal URLs:
      • Operator: Click the drop-down arrow to select Contains, Starts with, or Regex.
      • URL: Type a URL in the field.
      • Delete: Click the trashcan icon to delete the row.
      • + Add: Click to add a new Input Selector. An empty row appears in the list.
    Important

    It is possible to do a link exclusion for a link that is attached to an image. The link is then excluded from the scan. However, the image itself could still appear on the SEO and QA pages as an issue to be fixed if it does not meet other requirements (for example, missing ALT text).

For more information and examples, see the external article:

Class Patterns by Oracle.

Additional resources

For more information, see the user guide articles:

For advanced instructions on this topic, see the associated article in the Acquia Optimize for Developers collection:

CMS Integration

Did not find what you were looking for?

If this content did not answer your questions, try searching or contacting our support team for further assistance.

Acquia Help

Filter by product:

Acquia Optimize common questions