---
title: "How can I exclude links and pages from the Web Governance scan?"
date: "2025-02-05T10:52:54+00:00"
summary: "Learn how to exclude specific links and pages from Web Governance scans using link excludes, path constraints, and canonical tags."
image:
type: "article"
url: "/web-governance/help/61226-how-can-i-exclude-links-and-pages-web-governance-scan"
id: "4e9ee94d-eef0-4843-8ff2-40162d1d146f"
---

Answer
------

You can exclude certain links and URL paths from a scan and exclude specific pages as well. Use these to stop the scan from following, scanning, and counting certain links and pages.

### Link excludes

These allow you to specify patterns to exclude specific links from the scan by giving instructions to the crawler to ignore all URLs that match the pattern. The link is still recorded as present on the page, but we will not check it or follow it.

For more information, see the user guide articles:

*   [Path constraints and link exclusions](https://docs.acquia.com/acquia-optimize/docs/admin/configure-scans/path-constraints-and-link-exclusions?cid=1a105)
*   [Ignored links](https://docs.acquia.com/acquia-optimize/docs/features/quality-assurance/links/automatic-ignored-links?cid=cd416).

### **Path constraints**

These give you the option to control the pages that are covered by the scan. With a r_egular expression_, you can include or exclude content from the scan.

Example 1:

If you only want to scan the news section of your homepage, `https://domain.tld/new`s, add a constraint with 

    ^/news 

to force the crawler to crawl content there.

Important

The start URL needs to be a part of the constraints. This can be done is either of the following ways:

*   Change the start URL to `https://domain.tld/news`.
*   Add an extra constraint with `^/$` - that includes the frontpage.

Example 2:

If you have a result page and you want to remove all the results from the scan. The results page could look something like:

    https:/ /domain.tld/search/results?query=test 

It is possible to create a negative constraint to do this. It could look something like: `!search/results?`

For more information, see the user guide article:

[Path constraints and link exclusions](https://docs.acquia.com/acquia-optimize/docs/admin/configure-scans/path-constraints-and-link-exclusions?cid=1a105).

Canonical links
---------------

Canonical links are a way to indicate the preferred version of a webpage when there are duplicate or similar versions under different URLs. Canonical links are mostly used to help search engines understand which version to index and display in the search results, which improves SEO.

The following external resources provide more information about canonical links:

*   Google: [What is Canonicalization](https://developers.google.com/search/docs/crawling-indexing/canonicalization)
*   Google: [How to specify a canonical with rel="canonical" and other methods](https://developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls).

Example uses of canonical links:

Example 1: Print version of a page

For example this URL:

    https:/ /domain.tld/page_id=32 

When a print version of this page is created, many CMS systems add a print parameter within the URL that looks something like:

    https:/ /domain.tld/page_id=32?print=yes

In many real-world cases, the content of these two pages is effectively or exactly the same. In the example above, these URLs register as two separate pages for web crawlers or search engines.

A common way to address this is to add a canonical tag on the print page like this:

    https:/ /domain.tld/page_id=32?print=yes 

That points to the primary page:

    https:/ /domain.tld/page_id=32

To do this, insert a tag into the head section of the HTML, here is an example: 

    ​<link rel="canonical" href="https://domain.tld/page_id=32"> 

This canonical tag tells web crawlers and search engines that:

*   These pages contain duplicate content.
*   The URL without the print parameter is the main version of the page.

Example 2: Sortable lists

Another example is a page that displays a sortable list of items, for example, a news site with a list of articles or a store with a list of products.

Assume that the https:/ /domain.tld/list contains a list where you can sort by color, price, or size. The content contained on the page remains the same, but each sorted version of the page has a unique URL such as the following:

https:/ /domain.tld/list?sort=**colors**   
​https:/ /domain.tld/list?sort=**price**   
​https:/ /domain.tld/list?sort=**size**

In this case, you could add a canonical link to the main list like this:

`<link rel="canonical" href="https://domain.tld/list">`.

This tag indicates that the default sort version of the page should be considered the primary version.

Add this canonical tag to alert search engines or web crawlers (like Web Governance) that each of these URLs is actually a link to a page that has the same content.

You can read more about canonical links on the following external pages:

*   [https://developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls](https://developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls)
*   [https://en.wikipedia.org/wiki/Canonical\_link\_element](https://en.wikipedia.org/wiki/Canonical_link_element).

It is also possible to configure canonical tags to exclude URLs that point to identical content.