Loading...


Related Products


How can I exclude links and pages from the Acquia Optimize scan?

Answer

You can exclude certain links and URL paths from a scan and exclude specific pages as well. Use these to stop the scan from following, scanning, and counting certain links and pages.

Link excludes

These allow you to specify patterns to exclude specific links from the scan by giving instructions to the crawler to ignore all URLs that match the pattern. The link is still recorded as present on the page, but we will not check it or follow it.

For more information, see the user guide articles:

Path constraints

These give you the option to control the pages that are covered by the scan. With a regular expression, you can include or exclude content from the scan.

Example 1:

If you only want to scan the news section of your homepage, https://domain.tld/news, add a constraint with 

^/news 

to force the crawler to crawl content there.

Example 2:

If you have a result page and you want to remove all the results from the scan. The results page could look something like:

https:/ /domain.tld/search/results?query=test 

It is possible to create a negative constraint to do this. It could look something like: !search/results?

For more information, see the user guide article:

Path constraints and link exclusions.

Canonical links

Canonical links are a way to indicate the preferred version of a webpage when there are duplicate or similar versions under different URLs. Canonical links are mostly used to help search engines understand which version to index and display in the search results, which improves SEO.

The following external resources provide more information about canonical links:

Example uses of canonical links:

Example 1: Print version of a page

For example this URL:

https:/ /domain.tld/page_id=32 

When a print version of this page is created, many CMS systems add a print parameter within the URL that looks something like:

https:/ /domain.tld/page_id=32?print=yes

In many real-world cases, the content of these two pages is effectively or exactly the same. In the example above, these URLs register as two separate pages for web crawlers or search engines.

A common way to address this is to add a canonical tag on the print page like this:

https:/ /domain.tld/page_id=32?print=yes 

That points to the primary page:

https:/ /domain.tld/page_id=32

To do this, insert a tag into the head section of the HTML, here is an example: 

​<link rel="canonical" href="https://domain.tld/page_id=32"> 

This canonical tag tells web crawlers and search engines that:

  • These pages contain duplicate content.
  • The URL without the print parameter is the main version of the page.

Example 2: Sortable lists

Another example is a page that displays a sortable list of items, for example, a news site with a list of articles or a store with a list of products.

Assume that the https:/ /domain.tld/list contains a list where you can sort by color, price, or size. The content contained on the page remains the same, but each sorted version of the page has a unique URL such as the following:

https:/ /domain.tld/list?sort=colors 
​https:/ /domain.tld/list?sort=price 
​https:/ /domain.tld/list?sort=size

In this case, you could add a canonical link to the main list like this:

<link rel="canonical" href="https://domain.tld/list">.

This tag indicates that the default sort version of the page should be considered the primary version.

Add this canonical tag to alert search engines or web crawlers (like Acquia Optimize) that each of these URLs is actually a link to a page that has the same content.

You can read more about canonical links on the following external pages:

It is also possible to configure canonical tags to exclude URLs that point to identical content.

Did not find what you were looking for?

If this content did not answer your questions, try searching or contacting our support team for further assistance.

Back to Section navigation