The Problem: Unwanted URLs in Google Index
Managing indexed URLs is a crucial part of SEO, especially when Google indexes unwanted pages like dynamically generated or shopping cart URLs. Google's John Mueller recently provided expert advice on handling such issues efficiently.
An SEO audit revealed that more than half of a client’s 1,430 indexed pages were either paginated URLs or ‘add to cart’ URLs. These URLs often contain query parameters and look like this:
example.com/product/page-5/?add-to-cart=example
Despite using the rel=canonical tag to suggest the correct URL for indexing, Google continued to index the unwanted pages. This illustrates a common SEO issue: Google treats canonical tags as hints, not strict directives.
Proposed SEO Solution: Noindex & Robots.txt
To fix this, the SEO suggested:
- Applying a noindex tag to all unwanted pages.
- Blocking these URLs in robots.txt once they are deindexed.
However, John Mueller had a different take on this approach.
John Mueller’s SEO Advice
Mueller emphasized that blindly applying a general fix is ineffective. Instead, he recommended analyzing the URLs for patterns and implementing a specific solution tailored to the website. Here’s his approach:
Block ‘Add to Cart’ URLs Using Robots.txt
Since these URLs serve no purpose in search results, blocking them at the crawl level is ideal.
Address Pagination and Filtering Issues
If indexed URLs are a result of faceted navigation, site owners should consult Google’s official documentation on handling URL parameters.
Understand Why Google is Indexing These URLs
Investigating why Google is indexing dynamic URLs can reveal underlying issues related to the shopping cart platform.
Why Google Indexes URLs with Query Parameters?
Google sometimes indexes pages with query parameters due to:
- Poor internal linking structure.
- Lack of proper robots.txt implementation.
- Canonical tags that Google chooses to ignore.
Best Practices to Prevent Unwanted URL Indexing
Use Robots.txt to Block Crawling
- Example:
User-agent: * Disallow: /*?add-to-cart=*
Implement Meta Noindex for Non-Essential Pages
- Example:
<meta name="robots" content="noindex, follow">
Manage URL Parameters in Google Search Console
- Navigate to Legacy Tools > URL Parameters and specify parameter handling.
Use Internal Linking Wisely
- Avoid linking to URLs with query parameters.
Conclusion
Managing indexed URLs is essential for maintaining a clean, high-quality website structure. Instead of relying solely on canonical tags, SEO experts should use a combination of robots.txt, noindex tags, and Google Search Console settings. By implementing a tailored approach, websites can ensure only relevant pages appear in search results.
For further details, read John Mueller’s official advice.
Credits
- Article by Roger Montti, originally published on Search Engine Journal.