Thursday 11 April 2013

Mitigating Mixed Signals: Effectively Consolidating Paginated URLs

Pagination has long been a topic of discussion within the SEO community. Be it an e-commerce product category, an aggregation of blog posts that meet a given criteria, or a lengthy news article, there are countless examples of pagination across the web. Why paginate? Well, from a usability standpoint, it makes perfect sense. No endless scrolling for users. Faster page load time. Fewer links on the first page (and subsequent pages) in the sequence. A better experience for mobile users. The list goes on. Notice the overlap between user experience and SEO best practices in pagination.
While many have been down the cobblestone alley of pagination issue remediation, I thought it might be beneficial to take a look at the different signals we use (correctly and incorrectly) to tell Google that our content is paginated. Specifically, we'll be looking at the rel="prev" and rel="next" link elements, the implementation of the rel="canonical" link element, and the use of URL parameter suggestions in Google Webmaster Tools. Let's have at it.

Pagination. Simplified.

With Google's September 2011 announcement of the rel="prev" and rel="next" link elements came a collective sigh of relief from SEOs the world over. Rather than leave pagination up to Googlebot's interpretation, we could now provide some direction in the code.
While the implementation here is likely burned into the back of your brain, let's take a second to look at a use case. The fictitious URL we'll be working with (throughout this post) is http://www.shinybucketsthatcarrydreams.com. Don't waste your time struggling to find a reference (I promise you; there is none). Let's say that this is an e-commerce site that sells buckets of very rare varieties, and buckets are segmented into categories. One such category - we'll say 'Happy Buckets for the Forlorn' - has over 40 different types of bucket. That's a lot of buckets! And rare buckets, no less.
An example of one of the shiny red buckets offered by the site owners in our pagination example.
Rather than displaying all of these happy, hope-instilling buckets on one category page, the bucketeers (who manage their own site) decide that they'd like to display 10 buckets per page within the category. For now, they want to keep things simple - no advanced sorting options, view all page, etc. Rather than creating additional directories for each page (e.g., /page/2), they decide to use the page URL parameter to paginate the category. Since there are 40 buckets, with 10 buckets to be displayed on each page, they'll need four pages in total.
As bucketeers who are mindful of SEO (as most bucketeers are), these guys decide to implement the rel="prev" and rel="next" link elements in the head of each page. Let's see how it's done.

Page 1 - http://www.shinybucketsthatcarrydreams.com/happy-buckets
Notice that the first page (the top-level category page) URL doesn't include the page query parameter. Frankly, this is because it doesn't need it. We, the users, and the backend of the website understand that this is the first page in the sequence. Remember, do what you can to keep those URLs squeaky clean!
Okay, onto the HTML. In the <head>, we should encounter the following link element:
<link rel="next" href="http://www.shinybucketsthatcarrydreams.com/happy-buckets?page=2"/>
The presence of this element, along with the absence of a rel="prev" link element, tell crawlers that this is the first page in a series of paginated URLs, the next URL in the sequence being that showcased in the href attribute.
Page 2 - http://www.shinybucketsthatcarrydreams.com/happy-buckets?page=2
In the <head> of the second page in the series, we are greeted with:
<link rel="prev" href="http://www.shinybucketsthatcarrydreams.com/happy-buckets"/>
<link rel="next" href="http://www.shinybucketsthatcarrydreams.com/happy-buckets?page=3"/>
Here, we have two link elements - one with rel="prev"; the other with rel="next". In this case, the rel="prev" link element points to the page before in the series, while the rel="next" link element points to the subsequent page in the series.
Page 3 - http://www.shinybucketsthatcarrydreams.com/happy-buckets?page=3
It looks like more of the same on Page 3:
<link rel="prev" href="http://www.shinybucketsthatcarrydreams.com/happy-buckets?page=2"/>
<link rel="next" href="http://www.shinybucketsthatcarrydreams.com/happy-buckets?page=4"/>
Notice that our href attribute in the rel="prev" link element now points to a URL with the page parameter. On Page 2, this wasn't the case, because the prior page (Page 1) was the top-level category page.
Page 4 - http://www.shinybucketsthatcarrydreams.com/happy-buckets?page=4
It is on Page 4 that we find our finish:
<link rel="prev" href="http://www.shinybucketsthatcarrydreams.com/happy-buckets?page=3"/>
Since there is no next page in the sequence, Page 4 only requires the rel="prev" link element. Pretty intuitive, right?

For interior pages - or those pages that aren't the first or last - it might help to think of your link element implementation in terms of building a good, succulent burger, where the page you're working on is the burger, and the link elements are the fresh sesame bun halves. Your goal is to "sandwich" the current page with your link elements. Okay, admittedly, this might be a stretch; however, there's nothing wrong with developing some strange association between pagination and grilled meat. Yum.

Canonical Conundrum

Okay, so, our bucketeers have the correct link elements implemented, and Google is eating it up. Rather than indexing each page in the sequence as its own entity, Google consolidates the series' indexing properties and returns the most relevant page (the first page). All is well in Bucketville, USA. As fate would have it, though, there comes a time when the bucketeers see change on the crimson horizon. They come across another successful bucket selling website and notice that, on this site, users can sort each page by bucket color. Instantly, they fall in love with the idea and decide that it's high time that their site had a sorting feature.
Sorting by color makes perfect sense. They decide to use the sort URL parameter, with red, blue, and green as its possible values. When a user selects a bucket color (blue, for example), the page will reload with the buckets sorted by color, with the blue buckets appearing at the top. Additionally, this will add ?sort=blue to the URL. Some URLs that color-sorted pages in our series might generate are:
  • http://www.shinybucketsthatcarrydreams.com/happy-buckets?page=2&sort=green (Page 2)
  • http://www.shinybucketsthatcarrydreams.com/happy-buckets?sort=red (Page 1)
  • http://www.shinybucketsthatcarrydreams.com/happy-buckets?page=4&sort=blue (Page 4)
At this point, it is important to note that, while the page parameter changes the content that appears at each URL, the sort parameter simply reorders the content. This is an important distinction to make when dealing with URL parameters and canonicalization. In fact, it's something that the bucketeers need to consider immediately. If a user happens to link to a sorted page (thats URL contains the sort parameter), the webmasters could have a duplicate content problem. While the rel="next" and rel="prev" link elements signal that the page parameter causes pagination (and thus different content), there is no indication as to how Google should handle the sort parameter. This is where rel="canonical" proves useful in dealing with pagination.
Let's look at a couple of pages in our series and see how the rel="canonical" link element can be utilized to combat duplicate content issues.

Page 1 (Sorted by Red) - http://www.shinybucketsthatcarrydreams.com/happy-buckets?sort=red
Okay, so, we've decided that the sort parameter doesn't actually change the page's content, but rather reorders it. Thus, the content on a page at a URL with the parameter will be the same as the content on a page at that same URL without the parameter. Smell a canonical implementation? I do. Let's see how the rel="canonical" link element compliments our rel="next" link element.
<link rel="canonical" href="http://www.shinybucketsthatcarrydreams.com/happy-buckets"/>
<link rel="next" href="http://www.shinybucketsthatcarrydreams.com/happy-buckets?page=2&sort=red"/>
Our rel="canonical" link element tells Google that the page at this URL (with the sort parameter) is a canonical version of the page at the top-level category URL (in the href attribute). While this is a general best practice when it comes to canonical URLs, it will prove to be even more useful when we get to dynamically generated rel=next" and rel="prev" link tags. Let's explore this concept now.
Note: In our case, the sort parameter reorders the content on a page-by-page basis, rather than the entire category of pages. This means that the content on each page - though sortable - will not change. If the sort parameter were to reorder the entire category - listing buckets of the chosen color on Page 1 and other-colored buckets on subsequent pages - we'd need to treat it as a parameter that changes page content and include it in the href URLs of our canonical link elements.
Page 2 (Sorted by Green) - http://www.shinybucketsthatcarrydreams.com/happy-buckets?page=2&sort=green
Again, we'll want to check out our implementation of the rel="canonical" link tag with rel="next" and rel="prev". At this juncture, however, we're going to assume that the "next" and "prev" link tags are being generated dynamically. So, essentially, when the page reloads, it will do so with these link tags set to:
<link rel="canonical" href="http://www.shinybucketsthatcarrydreams.com/happy-buckets?page=2"/>
<link rel="prev" href="http://www.shinybucketsthatcarrydreams.com/happy-buckets?page=1&sort=green"/>
<link rel="next" href="http://www.shinybucketsthatcarrydreams.com/happy-buckets?page=3&sort=green"/>

Notice that, while the value of the page parameter changes accordingly within the href attributes, the value of the sort parameters in the rel="prev" and rel="next" link elements stay the same. This is where our rel="canonical" implementation pays off. If we were to load either of these URLs, we'd see that the preferred version of the page exists at the same URL, minus the sort parameter. Effectively, we're sending the search engines that support these tags the same signals. Yipee!
Note: This implementation of rel="next" and rel="prev" - wherein all parameters from the working URL are included in the href URLs - is actually recommended. In this case, given that the working URL contains a sort parameter, it would be inadvisable to chop that parameter from the href URLs.

Exercise care and caution when using the canonical link tag with paginated URLs. Unless you're utilizing a view all page, stay away from pointing all pages in a series to a single page. Doing so will result in the majority of your content (from all pages except the preferred version) being excluded from Google's index. This is not consolidation. It's oversimplification and cannibalization.
For more on using rel="canonical" with paginated URLs, check out Google's Pagination documentation. Maile Ohye's video (at the top) is especially helpful!

URL Parameters, People

Buckets of rain become buckets of golden sunshine, and the bucketeers are seeing major success with their new sorted setup. As a final measure of caution, they decide to check their working URL parameters in Google Webmaster Tools. They look specifically at the page and sort parameters, which have already been added to GWT. Their goal is to make sure that the "suggestions" that they're offering Google with their URL parameter setup coincide with the signals being sent via the rel="canonical", rel="prev", and rel="next" link elements.
First, they look to edit the page parameter setup. Let's take a peak at the settings.




Pretty simple. We're indicating that the page parameter changes page content by pagination, and that Googlebot should crawl every URL for which the parameter value differs as a unique URL. The 'Let Googlebot decide' (default) option would probably work here, as well. Again, purvey caution when implementing these parameter recommendations. A small misstep can create massive indexation issues.
Now, let's move on to the sort parameter settings - perhaps a bit less intuitive than those for page.


Settings for sort parameter in GWT.


Since the sort parameter does reorder the page content, we don't want to lie to Google and indicate otherwise. However, as is indicated by the signals we've discussed thus far, we don't want Googlebot to crawl URLs with the sort parameter as unique URLs. While the sort option is great for users, serving the same content with specific sorts enabled in the search engines isn't something we're aiming to do. So, in this case, we recommend that Googlebot crawls no URLs that contain the sort parameter.
This setting, along with a disallow in our robots.txt for URLs that contain the sort parameter, are the primary preventative measures that we can take against these URLs being indexed. If Google should happen to forego one of these recommendations, we have other signals in place to provide direction.
Note: It is incredibly important that you first consider the where and what of your site's URL parameter configuration. Where is the given parameter used throughout the site? What does it do when used in these different circumstances? In our case, the sort parameter is used exclusively to sort our buckets by color, so we're safe in setting this recommendation. However, if it were to be used elsewhere in some different way - say, a way that actually changes page content - we'd have to be more careful in implementing our recommendations.
--
In our efforts to consolidate paginated URLs, it is vital that we avoid sending mixed signals to Google and other search engines. By using the rel="canonical", rel="next", and rel="prev" link elements in cooperation with GWT URL parameter settings, we can accomplish just this. Remember, every case will be a tad different. Be careful; but don't be afraid. There has never been a better time to cross "deal with paginated duplicate content" off your SEO bucket list.

Reference :- http://www.seomoz.org/ugc/mitigating-mixed-signals-effectively-consolidating-paginated-urls

EBriks Infotech :- SEO Company India

No comments:

Post a Comment