Categories
Archives
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- November 2008
Blogroll
Useful Sites
Friendly Links!
(Add your link here!)Tags
Posted by randfish
Lately I’ve been surprised to hear concerns from a number of SEOs that using the canonical URL tag on the canonical version of the page can somehow cause problems. When I’ve talked to folks about it, there seems to be confusion that only duplicates should use the rel="canonical" specification and the original must remain rel="canonical"-free. This isn’t the case.
Let’s look at a few diagrams to help explain:

This is the standard way rel=canonical is employed. Different versions of a page, whether on your own site, on partner sites, or places you’re licensing content (note: this is an update Google launched on Dec. 17th, 2009) can all reference back to the original to help tell the search engines where to find that piece. However, it’s also perfectly OK to do this:

Looking through Google’s blog post on the subject, this isn’t explicitly stated. However, you can see that even the example website, Wikia, employs this practice on the page Google points out. You can also see Googler Maile Ohye answering a comment on this:
@Wade: Yes, it’s absolutely okay to have a self-referential rel="canonical". It won’t harm the system and additionally, by including a self-reference you better ensure that your mirrors have a rel=”canonical” to you.
Maile’s got really good advice here. If you run into situations where third parties are referencing your posts and appending strings of data to the URL, it can be really helpful to have the canonical URL tag on these by default. In fact, we’ve worked with many companies recently who found it helpful to employ sitewide as a best practice, just to prevent future iterations or less SEO savvy development from reproducing versions of the page that didn’t contain the rel=canonical and potentially losing link juice / causing canonicalization issues.
One last piece – it’s a really, really good way to make sure Google indexes the http rather than https version of your page (and counts link juice to the proper one). This had historically been a royal pain in the butt for many SEOs, and we’ve heard enough positive stories now to feel confident recommending it.
Welcome to 2010! Hope everyone had a great holiday break
What Makes a Quality URL?
12/03/09
In setting up a website, the URL is one of the most important elements. As Frank Bauer of Web Tuning Garage points out, webmasters need to think about the URL before they get their site up and running.
In an interview with WebProNews, Bauer advises webmasters to consider the following factors:
- What makes a quality URL?
- Canonicalization issues
- Site taxonomy
- Keywords
One of the most problematic areas with URLs is with canonicalization. Bauer said the most difficult canonicalization issue is “www” versus “non-www.”
As a way to avoid other obstacles, Bauer advises against exposing too much of your technology in regards to extensionless URLs, file names, and directories.
Ultimately, a domain needs to possess usable, typeable, and guessable qualities. Learn more about quality URLs in the above interview with Frank Bauer.
Posted by Nick Gerner
As we rapidly approach the end of 2009 and opening of 2010, we’ve got a much anticipated index update ready to roll out gang. Say it with me "twenty-ten". Oh yeah, I’m so gonna get a flying car and a cyberpunk android
…Ahem. I thought this would be a great time to take a look back at the year and ask, "where did all those pages go?" Being a data-driven kind of guy, I want to take a look at some numbers about churn, freshness and what it means for the size of the web and web indexes over the last year, and the hundreds of billions, indeed trillion plus urls we’ve gotten our hands on.
This index update has a lot going on, so I’ve broken things out section by section:
- Analysis of the Web’s Churn (or why having ten trillion URLs isn’t very useful)
- Canonicalization, De-Duping & Choosing Which Pages to Keep
- Statistics on our December Linkscape Update
- New Updates to the FREE SEOmoz API (and a 90% price drop on the paid API)
An Analysis of the Web’s Churn Rate
Not too long ago, at SMX East, I heard Joachim Kupke (senior software engineer on Google’s indexing team) say that "a majority of the web is duplicate content". I made great use of that point at a Jane and Robot meet up shortly after. Now, I’d like to add my own corollary to that statement: "most of the web is short-lived".

After just a single month, a full 25% of the URLs are what we call "unverifiable". By that I mean that the content was either duplicate, included session parameters, or for some reason could not be retrieved (verified) again (404s, 500s, etc.). Six months later, 75% of the tens of billions of URLs we’ve seen are "unverifiable" and a year later, only 20% qualifies for "verified" status. As Rand noted earlier this week, Google’s doing a lot of verifying themselves.
To visualize this dramatic churn, imagine the web six months ago…
Using Joachim’s point, plus what we’ve observed, that six-month old content today looks something like this:

What this means for you as a marketer is that some of the links you build and content you share across the web is not permanent. If you engage heavily with high-churn portions of the web, the statistics you monitor over time can vary pretty wildly. It’s important to understand the difference between getting links (and republishing content) in places that will make a splash now, but fade away, versus engaging in lasting ways. Of course, both are important (as high-churn areas may drive traffic that turns into more permanent value), but the distinction shouldn’t be overlooked.
Canonicalization, De-Duping & Choosing Which Pages to Keep
Regarding Linkscape’s indices, we capture both of these cases:
- We’ve got an up-to-date crawl including fresh content that’s making waves right now. Blogscape helps power this, monitoring 10 million+ feeds and sending those back to Linkscape for inclusion in our crawl.
- We include the lasting content which will continue to support your SEO efforts by analyzing which sites and pages are "unverifiable" and removing these from each new index. This is why our index growth isn’t cumulative — we re-crawl the web each cycle to make sure that the links + data you’re seeing are fresh and verifiable.
To put it another way, consider the quality of most of the pages on the web, as measured, for instance, by mozRank:
I think the graph speaks for itself. The vast majority of pages have very little "importance" as defined by a measure of link juice. So it doesn’t surprise me (now at least) that most of these junk pages are disappearing after not too long. Of course, there are still plenty of really important pages that do stick around.
But what does this say about the pages we’re keeping? First of let’s take out any discussion of the pages that we saw over a year ago (as we’ve seen above, there’s likely less than 1/5th of them remaining on the web). In just the past 12 months, we’ve seen between 500 billion and well over 1 trillion pages depending on how you count it (via Danny at Search Engine Land).
So in just a year we’ve provided 500 billion unique urls through Linkscape and the Linkscape powered tools (Competitive Link Finder, Visualization, Backlink Analysis, etc.). And what’s more, this represents less than half of the URLs we’ve seen in total, as the "scrubbing" we do for each index cuts approx. 50% of the "junk" (including canonicalization, de-duping, and straight tossing for spam and other reasons). There’s likely many trillions of URLs out there, but the engines (and Linkscape) certainly don’t want anything close to all of these in an index.
Linkscape’s December Index Update:
From this latest index (compiled over approx. the last 30 days) we’ve included:
- 47,652,586,788 unique URLs (47.6 billion)
- 223,007,523 subdomains (223 million)
- 58,587,013 root domains (59.5 billion)
- 547,465,598,586 links (547 billion)
We’ve checked that all of these URLs and links existed within the last month or so. And I call out this notion of "verified" because we believe that’s what matters for a lot of reasons:
- Our own research on how search engines rank documents
- Your impact on the web (as in traditional marketing) and ability to compare progress over time
- Sharing reliable, trust-worthy data with customers, both for self and competitive analysis
- Measuring progress and areas for improvement in search acquisition and SEO
I hope you’ll agree. Or, at least, share your thoughts
New Updates to the Free & Paid Versions of our API
I also want to call a shout out to Sarah who’s been hard at work on repackaging our site intelligence API suite. She’s got all kinds of great stuff planned for early the coming year, including tons of data in our free APIs. Plus she’s dropped the prices on our paid suite by nearly 90%.
Both of these items are great news to some of our many partners, including:
- Buzzstream – a tool for social media, PR and link management
- Brandwatch – a reputation monitoring tool
- Grader.com – Hubspot’s popular site analysis tool
- Quirk’s Search Status Bar
- And at least three of these top "10 Link Building Tools for Tracking Inbound Links"
Thanks to these partners we’ve doubled the traffic to our APIs to over 4 million hits per day, more than half of which are from external partners! We’re really excited to be working with so many of you.
A Dozen Don’ts for SEOs
07/02/09
Posted by randfish
I’m not always a fan of Guy Kawasaki’s work, but really enjoyed his post on the OPEN Forum – A Dozen Don’ts for Entrepreneurs. I thought I’d take a stab at replicating it with some of my biggest warnings for those in our field.
For the list below, the word "clients" is interchangeable with "marketing manager" or "executive team" for in-house SEOs.
- Don’t Create False Expectations
Clients are just like everyone else – when you exceed their expectations, they love you. When you disappoint, they’re angry. Make it easy for yourself and don’t oversell. If anything, undersell your abilities to do great things and let them be surprised. It’s a hard thing to do, particularly in a competitive bidding environment, but humility and hard work often shine through in presentations and good clients will see that and honor it.
_ - Don’t Ignore Analytics
Website analytics, both visitor traffic and third party metrics, are important parts of SEO. When things are going well, even if best practices aren’t being followed, it can be wise to match up data and trends to see what’s made a real difference. Don’t undertake an SEO project unless you have at least the essential data points (this also comes in handy once changes have been implemented and your work starts to have an impact).
_ - Don’t Always Take Your Client at Their Word
If you talk to lots of clients, you’ll find that none of them have ever spammed the engines, bought a link, accidentally cloaked for Googlebot or hidden text, yet the statististics tell another story. Never assume your clients are being dishonest, but always watch out for activities they might not be aware of (or might not have realized were problematic). This goes beyond just white and black hat – we had a client who thought they had a couple dozen active domains; turns out they had nearly a hundred – canonicalization alone has been a big project and a big return.
_ - Don’t Get Into Projects with People You Don’t Like
If ever you get a "funny feeling" about a client, move on if you can possibly afford it. Some people just don’t click together, and when interpersonal relationships aren’t working, projects have a way of not working out, either. It’s always better to get out before something’s signed than after.
_ - Don’t Give an Unqualified Answer Unless You’re Extremely Certain You’re Right
If you’ve been reading SEOmoz lately or hearing me speak at conferences, you’ll notice that my advice comes with a lot more caveats than it used to. It’s been a tough lesson, but there’s very rarely a "this is ALWAYS better than that" in the field of SEO. Exceptions abound, so cage your language accordingly.
_ - Don’t Confuse SEO & Sales
If your client comes to you wanting to drive sales with SEO, make sure they’re keenly aware of the multiple responsiblities inherent in such a request. Yes – SEO can drive lots of high quality, targeted traffic at the perfect moment for capturing the sale. But NO – SEO cannot convert that visit into dollars. If the website sucks at turning visitors into leads, do the right thing and recommend CRO (Conversion Rate Optimization) before they dive into SEO.
_ - Don’t Rest on Your Laurels
If you’re not paying attention in the SEO world, even for just a few weeks, you can miss massive changes. Look at June! We’ve had a reversal of position on nofollow and Javascript links from Google, a new engine/algorithm/brand from Microsoft, adoption of rich text formatting in the SERPs, evidence that header tags may not be as valuable as we thought and data suggesting that alt attributes are highly correlated with good rankings. Stay ahead of the curve and devote some resources to industry news – you owe it to your clients and yourselves.
_ - Don’t Undervalue Your Work
SEO is hard work. For every consulting hour, there’s days of research, testing, reading, surfing and experimenting. Don’t undersell your services or accept that what you do doesn’t provide tremendous value. If you’re being undervalued now, consider how terrificly trackable SEO really is and show them the data. It’s almost always on your side.
_ - Don’t Believe Everything You Read
Yes, even here at SEOmoz! We certainly try our best to provide high quality, accurate information, as do many other great sites on SEO, but no one is right 100% of the time, and, more importantly, not every piece of advice is applicable for every business or every situation.
_ - Don’t Underestimate Dev Contributions
I was recently asked "what’s the biggest roadblock to SEO," and didn’t need to think for 10 seconds before quoting Mr. Ballmer’s infamous adage "Developers! Developers! Developers!" If you get bandwidth cycles for SEO projects, use them wisely. If the developers have made critical SEO errors, don’t be quick to criticize – you’ll make enemies, and, oftentimes, be guilty of hypocrisy. Stay humble, prioritize the big pieces and make sure you have the resources before you commit to improving traffic.
_ - Don’t Overstate Your Influence or Abilities
Just because you have the ear of some important minds at Google/Yahoo!/Facebook/etc. doesn’t mean you can influence change within these large organizations. I’ve heard a lot of stories from companies that worked with SEOs of how they promised to get their penalty lifted or special treatment from an engine because they got a response to an email they sent to a search engineer. Perhaps an even better rule is – don’t promise something you can’t personally control and deliver.
_ - Don’t Get Overconfident and Dismiss Other Marketing Channels
OK, yes – SEO rocks. But don’t forget how valuable other marketing activities like email, PPC, CRO, affiliate programs, even display advertising can be for the right scenario. Once you’ve found the SEO hammer, it’s easy to see every problem as a nail – I’ve certainly been guilty of it. If you can resist, think holistically and provide the best answer from a strategic (rather than tactical) level, you’ll become even better and more valuable to your clients.
Your turn – any "don’ts" you’d recommend to fellow SEOs?
p.s. If you haven’t read the whole Malcolm Gladwell vs. Chris Anderson with Seth Godin weighing in thing, it’s pretty worthwhile
Posted by randfish
It’s been a long time since we had a differential diagnosis post here on SEOmoz, but we’ve been getting lots of comments and emails requesting some mysteries, so here goes:
#1 – Who, Exactly, is Awesome, and Why?
I agree with the sentiment of the second result, Google is an awesome product, but this ranking is very bizarre given the content and links pointing to this page/site. The other engines certainly don’t agree that it belongs anywhere near the top of the SERPs.
#2 – Bing and the Hash
In the past, search engines have been known to ignore the hash in URLs and treat internal anchors as invisible to their link graph. While Bing has done a lot of things right and earned Microsoft some of the best praise they’ve received in years on the search front, treating internal anchors as separate URLs could cause a lot of problems. In this example, it’s a relevancy issue, but in other cases it could seriously screw with canonicalization in the link graph (and force webmasters to re-think their use of the hash in URLs).
#3 - Where are the Cheap Books?
The Half.Ebay.com URL is an odd one to have at the top of these results. Not only is there a much more relevant "books" page at http://books.half.ebay.com, there’s also no mention of the word cheap anywhere here (and precious little anchor text pointing to this page with that term either). It almost makes me wonder if Google’s doing something with synonyms to rank this page here.
OK – now it’s your turn to solve the mysteries above. Please reward valiant efforts and great insight with thumbs up!



