A member describes a case of near-identical URLs found in Google search results. GoogleGuy responds.
[Link to quote
Interesting case, killroy. Thanks for passing it back to me via stickymail. I think the difference is that one url has a trailing slash and one url doesn't.
I can practically hear folks asking "But isn't www.foo.com/path the same as www.foo.com/path/"? In practice, they almost always are the same, but technically according to the HTTP standards I don't think that they have to be the same.
I've got a few minutes free, so let's go into detective mode for a bit. Most webservers are configured to append the "/" automatically via a 301 redirect. For example, if you try to fetch www.google.com/webmasters, our web server will do a permanent 301 redirect to the canonical page, which is www.google.com/webmasters/ (note the trailing slash).
Just to illustrate the point, let's use the same imitate-the-browser-using-telnet technique that I posted about in
It's a really good debugging technique. What actually happens when you request a directory without the trailing slash looks like this:
telnet www.google.com 80
Connected to www.google.com (22.214.171.124).
Escape character is '^]'.
GET /webmasters HTTP/1.0
HTTP/1.0 301 Moved Permanently
Date: Sun, 03 Aug 2003 22:11:43 GMT
The document has moved
So the server basically said "Instead of fetching this page, try it again with a trailing slash"? That's why it's ever-so-slightly faster if you go to "www.webmasterworld.com/forum3/" instead of "www.webmasterworld.com/forum3"--because your browser doesn't have to get the redirect and do another fetch of the new url.
So to make a long story not quite as long, I noticed that the webserver for this domain returns a 301, but it looks like it doesn't add the trailing slash correctly in either the "Location:" field in the HTTP headers or in the text of the page. So that's the main thing I'd check on your web server.
On the other hand, even if we get duplicate content for two nearly identical urls, we have heuristics that normally detect that sort of thing. That's why the search collapses those two urls together unless you do "&filter=0". So the duplicate content filter was cleaning things up in this case. I think if you switch the webserver to do the 301 to the trailing-slash url, you should be in good shape in the future too.