Google Forum

Double Your Adwords Profits in 7 minutes!


$7 Secrets
People are making thousands of dollars using the techniques and scripts included with this popular report. And it costs only $7! [Read more...]

AdWords Secrets
Free 5-day course that can help you make money using Google AdWords.

Articles for 25 Cents Each
Get 400 exact keyword optimized articles, delivered to your inbox, every month. A new and inexpensive way to build keyword rich web sites that can make money, month after month.

Trade Links with 5,000+ Sites
This is not an automated link system. The Add URL Directory is a directory that lists over 5,000 "add url" pages that have forms that you can use to add your site and exchange links. The directory is divided into categories, helping you find sites in your topic area.

Backlink Analyzer
Automatically analyze the anchor text of all of your backlinks. Analyze your competitors' backlinks to see how you compare in the anchor text department.

SEO Web Site Templates
Web site templates that have been designed specifically for search engine optimization (SEO).

My sponsered child, Hama from Niger, Africa
A portion of the proceeds from this site help sponsor Hama from Niger. Learn more about Child Sponsorship.

Unrecognized file type by Google


Author Message
GoogleGuy Says

PostPosted: July 30, 2003 11:49 AM 

Importance: Medium

A member reports a problem with their site listed in Google as an "unrecognized file type". GoogleGuy offers some troubleshooting advice.

GoogleGuy Says: [Link to quote]

Hmm. If I had to take a guess, I'd look for a misconfigured webserver. Just a shot in the dark, but I would guess that the webserver isn't returning text/html as the content type.
Here's how you can debug it yourself from Unix/Linux--you basically imitate a web browser or spider. Here's an example of fetching a page by hand from Google:

telnet www.google.com 80
Connected to www.google.com.
Escape character is '^]'.
GET / HTTP/1.1
Host: www.google.com
(hit return once or twice until you get a response, which will look like the text below:)

HTTP/1.1 200 OK
Date: Wed, 30 Jul 2003 16:38:18 GMT
Cache-control: private
Content-Type: text/html <--- this line says what type of file it is.
Server: GWS/2.1
Content-length: 2691

Now if your page is www.foo.com/user1/test.html, you would type
telnet www.foo.com 80
and then do
GET /user1/test.html HTTP/1.1
Host: www.foo.com

and see what the webserver returns back. This is all that a crawler does, except it also looks for links and follows them several billion times. ;)

By the way, the "Host:" line is what allows an ISP to support virtual hosting--the bot says which domain it wants to fetch the page from. That's what allows an ISP to host many domains on one IP address. You can also use this technique to verify that an ISP is doing virtual hosting correctly. If you ask for pages from foo.com and get pages from someothercompany.com or yourisp.net, then tell your ISP to fix their virtual hosting. If you find virtual hosting errors, it could be that your ISP made a mistake, or maybe you didn't pay your ISP bill, so they've started serving their own content instead of yours. :)

So try that out. If the Content-Type: line doesn't say text/html, that's what needs to be fixed. If it does say text/html, then you might want to look into whether the webserver is sending binary data (e.g. an executable, or bad character encodings for non-English pages, etc.). Let us know what you find out, and good question! :)

P.S. I kinda spilled that out fast; definitely let me know if I did a typo/mistake in the above..

Subscribe to this discussion: Email

Join the conversation:

Remember personal info?

Check to Subscribe to this Comment:
(email field must be filled in)

Subscribe Without Commenting