Beautification Revisited

Beauty comes in many forms. For normal people, maybe it’s Ashley Judd in a bedsheet on a Sunday morning. For web dorks, however, it can be something as mundane as extensionless URLs or intelligent error pages. Sad as that may be, most of us don’t have the Ashley Judd option available anyway, so we shouldn’t feel too bad about deifying code.

Last week’s post on dirified URLs was supposed to bring about some sort of consensus opinion on smart URL-naming conventions. Thanks to everyone who posted their very helpful and enlightening comments, but in the end, we only discovered more options and came to no mutual conclusions. It appears that people just look for different things in their URLs and what you do with yours is up to you.

Having said that, I have completely redone my URL structure and 404 strategy at Mike Industries based on the comments received and some additional research.

URLs

I’ve reached the conclusion that URLs should neither be automatically generated from post titles nor should they use an automatically incrementing number system. The inability to change a post title after publishing makes the former method too limiting. Combine that with the fact that auto-generated dirified URLs can get needlessly long or truncated awkwardly and you have a recipe for headaches.

On the other hand, the latter method, which uses an incrementing numbering system, is not the least bit useful to humans who will inevitably need to interact with the raw URLs. URLs are not like ISBN numbers or database keyfields after all. Why? Because ISBN numbers and database keyfields are almost always dealt with by computers and not humans. The raw URL (besides as a simple domain name) was ideally supposed to be hidden from users, but the reality of our world is that it is not. It’s in the location bar, it’s on web pages, it’s in your history, it’s on TV, it’s everywhere. To not include at least some human-readable content in your URLs is to ignore how prevalent the raw URL has become in our visual routines.

The other hot issue with URLs was whether or not (and how) to hide file extensions. There did seem to be a consensus that hiding extensions was necessary to properly future-proof URLs, but people disagreed on the best way to do it. Some people create directories based on the title of the post and throw an index.php file in the directory like so:

Structure: "/of-mice-and-men/index.php" Via the web: "/of-mice-and-men/" or "/of-mice-and-men"

Others remove the file extensions completely from the files themselves like so:

Structure: "/of-mice-and-men" Via the web: "/of-mice-and-men"

And still others keep the file extensions on the files themselves and use an htaccess file or other server filter to serve the correct file like so:

Structure: "/of-mice-and-men.php" Via the web: "/of-mice-and-men"

The first example seems like the most popular, but it is also the most wrong. First, creating new directories for every entry is extraneous, and it introduces a convention whereby one should then encase all other site files in their own directories as well. Hassle. Pain. Not good.

Secondly, and much more importantly, it creates two URLs and not one. “/of-mice-and-men/” and “/of-mice-and-men” are not technically the same location. One points to where a file should be and one points to a directory. Deploy a system like this and you’ll have people pointing to both locations, depending on if they use the trailing slash when linking to you.

The second example seems the most extreme, and I like it, but it’s really URL-permanence we’re seeking here and not necessarily local filename permanence. If I switch publishing systems, like PhotoMatt keeps telling me I should, I will of course be re-exporting all of my pages from a database, so the files themselves don’t matter so much. Additionally, removing file extensions from the files themselves introduces other minor problems like FTP mode negotiation, local file editing, and other non-crucial, but annoying side-effects.

The third example is what I settled on. It provides the future-proof URLs I was looking for, without introducing other issues into my life. I’m not going to give a full tutorial on making this conversion here, since not all of the info is mine, but here’s an overview of what needs to be done:

Take care of .htaccess

Add this to the .htaccess file on the root of your server:

RewriteEngine on RewriteBase / RewriteCond %{REQUEST_FILENAME}.php -f RewriteCond %{REQUEST_URI} !/$ RewriteRule (.*) $1\.php [L] RewriteCond %{REQUEST_FILENAME} !-d RewriteRule ^(.+)/$ /$1 [R=301,L]

This instructs all requests for the file “http://yoursite/whatever” to resolve to “http://yoursite/whatever.php”. Keep in mind this is not a redirect and the user will never see the .php extension. The page will simply be served and the URL will remain clean. Obviously you can use this method with non-PHP pages as well… just change the file extension in the code above.

Turn Movable Type on to your extension-hatin’ ways

First, head on over to MÃ¡r Ã–rlygsson’s site and read through his excellent tutorial. MÃ¡r is a believer in pure numeric URLs (based on date/time) so I didn’t follow his steps verbatim, but I was able to achieve what I wanted by merely substituting these values in during Step 1 of his process:

Individual: <MTEntryDate format="%Y">/<MTEntryDate format="%m">/<MTEntryKeywords>.php Monthly: <MTEntryDate format="%Y">/<MTEntryDate format="%m">/index.php Category: <MTCategoryLabel dirifyplus="pld">/index.php

… and then changing the RegEx line in Step 2 to this:

<MTAddRegex name="stripFile">s/index\.php|\.php//g</MTAddRegex>

So what’s the <MTEntryKeywords> tag doing in there? Ah, that is the key to solving the auto-dirification issue! Movable Type has a field called “Keywords” which is hidden by default in the “Create an Entry” screen. Click the customize button on your “Create an Entry” screen, check the “Keywords” box and boom… you’ve got another field to play with. Using the above steps, whatever you enter into the Keywords field will become the filename of your post. For instance, “a-walk-in-the-park” or “motorhead”.

We now have complete control over our filenames, human-readable unique URLs, and no reliance on entry title permanence. Life is good.

404s

With clean URLs taken care of, I wanted to follow through on some code that would thoroughly smarten-up my 404s. The idea, as spelled out in my previous post, would be to create a system whereby:

https://mikeindustries.com/vegemite sandwich

… would automatically query all blog entries on my site, and if only one entry matched the words “vegemite sandwich”, it would instantly redirect the user to that entry. If more than one entry matched, it would take the user to a page listing all matching pages.

Using Erik Barzeski’s code as a starting point, I put together a smart 404 page which does exactly that. For some reason, Erik’s code wasn’t working for me, so I turned to my favorite PHP function “file_get_contents()” and all was well.

Here’s what’s necessary to get the job done:

Take care of .htaccess

Add this to the .htaccess file on the root of your server:

ErrorDocument 404 /404.php

Create your PHP-based 404 page (404.php)

Here is the full-text of the 404.php page I created:

<br />
<?php<br />
$search_term = preg_replace("#/$#","",$REQUEST_URI);<br />
ereg("([^\\/]*)$", $search_term, $regs);<br />
$search_term = $regs[1];<br />
$search_url = '/search?smart404=1&dosearch=1&IncludeBlogs=1&search=';<br />
$full_search_url = $search_url . $search_term;<br />
$full_page = file_get_contents($full_search_url);<br />
$search_string = '/<a class="searchresult" href="([^"]*)"/';<br />
$count = preg_match_all($search_string, $full_page, $matches);<br />
header("HTTP/1.1 301 Moved Permanently");<br />
if(1 == $count) {<br />
header("Location: {$matches[1][0]}");<br />
} else {<br />
header("Location: $full_search_url");<br />
}<br />
header("Connection: close");<br />
?><br />

The only things which require customizing are the following:

Replace my search URL with your own search URL.
The parameter smart404=1 tells your search output form that the search came from a 404, so if you’d like, you can add code into your search output page to write out different text depending on this condition. I write out “Were you looking for something?” instead of “Search Results”, for instance.
The line with class="searchresult" in it is used to identify how many pages contain the search term(s). In order for it to work, you’ll have to modify the anchor links in your search output page so they have class="searchresult" in them.

*For further information on turning your Movable Type search results page into a PHP page in the first place, please see my previous article on the subject.

That’s it! Say hello to your new friend: the smarter 404. Not quite Ashley Judd in a bedsheet, but quite useful nonetheless. Here it is in action:

Multiple hits:
https://mikeindustries.com/espn

One hit with “I’m Feeling Lucky” redirect:

As always, I’m open to posting improvements on the above methods. Please feel free to comment.

UPDATE: I’ve updated the smart 404 routine to place a location bar at the top of the page when a user is redirected intelligently. It appears on the Search Results page when there are multiple matches and the individual entry page when “I’m Feeling Lucky” kicks in. Here are the two quick steps to implement it:

1. Paste this into the top of your 404.php file:

<br /> setcookie("origurl", "http://www.yourdomain.com".$REQUEST_URI, time()+60*60*24, "/", "yourdomain.com", 0);<br />

This captures the URL the user typed in and stores it in a cookie called “origurl”.

2. Paste this into the top of your Individual Entry Template and your Search Results template (before any of the HTML):

<br /> <?php<br /> if ($_COOKIE['origurl']) {<br /> $origurl = $_COOKIE['origurl'];<br /> setcookie('origurl', '', time()-60*60*24, '/', 'yourdomain.com', 0);<br /> }<br /> ?><br />

This checks for the ‘origurl’ cookie, stores the cookie value as a PHP variable, and deletes the cookie.

3. Paste this right after the open <body> tag on both your Individual Entry Template and your Search Results template:

<br /> <?php<br /> if ($origurl) {<br /> ?><br /> <div style="width: 100%; background-color: #A7A7A7; text-align: center; padding: 8px 0 8px 0; color: #000; font-size: 10px;">You have been taken to the closest match(es). You typed: <form action="" onsubmit="self.location.replace(this.urlbar.value); return false;"><input name="urlbar" type="text" value="<?php echo str_replace('%20',' ',$origurl)?>" style="width: 300px; font-size: 10px; margin-right: 9px" /><input type="submit" name="go" style="font-size: 10px" value="Go" /></form></div><br /> <?php<br /> }<br /> ?><br />

This checks for the origurl variable and then writes out the div with the location bar if necessary.

Enjoy this? Subscribe to Mike Industries updates via email, or follow me on Bluesky @mikeindustries.com.

53 comments on “Beautification Revisited”. Leave your own?

Keith says:

Thanks for the followup Mike — but I wish you’d have posted it yesterday! I could have used some of the info. I spent a few hours working on similar issues myself and came to a similar conclusion with a bit of a different implementation. I looked at the regex solution to remove the file extension and it would have been quite a bit of work to get going and because I need to redirect all of my old entries I didn’t want to mess with adding even more stuff to my .htaccess.

But the end result is almost the same. Also, I use MTShortTitle to do the same thing you do with the keywords. What’s nice is that it allows you to keep your keyword functionality. Read more here. It might serve you a bit better. I’ve migrated my entries, changed the url scheme and will have my site up on the new server soon.

(You’ll be happy to note that the comment lag is gone!)

I’m going to look into your smart 404 and see if maybe I can use it in conjunction with what I’ve got already. Might work well, so thanks for that.

August 2, 2004 at 7:25 pm
Patrick H. Lauke says:

on the issue of extensionless urls: there’s a method that avoids re-write rules completely, and works for anything and everything (you can use it for .php, .html, .jpg…anything)

simply enable multiviews (either in httpd.conf or .htaccess)

Options +MultiViews

careful though, this can potentially have interesting side effects (read: rewrite rule loops) if combined with other existing heavy rewrite rules (particularly the ones you describe above, where it matches some name to the same name + extension)

August 2, 2004 at 7:36 pm
Mike D. says:

Keith,

Sorry. I would have posted it yesterday but Ashley’s tattoo scar hadn’t quite healed yet.

One thing that might help you with your redirect issue is to create an entirely new archive folder for your new entries. My old folder was “archives” and my new folder is “archive”, so I can throw an .htaccess file in the old folder to handle the redirects without it affecting the new folder.

August 2, 2004 at 7:48 pm
Keith says:

That’s an idea. I think my biggest issue with the methods you talk about was the use or regex. I’m just too damn lazy to get in there and monkey with all of my templates. I think what I have will work well.

Now, about your custom smart 404. I’m thinking it might be a good idea to combine what you’ve got with something a bit more traditional. I mean there could be many cases when someone hits the 404 with something that doesn’t return any search results. Might it be a good idea, in addition to the custom heading also add some custom 404 text and recent entries or something?

So your 404 might read like:

Looking for something?

(search results if any)

Nothing in there?

Here are my last 5 entries (via mt).

If you still can’t find what you’re looking for, blah, blah, yackety schmakity…

August 2, 2004 at 8:19 pm
Jonathan Snook says:

One small thing to note… I think your RSS feed is still including the .php extension. Nice informative article… your 404 technique could also have a positive effect on slightly truncated URLs (due to improper cut and pasting) because the search will more than likely redirect to the proper article. A bonus in my books.

August 2, 2004 at 8:24 pm
Mike D. says:

Keith,

The other good way I’ve found to get the extensions off of your files in MT is to hack the “util.pm” file so that if your preferred extension field in the MT configuration screen is blank, no extension is written (currently, it defaults to .html if left blank). This is a pretty easy solution. I went with the RegEx one instead because I have a common MT template module (the header) which appears on every page so adding the RegEx line was just a change to one file.

As for your search results suggestion… yes… definitely. More text is necessary. I’m also going to throw a prepopulated search field in there at the top so you can adjust your search.

August 2, 2004 at 8:29 pm
Keith says:

Yeah, RegEx might be easier than I’m making it out to be. I’ve just spent so much time in MT this last week I’m being lazy.

Well, I’d be interested in seeing what you come up with. The prepopulated search is a great idea…

Also, a stupid, but related, PHP question (I’m still pretty new to PHP) — in your search.php template, how are you using that smart404=1 paramater to display the 404 title?

August 2, 2004 at 8:37 pm
Anil says:

Thanks for the update! Interesting that you independently arrived at the same conclusion that Jason Kottke did a while back. (Me? I’m lazy, so I just use dirified post titles. They don’t change *that* frequently once I’ve finished my initial edits.)

I’m gonna see if we can’t edit the defaults in MT so that a blank file extension actually means no extension and then HTML is prefilled by default. That makes more sense anyway.

August 2, 2004 at 9:27 pm
Mike D. says:

Anil,

Yeah, the web is getting a lot more like print now in that most “new ideas” have already been thought of independently by other people. That’s one of the things that really bores me about the print design and advertising worlds. Thanks for looking into the default file extension improvement for MT… I agree, seems logical.

Keith,

The code which goes in the search page to catch the “smart404=1” is:

<h2>
<?
if ($_GET[‘smart404’] == ‘1’) {
echo “Were You Looking for Something?”;
} else {
echo “Search Results”;
}
?>
</h2>

That’s it…

August 2, 2004 at 9:53 pm
Keith says:

Thanks Mike. I’ll play around with that. I really need to get myself a good PHP book.

August 3, 2004 at 12:14 am
Goldfinch says:

Mike,

This is a very helpful post (like most of the things I read in your blog) but I think you might want to reconsider how you’re handling the 404 errors.

IMHO I don’t think you should be automatically redirecting people or even replacing the location in the address bar.

You should be informing people that they’ve made a mistake or that the page has moved but you should still give them the chance to correct that mistake themselves.

I.e. if someone mistypes a URL, they should be able to correct their mistake in the address bar without having to re-type the whole thing. This isn’t possible with your current setup.

I’m afraid I can’t offer any pointers to any PHP code that would help you do this though, but I thought it was worth mentioning.

August 3, 2004 at 4:46 am
Mike D. says:

UP

1. Jonathan: Thanks. Feeds are now fixed. Sorry to everyone who thinks I now have 15 new articles :)

2. I’m thinking about making Ashley Judd a permanent fixture on the smart404-induced search results page.

3. Goldfinch: Being able to retype a URL would be nice and I think I have a way to accomplish your suggestion without losing automatic rerouting. Later today, I should have an enhancement posted whereby I write out a slim horizontal div at the very top of the page containing the previously typed URL and a “Go” button. This will appear completely outside the design, as a Google toolbar would.

August 3, 2004 at 8:28 am
Brad Daily says:

Wouldn’t you know it, the one time my wife is looking over the shoulder when I am reading blogs I come here, and she thinks I am looking at “questionable content”. You should have seen me trying to explain it was an article on dirified URLs, Regex, and Rewrite rules….not a chance.

Good article nonetheless Mike…

August 3, 2004 at 8:40 am
David Schontzler says:

Mike –

You can accomplish the don’t-change-the-url-for-multiple by just outputting $full_code to the page instead of redirecting to it. The added advantage is that you won’t be performing the search twice!

I would suggest automatic redirection to single-page matches though and consider adding a you got here the “wrong way” bit. Often times, a visitor arrives at a page through a link that they have no control over and you want search engines to update their links anyhow.

August 3, 2004 at 11:59 am
Ben says:

Just FYI, adding a trailing slash (as I probably would if typing a URL from memory) kills your system.

August 3, 2004 at 2:10 pm
Mike D. says:

Thanks Ben… that is now fixed. I just had to strip the trailing slash out in the 404.php code using a regular expression. I’ve updated the 404.php code above to reflect this.

August 3, 2004 at 2:37 pm
Jeff Minard says:

Check it – I missed out on commenting on the last piece, but since I though a lot about it, I thought I would suggest an ultimate combination of the URL methods here.

We like the article numbering system because it never changes and is very easy to keep a handle on.
But we like the /year/month/day/short-title method because it’s nicer to read.

Well, I like both of them:
```
/archives/45/2004/07/21/cool-urls
```
Notice the 45 there? Yeah, that’s the article number. So, what’s all the other crap – oh that would be ignored by apache actually. It’s just there to make the article’s url *LOOK* nice. You could actually just link to
```
/archives/45
```
and it would work too. Heck, so would
```
/archive/45/January,2nd-2004/cool-urls
```
That would be a rather nifty solution. I mean, it may look funny after post 1000 (/archives/1000/2004/03/02/joes-1000th-post) but it’s not too much of a hassel to read past it. Or for the more inventive, code the article number in hex, that’ll get ya 4096 entries before you break 3 characters ;-)

August 4, 2004 at 12:27 am
Matt says:

Jeff, but that misses the biggest benefit of going to all this trouble: forward compatibility. As someone who has switched software many times, you learn that those autogenerated IDs mean nothing between systems. By focusing on platform-independent variables such as local datatime and the post title you have portability between any sophisticated system. Because Mark Pilgrim put thought and effort into his URIs (like Mike has here) his switch to WordPress was completely transparent. He could have not told anybody and seen how long it took for people to notice. It’s not just a vanity thing, it’s a service to the web. Linkrot severely diminishes the utility of the web. (and blogs!) Just like you wouldn’t build a house without a floor plan, you shouldn’t create a website without a solid plan of how you’re going to address resources within your domain. And you don’t want to build a house on a shaky foundation like application-specific IDs.

I would question the need for “/archive/” in the structure here, but otherwise looks good.

August 4, 2004 at 7:18 am
Milan Negovan says:

Wow, interesting timing. I’ve revisited my 404s just recently, too. Since I use a CMS of my own making I assign pretty much any “file names” to my blog posts.

When a missing page is requsted it’s easy enough to parse these “file names” and feed them to a search which produces a list of relevant URLs. It’s been working pretty good so far.

August 4, 2004 at 7:37 am
Mike D. says:

Matt is correct. Jeff, the system you propose is similar, I believe, to the system CNET is using right now, but they openly admit that the reason they switched their URLs was merely for Google juice… a reason I don’t subscribe to.

I think your system of having Apache ignore the middle part of the URL is fine, but the only change I’d consider making if I were you was instead of using the post number (45), use some derivation of hour/minute. That will be just as unique as an ID but will at least have its root in something meaningful.

August 4, 2004 at 10:00 am
Jina Bolton says:

Is there a way to implement this for WordPress? I tried, and failed miserably. I really like the idea and would love to give it a go…

August 4, 2004 at 10:59 am
Martin Lucas-Smith says:

I’m interested that you see the need to use .php rather than .html for the underlying files before any rewrite rules are replied. See my earlier post here.

Incidentally, typing in a wrong address gives a 302 redirection, not a 404 as it should. You should consider the effect this will have on search engines for those pages – in theory they will enter google etc even though they are pages you presumably don’t want in google, as they ought to be 404s…

August 5, 2004 at 4:13 am
Mike D. says:

Martin,

1. The reason I use .php for the underlying files is twofold: Number one, it’s more technically correct. They are PHP files after all, so why not name them as such. Since they are generated by a CMS and the user is not exposed to the extension, there is no downside to this. The second reason is for syntax coloring and convenience. If I’m editing a file locally, my text editor knows what kind of file I’m dealing with and treats it accordingly.

2. I’m interested to know more about the 404/302 thing you mentioned. Although the issue you bring up doesn’t bother me much, I was under the impression that a 404 was indeed getting sent. Doesn’t this line take care of that:

ErrorDocument 404 /404.php

I would think so, but then again, I don’t know a lot about HTTP server codes. If you have a way to accomplish what you’re talking about and still do a redirect as I’m doing, please let me know. I was under the impression a 404 was getting sent, followed by a 302.

August 5, 2004 at 8:49 am
Martin Lucas-Smith says:

Mike,

1. I suppose it depends which way you see it – I would say that the user is getting HTML, not PHP. I accept though, in your case, since you’re doing rewriting, it doesn’t really matter after all.

2. Hmm..!

Using http://web-sniffer.net/ and entering a non-existent URL on your domain, I definitely get a 302, though I agree using
ErrorDocument 404 /404.php
ought to be giving a 404. It looks to me as if your
header(“Location …
line is overriding the apache-supplied header.

You could probably create a robots.txt file with
user-agent: *
disallow: /blog/search?smart404
if you do find pages start to enter google, if you regard that as an issue.

August 5, 2004 at 10:55 am
Matt says:

Jina, the best place for future discussion on this is the WordPress forums, but enabling a permalink structure just like Mike’s is two clicks away under Options > Permalink Structure. As for the 404 search Mark blogged about this earlier today. :)

August 5, 2004 at 11:34 am
david says:

Ahem, the “I’m feeling lucky” example link doesn’t match the text.

It says: “…/smart news aggregators”

But it links to: “…/blog/archive/2004/06/smart news aggregators“

August 5, 2004 at 12:22 pm
Mike D. says:

David,

Oops, sorry… that was a problem with my example link, not the system. Example link is now correct. Thanks.

August 5, 2004 at 12:29 pm
David Schontzler says:

A couple things:
1. The reason that specifying a 404 page doesn’t send a 404 is because the 404 page decides. You should be serving up a 404 in multiple match cases and a 301 (moved) in the single match cases.
2. Why are you adding the top bar with JavaScript? The PHP already decides whether or not to include the content and should probably be in charge of writing the content and deleting the cookie.
August 5, 2004 at 5:14 pm
Mike D. says:

David,

Thanks much. Gotta love peer review.

1. Apparently, you can’t set a 404 and also redirect in PHP. You can, however, set a 301 (Moved Permanently), which in my opinion is just about as good. You’re telling the user agent “There is nothing at this address”, so that’s good enough for me. I’ve modified the above 404.php code so that it now sets a 301 before redirecting.

2. Done. No more javascript required. The above code now reflects a cookie-checking/div-writing/cookie-deleting routine done entirely in PHP.

August 5, 2004 at 11:27 pm
Kyle says:

What do I love about this entry?

1. The content

2. The fashionable ‘404’ tattoo the Miss has on her lower back (this didn’t escape me)

Nicely done (as usual) Mike.

August 6, 2004 at 2:04 am
Faruk Ateş says:

Good to see you’ve got something working for yourself there, Mike. :) Looking sharp!

August 6, 2004 at 6:48 am
Jeff Croft says:

Forgive me if this has already been mention, but I’ve got another reason why creating a directory and then adding an index file is not the best option: it’s hard on your server.

When someone enters, say, http://jeffcroft.com/something/, the webserver recognizes this as a directpry and has to manually search for a default (index) file within it — which takes time, and processing power.

However, when you point to an actual file name, say http://jeffcroft.com/something/index.php, the server doesn’t have to do such a search, redusing processing time and power.

August 6, 2004 at 8:52 am
Jeff Minard says:

Matt & Mike,

Jeff, but that misses the biggest benefit of going to all this trouble:

forward compatibility

While at first glance, you would do well to use “system independant” variables like date and time I do not believe that using a numbering system is not “forwards compatible”.

I don’t know of any system that doesn’t use a numbering system for posts (WP, TxP, my own, MT, even Drupal) so I it’s a fairly safe bet that using numbers would be an acceptable way to easily move posts between systems and retain your old URL structure.

The reason I also like not requiring it to be based on the post number instead of datess or short titles is because we round back to the problem of “What if I change the original date or short-name?” Sure, any system would be able to cope by rewriting urls once you have done it, and some more advanced ones will even use “modified at” instead of changing it, but you may still get people linking to the wrong place that had bookmarked/searched it.

With a “catch number” in the /archives/### format, you need not worry because that number should never, ever change – regardless of system or changes to that post.

In short, I disagree that using a numbering system and ignoring the end of the URL is “build[ing] a house on a shaky foundation.” Mike even said it well in his last post:

…a persistent ID never needs to be changed since it contains no information about the entry. Anything about the entry can be changed and it won’t affect the ID.

That “anything” includes the system you use, that date it’s on, or the short-name you use.

August 6, 2004 at 9:48 am
Henrik Lied says:

An error-document should always be sent with a 404 – Not Found HTTP header.
The problem that often occurs when this isn’t done, is that Google might spider a document which doesn’t exist.

August 15, 2004 at 2:45 pm
Stewart Vardaman says:

I agree with the 404 page always returning 404 code.

My only (minor) nitpick is on the example you send an HTTP 1.1 header. Seems like for a 404 page, 1.0 is fine, and a lot of my 404s come from robots that don’t speak anything other than 1.0.

I could be wrong, but better safe than sorry.

August 31, 2004 at 10:41 pm
Mike D. says:

Stewart:

I’d love to return a 404, but if you do that, you can’t also initiate a smart redirect in PHP. I therefore return a 301, which is, for these purposes, just as good. It tells the requesting server that there is no page at that address, and then redirects. It’s very nice for me to be able to search for any article I’ve written on my site by using the location bar. E.G:

https://mikeindustries.com/ashley

So nice!

August 31, 2004 at 10:46 pm
Paul Mayne says:

Whenever I add the lines to my .htaccess file

RewriteEngine on
RewriteBase /
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteCond %{REQUEST_URI} !/$
RewriteRule (.*) $1\.php [L]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+)/$ /$1 [R=301,L]

I get 403 errors on all pages below it. Anyone know why? Any direction is greatly appreciated. I’m just interested in removing the file extentions.
Thanks

September 7, 2004 at 11:41 am

Rob Jones says:

For the url rewriting code to work:

RewriteEngine on
RewriteBase /
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteCond %{REQUEST_URI} !/$
RewriteRule (.*) $1\.php [L]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+)/$ /\ [R=301,L]

your apache server must have the “mod_rewrite.c” module installed.

September 22, 2004 at 1:59 pm

Kevin says:

What do you mean by a common MT template module?

…because I have a common MT template module (the header) which appears on every page so adding the RegEx line was just a change to one file.

Sorry if this is a dumb question, I’m new to all this.

Thanks!

October 3, 2004 at 7:45 pm
Sina Eetezadi says:

Nice Pic…

Where is it from?

Sina

December 18, 2004 at 10:02 am
Rajaselvam.M says:

Hi Guys,
I want to redirect a page without affecting the url at the Address Bar.
For Example, now i am in testone.php after chercking the condition it may gets redirected to testtwo.php without affecting addressbar url as testone.php

Rajaselvam.M

June 6, 2005 at 3:34 am
Didier says:

Redirection is a matter of browser, you can NOT redirect without changing the URL in the address bar. But you can use an include(), which will be invisible.

October 10, 2005 at 1:37 pm
lucas says:

Is there an additional setting that needs to be enabled to get the file extension stripper to work? Because I keep getting 404 errors whenever I try to directly type in my urls without the .php extension.

ie.
http://www.mydomain.com/file.php works fine
http://www.mydomain.com/file = 404

I have mod_rewrite.c enabled in apache.

Do I need to disable MultiViews in httpd.conf? Or add some specific option there?

I have the .htaccess file copied, verbatim.

Thanks!

November 7, 2005 at 4:05 pm
lucas says:

I had a mis-configured httpd.conf

November 8, 2005 at 2:07 pm
Albert says:

Hi,

I’ve also added the code to my .htaccess file. Like Paul Mayne, I’m getting 403 errors on all my pages. I’m certain that the mod_rewrite module is installed and enabled on my server. So what could be causing the problem?

Any help would be greatly appreciated.

Thanks!

January 3, 2006 at 11:04 pm
John says:

thx for code.. no comments (c)

February 13, 2006 at 8:55 am
Mike Farho says:

Thanks for the tips about dropping extensions and the code to add to my .htaccess file. I just started using PHP, but the custom 404 idea looks good for future use. Hopefully I’ll find the time.

I noticed one drawback to using URLs without extensions. The Google rank on a page without extension is zero. When the complete URL is entered, it is nearly 1/3.

Does that mean anything?
Will using links without extensions affect our search engine results?

August 6, 2006 at 5:20 pm
Mike D. says:

Mike: That’s not correct. Not sure what’s happening but extensionless URLs are not penalized by Google.

August 7, 2006 at 6:23 pm
John says:

And if you want to beautfy your blog, just put Ashley’s snap wearing nothing on her body and it would earn you a lot of traffic. Sorry but I didn’t mean to offend her lol

September 4, 2006 at 9:30 am
Tom Fallowfield says:

Of course no-one will ever speak to me again for this but I still use good old “classic” ASP for all my server-side needs.

Ever since I read your wisdom regarding URLs my company – Turtle Media – has started using decent folder structures for all its sites. Each folder contains a “default.asp” so all pages on all our sites can be referred to in a nice clean manner. E.g.

http://www.turtle-media.com/portfolio/chinawheelie/

or

http://www.chinawheelie.com/archive/2006/12/Plant/

(big up the trailing slash)

I firmly agree that URLs are there for humans to read, understand and remember. The machines should work around us, not vice versa. No serial numbers, no alphanumeric gobbledegook and NO UNDERSCORES. Mike, you’re a champion of common sense and basic human decency and I’m with you all the way.

I wrote all the blogging code for Chinawheelie in ASP, more as a personal challenge than anything else. It works pretty well I think, but I’d welcome any ideas for improvement. It’s still a little clunky.

I don’t know of any way to make ASP automatically query for data in the URL (I mean without using a query string, of course), as you described above for PHP. Does anyone else? OR should I get round to learning a decent language? Answers on a postcard…

I’ve been dubbed a “URL pervert” for my retentive insistence on clean and readable URLs, but frankly I don’t care. Let’s work together to make the internet a happier and more logical place.

PS Chinawheelie is a great read. This guy is cycling through every province in china on a tricycle with a dog he rescued. Totally barmy.

December 12, 2006 at 10:55 pm
Dan Laughlin says:

Good evening!

I desperately need help with using my .htaccess file to detect mobile devices and redirect them to a wap version of a site.

I’ve set up a subdomain that pulls from a subdirectory containing xhtml mp files.

I’m looking for code that will detect and redirect mobile devices from the domain to the subdomain (i.e. http://domainname.com/index.html to http://wap.domainname.com

Any help is greatly appreciated!
-Dan

December 30, 2006 at 8:35 pm
writing my name in water says:

Friday is for freading

Here’s some stuff I’m reading today: Benoit Mandelbrot, who discovered the famous Mandelbrot Set of fractal geometry, as interviewed in New Scientist magazine Q: Your work has covered many areas. Would you describe yourself as a pure or an …

November 12, 2004 at 12:01 pm
Nokrev says:

Zapping Ugliness

Alright, I just published the redesign. I feel this one doesn’t have as many bugs, and the layout is much less size-dependant. There are several things I considered when doing this redesign that I hadn’t thought about before. I went…

July 28, 2005 at 7:25 pm

Beautification Revisited

URLs

Take care of .htaccess

Turn Movable Type on to your extension-hatin’ ways

404s

Take care of .htaccess

Create your PHP-based 404 page (404.php)

53 comments on “Beautification Revisited”. Leave your own?

Leave a Reply

Subscribe by Email