{"id":26,"date":"2004-08-02T00:00:41","date_gmt":"2004-08-02T08:00:41","guid":{"rendered":""},"modified":"2016-05-25T23:34:40","modified_gmt":"2016-05-26T06:34:40","slug":"smart-urls-and-smarter-404s","status":"publish","type":"post","link":"https:\/\/mikeindustries.com\/blog\/archive\/2004\/08\/smart-urls-and-smarter-404s","title":{"rendered":"Beautification Revisited"},"content":{"rendered":"<div style=\"position: absolute; background-image: url(\/blog\/images\/inline\/ashley.jpg); left: -45px; top: -20px; height: 439px; width: 419px; background-repeat: no-repeat;\"><span><\/span><\/div>\n<div style=\"z-index: 10; padding-top: 125px; position: relative\">\n<div style=\"float: left; width: 300px; height: 285px; position: relative\"><span><\/span><\/div>\n<p>Beauty comes in many forms. For normal people, maybe it&#8217;s Ashley Judd in a bedsheet on a Sunday morning. For web dorks, however, it can be something as mundane as extensionless URLs or intelligent error pages. Sad as that may be, most of us don&#8217;t have the Ashley Judd option available anyway, so we shouldn&#8217;t feel too bad about deifying code.<\/p>\n<p><a href=\"https:\/\/mikeindustries.com\/blog\/archive\/2004\/07\/beautification-by-dirification\">Last week&#8217;s post on dirified URLs<\/a> was supposed to bring about some sort of consensus opinion on smart URL-naming conventions.  Thanks to everyone who posted their very helpful and enlightening comments, but in the end, we only discovered more options and came to no mutual conclusions. It appears that people just look for different things in their URLs and what you do with yours is up to you.<\/p>\n<p>Having said that, I have completely redone my URL structure and 404 strategy at Mike Industries based on the comments received and some additional research.<\/p>\n<\/div>\n<p><!--more--><\/p>\n<div>\n<h3>URLs<\/h3>\n<p>I&#8217;ve reached the conclusion that URLs should neither be automatically generated from post titles nor should they use an automatically incrementing number system.  The inability to change a post title after publishing makes the former method too limiting.  Combine that with the fact that auto-generated dirified URLs can get needlessly long or truncated awkwardly and you have a recipe for headaches.<\/p>\n<p>On the other hand, the latter method, which uses an incrementing numbering system, is not the least bit useful to humans who will inevitably need to interact with the raw URLs.  URLs are not like ISBN numbers or database keyfields after all.  Why?  Because ISBN numbers and database keyfields are almost always dealt with by computers and not humans.  The raw URL (besides as a simple domain name) was ideally supposed to be hidden from users, but the reality of our world is that it is not.  It&#8217;s in the location bar, it&#8217;s on web pages, it&#8217;s in your history, it&#8217;s on TV, it&#8217;s everywhere.  To not include at least some human-readable content in your URLs is to ignore how prevalent the raw URL has become in our visual routines.<\/p>\n<p>The other hot issue with URLs was whether or not (and how) to hide file extensions. There did seem to be a consensus that hiding extensions was necessary to properly future-proof URLs, but people disagreed on the best way to do it.  Some people create directories based on the title of the post and throw an index.php file in the directory like so:<\/p>\n<p><code><br \/>\nStructure: \"\/of-mice-and-men\/index.php\"<br \/>\nVia the web: \"\/of-mice-and-men\/\" or \"\/of-mice-and-men\"<br \/>\n<\/code><\/p>\n<p>Others remove the file extensions completely from the files themselves like so:<\/p>\n<p><code><br \/>\nStructure: \"\/of-mice-and-men\"<br \/>\nVia the web: \"\/of-mice-and-men\"<br \/>\n<\/code><\/p>\n<p>And still others keep the file extensions on the files themselves and use an htaccess file or other server filter to serve the correct file like so:<\/p>\n<p><code><br \/>\nStructure: \"\/of-mice-and-men.php\"<br \/>\nVia the web: \"\/of-mice-and-men\"<br \/>\n<\/code><\/p>\n<p>The first example seems like the most popular, but it is also the most wrong. First, creating new directories for every entry is extraneous, and it introduces a convention whereby one should then encase all other site files in their own directories as well. Hassle. Pain. Not good.<\/p>\n<p>Secondly, and much more importantly, it creates two URLs and not one.  &#8220;\/of-mice-and-men\/&#8221; and &#8220;\/of-mice-and-men&#8221; are <em>not<\/em> technically the same location.  One points to where a file should be and one points to a directory. Deploy a system like this and you&#8217;ll have people pointing to both locations, depending on if they use the trailing slash when linking to you.<\/p>\n<p>The second example seems the most extreme, and I like it, but it&#8217;s really URL-permanence we&#8217;re seeking here and not necessarily local filename permanence.  If I switch publishing systems, like <a href=\"http:\/\/photomatt.net\" target=\"_blank\">PhotoMatt<\/a> keeps telling me I should, I will of course be re-exporting all of my pages from a database, so the files themselves don&#8217;t matter so much.  Additionally, removing file extensions from the files themselves introduces other minor problems like FTP mode negotiation, local file editing, and other non-crucial, but annoying side-effects.<\/p>\n<p>The third example is what I settled on.  It provides the future-proof URLs I was looking for, without introducing other issues into my life.  I&#8217;m not going to give a full tutorial on making this conversion here, since not all of the info is mine, but here&#8217;s an overview of what needs to be done:<\/p>\n<h4>Take care of .htaccess<\/h4>\n<p>Add this to the .htaccess file on the root of your server:<\/p>\n<p><code><br \/>\nRewriteEngine on<br \/>\nRewriteBase \/<br \/>\nRewriteCond %{REQUEST_FILENAME}.php -f<br \/>\nRewriteCond %{REQUEST_URI} !\/$<br \/>\nRewriteRule (.*) $1\\.php [L]<br \/>\nRewriteCond %{REQUEST_FILENAME} !-d<br \/>\nRewriteRule ^(.+)\/$ \/$1 [R=301,L]<br \/>\n<\/code><\/p>\n<p>This instructs all requests for the file &#8220;http:\/\/yoursite\/whatever&#8221; to resolve to &#8220;http:\/\/yoursite\/whatever.php&#8221;.  Keep in mind this is not a redirect and the user will never see the .php extension.  The page will simply be served and the URL will remain clean.  Obviously you can use this method with non-PHP pages as well&#8230; just change the file extension in the code above.<\/p>\n<h4>Turn Movable Type on to your extension-hatin&#8217; ways<\/h4>\n<p>First, head on over to <a href=\"http:\/\/mar.anomy.net\/entry\/2003\/06\/22\/17.15.00\/\" target=\"_blank\">M\u00c3\u00a1r \u00c3\u2013rlygsson&#8217;s site<\/a> and read through his excellent tutorial.  M\u00c3\u00a1r is a believer in pure numeric URLs (based on date\/time) so I didn&#8217;t follow his steps verbatim, but I was able to achieve what I wanted by merely substituting these values in during Step 1 of his process:<\/p>\n<p><code><br \/>\nIndividual: &lt;MTEntryDate format=&quot;%Y&quot;&gt;\/&lt;MTEntryDate format=&quot;%m&quot;&gt;\/&lt;MTEntryKeywords&gt;.php<br \/>\nMonthly: &lt;MTEntryDate format=&quot;%Y&quot;&gt;\/&lt;MTEntryDate format=&quot;%m&quot;&gt;\/index.php<br \/>\nCategory: &lt;MTCategoryLabel dirifyplus=&quot;pld&quot;&gt;\/index.php<br \/>\n<\/code><\/p>\n<p>&#8230; and then changing the RegEx line in Step 2 to this:<\/p>\n<p><code><br \/>\n&lt;MTAddRegex name=&quot;stripFile&quot;&gt;s\/index\\.php|\\.php\/\/g&lt;\/MTAddRegex&gt;<br \/>\n<\/code><\/p>\n<p>So what&#8217;s the &lt;MTEntryKeywords&gt; tag doing in there?  Ah, that is the key to solving the auto-dirification issue!  Movable Type has a field called &#8220;Keywords&#8221; which is hidden by default in the &#8220;Create an Entry&#8221; screen.  Click the customize button on your &#8220;Create an Entry&#8221; screen, check the &#8220;Keywords&#8221; box and boom&#8230; you&#8217;ve got another field to play with. Using the above steps, whatever you enter into the Keywords field will become the filename of your post.  For instance, &#8220;a-walk-in-the-park&#8221; or &#8220;motorhead&#8221;.<\/p>\n<p>We now have complete control over our filenames, human-readable unique URLs, and no reliance on entry title permanence. Life is good.<\/p>\n<h3>404s<\/h3>\n<p>With clean URLs taken care of, I wanted to follow through on some code that would thoroughly smarten-up my 404s. The idea, as spelled out in my previous post, would be to create a system whereby:<\/p>\n<p>https:\/\/mikeindustries.com\/vegemite sandwich<\/p>\n<p>&#8230; would automatically query all blog entries on my site, and if only one entry matched the words &#8220;vegemite sandwich&#8221;, it would instantly redirect the user to that entry. If more than one entry matched, it would take the user to a page listing all matching pages.<\/p>\n<p>Using <a href=\"http:\/\/nslog.com\/archives\/2003\/02\/26\/404_search_function_code.php\" target=\"_blank\">Erik Barzeski&#8217;s code<\/a> as a starting point, I put together a smart 404 page which does exactly that. For some reason, Erik&#8217;s code wasn&#8217;t working for me, so I turned to my favorite PHP function &#8220;file_get_contents()&#8221; and all was well.<\/p>\n<p>Here&#8217;s what&#8217;s necessary to get the job done:<\/p>\n<h4>Take care of .htaccess<\/h4>\n<p>Add this to the .htaccess file on the root of your server:<\/p>\n<p><code>ErrorDocument 404 \/404.php<\/code><\/p>\n<h4>Create your PHP-based 404 page (404.php)<\/h4>\n<p>Here is the full-text of the 404.php page I created:<\/p>\n<div><code><textarea style=\"width: 375px; height: 240px\"><br \/>\n&lt;?php<br \/>\n$search_term = preg_replace(\"#\/$#\",\"\",$REQUEST_URI);<br \/>\nereg(&quot;([^\\\\\/]*)$&quot;, $search_term, $regs);<br \/>\n$search_term = $regs[1];<br \/>\n$search_url = '\/search?smart404=1&amp;dosearch=1&amp;IncludeBlogs=1&amp;search=';<br \/>\n$full_search_url = $search_url . $search_term;<br \/>\n$full_page = file_get_contents($full_search_url);<br \/>\n$search_string = '\/&lt;a class=&quot;searchresult&quot; href=&quot;([^&quot;]*)&quot;\/';<br \/>\n$count = preg_match_all($search_string, $full_page, $matches);<br \/>\nheader(&quot;HTTP\/1.1 301 Moved Permanently&quot;);<br \/>\nif(1 == $count) {<br \/>\nheader(&quot;Location: {$matches[1][0]}&quot;);<br \/>\n} else {<br \/>\nheader(&quot;Location: $full_search_url&quot;);<br \/>\n}<br \/>\nheader(&quot;Connection: close&quot;);<br \/>\n?&gt;<br \/>\n<\/textarea><\/code><\/p>\n<\/div>\n<p>The only things which require customizing are the following:<\/p>\n<ol>\n<li>Replace my search URL with your own search URL.<\/li>\n<li>The parameter <code>smart404=1<\/code> tells your search output form that the search came from a 404, so if you&#8217;d like, you can add code into your search output page to write out different text depending on this condition.  I write out &#8220;Were you looking for something?&#8221; instead of &#8220;Search Results&#8221;, for instance.<\/li>\n<li>The line with <code>class=\"searchresult\"<\/code> in it is used to identify how many pages contain the search term(s). In order for it to work, you&#8217;ll have to modify the anchor links in your search output page so they have <code>class=\"searchresult\"<\/code> in them.<\/li>\n<\/ol>\n<p><em>*For further information on turning your Movable Type search results page into a PHP page in the first place, please see <a href=\"https:\/\/mikeindustries.com\/blog\/archive\/2004\/06\/mt-cgi-to-php.php\">my previous article on the subject<\/a>.<\/em><\/p>\n<p>That&#8217;s it!  Say hello to your new friend: the smarter 404. Not quite Ashley Judd in a bedsheet, but quite useful nonetheless.  Here it is in action:<\/p>\n<p>Multiple hits:<br \/>\n<a href=\"https:\/\/mikeindustries.com\/espn\">https:\/\/mikeindustries.com\/espn<\/a><\/p>\n<p>One hit with &#8220;I&#8217;m Feeling Lucky&#8221; redirect:<\/p>\n\r\n<script language=\"Javascript\" type=\"text\/javascript\">var hidethis='sm'+'art ne'+'ws aggreg'+'ators';<\/script><script language=\"Javascript\" type=\"text\/javascript\">document.write('<a href=\"https:\/\/mikeindustries.com\/'+hidethis+'\">https:\/\/mikeindustries.com\/'+hidethis+'<\/a>');<\/script>\r\n\n<p>As always, I&#8217;m open to posting improvements on the above methods. Please feel free to comment.<\/p>\n<div class=\"update\"><strong>UPDATE:<\/strong> I&#8217;ve updated the smart 404 routine to place a location bar at the top of the page when a user is redirected intelligently. It appears on the Search Results page when there are multiple matches and the individual entry page when &#8220;I&#8217;m Feeling Lucky&#8221; kicks in. Here are the two quick steps to implement it:<\/p>\n<p><strong>1. Paste this into the top of your 404.php file:<\/strong><\/p>\n<p><code><textarea style=\"width: 375px; height: 60px\"><br \/>\nsetcookie(&quot;origurl&quot;, &quot;http:\/\/www.yourdomain.com&quot;.$REQUEST_URI, time()+60*60*24, &quot;\/&quot;, &quot;yourdomain.com&quot;, 0);<br \/>\n<\/textarea><\/code><\/p>\n<p>This captures the URL the user typed in and stores it in a cookie called &#8220;origurl&#8221;.<\/p>\n<p><strong>2. Paste this into the top of your Individual Entry Template and your Search Results template (before <em>any<\/em> of the HTML):<\/strong><\/p>\n<p><code><textarea style=\"width: 375px; height: 85px\"><br \/>\n&lt;?php<br \/>\nif ($_COOKIE['origurl']) {<br \/>\n$origurl = $_COOKIE['origurl'];<br \/>\nsetcookie('origurl', '', time()-60*60*24, '\/', 'yourdomain.com', 0);<br \/>\n}<br \/>\n?&gt;<br \/>\n<\/textarea><\/code><\/p>\n<p>This checks for the &#8216;origurl&#8217; cookie, stores the cookie value as a PHP variable, and deletes the cookie.<\/p>\n<p><strong>3.  Paste this right after the open &lt;body&gt; tag on both your Individual Entry Template and your Search Results template:<\/strong><\/p>\n<p><code><textarea style=\"width: 375px; height: 225px\"><br \/>\n&lt;?php<br \/>\nif ($origurl) {<br \/>\n?><br \/>\n&lt;div style=&quot;width: 100%; background-color: #A7A7A7; text-align: center; padding: 8px 0 8px 0; color: #000; font-size: 10px;&quot;&gt;You have been taken to the closest match(es). You typed: &lt;form action=&quot;&quot; onsubmit=&quot;self.location.replace(this.urlbar.value); return false;&quot;&gt;&lt;input name=&quot;urlbar&quot; type=&quot;text&quot; value=&quot;&lt;?php echo str_replace('%20',' ',$origurl)?&gt;&quot; style=&quot;width: 300px; font-size: 10px; margin-right: 9px&quot; \/&gt;&lt;input type=&quot;submit&quot; name=&quot;go&quot; style=&quot;font-size: 10px&quot; value=&quot;Go&quot; \/&gt;&lt;\/form&gt;&lt;\/div&gt;<br \/>\n&lt;?php<br \/>\n}<br \/>\n?&gt;<br \/>\n<\/textarea><\/code><\/p>\n<p>This checks for the origurl variable and then writes out the div with the location bar if necessary.<\/p><\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Beauty comes in many forms. For normal people, maybe it&#8217;s Ashley Judd in a bedsheet on a Sunday morning. For web dorks, however, it can be something as mundane as extensionless URLs or intelligent error pages. Sad as that may be, most of us don&#8217;t have the Ashley Judd option available anyway, so we shouldn&#8217;t [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[37,282],"tags":[],"class_list":["post-26","post","type-post","status-publish","format-standard","hentry","category-code","category-original"],"_links":{"self":[{"href":"https:\/\/mikeindustries.com\/blog\/wp-json\/wp\/v2\/posts\/26","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mikeindustries.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mikeindustries.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mikeindustries.com\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/mikeindustries.com\/blog\/wp-json\/wp\/v2\/comments?post=26"}],"version-history":[{"count":0,"href":"https:\/\/mikeindustries.com\/blog\/wp-json\/wp\/v2\/posts\/26\/revisions"}],"wp:attachment":[{"href":"https:\/\/mikeindustries.com\/blog\/wp-json\/wp\/v2\/media?parent=26"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mikeindustries.com\/blog\/wp-json\/wp\/v2\/categories?post=26"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mikeindustries.com\/blog\/wp-json\/wp\/v2\/tags?post=26"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}