Bye bye Drupal: my journey from Drupal 5.7 to WordPress 2.5.1

I am now one website away from never having to look at the underside of a Drupal website ever again. I’m excited. But a bit… tired. This is an explanation of how I migrated from Drupal 5.7 to WordPress 2.5.

The method advice narrative chatter on this page comes completely without warranty or support. It may not work for you and it may even make things worse. I can’t help that and am not responsible for your decision to follow the words on this page.

My blog (cmkl.ca) had been running Drupal for a while. Quite a while. It used to run Geeklog. I’ve also tried various Nuke variants. Don’t laugh. While I’m sharing I might as well tell you I used to use a program called Adobe Pagemill too.

At the time, Drupal seemed like a good choice. It offered XHTML-compliant, template-driven output, a highly-extensible system with a huge, vibrant, award-winning developer community. Sounds great, huh?

But here’s the thing: I like lightweight, semantic markup styled with CSS. I like my presentation to be separated from my structure and my content separated from my presentation.

And I don’t want a bunch of extraeneous crap – markup, applications, ‘features’ whatever – floating around just itching to cause me grief.

So if you’re like me, you don’t want Drupal.

I had flirted with WordPress for a while, futzing about about with it on test blogs and was suddenly green with envy for all the people I knew who’d adopted it for their blogs/CMS.

But I reasoned that moving would be more of a pain than just suffering through Drupals many annoyances. Until Drupal’s vibrant, numerous development community took the project from 4.7 to 5.7 to 6.2 in the space of six months. And I tried to follow along with this other website I’m responsible for.

There was no way I was going to go through all that work again and still end up with Drupal.

So I started Googling.

Migrating from Drupal (4.7) to WordPress had a lot of promise, and Dave Dash’s MySQL-only solution suggested you could move to WordPress with a dozen queries.

Sure the article covered Drupal 4.7 and some unspecified version of WordPress, but how different could the data structure be?

Answer: Quite.

For example, this query is supposed to turn your Drupal taxonomy into WordPress categories:

USE [your_wordpress_db_name];
INSERT INTO 
    wp_categories (cat_ID, cat_name, category_nicename, category_description, category_parent)
SELECT term_data.tid, name, name, description, parent 
FROM [your_drupal_db_name].term_data, drupal.term_hierarchy 
WHERE term_data.tid=term_hierarchy.tid;

The select statement for Drupal still works (assuming you don’t have some elaborately hierarchical taxonomy), but the data structure in wordpress is now very different.

I decided I wanted to make all my drupal terms into tags. That’s more “Web 2.0” and web 2.0 is cool and I desperately desperately want to be cool.

The first wrinkle: WP now has a column for the ‘term’ and a human-readable version they call the ‘slug’. In drupal, the ‘name’ field wants to be the ‘slug’ field. How to remove the url-unfriendly characters without getting into using something like PHP.

So there are some MySQL string functions listed here that work:

And in fact, the lower() and replace() functions work wonderfully.

This gets you the listing of drupal tags:

SELECT term_data.tid, lower(replace(name," ", "-")) as name, name as slug, parent 
FROM [your_drupal_db_name].term_data, [your_drupal_db_name].term_hierarchy 
WHERE term_data.tid=term_hierarchy.tid;

And this is the query I ended up using instead of Mr. Dash’s first one:

INSERT INTO wp.wp_terms(term_ID, name, slug, term_group)
SELECT term_data.tid, lower(replace(name," ", "-")) as name, name as slug, parent 
FROM  [your_drupal_db_name].term_data,  [your_drupal_db_name].term_hierarchy 
WHERE term_data.tid=term_hierarchy.tid;

This was supposed to get the terms and bring them into wordpress, which it does, however the terms didn’t show up in the listing. There’s another query that has to be executed.

I discovered that every row in the terms table has to have an entry in the wp_term_taxonomy table. Yes indeed. So I did this:

insert into wp_term_taxonomy (term_id)
select term_id from wp_terms;
update wp_term_taxonomy set taxonomy ="post_tag" where taxonomy = ""

There’s more elegant SQL syntax I’m sure. And if you wanted your Drupal terms to become WordPress categories, you’d set taxonomy = 'category'

And now the content. Nodes in Drupalese, posts in WordPresspeak.

Mister Dash says do this:

SELECT DISTINCT
    n.nid, FROM_UNIXTIME(created), body, n.title, 
    teaser, 
    REPLACE(REPLACE(REPLACE(REPLACE(LOWER(n.title),' ', '_'),'.', '_'),',', '_'),'+', '_'),
    FROM_UNIXTIME(changed) 
FROM [your_drupal_db_name].node n, [your_drupal_db_name].node_revisions r
WHERE n.vid = r.vid
    AND type='story' OR type='page' ;

…to pull the contend from Drupal. But for me that just pulled in the first title of the first node. So I tried:

SELECT DISTINCT
    n.nid, FROM_UNIXTIME(created), body, n.title, 
    teaser, 
    REPLACE(REPLACE(REPLACE(REPLACE(LOWER(n.title),' ', '-'),'.', '-'),',', '-'),'+', '-'),
    FROM_UNIXTIME(changed) 
FROM cmkldrupal.node n, cmkldrupal.node_revisions r
WHERE n.vid = r.vid
    AND (type='story' OR type='page' OR type='blog' OR type='image');

This is what actually worked for me because most of my nodes are blog entries. The brackets are necessary for my version of MySQL too. I changed the underscores to hyphens because that’s supposed to produce more findable URLs.

Note: I added type='image' after the fact so that you don’t make the same mistake I made. I wish I had thought of it before heading so much further down the road.

And now the insert:

INSERT INTO 
    wp_posts (id, post_date, post_content, post_title, 
    post_excerpt, post_name, post_modified)

Mister Dash’s query references fewer columns, I suspect, because the version of WordPress he was writing about had fewer columns.

This is the query as it actually ran:

INSERT INTO wp_posts(id,post_date,post_content,post_title,post_excerpt,post_name,post_modified)
SELECT DISTINCT n.nid, FROM_UNIXTIME( created ) , body, n.title, teaser,
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( REPLACE( REPLACE( REPLACE( LOWER( n.title ) , ' ', '-' ) , '.', '-' ) , ',', '-' ) , '+', '-' ),'?',''),'!',''),'--','-'),':',''), FROM_UNIXTIME(CHANGED )
FROM [your_drupal_db_name].node n, [your_drupal_db_name].node_revisions r
WHERE n.vid = r.vid AND ( TYPE = 'story' OR TYPE = 'page' OR TYPE = 'blog' OR TYPE = 'image');

This is basically Dave Dash’s query plus the extra columns for WordPress 2.5.1 and a fair bit more weeding out of characters used to make permalinks in WordPress. Dave showed me how, but I guess he doesn’t use much punctuation in his headlines because you need to make sure you weed out colons, question marks, hyphens, parentheses, exclamation marks and instances of two or more hypens used in your post titles to construct your WordPress permalinks.

Otherwise your permalinks won’t work. I didn’t realize this until I started testing my new WP blog and noticed a lot of articles that would show up in home or archive listings but would give me “no articles matched your query” when I clicked on the title.

So I exported a CSV file of my post ids and post_names, grepped them for punctuation, then made the results into SQL queries to update the permalinks stored in the wp_posts table.

If you write your drupal import query you should be able to skip this step and live a relatively normal life.

Tagging the posts

INSERT INTO wp_term_relationships (object_id,term_taxonomy_id) SELECT nid,tid FROM [your_drupal_db_name].term_node ;

…from Dave Dash’s site seems promising. It tags your new WordPress posts the way they were tagged in drupal. Now, of course, this only works if you remembered to empty your WordPress tables before doing any imports, or adding any test posts. I didn’t and all my tags were about other posts. I had to redo the import.

Bringing comments over

INSERT INTO 
    wp_comments 
    (comment_post_ID, comment_date, comment_content, comment_parent, comment_author_IP, comment_author, comment_author_email, comment_author_url)
SELECT 
    nid, FROM_UNIXTIME(timestamp),concat('',subject, '
', comment), thread, hostname, name, mail, homepage 
FROM [your_drupal_db_name].comments ;

WP seems to have evolved a lot from this Dave Dash query. The columns in this post are all still there but the table seems to hold lot more information.

Now drupal’s comments look like:

 `cid` int(10) NOT NULL auto_increment,
  `pid` int(10) NOT NULL default '0',
  `nid` int(10) NOT NULL default '0',
  `uid` int(10) NOT NULL default '0',
  `subject` varchar(64) NOT NULL default '',
  `comment` longtext NOT NULL,
  `hostname` varchar(128) NOT NULL default '',
  `timestamp` int(11) NOT NULL default '0',
  `score` mediumint(9) NOT NULL default '0',
  `status` tinyint(3) unsigned NOT NULL default '0',
  `format` int(4) NOT NULL default '0',
  `thread` varchar(255) NOT NULL default '',
  `users` longtext,
  `name` varchar(60) default '',
  `mail` varchar(64) default '',
  `homepage` varchar(255) default '',

And wordpress’s looks like:

  `comment_ID` bigint(20) unsigned NOT NULL auto_increment,
  `comment_post_ID` int(11) NOT NULL default '0',
  `comment_author` tinytext NOT NULL,
  `comment_author_email` varchar(100) NOT NULL default '',
  `comment_author_url` varchar(200) NOT NULL default '',
  `comment_author_IP` varchar(100) NOT NULL default '',
  `comment_date` datetime NOT NULL default '0000-00-00 00:00:00',
  `comment_date_gmt` datetime NOT NULL default '0000-00-00 00:00:00',
  `comment_content` text NOT NULL,
  `comment_karma` int(11) NOT NULL default '0',
  `comment_approved` varchar(20) NOT NULL default '1',
  `comment_agent` varchar(255) NOT NULL default '',
  `comment_type` varchar(20) NOT NULL default '',
  `comment_parent` bigint(20) NOT NULL default '0',
  `user_id` bigint(20) NOT NULL default '0',

So the above works, but I find that people seldom use meaningful subjects in drupal comments so I skipped the concat thing and added some of the extra info.

INSERT INTO 
    wp_comments 
    (comment_post_ID, comment_date, comment_content, comment_parent, comment_author_IP, comment_author, comment_author_email, comment_author_url)
SELECT 
    nid, FROM_UNIXTIME(timestamp), comment, thread, hostname, name, mail, homepage 
FROM [your_drupal_db_name].comments ;

Now where are we?

  1. Posts are all in.
  2. Posts are correctly tagged
  3. Comments are there, but counts are wrong
  4. Posts have comments but that’s not indicated.
  5. Posts don’t indicate the author

Adding the author info

Easy. I only have one author:

update `wp_posts` set post_author = 2 where post_author = 0

Fixing the comment counts

Maybe there’s a way to do this just with SQL commands, but I don’t know it.

So I wrote a dorky PHP script using Justin Vincent’s ezSQL class to do the job for me. Imagine this in PHP.

get_col("select ID from wp_posts ORDER BY id",0) as $id )
{
          $ccountquery = "SELECT count(comment_id) as CommentCount FROM wp_comments WHERE comment_post_id = " . $id;
          $currentcommentcount = $db->get_var($ccountquery);
          $commentcountupdatequery = "UPDATE wp_posts set comment_count = " . $currentcommentcount . " WHERE ID = " . $id . " LIMIT 1";
          $db->query($commentcountupdatequery);
          echo "

Post ID” . $id . ” Number of comments: ” . $currentcommentcount . ”

";       
}

Incredibly, this worked.

When putting together my “Tag cloud” I discovered another problem. WordPress builds the tag clouds based on the count column in wp_term_taxonomy. None of my tags were showing up in the cloud.

So I built another disposable PHP script:

get_col("select term_id from wp_terms ORDER BY term_id",0) as $id )
{
          $ccountquery = "SELECT count(term_taxonomy_id) as TermUseCount FROM wp_term_relationships WHERE term_taxonomy_id = " . $id;
          $currentermcount = $db->get_var($ccountquery);
          $commentcountupdatequery = "UPDATE wp_term_taxonomy set count = " . $currentermcount . " WHERE term_id = " . $id . " LIMIT 1";
          $db->query($commentcountupdatequery);
          echo "
" . $ccountquery . "

“;
echo ”

Term ID: ” . $id . ” Number of posts with that term: ” . $currentermcount . ”

“;
}

Feeling like we’re making progress?

  1. mod_rewrite rules to send people from old to new URLs.
  2. do something about the photos/videos
  3. add in the links to other blogs and sidebar widgetry

1.) Me, I’ve written a few mod_rewrite rules in my life, but I wouldn’t say we’re close friends. So I take a trip to the Apache URL rewriting guide.

And I come up with this:

RewriteRule ^node/feed$  feed [L,R=301]
RewriteRule ^node/(.*)$  ?p=$1 [L,R=301]

I added the above two rules to my .htaccess file in my WordPress website root directory. You could, theoretically, write a rewrite rule for each URL, mapping them to the permalink, but I’m thinking that might be a bit more expensive in terms of processor time and what not.

So the whole thing looks like this:

# BEGIN WordPress
RewriteEngine On
RewriteBase /
RewriteRule ^node/feed$  feed [L,R=301]
RewriteRule ^node/(.*)$  ?p=$1 [L,R=301]
RewriteCond % !-f
RewriteCond % !-d
RewriteRule . /index.php [L]

The first rule is for the RSS feed, which I noticed was broken too. But the regular rule won’t work because of the WordPress URL.

2.) do something about photos/videos

This doesn’t look promising. Over my years of trying to find an easy and efficient way to add images to pages I have tried many things. Never had any luck. But the result of these experiments is that no two images appear to have been created the same way.

So there doesn’t seem to be an easy way to bring stuff over. Some images continue to show in thumbnail because they’re referencing images my old blog. But the javascripts used to open the full size images are broken. And as soon as I take my old blog app offline they’ll be gone.

Of course, externally linked things like YouTube videos are all fine. Another thumb up for contracting out your website.

When I did my Drupal node import I forgot that a lot of my posts were created as Image nodes, and I didn’t bring them over. So I went through and re-created them, back-dated them and re-uploaded the images. Very tedious work, though I took it as an excuse to re-read some old posts about my daughter, so I didn’t mind. Your mileage may vary and if I were doing this again I would definitely bring the image nodes over with all the other kinds.

Widgetry, what of it? Nothing for it

The sidebar widgetry and blogroll stuff won’t offer any repeatable lessons. I had got so frustrated with Drupal’s various devices to create blog rolls and RSS feeds and what not that I just created all my sidebar stuff using static HTML.

Known problems

  • The accented characters in my post titles didn’t translate. They came over unencoded or in the wrong Charset and ended up as question marks.
  • I also didn’t bring over comments on those forgotten image nodes, but the above queries, corrected to include image nodes, should work.

domainsatcost.ca does subdomain hijacking

This is disappointing. The domain name registrar I use – and have been telling friends and clients to use – is one of those ones that does subdomain hijacking.

I know because a client asked me to set up a website for him that he is eventually going to move to my server. He registered the domain himself at domainsatcost.ca and left the DNSing up to them.

Check out the domain and see what you get. Dig it and you get 66.116.125.150, which belongs to some company in Seattle.

Eventually, my friend will set up his website, but the subdomain hijack will cause him all sorts of problems with his Google ranking.

And this is only a problem because he parked his domain with them. If he’d changed to external DNS hosting and listed my primary and secondary DNS servers as the authorities for his domain name, he wouldn’t have had this problem. And as soon as he does this, the problem will go away.

I’m just sad that an otherwise reliable, local company would be so sleazy. And they’re so cheap too. Maybe that’s why.

I’m not going to recommend them any more and will transfer my domains away from them as soon as it’s economically practical and would urge anyone else to do so too.

Interarchy 9: please tell me I have a copy of 8.5 around somewhere

I have been using this file transfer client since it was in version 1.x. The latest re-release is a total disaster. It’s just wasted an hour of my life.

I rarely write about geeky stuff like this but this is so aggravating and disappointing that I can’t help it. It’s such a stupid thing, and such a small bit of software, but it’s so central to what I do that it’s become an integral part of every work day of mine.

And now it’s dead to me.

I’m not the only one. See the support forums for Interarchy’s new, appropriately named owners, Nolobe.

This guy points out my issue:

Edit button doesn’t work. I do almost 100-200 edits a day… having
to crawl through menus and submenus each time is a complete
frustration. I can’t even choose my preferred edit application.

Dreamweaver. Interarchy now wants me to use Dreamweaver for my edits and I cannot change it.

There isn’t a single thing about Interarchy 9 that I like.

Even under its original publishers they’ve come up with inane features, features with much promise that delivered nothing or worse than nothing (Netdisks, mirroring, etc etc) but they’ve always kept innovating and improving the program’s basic functionality.

Now, in addition to messing up basic stuff, they’ve gone and blown a fashion fuse. They have made the Interarchy window look like the Mac OS X window, which I’m sure they’re really pleased about, but I hate hate hate hate hate it.

I like the way OS X looks. It’s fine I suppose but I liked the fact that Interarchy windows looked different. That way I could tell which listing of files was local and which listing was on my server without averting my gaze. Now, I have to look up at the menu bar to check to see whether I’m in Interarchy or in the Finder.

Who finds that helpful?

I’m furious. Bring back Peter Lewis. Bring back Stairways. Please.

Apparently they’re promising to go back to the old Command-J in version 9.01. And they’re promising a beta “really soon.” But what a bizarre step to take. Do they really think that someone who wants to edit a file on a remote server is going to be using Dreamweaver – with its built-in client and publishing rigmarole – to do it?

Tangled in Our Dreams/Teresa Healy

Tangled in our Dreams website screen capture

Tangled in our Dreams website screen captureI’ve put together a website for Ottawa singer-songwriter Teresa Healy and her musical and life partner Tom Juravich for their recently-launched CD, Tangled in Our Dreams.

It’s pretty much driven by Roberta Huebner’s artwork, which lends a richness and a texture to the site. So I can’t take much credit for the look. It was, however an enjoyable project to work on. I wish the both of them all the success in the world with their music.