I am now one website away from never having to look at the underside of a Drupal website ever again. I’m excited. But a bit… tired. This is an explanation of how I migrated from Drupal 5.7 to WordPress 2.5.
The method advice narrative chatter on this page comes completely without warranty or support. It may not work for you and it may even make things worse. I can’t help that and am not responsible for your decision to follow the words on this page.
My blog (cmkl.ca) had been running Drupal for a while. Quite a while. It used to run Geeklog. I’ve also tried various Nuke variants. Don’t laugh. While I’m sharing I might as well tell you I used to use a program called Adobe Pagemill too.
At the time, Drupal seemed like a good choice. It offered XHTML-compliant, template-driven output, a highly-extensible system with a huge, vibrant, award-winning developer community. Sounds great, huh?
But here’s the thing: I like lightweight, semantic markup styled with CSS. I like my presentation to be separated from my structure and my content separated from my presentation.
And I don’t want a bunch of extraeneous crap – markup, applications, ‘features’ whatever – floating around just itching to cause me grief.
So if you’re like me, you don’t want Drupal.
I had flirted with WordPress for a while, futzing about about with it on test blogs and was suddenly green with envy for all the people I knew who’d adopted it for their blogs/CMS.
But I reasoned that moving would be more of a pain than just suffering through Drupals many annoyances. Until Drupal’s vibrant, numerous development community took the project from 4.7 to 5.7 to 6.2 in the space of six months. And I tried to follow along with this other website I’m responsible for.
There was no way I was going to go through all that work again and still end up with Drupal.
So I started Googling.
Migrating from Drupal (4.7) to WordPress had a lot of promise, and Dave Dash’s MySQL-only solution suggested you could move to WordPress with a dozen queries.
Sure the article covered Drupal 4.7 and some unspecified version of WordPress, but how different could the data structure be?
Answer: Quite.
For example, this query is supposed to turn your Drupal taxonomy into WordPress categories:
USE [your_wordpress_db_name]; INSERT INTO wp_categories (cat_ID, cat_name, category_nicename, category_description, category_parent) SELECT term_data.tid, name, name, description, parent FROM [your_drupal_db_name].term_data, drupal.term_hierarchy WHERE term_data.tid=term_hierarchy.tid;
The select statement for Drupal still works (assuming you don’t have some elaborately hierarchical taxonomy), but the data structure in wordpress is now very different.
I decided I wanted to make all my drupal terms into tags. That’s more “Web 2.0” and web 2.0 is cool and I desperately desperately want to be cool.
The first wrinkle: WP now has a column for the ‘term’ and a human-readable version they call the ‘slug’. In drupal, the ‘name’ field wants to be the ‘slug’ field. How to remove the url-unfriendly characters without getting into using something like PHP.
So there are some MySQL string functions listed here that work:
And in fact, the lower() and replace() functions work wonderfully.
This gets you the listing of drupal tags:
SELECT term_data.tid, lower(replace(name," ", "-")) as name, name as slug, parent FROM [your_drupal_db_name].term_data, [your_drupal_db_name].term_hierarchy WHERE term_data.tid=term_hierarchy.tid;
And this is the query I ended up using instead of Mr. Dash’s first one:
INSERT INTO wp.wp_terms(term_ID, name, slug, term_group) SELECT term_data.tid, lower(replace(name," ", "-")) as name, name as slug, parent FROM [your_drupal_db_name].term_data, [your_drupal_db_name].term_hierarchy WHERE term_data.tid=term_hierarchy.tid;
This was supposed to get the terms and bring them into wordpress, which it does, however the terms didn’t show up in the listing. There’s another query that has to be executed.
I discovered that every row in the terms table has to have an entry in the wp_term_taxonomy table. Yes indeed. So I did this:
insert into wp_term_taxonomy (term_id) select term_id from wp_terms; update wp_term_taxonomy set taxonomy ="post_tag" where taxonomy = ""
There’s more elegant SQL syntax I’m sure. And if you wanted your Drupal terms to become WordPress categories, you’d set taxonomy = 'category'
And now the content. Nodes in Drupalese, posts in WordPresspeak.
Mister Dash says do this:
SELECT DISTINCT n.nid, FROM_UNIXTIME(created), body, n.title, teaser, REPLACE(REPLACE(REPLACE(REPLACE(LOWER(n.title),' ', '_'),'.', '_'),',', '_'),'+', '_'), FROM_UNIXTIME(changed) FROM [your_drupal_db_name].node n, [your_drupal_db_name].node_revisions r WHERE n.vid = r.vid AND type='story' OR type='page' ;
…to pull the contend from Drupal. But for me that just pulled in the first title of the first node. So I tried:
SELECT DISTINCT n.nid, FROM_UNIXTIME(created), body, n.title, teaser, REPLACE(REPLACE(REPLACE(REPLACE(LOWER(n.title),' ', '-'),'.', '-'),',', '-'),'+', '-'), FROM_UNIXTIME(changed) FROM cmkldrupal.node n, cmkldrupal.node_revisions r WHERE n.vid = r.vid AND (type='story' OR type='page' OR type='blog' OR type='image');
This is what actually worked for me because most of my nodes are blog entries. The brackets are necessary for my version of MySQL too. I changed the underscores to hyphens because that’s supposed to produce more findable URLs.
Note: I added type='image' after the fact so that you don’t make the same mistake I made. I wish I had thought of it before heading so much further down the road.
And now the insert:
INSERT INTO wp_posts (id, post_date, post_content, post_title, post_excerpt, post_name, post_modified)
Mister Dash’s query references fewer columns, I suspect, because the version of WordPress he was writing about had fewer columns.
This is the query as it actually ran:
INSERT INTO wp_posts(id,post_date,post_content,post_title,post_excerpt,post_name,post_modified) SELECT DISTINCT n.nid, FROM_UNIXTIME( created ) , body, n.title, teaser, REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( REPLACE( REPLACE( REPLACE( LOWER( n.title ) , ' ', '-' ) , '.', '-' ) , ',', '-' ) , '+', '-' ),'?',''),'!',''),'--','-'),':',''), FROM_UNIXTIME(CHANGED ) FROM [your_drupal_db_name].node n, [your_drupal_db_name].node_revisions r WHERE n.vid = r.vid AND ( TYPE = 'story' OR TYPE = 'page' OR TYPE = 'blog' OR TYPE = 'image');
This is basically Dave Dash’s query plus the extra columns for WordPress 2.5.1 and a fair bit more weeding out of characters used to make permalinks in WordPress. Dave showed me how, but I guess he doesn’t use much punctuation in his headlines because you need to make sure you weed out colons, question marks, hyphens, parentheses, exclamation marks and instances of two or more hypens used in your post titles to construct your WordPress permalinks.
Otherwise your permalinks won’t work. I didn’t realize this until I started testing my new WP blog and noticed a lot of articles that would show up in home or archive listings but would give me “no articles matched your query” when I clicked on the title.
So I exported a CSV file of my post ids and post_names, grepped them for punctuation, then made the results into SQL queries to update the permalinks stored in the wp_posts table.
If you write your drupal import query you should be able to skip this step and live a relatively normal life.
Tagging the posts
INSERT INTO wp_term_relationships (object_id,term_taxonomy_id) SELECT nid,tid FROM [your_drupal_db_name].term_node ;
…from Dave Dash’s site seems promising. It tags your new WordPress posts the way they were tagged in drupal. Now, of course, this only works if you remembered to empty your WordPress tables before doing any imports, or adding any test posts. I didn’t and all my tags were about other posts. I had to redo the import.
Bringing comments over
INSERT INTO wp_comments (comment_post_ID, comment_date, comment_content, comment_parent, comment_author_IP, comment_author, comment_author_email, comment_author_url) SELECT nid, FROM_UNIXTIME(timestamp),concat('',subject, ' ', comment), thread, hostname, name, mail, homepage FROM [your_drupal_db_name].comments ;
WP seems to have evolved a lot from this Dave Dash query. The columns in this post are all still there but the table seems to hold lot more information.
Now drupal’s comments look like:
`cid` int(10) NOT NULL auto_increment, `pid` int(10) NOT NULL default '0', `nid` int(10) NOT NULL default '0', `uid` int(10) NOT NULL default '0', `subject` varchar(64) NOT NULL default '', `comment` longtext NOT NULL, `hostname` varchar(128) NOT NULL default '', `timestamp` int(11) NOT NULL default '0', `score` mediumint(9) NOT NULL default '0', `status` tinyint(3) unsigned NOT NULL default '0', `format` int(4) NOT NULL default '0', `thread` varchar(255) NOT NULL default '', `users` longtext, `name` varchar(60) default '', `mail` varchar(64) default '', `homepage` varchar(255) default '',
And wordpress’s looks like:
`comment_ID` bigint(20) unsigned NOT NULL auto_increment, `comment_post_ID` int(11) NOT NULL default '0', `comment_author` tinytext NOT NULL, `comment_author_email` varchar(100) NOT NULL default '', `comment_author_url` varchar(200) NOT NULL default '', `comment_author_IP` varchar(100) NOT NULL default '', `comment_date` datetime NOT NULL default '0000-00-00 00:00:00', `comment_date_gmt` datetime NOT NULL default '0000-00-00 00:00:00', `comment_content` text NOT NULL, `comment_karma` int(11) NOT NULL default '0', `comment_approved` varchar(20) NOT NULL default '1', `comment_agent` varchar(255) NOT NULL default '', `comment_type` varchar(20) NOT NULL default '', `comment_parent` bigint(20) NOT NULL default '0', `user_id` bigint(20) NOT NULL default '0',
So the above works, but I find that people seldom use meaningful subjects in drupal comments so I skipped the concat thing and added some of the extra info.
INSERT INTO wp_comments (comment_post_ID, comment_date, comment_content, comment_parent, comment_author_IP, comment_author, comment_author_email, comment_author_url) SELECT nid, FROM_UNIXTIME(timestamp), comment, thread, hostname, name, mail, homepage FROM [your_drupal_db_name].comments ;
Now where are we?
- Posts are all in.
- Posts are correctly tagged
- Comments are there, but counts are wrong
- Posts have comments but that’s not indicated.
- Posts don’t indicate the author
Adding the author info
Easy. I only have one author:
update `wp_posts` set post_author = 2 where post_author = 0
Fixing the comment counts
Maybe there’s a way to do this just with SQL commands, but I don’t know it.
So I wrote a dorky PHP script using Justin Vincent’s ezSQL class to do the job for me. Imagine this in PHP.
get_col("select ID from wp_posts ORDER BY id",0) as $id ) { $ccountquery = "SELECT count(comment_id) as CommentCount FROM wp_comments WHERE comment_post_id = " . $id; $currentcommentcount = $db->get_var($ccountquery); $commentcountupdatequery = "UPDATE wp_posts set comment_count = " . $currentcommentcount . " WHERE ID = " . $id . " LIMIT 1"; $db->query($commentcountupdatequery); echo "
Post ID” . $id . ” Number of comments: ” . $currentcommentcount . ”
"; }
Incredibly, this worked.
When putting together my “Tag cloud” I discovered another problem. WordPress builds the tag clouds based on the count column in wp_term_taxonomy. None of my tags were showing up in the cloud.
So I built another disposable PHP script:
get_col("select term_id from wp_terms ORDER BY term_id",0) as $id ) { $ccountquery = "SELECT count(term_taxonomy_id) as TermUseCount FROM wp_term_relationships WHERE term_taxonomy_id = " . $id; $currentermcount = $db->get_var($ccountquery); $commentcountupdatequery = "UPDATE wp_term_taxonomy set count = " . $currentermcount . " WHERE term_id = " . $id . " LIMIT 1"; $db->query($commentcountupdatequery); echo "
" . $ccountquery . "
“;
echo ”
Term ID: ” . $id . ” Number of posts with that term: ” . $currentermcount . ”
“;
}
Feeling like we’re making progress?
- mod_rewrite rules to send people from old to new URLs.
- do something about the photos/videos
- add in the links to other blogs and sidebar widgetry
1.) Me, I’ve written a few mod_rewrite rules in my life, but I wouldn’t say we’re close friends. So I take a trip to the Apache URL rewriting guide.
And I come up with this:
RewriteRule ^node/feed$ feed [L,R=301] RewriteRule ^node/(.*)$ ?p=$1 [L,R=301]
I added the above two rules to my .htaccess file in my WordPress website root directory. You could, theoretically, write a rewrite rule for each URL, mapping them to the permalink, but I’m thinking that might be a bit more expensive in terms of processor time and what not.
So the whole thing looks like this:
# BEGIN WordPress RewriteEngine On RewriteBase / RewriteRule ^node/feed$ feed [L,R=301] RewriteRule ^node/(.*)$ ?p=$1 [L,R=301] RewriteCond % !-f RewriteCond % !-d RewriteRule . /index.php [L]
The first rule is for the RSS feed, which I noticed was broken too. But the regular rule won’t work because of the WordPress URL.
2.) do something about photos/videos
This doesn’t look promising. Over my years of trying to find an easy and efficient way to add images to pages I have tried many things. Never had any luck. But the result of these experiments is that no two images appear to have been created the same way.
So there doesn’t seem to be an easy way to bring stuff over. Some images continue to show in thumbnail because they’re referencing images my old blog. But the javascripts used to open the full size images are broken. And as soon as I take my old blog app offline they’ll be gone.
Of course, externally linked things like YouTube videos are all fine. Another thumb up for contracting out your website.
When I did my Drupal node import I forgot that a lot of my posts were created as Image nodes, and I didn’t bring them over. So I went through and re-created them, back-dated them and re-uploaded the images. Very tedious work, though I took it as an excuse to re-read some old posts about my daughter, so I didn’t mind. Your mileage may vary and if I were doing this again I would definitely bring the image nodes over with all the other kinds.
Widgetry, what of it? Nothing for it
The sidebar widgetry and blogroll stuff won’t offer any repeatable lessons. I had got so frustrated with Drupal’s various devices to create blog rolls and RSS feeds and what not that I just created all my sidebar stuff using static HTML.
Known problems
- The accented characters in my post titles didn’t translate. They came over unencoded or in the wrong Charset and ended up as question marks.
- I also didn’t bring over comments on those forgotten image nodes, but the above queries, corrected to include image nodes, should work.