A Tool for Importing Tags from Squarespace into WordPress

I migrated a website from Squarespace into WordPress recently.  As part of that process, I used a tool to import blog posts into WordPress.  Unfortunately, Squarespace does not export tags in their export format.  With Scrapy, I was able to configure a tool that crawled the Squarespace website, matched tags using xpath selectors and dumped those into a json file containing a list of post titles and the tags associated with that post.

The key part this is the spider configuration. Running the tool results in a json lines file like this:

{"title": ["Roads"], "tags": "homeschooling,self discovery,self-directed learning,staff post,travel"}
{"title": ["Do Something\u00a0Projects"], "tags": "Social Issues,classes,learning,news"}

Then I used WP-CLI, a command line interface to WordPress, to generate a list of the posts containing their ID and title.

$ php ~/wp-cli.phar post list --fields=ID,post_title --format=json > ~/post_ids.json

The resulting file looks like:

[{"ID":1370,"post_title":"Talking to Teens and Parents When School Isn't Working"},{"ID":1369,"post_title":"Philanthropy at North Star"}]

A quick python script matches up the tags with the appropriate title and uses the wp-cli tool to update the post:

import json
from subprocess import call

ids = []
with open('post_ids.json') as f:
for line in f:
  ids.append(json.loads(line))

with open('items.jl') as f:
  for line in f:
    post = (json.loads(line))
    for item in ids[0]:
      # Replace unicode non-breaking spaces with ascii chars.
      if item["post_title"] == post["title"][0].replace(u"\u00a0", " "):
        call(["/usr/bin/php", "/path/to/wp-cli.phar", "--path=/to/wordpress/root", "post", "update", str(item["ID"]), "--tags_input=" + post["tags"]])

You can find this code on github here.