A Tool for Importing Tags from Squarespace into WordPress

I migrated a website from Squarespace into WordPress recently.  As part of that process, I used a tool to import blog posts into WordPress.  Unfortunately, Squarespace does not export tags in their export format.  With Scrapy, I was able to configure a tool that crawled the Squarespace website, matched tags using xpath selectors and dumped those into a json file containing a list of post titles and the tags associated with that post.

The key part this is the spider configuration. Running the tool results in a json lines file like this:

{"title": ["Roads"], "tags": "homeschooling,self discovery,self-directed learning,staff post,travel"}
{"title": ["Do Something\u00a0Projects"], "tags": "Social Issues,classes,learning,news"}

Then I used WP-CLI, a command line interface to WordPress, to generate a list of the posts containing their ID and title.

$ php ~/wp-cli.phar post list --fields=ID,post_title --format=json > ~/post_ids.json

The resulting file looks like:

[{"ID":1370,"post_title":"Talking to Teens and Parents When School Isn't Working"},{"ID":1369,"post_title":"Philanthropy at North Star"}]

A quick python script matches up the tags with the appropriate title and uses the wp-cli tool to update the post:

import json
from subprocess import call

ids = []
with open('post_ids.json') as f:
for line in f:
  ids.append(json.loads(line))

with open('items.jl') as f:
  for line in f:
    post = (json.loads(line))
    for item in ids[0]:
      # Replace unicode non-breaking spaces with ascii chars.
      if item["post_title"] == post["title"][0].replace(u"\u00a0", " "):
        call(["/usr/bin/php", "/path/to/wp-cli.phar", "--path=/to/wordpress/root", "post", "update", str(item["ID"]), "--tags_input=" + post["tags"]])

You can find this code on github here.

WordPress Optimization and Monitoring

I spent some time recently working on improving the performance of a WordPress installation.  I had set up a new server at Digital Ocean, a relative newcomer to the virtual server world. In general, I’ve been pleased with their product. The pricing is good, the interface is easy to use and intuitive and the uptime has been good.

The default install for WordPress has been to use the Apache webserver. WordPress comes with the .htaccess rewrite rules for making nice looking urls using Apache.  Unfortunately, Apache doesn’t come configured out of the box with reasonable memory usage parameters and can quickly suck up as much RAM as you throw at it.

Each Apache server process was using about 35M of RAM.  On a 512M virtual server, I’m going to allocate about 350M or about 65% of memory to the webserver.  The configuration looks like this:

# prefork MPM
# StartServers: number of server processes to start
# MinSpareServers: minimum number of server processes which are kept spare
# MaxSpareServers: maximum number of server processes which are kept spare
# MaxClients: maximum number of server processes allowed to start
# MaxRequestsPerChild: maximum number of requests a server process serves

    StartServers          5
    MinSpareServers       5
    StartServers          5
    MinSpareServers       5
    MaxSpareServers      10
    MaxClients           10
    MaxRequestsPerChild   0

This keeps things under control, but performance was still not great. Load testing using ApacheBench with 10 concurrent requests showed an average response time just over 2500ms.

ab -n 100 -c 10 http://healthloop.com/index.php

I’ve used Nginx with a lot of sites recently and thought I’d see if it helped with performance here. Configuring Nginx with WordPress isn’t too complicated, but is less widely known than the Apache configuration:

server {
        listen 80;
        server_name example.com;

        root /var/www/wordpress;
        index index.php index.html index.htm;

        access_log /var/log/nginx/example.com.access.log;
        error_log /var/log/nginx/example.com.error.log;

        # Use pretty permalinks.
        location / {
            try_files $uri $uri/ /index.php?q=$uri&$args;
        }

        error_page 404 /404.html;
        error_page 500 502 503 504 /50x.html;

        location = /50x.html {
            root /usr/share/nginx/www;
        }

        # pass the PHP scripts to php5-fpm
        location ~ \.php$ {
            try_files $uri =404;
            fastcgi_pass unix:/var/run/php5-fpm.sock;
            fastcgi_index index.php;
            include fastcgi_params;
        }

        # Set Expire for static assets.
        location ~*  \.(jpg|jpeg|png|gif|ico|css|js)$ {
           expires 365d;
        }

}

Unfortunately, I didn’t see any performance gains from switching to Nginx. I did have reduced RAM consumption, but testing showed an average of several hundred ms slower performance.

Enter Batcache and the APC Object Cache.  Batcache is a full-page caching plugin for WordPress and will cache the content of the WordPress site for anonymous users.  Authenticated visitors see the non-cached version, so this might not be the ideal solution for every WordPress site, but it was perfect for this scenario.  After installing Batcache and the WordPress plugin for APC support, testing showed the average response time had dropped to about 600ms per request.

Here’s the APC configuration I added to /etc/php5/fpm/php.ini. Initially, I had the shm_size at 32M, but noticed that the APC cache was getting highly fragmented. Since doubling the cache size, fragmentation has stayed low, in the 2-3% range.

[APC]
extension=apc.so
apc.enabled=1
apc.shm_segments=1

;size per WordPress install
apc.shm_size=64M

I’ve also been experimenting with monitoring both server and application status with New Relic. New Relic provides nice charts displaying application response time, CPU and RAM usage, and a number of other useful metrics. They also provide configurable notifications. Soon after installing the New Relic agent, I got an alert of high activity, checked the log file, and discovered an attack on /wp-login.php. Thwarted with iptables:

sudo /sbin/iptables -I INPUT -s 74.208.246.118 -j DROP

Here’s an example of their rather elegant charts:

Screen Shot 2013-12-20 at 9.02.37 PMAnother option I’ve been exploring recently is Cloudflare.com. They offer caching of static assets in their CDN at their free account level along with some basic threat protection. So far it seems to be working out well, though perhaps not with as drastic improvements as I saw in this case.

WordPressing “The Homesnewser”

Screen Shot 2013-02-04 at 10.43.46 AM

I built this site for the a homeschool newspaper publication that needed an online presence.

Screen Shot 2013-02-04 at 10.43.46 AM

It’s built with WordPress using a responsive design to accomodate everything from phones to tablets to desktop browsers.  There are a  few interesting plugins:

The Facebook Like Box allows visitors to share the articles with their friends on Facebook.  Getting the word out to friends and subscribers is one of the key use cases for this website.  A Google Analytics plugin allows staff to evaluate how successful social media campaigns have been in driving visitors to the site.

FlexPaper, a web-based document viewer, allows visitors to browse the PDF version of newspapers without leaving the current page they are viewing.

Screen Shot 2013-02-04 at 10.47.54 AM

Using WordPress in multiple environments

figure1

When developing software, it’s important to have different environments so that modifications made by developers don’t affect others outside their own “sandbox” or the “production” server. Best practices generally dictate three tiers: a local development environment for each developer, a staging environment to integrate changes from multiple developers, and a production environment.

WordPress is one of the best blogging platforms available, but it isn’t really designed to be used in multiple environments. It isn’t too difficult to setup multiple environments, though. Configuration variables can be read in from environment variables that tell WordPress which environment it is in.

With Apache, add the environment variables to your .htaccess file:

SetEnv WP_DB your_database_name
SetEnv WP_USERNAME your_username
SetEnv WP_PASSWORD your_password

Then you can grab these settings in the WordPress configuration file, wp-config.php:

// The name of the database for WordPress
define('DB_NAME', $_ENV['WP_DB']);
// MySQL database username
define('DB_USER', $_ENV['WP_USERNAME']);
// MySQL database password
define('DB_PASSWORD', $_ENV['WP_PASSWORD']);
// For developers: WordPress debugging
define('WP_DEBUG', $_ENV['WP_DEBUG']);
// Override the wp_options and set the site
define('WP_SITEURL', 'http://' . $_SERVER['SERVER_NAME']);
define('WP_HOME', 'http://' . $_SERVER['SERVER_NAME']);

You can also set these environment variables using nginx:

include fastcgi_params;
fastcgi_param WP_DB your_database;
fastcgi_param WP_USERNAME your_username;
fastcgi_param WP_PASSWORD your_password;

You probably also need to tell php to load your environment variables in php.ini:
variables_order = "EGPCS"

Now you can have multiple environments set up to use with your version control without any additional direct configuration of WordPress.

WordPress on OS X with Nginx, PHP, and mysql

wordpress-logo-notext-rgb

Recently, I wanted to do some WordPress development on my Mac.  I’ve got Ubuntu installed on a virtual machine, but I decided to get the stack running on OS X.

I’ve been using Nginx for a while now as a web server, caching proxy, and load balancer for some Plone sites.   It’s been fantastic – fast, reliable, easy to configure – greatly simplifying my life as a sysadmin.  I was interested to see how it would work for serving wordpress.

If you haven’t checked out homebrew, you should.   No, it’s not beer, but a package manager for OS X.  Homebrew is, they claim and I agree, the easiest and most flexible way to install the UNIX tools Apple didn’t include with OS X.

First, install and start the mysql server:

brew install mysql
/usr/local/bin/mysqld_safe --datadir=/usr/local/Cellar/mysql/5.5.10/data

Installing PHP is a little more complicated, since there isn’t an official homebrew formula. Instead, grab this formula:

wget https://github.com/ampt/homebrew/raw/php/Library/Formula/php.rb
mv php.rb `brew --prefix`/Library/Formula

Then, build PHP with mysql and fastcgi support:

brew install php --with-mysql --with-fpm

Tell PHP to listen on port 9000:

/usr/local/Cellar/php/5.3.6/bin/php-cgi -b 9000

Installing nginx is as simple as:

brew install nginx

You’ll need to modify the nginx.conf, which can be found in /usr/local/etc/nginx, and add this configuration:

server {
    listen       8080;
    server_name  localhost;

    location / {
        root   /path/to/wordpress;
        index  index.php index.html index.htm;
    }

    location ~ .php {
    include fastcgi_params;
    fastcgi_pass 127.0.0.1:9000;
    fastcgi_index index.php;
    fastcgi_param SCRIPT_FILENAME /path/to/wordpress/$fastcgi_script_name;
}

Start the Nginx server like this:

/usr/local/sbin/nginx -c /usr/local/etc/nginx/nginx.conf

There are more robust ways to start these services, but since this is just a development environment, I prefer not to have them running unless I am actively working with them. Now you should be ready to install WordPress.  Happy developing!