Gearman

2009.07.25

Users have high expectations of web apps in terms of performance, responsiveness and tons of features. Normally, you’re only allowed two of any list of three really cool things. In the case of Web Apps, that would be true. Most will find some compromise of between performance / responsiveness and tons of features. More features usually equals less responsiveness, depending on the feature and scale.

Enter Gearman. Gearman is a queuing system that allows work to be farmed out to other servers. Most importantly, it allows for intense tasks to be queued and performed in the background. This means that when a user performs an action that could potentially take a long time (sending notification emails, updating Full Text indexes, etc), that slow task can be queued to run in the background, and the page can be sent to the user, keeping things snappy.

Gearman is pretty simple to install on Red Hat.

download gearman from server
> wget http://launchpad.net/gearmand/trunk/0.8/+download/gearmand-0.8.tar.gz

unzip and move into the directory
> tar -xvzf gearmand-0.8.tar.gz
> cd gearmand-0.8

Red Hat didn’t have some dependencies. The next few steps will vary depending on your *nix distro.

Install the libevent developer library.
> yum install libevent-devel

Install the e2fsprogs developer library
> yum install e2fsprogs-devel

configure and install
> ./configure
> make
> make install

/** Net Gearman **/

download php extension from the pecl repo
> wget http://pecl.php.net/get/gearman-0.4.0.tgz

untar
> tar -xvf gearman-0.4.0.tgz

build the extension
> phpize
> ./configure
> make
> make test
> make install

Add the extension to the php.ini

[gearman]
extension=gearman.so

And you’re all set!

Integration will depend on if you decide to use the php extension, and how encapsulated the code base is. I highly recommend using the pecl extension, as it provides great implementations of the client and worker. and Gearman will save you.

Save MySQL

2009.07.16

Runaway queries on MySQL can be a real problem. If a long-running query locks up important tables, other queries trying to query the table will will placed in a queue. Each new query is a new connection to MySQL. Once you hit max_connections, your MySQL connection code will start to fail. Depending on how errors are handled at this stage of the request, this could mean total disaster for a site.

Although there is no way to fix this within the MySQL server itself, a bit of clever scripting can be run via cron to check if there is a problem. Presenting : save_mysql

/usr/bin/mysql -e ’show full processlist \G;’ 2> /dev/null |
grep -A1 -B5 -E “Time: [1-9][0-9][0-9]?” |
grep -E “\bId\:\ |\bState\:\ ” |
/usr/bin/perl -n -e “if( $. % 2 ) { chomp $_;print $_; } else { print $_; }” |
grep -E “\ State\:\ Sending\ data$|\ State\:\ Sorting\ result$” |
awk {‘print $2′} |
xargs -iTHREAD -r -n1 /usr/bin/mysqladmin kill THREAD &> /dev/null

/usr/bin/mysql -e ’show full processlist \G;’ 2> /dev/null
This line will grab a list of all the currently running queries and commands from the MySQL server. It also redirects any error output to the blackhole. It produces output like so:

*************************** 1. row ***************************
Id: 842863
User: admin
Host: localhost
db: NULL
Command: Query
Time: 0
State: NULL
Info: show full processlist

grep -A1 -B5 -E “Time: [1-9][0-9][0-9]?”
The grep here will grab line directly below and the 5 above if the time is over 100 seconds. This line can be tweaked to grep for less time. My preference is between 30 seconds and a minute. So instead of
[1-9][0-9][0-9]
you’d have
[3-9][0-9] (30 seconds) OR [6-9][0-9] (60 seconds)

grep -E “\bId\:\ |\bState\:\ ”
This will filter out the other lines from the previous grep and just grab the MySQL process ID and it’s State.

/usr/bin/perl -n -e “if( $. % 2 ) { chomp $_;print $_; } else { print $_; }”
Quick Perl script to put id and state from the step above on the same line.

grep -E “\ State\:\ Sending\ data$|\ State\:\ Sorting\ result$”
This line will filter out the queries being run that are in the state ‘Sending Data’ or ‘Sorting Result’. These are both states where it’s safe to kill the query.

awk {‘print $2′}
This line grabs the query ID from the output.

xargs -iTHREAD -r -n1 /usr/bin/mysqladmin kill THREAD &> /dev/null
Lastly, this line will grab the ID from above to the mysqladmin kill command, effectively killing the query.

Do one thing, but do it really well

2009.07.07

In life there are people who will consider themselves Jacks of all Trades. But as the saying goes, they are master of none.

Websites will always start out as a codebase that does everything. There will be a couple files that add users, encode video to Flash, pull rss feeds, assemble HTML, update products, charge users, manipulate images, redirect old links, handle file uploads, calculate shipping, delete categories, create rss feeds, search the database, etc. Sometimes the code to do these will be organized into files, sometimes it won’t. The whole site will run on a single server, or more likely, a slice of a single server.

None of the things in the above list will be done well. None. This is mostly because there is too little code and too little hardware focused on doing too much. Also, every piece of code will be tightly coupled. So any one of those features could potentially get a ton of traffic, or hit a bump, and consume a ton of resources. Once that happens, it’s safe to assume the whole thing will go down in flames.

So to avoid the Fail Whale, its really important to build sites as a group of components that work together. Architecture is key, and when carefully thought out, can ensure that the most important parts of the site stay up. Even when your image manipulation script on the backend freaks out, the home page should continue to load flawlessly.

With database-driven apps (almost every major site on the web), there needs to be particular attention paid to a caching layer. Again, since most sites start out with a jumbled codebase, the likelihood that all the code to manage data is in the same place is unlikely. Given the complexities of managing cache objects, making sure that objects are invalidated on update is crucial to making updates look seamless. So there needs to be a set of code that’s good at one thing: managing data and its cache.

Search is another area that commonly relies on database, and can eat a ton of resources. If performing search in SQL, difficult queries can lock tables and keep other queries from being answered. As good as some DBMSes have gotten at handling search ( ie MySQL’s FULLTEXT ), they still can’t fulfill the concurrency demands of a site with heavy traffic. So, again, the solution is a change where a resource intensice feature needs to be isolated from other code. There are a few different ways to do this. One is running replication, which may not be possible in smaller hosting environments. Another is to use Full Text Search (Lucene, Sphinx, etc.) Again, this may not be possible in smaller hosting environments.

Using code that’s already good at managing and retrieving data, an interface can be built to query your data. A second hosting environment that’s suitable for running the search tool of your choice can then query the data code for updates it needs to keep itself updated. In turn, this server will return search results without tying up any resources necessary for doing important stuff, like serving the home page.

So in these two short examples, we’ve created a theoretical architecture that can sustain heavy, site-breaking traffic to the search, and still continue to serve the home page. Of course, until the Apache server becomes so inundated with requests that it can’t do anything. Then it’s time to get that load balancer in place…

Explain your code

2009.07.01

In my search to expand my dev team, I use a code sample as one of the main determining factors. During an interview, I will always make the same request:

“Give us a code sample. It can be something that you think is really great, or something you think really sucks. Most importantly, tell us why you think it’s great, or why you think it sucks.”

No one seems to able to do it. I have received code samples that consist of stream wrappers, database wrappers, complete websites, etc. Some have been really good, and some have been outright scary. But very few candidates have been able to communicate what they think of their own code and why.

Which is very surprising, given most developers’ proclivity to judge others’ code as total crap without a second thought (guilty).