MySQL Error 28

2011.11.02

Yesterday, I had to run a query for some statistics I needed. This was a query that I knew were going to be particularly nasty as it required sorting 1.3M rows. Normally I run these sorts of queries on a reporting slave I keep around for this reason, but for some reason I chose to run this query on a production slave. When I ran my query, I got the following error;

ERROR 3 (HY000): Error writing file ‘/tmp/MYNcSyQ9′ (Errcode: 28)

Oh. *&^%. After some Googling, a bit of shitting my pants, and a wild grep session through as many application logs as I could find, I was able to figure out that problem seemed limited to this particular query. My Googling turned up the fact that the error code indicated that the server was out of disk space.

As a rapidly growing company, we’ve had our fair share of issues with managing (or failing to manage) rapidly filling disks, failed RAID controllers, and the like. However, I had recently done audits of this particular cluster of servers, and ascertained that the situation with disks was nominal. I was confident the disk wasn’t full, and permissions were correct. Our particular disk layout puts /tmp on its own 2GB partition, and after running the query, that partition was 2% full.

It turns out that during the execution of the query, MySQL was creating a temporary table that was 2GB, hence the error. By default MySQL will write temporary tables to /tmp, which in many cases, is its own small partition. The solution here was to set the tmpdir to a folder on the main partition adjacent to the MySQL datadir. This solution obviously has its own problems (ie you could fill your main partition, which is way worse than filling /tmp) However, for this type of ad hoc query, this was exactly what we needed.

MySQL and Linux swappiness

2010.07.29

MySQL’s InnoDB engine is really great. Row-level locking is amazing in tables where there is heavy concurrency. Write buffering is also awesome for cases where a table needs to accept a lot of data. InnoDB’s use of memory to store indexes or sometimes the entire table can also make reads incredibly fast, especially on tables that need to support complex queries where even the best placed indexes do nothing.

However, when tables get large, the innodb_buffer_pool is set to close to amount of memory on the server, Linux has a tendency to remove your data from memory for no good reason. The symptoms are unmistakable: a query that was known to be pretty quick, but hasn’t run in a while, will take long. Too long. Run it again, and it becomes snappy. What’s happening is that when the query initially runs, the necessary data isn’t in memory, so it’s read in from disk, and the query is performed. Once it’s in memory, that second run is quick.

Actually there is a good reason Linux behaves like this:

“My point is that decreasing the tendency of the kernel to swap stuff out is wrong. You really don’t want hundreds of megabytes of BloatyApp’s untouched memory floating about in the machine. Get it out on the disk, use the memory for something useful.”

http://kerneltrap.org/node/3000

This all makes sense, as most systems need to reclaim memory from applications that aren’t doing anything. Except in the case where you have a large dataset in InnoDB that you’d really like to be in memory when you query it. Luckily, there is a tunable that you can change to dictate how aggressive Linux is reclaiming memory from applications. /proc/sys/vm/swappiness stores a number for 0 to 100, where 100 means that Linux will be extremely aggressive in reclaiming memory, and 0 means that memory won’t be reclaimed all that much.

For servers that need to keep datasets in memory all the time, this variable can be extremely helpful. With an InnoDB table / indexes that consume ~80% of memory on the machine, a swappiness value of 30 is sufficient to allow MySQL to keep most of that in memory. Of course, I don’t recommend this for a machine that is not 100% dedicated to a single task. However, on dedicated MySQL machines, tuning this variable can be really helpful.

Categories : MySQL

MySQL Slave Delay and Maatkit

2010.05.02

This post could alternately be titled: ‘How to make developers hate you.’

A very common criticism of MySQL is that there is no support for delayed replication. Delaying data flowing from master to slave can be very useful in certain cases. For example, running a co-located slave for backups is still susceptible to data problems that caused by a DELETE with no where or a mistaken executed DROP. However, by running the slave anywhere from an hour to a day behind, you have the opportunity to catch whatever problems caused and have a good copy of your data ready to go.

In sandbox environments, a consistent slave delay is a great way to reproduce race conditions. In fact, running slave delay gives you the opportunity to ensure that data will be out of sync between the master and slave. When you can count on this part of the environment, developers can test and write code against this condition. Of course, in reality, working in this type of environment is reaally annoying, but necessary.

Delayed MySQL replication can be accomplished by using a tool from the maatkit library. Documentation for the tool can be found at http://www.maatkit.org/doc/mk-slave-delay.html. What’s great about this tool is that can be run as a daemon, so that it can be easily run for an extended period of time, without have to do any serious management.

Categories : MySQL

MySQL skip-name-resolve

2010.04.02

Small, obscure optimizations sometimes have the potential to make the greatest impact. For example, every time a connection is made, MySQL will do a DNS lookup of the host that is trying to connect. If MySQL is handling many connections, the overhead of an extra DNS lookup can be hefty, simply because of the number of extra operations that have to be performed before MySQL can actually start doing actual work.

Thankfully, there is an option in recent versions (4.1+) of MySQL that will instruct MySQL to skip the extra DNS lookup. It’s a fairly obscure option called skip-name-resolve. The only caveat to using this option is that the users defined the GRANT tables can only use IP addresses as hostnames. For most MySQL users, this shouldn’t be an issue.

Categories : Best Practices  MySQL

Save MySQL

2009.07.16

Runaway queries on MySQL can be a real problem. If a long-running query locks up important tables, other queries trying to query the table will will placed in a queue. Each new query is a new connection to MySQL. Once you hit max_connections, your MySQL connection code will start to fail. Depending on how errors are handled at this stage of the request, this could mean total disaster for a site.

Although there is no way to fix this within the MySQL server itself, a bit of clever scripting can be run via cron to check if there is a problem. Presenting : save_mysql

/usr/bin/mysql -e ‘show full processlist \G;’ 2> /dev/null |
grep -A1 -B5 -E “Time: [1-9][0-9][0-9]?” |
grep -E “\bId\:\ |\bState\:\ ” |
/usr/bin/perl -n -e “if( $. % 2 ) { chomp $_;print $_; } else { print $_; }” |
grep -E “\ State\:\ Sending\ data$|\ State\:\ Sorting\ result$” |
awk {‘print $2′} |
xargs -iTHREAD -r -n1 /usr/bin/mysqladmin kill THREAD &> /dev/null

/usr/bin/mysql -e ‘show full processlist \G;’ 2> /dev/null
This line will grab a list of all the currently running queries and commands from the MySQL server. It also redirects any error output to the blackhole. It produces output like so:

*************************** 1. row ***************************
Id: 842863
User: admin
Host: localhost
db: NULL
Command: Query
Time: 0
State: NULL
Info: show full processlist

grep -A1 -B5 -E “Time: [1-9][0-9][0-9]?”
The grep here will grab line directly below and the 5 above if the time is over 100 seconds. This line can be tweaked to grep for less time. My preference is between 30 seconds and a minute. So instead of
[1-9][0-9][0-9]
you’d have
[3-9][0-9] (30 seconds) OR [6-9][0-9] (60 seconds)

grep -E “\bId\:\ |\bState\:\ ”
This will filter out the other lines from the previous grep and just grab the MySQL process ID and it’s State.

/usr/bin/perl -n -e “if( $. % 2 ) { chomp $_;print $_; } else { print $_; }”
Quick Perl script to put id and state from the step above on the same line.

grep -E “\ State\:\ Sending\ data$|\ State\:\ Sorting\ result$”
This line will filter out the queries being run that are in the state ‘Sending Data’ or ‘Sorting Result’. These are both states where it’s safe to kill the query.

awk {‘print $2′}
This line grabs the query ID from the output.

xargs -iTHREAD -r -n1 /usr/bin/mysqladmin kill THREAD &> /dev/null
Lastly, this line will grab the ID from above to the mysqladmin kill command, effectively killing the query.