Yesterday, I had to run a query for some statistics I needed. This was a query that I knew were going to be particularly nasty as it required sorting 1.3M rows. Normally I run these sorts of queries on a reporting slave I keep around for this reason, but for some reason I chose to run this query on a production slave. When I ran my query, I got the following error;
ERROR 3 (HY000): Error writing file ‘/tmp/MYNcSyQ9’ (Errcode: 28)
Oh. *&^%. After some Googling, a bit of shitting my pants, and a wild grep session through as many application logs as I could find, I was able to figure out that problem seemed limited to this particular query. My Googling turned up the fact that the error code indicated that the server was out of disk space.
As a rapidly growing company, we’ve had our fair share of issues with managing (or failing to manage) rapidly filling disks, failed RAID controllers, and the like. However, I had recently done audits of this particular cluster of servers, and ascertained that the situation with disks was nominal. I was confident the disk wasn’t full, and permissions were correct. Our particular disk layout puts /tmp on its own 2GB partition, and after running the query, that partition was 2% full.
It turns out that during the execution of the query, MySQL was creating a temporary table that was 2GB, hence the error. By default MySQL will write temporary tables to /tmp, which in many cases, is its own small partition. The solution here was to set the tmpdir to a folder on the main partition adjacent to the MySQL datadir. This solution obviously has its own problems (ie you could fill your main partition, which is way worse than filling /tmp) However, for this type of ad hoc query, this was exactly what we needed.