New Years Actionable (non-whiner) Resolutions

2012.01.03

Normally, I’ve considered New Year’s Resolutions to be for whiners, people who never actually accomplish anything, people who are normally on the whhhaaaambulance. Most resolutions fall somewhere in the “stop being fat” to “be fluent in mesopotamian glyphs” range. They’re vague, completely un-actionable, and just describe a slightly unattainable goal / end result / dumbass want. This year, however, I’ve discovered a few serious problems in my life I need to fix. So, in the spirit of actually doing shit, I’ve provided a list of non-whiner resolutions + ways to actually make them happen.

1. Go to bed early(ish). This is a tough one for me. I’m naturally a night owl, but I might be able to fool myself to getting to bed early by showering early(ish). I love me a good shower. I’ve even been known to drink a beer in the shower. I also really like being in really warm, comfortable, if slightly embarrassing, PJs. All these things put me in a good mood and generally make me want to relax, which is not all that far from being asleep. Action Steps: Just get in the fucking shower, don’t look at the mail / email / dirty dishes / messy apartment / email / Twitter, just get in the shower (bring a beer).

2. See some doctors. I’ve avoided doctors for awhile, mostly because my lifestyle is a cross between Denis Nedry from Jurassic Park and a barfly. I consider this pretty simple. Action Steps: Make appointments with the following: dentist, general physician, eye doctor, nutritionist. Do it. Do what they say, even if it sucks. Follow up as often as the quacks say so.

3. Go to the gym regularly. OK, this one is, without a doubt, the most cliche, whine-tastic resolution evar. I know, because I have been to the gym in January. I’ve also been to the gym in April, when all the kids who were at the gym in January are nowhere to be found. Also, since the gym is a #creepy and #gross place to shower, resolution #1 should be even more important. Action Steps: Put that ish on the Google calendar with the following reminders; 2 hours, 1 hour, 30 minutes, 15 minutes, 10 minutes, and 5 minutes before. Keep gym clothes in the office. Don’t care how bad they smell.

4. Blog more. Writing has been a great way to get me to collect my thoughts, find some hindsight, and maybe, just maybe, help some other folks who have the same demented thoughts / stupid problems. As a technical guy, “the inspiration” doesn’t hit me so often, and when it does, I’m often busy, y’know, actually doing shit. However, as I’ve noted to myself more than once, keeping track of my day and journaling how I spend my time is something incredibly important for introspection. Action Steps: Write that thought down. Write down what you did 30 seconds ago, especially if it was different from regularly scheduled programming. Keep a sticky on your monitor to write shite down. Ask the dude next to you (@bossjones) to remind you to you write shit down. Lastly, a glass (or 7) of white wine, the notebook in which all your shit is written, and wordpress should convene regularly. Google Calendar #ftw, again. Lastly, check Google Analytics on posts. The un-monitored blog post is not worth writing.

5. Read more. Once upon a time (yesterday) I didn’t know nearly as much as I do now. Most of that knowledge came from reading shit-tons of blogs, books, bathroom graffiti, articles, and whitepapers related to web development. I read everything with a goal: How can I use, or leverage this to help me / my business work a little better? The Action Steps here are a bit tougher, and slightly conflict with non-whiner resolution #6: Keep Google Reader open. Curate my list of feeds with relevant sources. Prune feeds that stopped providing useful information. Lastly, and perhaps most important, find tidbits of information that make a difference in my life and / or business.

6. Don’t be distracted by bold numbers in parentheses. Simple (kinda). Action Steps: close Gmail, close Twitter. Try, and #fail, to delete my Facebook account.

7. Stop playing so much fucking air guitar by myself, alone in my apt, and start playing some real guitar, and actually learn the songs I normally rock out to. Action Steps: restring the Epiphone, buy a new amp, find tabs for shit I want to learn. If I’m feeling really frisky, get back into a band.

Distributed Updates

2011.06.25

Part of managing any large site involves writing scripts that will go through oyur data, make changes, merge things, remove things, do type transformations, etc. Most of the time, in PHP, iterating through rows or objects will do just fine. However, when there are lots of rows or objects, you could be faced with a script that takes hours or days to run. Depending on how often active the is, you may need to restrict access to ensure that the data before and after the transformation remains consistent. In other words, if someone tries to make a change to the data before the transformation, and the new feature only looks at data after the transformation, that user has just lost their changes. That is Very Bad.

As sites get larger and problems like this loom, taking the site offline becomes less and less of an option. This is what the business team calls a luxury problem, and what the ops team refers to simply as a problem. One option is to write a more efficient script. You can get pretty far by simply ensuring you’re reading from the fastest data source available, make good use of cache, etc. ensure that the tables being read for the transformation are properly indexed. All of these are great places to start. Additionally, making sure that data is grabbed in chunks can give the database time to breathe. There’s nothing worse than getting stuck in MySQL’s “sending data” phase simply because it needs to read several thousand rows from disk. MySQL configuration can also be your friend here. If using InnoDB, increasing the insert buffer is a great way to speed up writes.*

However, as much as you can do to speed up a single transaction, the fact remains that you have to execute each transformation serially, one after another. Your bottleneck is the transformation itself. It will take (# of transformations * # of objects to transform) to complete the job. No matter how well tuned the database is, it will only be performing one operation at a time, which means that the other (max connections – 1) connections are doing precisely crap. So the next logical step is to change your update script to distribute the update operations so a few can be run in parrallel.

Rewriting the update script does require thinking about your update differently, and will not work in every case. For example, if one is simply moving a large amount of data from one table to another, and there is no transformation, or the transformation can be accomplished via a builtin MySQL function, use that. However, just be prepared to deal with locking issues, and the source data potentially not being available while the transformation is taking place. However, if your transformation is complicated, and requires per-case logic, this is definitely a good route to take. The biggest difference is how the code for the update is organized. The update script needs to be separated out into code that will apply the transformation for exactly one entity, and code that will manage which entities get transformed and when. Ideally, the code for the transformation is idempotent, so failures can be handled by simply resubmitting the entity / object to be transformed again.

Accomplishing parallel processing in PHP can be kind of tricky. Php’s pcntl_exec function has always felt a bit finicky to me. Of course exec on its own it blocking, so that’s out. Additionally, neither of these solutions offer any sort of baked-in communication between the process that submitted the job, and the process carrying out the job. That leaves us with a queuing system. Popular systems include: RabbitMQ and Gearman. Personally, I’ve made great use of Gearman. It’s easy to install, as is the PHP module.

To sum up, performing large data updates via a distributed system is the way to go if you have complex requirements per transformation, and the option to perform these processes in parallel.

*If using MySQL’s MyISAM engine, this isn’t necessarily true, as writes will block, and the database could become the bottleneck. However, since MySQL is continuing to push InnnDB, this is getting increasingly unlikely. So if your tables are all InnoDB, you’re probably in good shape.

Categories : Best Practices  Ops  Process

The Progress Bar Pysch

2011.03.13

A classic UX problems is communicating to users how long they’ll have to wait before their task completes. A spinner or progress bar provides feedback that the system is, in fact, doing something, and how long that task may take. Psychologically, progress bars create tension while progressing, and resolution when completed.

From a technical standpoint, progress bars are black magic. The developer is attempting to estimate a task based on potentially thousands of variables. In the case of a file upload, the developer has to deal with differing network conditions, disk performance, etc, etc, etc. Then they have to write the code to communicate what is happening to the browser. Not a trivial task. However, when executed well, can provide the user with reasonable feedback about their task.

Lately, sites like LinkedIn, Mint.com, and OKCupid have used that same tension to motivate users to completely fill out their profiles. During profile creation, a progress bar is displayed indicating how far the user has come along. Once the user completely fills out their profile, the progress bar hits 100%, and what changes? In most cases, nothing. The progress bar is just a psychological hack to entice users to go through the entire process.

The question is: Exactly how effective is the progress bar at enticing users to fully complete the task at hand? And are they actually worth it.

Categories : Best Practices  sxsw  Tools

Emergencies will audit the shit out of you

2010.10.22

Things never go wrong at convenient times: Like when you’re auditing the latest, coolest version of your app, and looking for bugs. Things have a funny way of working out fine then. However, soon as you look the other way, a multitude of problems come out of the woodwork. It usually goes something like this:

One server goes down, and the system that was supposed to fail silently starts screaming. The application it was supporting goes down, because the proper timeouts and error handling was never written. You can’t fail over, because failing over will take down 2 other applications. When that first server comes back up, nothing works, because the proper startup scripts were never put in place. Once the right services start, if you can remember what the hell they were, you find the original application is configured wrong. Not only is it configured wrong, it’s always been configured wrong, and no one noticed. No one noticed because it only explodes in the exact set of horrible circumstances you have right now. Which is, by the way, being down.

It’s an all-too-familiar story, and one that even most the anal of admins has dealt with. The fact of the matter is that it is going to happen, and there’s not a whole lot you can do to prepare, other than randomly pulling plugs out of servers. But with any mistake that causes downtime, it should only happen once. Proper postmortem examination needs to be taken here to figure out what went wrong where. Once all the variables are understood, the next step is to duplicate the same set of circumstances in your sandbox, and apply the necessary error handling.

Downtime and emergencies are a part of running any site. What’s really important is to treat emergencies as an opportunity to learn about what happens when systems fail, for real.

Ode to the Environment

2010.08.24

Consistency is key, in everything. People come to rely on trains, because they come on time. Devs rely on the environment which they develop in, because its stability allows them to be productive. Changing or upgrading that environment can be the same as changing the train schedule. Sometimes people get where they’re going faster, but sometimes, the change makes their life miserable.

Environment upgrades can be just like that.  Speed, security, features.  Everybody likes those.  The ugly side to upgrades is that they have a tendency to break things that are already working, may be incompatible with current code, and disrupt the work of the team.  As a struggling sysadmin / developer, all I really want in life is to build a stable platform that I can build my app in.

Hence, the Ode to the Environment:

The Environment is the basis for my business. Without it, and it’s consistency, there is uncertainty, chaos, and ultimately, failure.

I need to be able to replicate the Environment quickly, identify when issues are caused by it, sandbox it, and be comfortable building it from scratch, if it comes to that. (Hopefully, it never does.)

I need to be confident in the set of packages I’ve come to love, loathe, and rely on, and make sure they work for my business’s app.

I know that the Environment’s well-being will affect my application’s uptime, developers relying on it, and my business’s reputation.

I need to know the flaws and shortcomings in the Environment, and weigh how to fix them against the cost of change.

When it comes time to upgrade the Environment, there will be damn good reason. I need to be horribly convinced that my business will see benefits immediately.

Once I upgrade the Environment, I need to love and loathe it same as the old, embrace whatever change it brings, advocate for it and fix whatever issues the change brings.

Above all, I will maintain the best Environment that suits my business, and ensure that it is always meets the goals of my business, no matter the cost.

Gmail actually gets something really wrong.

2010.08.16

I’m a huge fan of Gmail and Google Apps for many reasons. I love the new redesign, and how they’re finally promoting consistency across their major webapps. It makes me feel like the web could really be a viable alternative alternative to desktop software. I can even deal with slowness in Gmail, given the amount of work they need to do in order to keep your inbox snappy. They need to index every message, which means parsing every message, converting every attachment, and linking it the search architecture. In real time. Not easy…

However, what I found today, was completely inexcusable: Gmail’s clipping “feature”. This is definitely a feature that sounds a lot more like a bug than a helpful tool.

Gmail Message clipping

What should be here is a few more links, some mouse text that contains our mailing address and unsubscribe links. What I did not show in this screenshot is the capacity for destruction this feature has on HTML emails. When the email is ‘clipped’, the HTML is broken at a random place, and not displayed. If your message is clipped at an inopportune place, there goes your entire HTML layout. In the best case, your HTML is simply truncated, leaving users with only a piece of their email.

As the entity sending this email, the responsibility falls on me to make sure that I send emails that are accessible, conform to CAN-SPAM, and are pleasing to the eye. Gmail bones me on three of these goals. Thanks to a lack of documentation as to how long an email can be without invoking the clipping feature. Most importantly, my users have no clear to unsubscribe from the list, since the most likely links to be clipped are the unsubscribe links.

I agree that performance is king, but never at the cost of the user.

Update: It seems like Gmail limits messages to around 102k characters before clipping. So the solution seems to be running HTML through a compressor. I found a pretty good one here

Accountability is a Feedback Loop

2010.06.22

Accountability is a word that’s getting tossed around a lot lately. You hear people saying things like:

– That developer should be held accountable for the validation problems.

– The tester should be accountable for not finding that bug.

– BP needs to be accountable for destroying an ecosystem.

The term seems to be thrown around most often when parts of a system fail. BP is part of a larger industry which that’s regulated. The government agency responsible for monitoring safety measures is responsible for ensuring they follow safety regulations. So when BP made their whoopsie daisy, the fingers were pointed squarely at them. However, where were the regulators? There were tons of opportunities for the government to push feedback to BP regarding the safety of their operation. But it seemed like no one was talking.

The development process is strikingly similar. Any development team worth their bits has a process that puts any issue in front of at least two parties at all times. Joel Spolsky’s infamous Bug 1203, a quick story about the interactions between a dev and a tester, is the picture of accountability, and shows that without active management and constant feedback being exchanged, things don’t get done.

A quick synopsis and commentary: Jill the tester finds a bug, and provides feedback to the dev team via the ticket system. In doing so, Jill has started the feedback loop, and made it the responsibility of the dev team to investigate the issue. The dev team, as they are prone to doing, deny responsibility for the issue, and mark the issue as ‘NOT A BUG’ Having done so, they’ve put the onus on Jill to prove it’s really a bug, which she does (probably in about 2 seconds). It’s again the responsibility of the dev team to fix the bug, which they do. Jill confirms the fix, and thereby closes the loop.

What’s important to realize is that in this type of process, it is the responsibility of anyone and everyone involved to be accountable for their role, and be focused on pushing feedback to the next person. Once there’s a break in the loop, the issue is likely to be dropped, and never fixed. The last person holding the ball is the screwup. I’m sure someone somewhere is really upset they didn’t ask BP about that little safety measure.

Categories : Best Practices  Process

I'll buy time any day.

2010.06.15

Many people value money as the most important thing in life, and will gladly trade time for it. The pursuit of saving money is an extremely American one. People will spend time in line for free stuff, just because it’s free. Motorists getting tickets will spend days in court, just to avoid a fine. Clipping coupons has become an art form, and even extended to the digital world in the form of sites like SlickDeals and Groupon.

Me, I like time. To me, time is way more valuable than the almighty dollar. Reason: I can’t get it back. Evar.

If I get a parking ticket, I know that if I pony up that those 55 greenbacks, chances are there will be a check with my name on it in the next couple of weeks that puts that those 55 American pesos back in my pocket. If went to court, I’d never get back that 3-4 hours of my life. I’d also probably lose the case. I’d also probably spend that time sitting next to someone who smells like cheese. Paying up gives me a net gain of 3-4 hours of my life, which I could spend doing stuff I like.

Being on a development team in a startup is pretty much the same thing. Your team should be focusing as much time as possible on actually developing your product. That means doing the things unique to your business and focusing on what your company decides it’s core competency should be. However, there’s tons of work that is hard, time-consuming, and generally unpleasant. Not only is it unpleasant, but it can be incredibly time-consuming, because chances are, you’re not good at it, or find it kind of icky. Leave that stuff to someone else. Even better, find someone who likes doing that stuff and pay them to do it.

In the cloud-infested webscape that exists today, there are any number of companies that have decided that their core competency is something specialized that you probably need. Companies that specialize in IT management, video encoding, DNS, storage, billing, etc. all exist and are willing to accept a chunk of your cold hard cash to provide a service. The most important thing to realize, is that if time is of the essence (and it always is), you’re not just buying a service, you’re also buying the time it would take to you to build that service yourself. So don’t be a crafty coupon-clipper and build it yourself. Buy back that precious, precious time and spend it doing something you really like.

MySQL skip-name-resolve

2010.04.02

Small, obscure optimizations sometimes have the potential to make the greatest impact. For example, every time a connection is made, MySQL will do a DNS lookup of the host that is trying to connect. If MySQL is handling many connections, the overhead of an extra DNS lookup can be hefty, simply because of the number of extra operations that have to be performed before MySQL can actually start doing actual work.

Thankfully, there is an option in recent versions (4.1+) of MySQL that will instruct MySQL to skip the extra DNS lookup. It’s a fairly obscure option called skip-name-resolve. The only caveat to using this option is that the users defined the GRANT tables can only use IP addresses as hostnames. For most MySQL users, this shouldn’t be an issue.

Categories : Best Practices  MySQL

Sandboxes

2010.02.22

Setting up a sandbox environment has normally been a trivial task. Set up a vhost, get a copy of the database, build out the app, and start doing stuff. When ‘Stuff’ is ‘Done’ push the changes onto production, and bask in your own crapulence. That is, until your data set exceeds the limits of the sandbox, and the SOA which is saving the day in production is becoming nothing but a headache in development.

The basic sandbox environment for an app includes a reasonably recent data set, similar ( underpowered can be OK, depending ) hardware, the exact same versions of php, mysql and apache. Php, mysql and apache need to be configured exactly the same way as in production. In fact, as part of this process, it might be useful to pull down those ever so important configuration files, put them in a safe place. Perhaps source control (cough). Consistent configuration is extremely important. Bugs produced by configuration problems are notoriously hard to reproduce, and result in devs combing through code looking for bugs that don’t exist.

Maintaining a few sandboxes should be a trivial endeavor. That is, until your project gets too big. A natural response to handling ever-growing problems is use a Service Oriented Architecture; that is, to shard off aspects of the app and dedicate hardware and resources to it. However, three or four shards later, multiplied by an environment for each developer, the guy who was doing sys admin work as needed just became full timer. Unfortunately, there’s no way around this, even with a clever sys admin, who can leave enough automated scripts around so that developers can *mostly* maintain their own environment.

The fact of the matter is maintaining the development environment is one of the most important things a company can do. Close attention to detail in the sandbox will make all the difference in the deployment process. Changes in code base, file permissions and configuration can all be tested and deployed the same as in production. So every build to every sandbox (everyone builds daily, right?) is a chance for the development team to catch mistakes, and learn from them, before the big push. And if that fails, we all know how to handle a crisis.