2010.08.16
I’m a huge fan of Gmail and Google Apps for many reasons. I love the new redesign, and how they’re finally promoting consistency across their major webapps. It makes me feel like the web could really be a viable alternative alternative to desktop software. I can even deal with slowness in Gmail, given the amount of work they need to do in order to keep your inbox snappy. They need to index every message, which means parsing every message, converting every attachment, and linking it the search architecture. In real time. Not easy…
However, what I found today, was completely inexcusable: Gmail’s clipping “feature”. This is definitely a feature that sounds a lot more like a bug than a helpful tool.

What should be here is a few more links, some mouse text that contains our mailing address and unsubscribe links. What I did not show in this screenshot is the capacity for destruction this feature has on HTML emails. When the email is ‘clipped’, the HTML is broken at a random place, and not displayed. If your message is clipped at an inopportune place, there goes your entire HTML layout. In the best case, your HTML is simply truncated, leaving users with only a piece of their email.
As the entity sending this email, the responsibility falls on me to make sure that I send emails that are accessible, conform to CAN-SPAM, and are pleasing to the eye. Gmail bones me on three of these goals. Thanks to a lack of documentation as to how long an email can be without invoking the clipping feature. Most importantly, my users have no clear to unsubscribe from the list, since the most likely links to be clipped are the unsubscribe links.
I agree that performance is king, but never at the cost of the user.
2010.06.22
Accountability is a word that’s getting tossed around a lot lately. You hear people saying things like:
– That developer should be held accountable for the validation problems.
– The tester should be accountable for not finding that bug.
– BP needs to be accountable for destroying an ecosystem.
The term seems to be thrown around most often when parts of a system fail. BP is part of a larger industry which that’s regulated. The government agency responsible for monitoring safety measures is responsible for ensuring they follow safety regulations. So when BP made their whoopsie daisy, the fingers were pointed squarely at them. However, where were the regulators? There were tons of opportunities for the government to push feedback to BP regarding the safety of their operation. But it seemed like no one was talking.
The development process is strikingly similar. Any development team worth their bits has a process that puts any issue in front of at least two parties at all times. Joel Spolsky’s infamous Bug 1203, a quick story about the interactions between a dev and a tester, is the picture of accountability, and shows that without active management and constant feedback being exchanged, things don’t get done.
A quick synopsis and commentary: Jill the tester finds a bug, and provides feedback to the dev team via the ticket system. In doing so, Jill has started the feedback loop, and made it the responsibility of the dev team to investigate the issue. The dev team, as they are prone to doing, deny responsibility for the issue, and mark the issue as ‘NOT A BUG’ Having done so, they’ve put the onus on Jill to prove it’s really a bug, which she does (probably in about 2 seconds). It’s again the responsibility of the dev team to fix the bug, which they do. Jill confirms the fix, and thereby closes the loop.
What’s important to realize is that in this type of process, it is the responsibility of anyone and everyone involved to be accountable for their role, and be focused on pushing feedback to the next person. Once there’s a break in the loop, the issue is likely to be dropped, and never fixed. The last person holding the ball is the screwup. I’m sure someone somewhere is really upset they didn’t ask BP about that little safety measure.
2010.06.15
Many people value money as the most important thing in life, and will gladly trade time for it. The pursuit of saving money is an extremely American one. People will spend time in line for free stuff, just because it’s free. Motorists getting tickets will spend days in court, just to avoid a fine. Clipping coupons has become an art form, and even extended to the digital world in the form of sites like SlickDeals and Groupon.
Me, I like time. To me, time is way more valuable than the almighty dollar. Reason: I can’t get it back. Evar.
If I get a parking ticket, I know that if I pony up that those 55 greenbacks, chances are there will be a check with my name on it in the next couple of weeks that puts that those 55 American pesos back in my pocket. If went to court, I’d never get back that 3-4 hours of my life. I’d also probably lose the case. I’d also probably spend that time sitting next to someone who smells like cheese. Paying up gives me a net gain of 3-4 hours of my life, which I could spend doing stuff I like.
Being on a development team in a startup is pretty much the same thing. Your team should be focusing as much time as possible on actually developing your product. That means doing the things unique to your business and focusing on what your company decides it’s core competency should be. However, there’s tons of work that is hard, time-consuming, and generally unpleasant. Not only is it unpleasant, but it can be incredibly time-consuming, because chances are, you’re not good at it, or find it kind of icky. Leave that stuff to someone else. Even better, find someone who likes doing that stuff and pay them to do it.
In the cloud-infested webscape that exists today, there are any number of companies that have decided that their core competency is something specialized that you probably need. Companies that specialize in IT management, video encoding, DNS, storage, billing, etc. all exist and are willing to accept a chunk of your cold hard cash to provide a service. The most important thing to realize, is that if time is of the essence (and it always is), you’re not just buying a service, you’re also buying the time it would take to you to build that service yourself. So don’t be a crafty coupon-clipper and build it yourself. Buy back that precious, precious time and spend it doing something you really like.
2010.04.02
Small, obscure optimizations sometimes have the potential to make the greatest impact. For example, every time a connection is made, MySQL will do a DNS lookup of the host that is trying to connect. If MySQL is handling many connections, the overhead of an extra DNS lookup can be hefty, simply because of the number of extra operations that have to be performed before MySQL can actually start doing actual work.
Thankfully, there is an option in recent versions (4.1+) of MySQL that will instruct MySQL to skip the extra DNS lookup. It’s a fairly obscure option called skip-name-resolve. The only caveat to using this option is that the users defined the GRANT tables can only use IP addresses as hostnames. For most MySQL users, this shouldn’t be an issue.
2010.02.22
Setting up a sandbox environment has normally been a trivial task. Set up a vhost, get a copy of the database, build out the app, and start doing stuff. When ‘Stuff’ is ‘Done’ push the changes onto production, and bask in your own crapulence. That is, until your data set exceeds the limits of the sandbox, and the SOA which is saving the day in production is becoming nothing but a headache in development.
The basic sandbox environment for an app includes a reasonably recent data set, similar ( underpowered can be OK, depending ) hardware, the exact same versions of php, mysql and apache. Php, mysql and apache need to be configured exactly the same way as in production. In fact, as part of this process, it might be useful to pull down those ever so important configuration files, put them in a safe place. Perhaps source control (cough). Consistent configuration is extremely important. Bugs produced by configuration problems are notoriously hard to reproduce, and result in devs combing through code looking for bugs that don’t exist.
Maintaining a few sandboxes should be a trivial endeavor. That is, until your project gets too big. A natural response to handling ever-growing problems is use a Service Oriented Architecture; that is, to shard off aspects of the app and dedicate hardware and resources to it. However, three or four shards later, multiplied by an environment for each developer, the guy who was doing sys admin work as needed just became full timer. Unfortunately, there’s no way around this, even with a clever sys admin, who can leave enough automated scripts around so that developers can *mostly* maintain their own environment.
The fact of the matter is maintaining the development environment is one of the most important things a company can do. Close attention to detail in the sandbox will make all the difference in the deployment process. Changes in code base, file permissions and configuration can all be tested and deployed the same as in production. So every build to every sandbox (everyone builds daily, right?) is a chance for the development team to catch mistakes, and learn from them, before the big push. And if that fails, we all know how to handle a crisis.
2010.02.11
Everybody procrastinates. Some tasks that get pushed off don’t matter, it just gets done later. Some tasks that go over deadline result in profanities and bloody noses. But every once in a while, tasks come along that have an expiration date. As in, if it doesn’t get done by a certain time, it doesn’t matter.
You ever heard of a “Japanese Inspection?” Japanese Inspection, you see, when the Japs take in a load of lettuce they’re not sure they wanna let in the country, why they’ll just let it sit there on the dock ’til they get good and ready to look at, But then of course, it’s all gone rotten… ain’t nothing left to inspect. You see, lettuce is a perishable item…like you two monkeys.
– Big John, Days of Thunder
What Big John was referring to was the fact that all he had to do was ignore Cole and Rowdy until they didn’t any fight left in them. Tasks can be just the same way. Eventually, the need for the task to be done just goes away, or starts to smell. The only thing that really matters is being able to tell the difference between the things that really need to get done, and the things that just aren’t important enough to get done.
2009.12.12
Occasionally, a project will come across my plate with the criteria, ‘Make sure this works everywhere, is completely template-able, and is something we can grow with.’ Normally this is coupled with ‘We need this to work with X *right now*, and Y and Z later.’ What I really hear is ‘Make it work for X, and ship the damn thing.’ After all, hitting those deadlines is really important.
Of course, this has a whole bunch of ugly assumptions tied to it. The first is: when I get to Y, everything I did for X will work. All I need to do is drop in a few config changes, and tweak a few parameters, and I’m done. (Yea, right) Secondly: that every case Y needs to cover is contained within X. (Not Likely) Third: All of this will be so well documented that any literate individual will be able to implement Y by osmosis.
So. Do we spend time now or later? Shipping X seems pretty simple, so why not just build X, satisfy the business dude and call it a day. Spending time now means that deadlines may have to shift, and something that should be simple becomes complex. We have other, more important, projects to work on.
Eventually Y comes calling. So let me introduce you to…Future Web Dev Guy Person Girl! If you’re lucky, that person is you. If you’re not, it’s another dev. The assumptions we made back in paragraph 2 have reared their ugly heads. Since they were assumptions, you’re probably boned. If not, you’re probably one of these guys. If you’re the rest of us, Future Web Dev Guy Person Girl definitely hates your guts, because the groundwork that was supposed to be laid out is not there. They’re running through a lice-infested rat’s nest of procedural functions trying to pass the additional variable that will make this all work.
The best way to keep Future Web Dev Guy Person Girl from cursing like a sailor is implement correctly, test thoroughly, and deal with Y before it’s due. Deadlines need to be managed according to project scope, and if project scope includes Y, it needs to be accounted for now, before you lose a friend in Future Web Dev Guy Person Girl.
2009.10.24
Sometimes bugs come along that require significant work to fix. Depending on what project timelines are like at the moment, sometimes fixing the bug isn’t the best option. For example, a race condition in the caching architecture causes pages to be stale. The persistent data store is correct, but the cache is not. To the person who just triggered the update, there’s a bug. The information on the public side is not in sync with the information they just entered.
So, like any other bug, a report will eventually percolate down to the dev team. People scream, fortunes are lost, the svn blame command is used, and the devs who wrote the code pee their pants. Once the chaos dies down, the actual prognosis of this issue can turn out to be extremely grim.
A shortcoming of the caching architecture shows that there’s a race condition when the system is under heavy load. In order to fix it, the dev team needs to plumb the depths of the data access layer, and probably change some parameters. But that’ll probably break everything. Everywhere. Or the layer manipulating the data could be fixed to replace the cache instead of invalidating. Except the methods to manipulate that entity live in 3 different codebases. It’ll probably break the editor. Either way, the actual solution doesn’t matter.The dev team certainly needs to do something, and it needs to be released three days ago.
The correct way to fix this issue will vary widely depending on circumstances. But in this particular case, the best answer was to not fix it, just manage it. Our team was busy, there were other projects that were more pressing. Plus the codebase was being rewritten. So instead of flogging a dead horse, a simple script was thrown together that compared the cache and the database. If they were out of sync, the cache would be cleared, and would be repopulated with the correct information the next time it was requested. Once it was implemented, the bug was still there, but the cache seemed to be up to date.
Every dev team will face bugs that have enormous costs to fix. The way to deal with these bugs will be different every time they come up. It’s important to remember that managing bugs can be almost as effective as fixing them.
2009.07.25
Users have high expectations of web apps in terms of performance, responsiveness and tons of features. Normally, you’re only allowed two of any list of three really cool things. In the case of Web Apps, that would be true. Most will find some compromise of between performance / responsiveness and tons of features. More features usually equals less responsiveness, depending on the feature and scale.
Enter Gearman. Gearman is a queuing system that allows work to be farmed out to other servers. Most importantly, it allows for intense tasks to be queued and performed in the background. This means that when a user performs an action that could potentially take a long time (sending notification emails, updating Full Text indexes, etc), that slow task can be queued to run in the background, and the page can be sent to the user, keeping things snappy.
Gearman is pretty simple to install on Red Hat.
download gearman from server
> wget http://launchpad.net/gearmand/trunk/0.8/+download/gearmand-0.8.tar.gz
unzip and move into the directory
> tar -xvzf gearmand-0.8.tar.gz
> cd gearmand-0.8
Red Hat didn’t have some dependencies. The next few steps will vary depending on your *nix distro.
Install the libevent developer library.
> yum install libevent-devel
Install the e2fsprogs developer library
> yum install e2fsprogs-devel
configure and install
> ./configure
> make
> make install
/** Net Gearman **/
download php extension from the pecl repo
> wget http://pecl.php.net/get/gearman-0.4.0.tgz
untar
> tar -xvf gearman-0.4.0.tgz
build the extension
> phpize
> ./configure
> make
> make test
> make install
Add the extension to the php.ini
[gearman]
extension=gearman.so
And you’re all set!
Integration will depend on if you decide to use the php extension, and how encapsulated the code base is. I highly recommend using the pecl extension, as it provides great implementations of the client and worker. and Gearman will save you.
2009.06.07
When things go wrong, there is always a reason. Sometimes it’s a good one. (I had no idea $_SERVER wasn’t available from the CLI) Sometimes it’s a really bad one. (I left 2 seconds after running the build script, and I didn’t test beforehand.) Whatever the reason is, the question needs to be asked ‘How can we never, ever let this happen again?’ Once we arrive at a good answer to that question, it’s really important that you hold that answer in a vise-like grip, and never let it go. Because knowing how to prevent problems, and actually taking the steps to prevent them are two very different things.
That is why I keep a Book of Fail. I will write down the consequences and circumstances of a particular catastrophe as a permanent record. I make sure to physically write it down because the act of writing with a pencil, unlike typing, requires significant effort, and helps to deeply ingrain what I have chosen to record. Secondly, keeping these records in a physical journal means that however many crappy Macs I burn through, or hosting plans I forget to pay for, I will still be able to carry my Moleskine around.
Periodically reading through the Book is also crucial to keeping these painful moments fresh. Many people will claim to have learned from a mistake, but will continually repeat the same doomed process over and over without a second’s thought. Once having a made a mistake a couple times, it should never be repeated.
|