The Argument for Generalists in Web Development

“An expert is one who knows more and more about less and less until he knows absolutely everything about nothing” -@mastersje

My grandfather was a carpenter. He had an immense workbench, with a pegboard that had hundreds of hammers, screwdrivers, wrenches, saws, drills, planes, awls. They were powered, hand powered, circular, rotary and reciprocating. Each of them had their own beauty, and their own use. The man could build a picture frame, a desk, a dock, or an entire house. He may not have wielded any of them expertly, but he knew which one to use when the situation called, and he certainly knew how to use them in concert.

Two generations later, I have become a web developer, and can’t help but notice the similarities between the two. He worked in wood, but I work in bits and bytes. Web sites, especially big ones, are not the product of a single technology, but the result of many technologies seamlessly interacting. Building a house is no different. Framing, roofing, laying foundation are all separate skills that require vastly different tools. Part of the challenge of being a web developer is being able to manage lots of technologies at once.

Great web developers are carpenters. Just like carpentry, there is always the right tool for the job. Infrastructure that doesn’t play well with software is bound for failure. Software that doesn’t use or fit the hardware / kernel / OS well won’t run well. As experts in web development, we can know less about each technology, but should know more about how they work together. Tailoring software or combinations of software packages is the magic bullet that solves problems quickly and scalably.

In the context of early stage startups, generalists are a better bet for getting a production up and running. Even as a team grows, having generalists around means you can task a single developer with developing and entire feature. With a bit of support from specialists, they can pull off shipping a feature faster than a team of a couple of specialists. When widespread  issues occur, I prefer to have a generalist in my corner, because they typically understand the the connective tissue of a website very well, and are willing to put in the time debugging from a variety of perspectives.

None of this is to say that specialists don’t have their place in web development. The web is an innovative medium that it has spawned dozens of technologies (node.js) that require a deep pool of expertise to work in. There are vast arrays of techniques and frameworks that cater to working in a single technology. Generalists typically won’t have the depth necessary to pull off a awe-inspiring, truly nasty implementation. However, they’ll have the right instincts to pair it, deploy, and make it do something meaningful that a specialist might not be able to do by themselves.

Try Adding Distance

Part of my New Years (non-whiny) resolution was to go the gym more often. It is a resolution that I have (mostly) adhered to over the past six months. When I go to the gym, I row. I love the rhythm, the fight against the machine, and the fact that no else at most gyms know what to do with it. Rowing is a grueling workout, which is one of the reasons I like it. Even when the workout is bad, it’s good. It also gives me a chance to crawl inside myself for 20 or so minutes and muddle through the tumult in my head.

For a couple weeks I kept my workout distance consistent at 4k, and was able to drop the time almost every workout. When I added 1k meters to my workout, the distance completely outstripped my ability to improve. When rowing the 4k, I was able to row every split evenly, with very little variance in performance. I was even able to throw in some intervals rowing at way above my normal split, just for funsies.  But when rowing the 5k, my splits varied widely, arced, and crashed for a few months.

Measurement on an ergometer is pretty easy. The average split time is recorded and displayed at the end (if you can still see through the sweat.) Also, for every 500m you row, there’s a split and strokes/minute displayed. Essentially, it tells you how hard you went out, at what point in your workout you died, where you rallied, and if you pushed hard at the end. There are no lies, deception, or excuses on the LCD, only numbers. My takeaway from this screen is always that I was not consistent enough.

I’ve seen parallels in my own life and career performance as well. While working on Creative Portfolio Display, an extremely intense, but relatively short duration project, I was able to push hard, crash, and push hard again. The entire project was feature complete in approx 3 weeks. While handling the ever growing DBA responsibilities of be.net, I’ve had an extremely difficult time being consistent with a task that requires constant work over years. Performance there has been inconsistent as the data outgrew it’s architecture.

In personalities and friendships, I’ve come to value the people that remain consistent over time. People who continue to struggle by doing meaningful work that is of constant quality over years and years. Intensity is often hailed as the more desirable between consistency and intensity. But the people who have been the most valuable in my life and career are those who have stood beside me for years.

In the end, it is the miles that make champions.

Jun 23rd, 2012

Calliope NYC

On the corner of East 4th Street, the newest of the New American restaurants, Calliope has soft launched. It’s in a beautiful space in a great part of the East Village. The dining room is open and airy, thanks to being on the corner, and there being windows 2 of 4 sides of the room. At one of the space is a georgous curved blonde wood bar. There is more comfortable than usual outdoor seating.

Even before launch. they have already aquired a liquor license, but do not have a cocktail list. The bar is suprisingly well stocked with a number of classic European tipples; Pimms and campari were featured. The old-fashioned left a bit to be desired, but the bartender made a decent manhattan.

The food was mostly well-executed. I had a slow cooked rabbit Pappardelle with Fava beans.  It seems like they’re still tweaking the menu, as the dish was listed as having peas in place of the Fava beans. The roast chicken with carrots and cabbage was also stellar. The skin was crispy, well seasoned, the carrots were the perfect texture, and the chicken stuffed cabbage lent some color.

Once the staff settles in, and the menu is finalized, Calliope is going to be a great addition to and already incredible neighborhood.

May 31st, 2012

Workflow With Synergy

As a web developer working in lots of different areas, I have to use lots of different programs to perform various tasks. Terminal for SSH, Eclipse for php, etc. Chrome for all things web browsing, including Gmail. Eventually, the number of tabs I needed to keep open during the day caused my gmail tabs to get lost pretty easily, even after using the pin feature. Then I stumbled upon Mailplane. Mailplane is a stellar program that turns gmail into a desktop app.

After adding yet another app to my stable, it seemed necessary to divide my screen real estate into spaces. Part of the problem of having so many different programs is the problem of context. Once situating the necessary programs in a way that makes sense for the particular task, changing context can completely knock me out of my flow. For a period of time, it didn’t even make sense to use Mailplane, as it was easier to actually switch over to my phone, and read / respond to email there.

Then it hit, why not just bring in my laptop, and deal with email there. So I started bringing my laptop. This was awesome, as it allowed me to deal with email on relatively normal sized screen and keyboard. The next issue was dealing with links in emails. After asking the twitterverse and stackexchange, I finally settled on Synergy. Synergy allows you to seamless share your keyboard and mouse between any number of machines. So I started running Mailplane in fullscreen mode on a laptop situated next to my desktop. To get to my email, I just mouse all the way to the left, and voila, I can use a full keyboard / mouse with a laptop. If I need to open links on the big screens, I just copy and use Chrome’s paste & go feature.

This works tremendously well, as I can leave any development task as is on my desktop without having to worry about shuffling things around. It also helps tremendously when applying pesky software updates, as you always have one machine that will be up and running.

Rig at the Behance Office

May 26th, 2012

Why SOPA Is Fucking Stupid

At its core, SOPA represents a universal, base desire of men to protect what they own. To some degree, it’s valid, something that anyone can respect. However, as with most things, ideas mean precisely shite, and execution is what really matters. In the case of SOPA, a misguided, poorly informed government is attempting to execute an idea in a way that could do the most damage possible. Let me count the ways SOPA is fucking stupid.

SOPA threatens free speech. This has been said by many folks who are much smarter than I. Any bill that allows the silencing of voices should not see the light of day. ‘Nuff said.

SOPA will starve and fracture an industry. A measure that could be taken against an “infringing” site is forcing advertising networks to stop serving ads for said site. For many sites, advertising is their meat and potatoes. An equally despicable measure is forcing payment services (can you say paypal?) to stop accepting transactions from the infringing site. Any site that relies on e-commerce is now proper fucked. Any service that provides a payment gateway or ad services is now unreliable. There will be a rift between providers that cooperate with SOPA, and those that don’t.

SOPA will break the one thing that makes the Internet accessible to humans, DNS. DNS is the system whereby domains names are translated to network addresses. Assigning IP addresses easy-to-remember names is one of the reasons the Internet has become a viable medium. As an extreme measure, SOPA will alter a site’s DNS records to point somewhere else. This last measure make is pretty clear the authors of these bills are complete dipshits.

While altering DNS will render the site inaccessible to most, it does not remove the existence or accessibility of content from the Internet. This very post is available here, whether DNS is up or not. To cope with a broken DNS system, the Internet will respond, and it will not be pleasant. Hardware vendors will ship with host files) set up to protect their own interests. Rogue DNS resolvers will pop up. The Internet will turn into Bartertown. Two browsers will enter, neither will find Facebook.

The internet industry and e-commerce have proved to be the country’s highest growth sectors in the past few years. One of the main contributors to that growth has been the availability of honest, reasonably reliable, interconnected services. They’ve given the classic humble entrepreneur + code monkey team the tools to build a business that yields riches. Compromise those tools, and you will destroy an industry, not to mention perhaps the last golden vestige of American opportunity. Creating a market-based system to punish violators will only destroy the system. To help combat SOPA, contact your local congressperson, or go here. To read the bill, click here.

Jan 12th, 2012

New Years Actionable (Non-whiner) Resolutions

Normally, I’ve considered New Year’s Resolutions to be for whiners, people who never actually accomplish anything, people who are normally on the whhhaaaambulance. Most resolutions fall somewhere in the “stop being fat” to “be fluent in mesopotamian glyphs” range. They’re vague, completely un- actionable, and just describe a slightly unattainable goal / end result / dumbass want. This year, however, I’ve discovered a few serious problems in my life I need to fix. So, in the spirit of actually doing shit, I’ve provided a list of non-whiner resolutions + ways to actually make them happen.

  1. Go to bed early(ish). This is a tough one for me. I’m naturally a night owl, but I might be able to fool myself to getting to bed early by showering early(ish). I love me a good shower. I’ve even been known to drink a beer in the shower. I also really like being in really warm, comfortable, if slightly embarrassing, PJs. All these things put me in a good mood and generally make me want to relax, which is not all that far from being asleep. Action Steps: Just get in the fucking shower, don’t look at the mail / email / dirty dishes / messy apartment / email / Twitter, just get in the shower (bring a beer).

  2. See some doctors. I’ve avoided doctors for awhile, mostly because my lifestyle is a cross between Denis Nedry from Jurassic Park and a barfly. I consider this pretty simple. Action Steps: Make appointments with the following: dentist, general physician, eye doctor, nutritionist. Do it. Do what they say, even if it sucks. Follow up as often as the quacks say so.

  3. Go to the gym regularly. OK, this one is, without a doubt, the most cliche, whine-tastic resolution evar. I know, because I have been to the gym in January. I’ve also been to the gym in April, when all the kids who were at the gym in January are nowhere to be found. Also, since the gym is a #creepy and #gross place to shower, resolution #1 should be even more important. Action Steps: Put that ish on the Google calendar with the following reminders; 2 hours, 1 hour, 30 minutes, 15 minutes, 10 minutes, and 5 minutes before. Keep gym clothes in the office. Don’t care how bad they smell.

  4. Blog more. Writing has been a great way to get me to collect my thoughts, find some hindsight, and maybe, just maybe, help some other folks who have the same demented thoughts / stupid problems. As a technical guy, “the inspiration” doesn’t hit me so often, and when it does, I’m often busy, y'know, actually doing shit. However, as I’ve noted to myself more than once, keeping track of my day and journaling how I spend my time is something incredibly important for introspection. Action Steps: Write that thought down. Write down what you did 30 seconds ago, especially if it was different from regularly scheduled programming. Keep a sticky on your monitor to write shite down. Ask the dude next to you (@bossjones) to remind you to you write shit down. Lastly, a glass (or 7) of white wine, the notebook in which all your shit is written, and wordpress should convene regularly. Google Calendar #ftw, again. Lastly, check Google Analytics on posts. The un-monitored blog post is not worth writing.

  5. Read more. Once upon a time (yesterday) I didn’t know nearly as much as I do now. Most of that knowledge came from reading shit-tons of blogs, books, bathroom graffiti, articles, and whitepapers related to web development. I read everything with a goal: How can I use, or leverage this to help me / my business work a little better? The Action Steps here are a bit tougher, and slightly conflict with non-whiner resolution #6: Keep Google Reader open. Curate my list of feeds with relevant sources. Prune feeds that stopped providing useful information. Lastly, and perhaps most important, find tidbits of information that make a difference in my life and / or business.

  6. Don’t be distracted by bold numbers in parentheses. Simple (kinda). Action Steps: close Gmail, close Twitter. Try, and #fail, to delete my Facebook account.

  7. Stop playing so much fucking air guitar by myself, alone in my apt, and start playing some real guitar, and actually learn the songs I normally rock out to. Action Steps: restring the Epiphone, buy a new amp, find tabs for shit I want to learn. If I’m feeling really frisky, get back into a band.

MySQL Error 28

Yesterday, I had to run a query for some statistics I needed. This was a query that I knew were going to be particularly nasty as it required sorting 1.3M rows. Normally I run these sorts of queries on a reporting slave I keep around for this reason, but for some reason I chose to run this query on a production slave. When I ran my query, I got the following error;

ERROR 3 (HY000): Error writing file ‘/tmp/MYNcSyQ9’ (Errcode: 28)

Oh. *&^%. After some Googling, a bit of shitting my pants, and a wild grep session through as many application logs as I could find, I was able to figure out that problem seemed limited to this particular query. My Googling turned up the fact that the error code indicated that the server was out of disk space.

As a rapidly growing company, we’ve had our fair share of issues with managing (or failing to manage) rapidly filling disks, failed RAID controllers, and the like. However, I had recently done audits of this particular cluster of servers, and ascertained that the situation with disks was nominal. I was confident the disk wasn’t full, and permissions were correct. Our particular disk layout puts /tmp on its own 2GB partition, and after running the query, that partition was 2% full.

It turns out that during the execution of the query, MySQL was creating a temporary table that was 2GB, hence the error. By default MySQL will write temporary tables to /tmp, which in many cases, is its own small partition. The solution here was to set the tmpdir to a folder on the main partition adjacent to the MySQL datadir. This solution obviously has its own problems (ie you could fill your main partition, which is way worse than filling /tmp) However, for this type of ad hoc query, this was exactly what we needed.

Nov 2nd, 2011

Three Tough Projects and Bands That Got Me Through Them

Creative Index

The now defunct Creative Index was a search engine aimed at indexing portfolio sites. The Creative Index was perhaps the most open-ended project I have ever taken on. The goal was to allow people to list their various portfolio sites, have the Creative Index scrape their sites, index all the textual content, and make it searchable via a Google-like interface. Of course, the project was doomed from the beginning, as results would be measured against Google, and we all know how that goes. I had never taken on a project that required taking such unstructured data, also another reason it was doomed. Most portfolio sites contain very little text, which makes matching and ranking difficult.

And that’s when I discovered the Mars Volta. While writing the engine to handle the retrieval of web pages, I learned just how chaotic the underbelly of Internet is. Circular redirects, 404s, bad links, authenticated pages made my code check hundreds of variables in the most paranoid, chaotic way possible. The Mars Volta’s drug-induced, hallucination-inspired free-form rock-jazz-samba was a great soundtrack to the chaos I was trying to make sense of.

Creative Portfolio Display

In July of 2010, I had the distinct honor of developing one of the few InApps for Linkedin. LinkedIn’s InApp platform runs on Google OpenSocial. OpenSocial is a great way to plug in 3rd party apps in a secure way. However, the normal development workflow changes quite a bit, as OpenSocial acts as a caching proxy. So in order to get changes in your app down to the user/tester/you, you need to set an additional variable that will re-retrieve the specification for your app. In order to get to that, you need to find the URL to the iframe that contains your app, which is only available in a javascript block, add the cache busting variable, drop the URL in your browser, and hard refresh. That only worked sometimes. And when it did work, it was pretty much guaranteed that the change you made didn’t.

Needless to say, the workflow was painful, even on the best days. Add to that, some weird firewall issues, and you had a situation that would make St Francis of Assisi murder kittens. That’s where Passion Pit came in. Their music is just so…damn…happy. In most cases Passion Pit saved me from putting my fist through my monitor.

RightScale, Rackspace Cloud configuration

In attempt to save ourselves some money, and automate a lot of the SysAdmin work I’d been doing by hand over the past couple years, I undertook a partnership with RightScale. Since in every case, the servers I was deploying didn’t have php, or any other language I knew by default, I had to resort to bash, which I didn’t know. This project took me very out of my comfort zone, had a Mt Everest of a learning curve, and was so essential to our growth that it couldn’t fail. There were also tons of moving parts that were out of my control. RightScale’s integration with the Rackspace cloud is in Beta, which meant that in addition to struggling through a language I didn’t know, I had to differentiate my own errors from things that were problems with the sever images. Tons ‘o fun.

In stepped The Bronx (III), probably one of the most solid rock bands I’ve heard in a long time. Their tracks had a real sense of purpose, and the lyrics echoed a lot of my desperation. In particular, the line in Pleasure Seekers where desperation is cited as inspiration totally got me through.

Jul 31st, 2011

Distributed Updates

Part of managing any large site involves writing scripts that will go through your data, make changes, merge things, remove things, do type transformations, etc. Most of the time, in PHP, iterating through rows or objects will do just fine. However, when there are lots of rows or objects, you could be faced with a script that takes hours or days to run. Depending on how often active the is, you may need to restrict access to ensure that the data before and after the transformation remains consistent. In other words, if someone tries to make a change to the data before the transformation, and the new feature only looks at data after the transformation, that user has just lost their changes. That is Very Bad.

As sites get larger and problems like this loom, taking the site offline becomes less and less of an option. This is what the business team calls a luxury problem, and what the ops team refers to simply as a problem. One option is to write a more efficient script. You can get pretty far by simply ensuring you’re reading from the fastest data source available, make good use of cache, etc. ensure that the tables being read for the transformation are properly indexed. All of these are great places to start. Additionally, making sure that data is grabbed in chunks can give the database time to breathe. There’s nothing worse than getting stuck in MySQL’s “sending data” phase simply because it needs to read several thousand rows from disk. MySQL configuration can also be your friend here. If using InnoDB, increasing the insert buffer is a great way to speed up writes.*

However, as much as you can do to speed up a single transaction, the fact remains that you have to execute each transformation serially, one after another. Your bottleneck is the transformation itself. It will take (# of transformations * # of objects to transform) to complete the job. No matter how well tuned the database is, it will only be performing one operation at a time, which means your database, which is more than capable of thousands of parallel, is not being used well. So the next logical step is to change your update script to distribute the update operations so a few can be run in parallel.

Rewriting the update script does require thinking about your update differently, and will not work in every case. For example, if one is simply moving a large amount of data from one table to another, and there is no transformation, or the transformation can be accomplished via a builtin MySQL function, use that. However, just be prepared to deal with locking issues, and the source data potentially not being available while the transformation is taking place. However, if your transformation is complicated, and requires per-case logic, this is definitely a good route to take. The biggest difference is how the code for the update is organized. The update script needs to be separated out into code that will apply the transformation for exactly one entity, and code that will manage which entities get transformed and when. Ideally, the code for the transformation is idempotent, so failures can be handled by simply resubmitting the entity / object to be transformed again.

Accomplishing parallel processing in PHP can be kind of tricky. Php’s pcntl_exec function has always felt a bit finicky to me. Of course exec on its own it blocking, so that’s out. Additionally, neither of these solutions offer any sort of baked-in communication between the process that submitted the job, and the process carrying out the job. That leaves us with a queuing system. Popular systems include: RabbitMQ and Gearman. Personally, I’ve made great use of Gearman. It’s easy to install, as is the PHP module.

To sum up, performing large data updates via a distributed system is the way to go if you have complex requirements per transformation, and the option to perform these processes in parallel.

*If using MySQL’s MyISAM engine, this isn’t necessarily true, as writes will block, and the database could become the bottleneck. However, since MySQL is continuing to push InnnDB, this is getting increasingly unlikely. So if your tables are all InnoDB, you’re probably in good shape.

Jun 25th, 2011

Development Without Internet Access

While flying to Austin for sxsw, I had a small programming task. Take a string of a few search terms, break it apart and highlight those terms in another string. It’s a straightforward task, and probably a wheel that’s been reinvented thousands of time in the history of computer science. I approached it as an exercise, to see if I could add another squeaky wheel to the pile. My goal was to do it without using any 3rd party code or any resources. I had no access to documentation, google, stack overflow, or any of the other resources I use constantly to get my job done every day.

The code that I produced was bloated, naive, and horribly inefficient (I suspect). While writing it, i knew I wasn’t really on the right path. When I got back to New York, I took a look at it, and more or less decided I had wasted my time. Then I realized I had written it on a plane, and had nothing better to do. I simply got myself into the zone, and wanted to work through a problem until it was solved. After I got over my initial disgust, I wondered what aside from boredom and stubbornness had prompted me to complete the task.

I never really came to any conclusions until a few days later. I was going about my day normally, fixing bugs, writing emails, troubleshooting. As I hit a hard spot, something I couldn’t figure out, I gave up staring at the code, and turned to Google. Then I came across a builtin php function that was giving me a strange result. After puzzling for a few seconds, I dropped the function into Google. A little while later, I was examining the results of an EXPLAIN statement in MySQL, and the output was something I hadn’t seen before. I found the answer on StackOverflow a few minutes later.

Then it dawned on me. Maybe I don’t actually have the skills to be a web developer, and I’ve faked it all these years. Maybe I don’t know all that much about MySQL, and perhaps I only know enough about Linux to cause problems for Rackspace. Whether or not that’s true, I did realize that I’m pretty good at finding solutions to problems from the collective experiences, wisdom, and flames of the Internet. Maybe it’s not entirely fair to say that I faked my way through several years of a career. After all the code that I’ve put together over the years to answer various questions, or sift through or collect data serves a purpose, performs relatively well, and is serving people everyday. Also, that disgusting snippet of string highlighting code works pretty well, despite that fact that I hate its face and want it to die.

After I got myself out of my existential development funk, more questions came to mind. First, how the F did anyone get any answers to tough questions before the Internet? Secondly, how did programmers back in the day find any sort of direction? Books on technology and programming are great, don’t get me wrong, but you can’t get answers to complicated questions. After having these thoughts crop up, I spent a little bit of time looking over other devs' shoulders at the office. What I saw was very reassuring, as the Google machine was often hard at work for the rest of the team. The php site, StackOverflow, and QuirksMode were in browsers constantly.

Which begs yet another question: what exactly does it take to be a web programmer? Based on my experience, it seems to boil down to an Internet connection, Google, tenacity to the point of stupidity, and decent search skills. To back up even further, is it possible to take on a job you know nothing about, and learn how do it via the Internet?

Jun 12th, 2011