Development without Internet Access

2011.06.13

While flying to Austin for sxsw, I had a small programming task. Take a string of a few search terms, break it apart and highlight those terms in another string. It’s a straightforward task, and probably a wheel that’s been reinvented thousands of time in the history of computer science. I approached it as an exercise, to see if I could add another squeaky wheel to the pile. My goal was to do it without using any 3rd party code or any resources. I had no access to documentation, google, stack overflow, or any of the other resources I use constantly to get my job done every day.

The code that I produced was bloated, naive, and horribly inefficient (I suspect). While writing it, i knew I wasn’t really on the right path. When I got back to New York, I took a look at it, and more or less decided I had wasted my time. Then I realized I had written it on a plane, and had nothing better to do. I simply got myself into the zone, and wanted to work through a problem until it was solved. After I got over my initial disgust, I wondered what aside from boredom and stubbornness had prompted me to complete the task.

I never really came to any conclusions until a few days later. I was going about my day normally, fixing bugs, writing emails, troubleshooting. As I hit a hard spot, something I couldn’t figure out, I gave up staring at the code, and turned to Google. Then I came across a builtin php function that was giving me a strange result. After puzzling for a few seconds, I dropped the function into Google. A little while later, I was examining the results of an EXPLAIN statement in MySQL, and the output was something I hadn’t seen before. I found the answer on StackOverflow a few minutes later.

Then it dawned on me. Maybe I don’t actually have the skills to be a web developer, and I’ve faked it all these years. Maybe I don’t know all that much about MySQL, and perhaps I only know enough about Linux to cause problems for Rackspace. Whether or not that’s true, I did realize that I’m pretty good at finding solutions to problems from the collective experiences, wisdom, and flames of the Internet. Maybe it’s not entirely fair to say that I faked my way through several years of a career. After all the code that I’ve put together over the years to answer various questions, or sift through or collect data serves a purpose, performs relatively well, and is serving people everyday. Also, that disgusting snippet of string highlighting code works pretty well, despite that fact that I hate its face and want it to die.

After I got myself out of my existential development funk, more questions came to mind. First, how the F did anyone get any answers to tough questions before the Internet? Secondly, how did programmers back in the day find any sort of direction? Books on technology and programming are great, don’t get me wrong, but you can’t get answers to complicated questions. After having these thoughts crop up, I spent a little bit of time looking over other devs’ shoulders at the office. What I saw was very reassuring, as the Google machine was often hard at work for the rest of the team. The php site, StackOverflow, and QuirksMode were in browsers constantly.

Which begs yet another question: what exactly does it take to be a web programmer? Based on my experience, it seems to boil down to an Internet connection, Google, tenacity to the point of stupidity, and decent search skills. To back up even further, is it possible to take on a job you know nothing about, and learn how do it via the Internet?

Out of the Terminal

2010.07.20

Every once and a while I get to leave server-land and get to do some fun projects that involve doing something on the front end. The latest was building an embed script for the Behance Job List. Projects like this, that get me out of the terminal and into a space that requires a bit more interaction between domains, are particularly appealing. As much as I think the Same Origin Policy is reasonable rule for security, I love looking at ways to get around it.

The technique I chose for this was JSONP, or JSON with Padding. I’m a huge fan of JSON as a transport, as I feel it is compact, flexible and stupidly simple to generate and consume. In fact, I’ve sworn to never touch another XML file as long as I live. JSONP is really convenient from a API implementation perspective, because when the request for the data is made (via the script tag), all the client has to do is pass a callback and it can use the data in any way it chooses. The server doesn’t have to be aware of what the callback actually does, although I do recommend checking against a list of pre-approved callbacks, just to make sure.

Like any semi-decent developer, I have dog-fooded my own work, and implemented the Behance Joblist embed code right here.

A little about the Behance Joblist:

Top global companies find and hire talent on Behance, the world’s leading network for creative professionals.

Sphinx Full Text Search Engine

2010.01.28

For a very long time, I was convinced that a FULLTEXT index in MySQL was the best solution for all your searching needs. Then I realized that it was horribly slow, and mixing with complex joins completely destroyed any chances of using MySQL indexes in any way that would make sense or get decent results. The solution to fast and scalable free text search on any website is, of course, a Full Text search engine.

There are a few different ones out there. After a brief affair with Lucene, I settled on Sphinx. Sphinx is easy to install, even on 64-bit machines, and is architected in a way that makes a lot of sense for the web. The following steps were performed on a Red Hat machine. Don’t skip the mysql-dev install, even if you already MySQL installed.

> yum install gcc-c++
> yum install mysql-dev*
> wget http://www.sphinxsearch.com/downloads/sphinx-0.9.9.tar.gz
> tar xzvf sphinx-0.9.9.tar.gz
> mkdir /usr/local/sphinx
> ./configure –prefix /usr/local/sphinx –with-mysql
> ./make
> make install

Once installed, it’s fairly simple to start playing with the packaged example data and queries. The php APIs make integration easy, either to build a service, or use locally as a substitute for MySQL. In fact, as long as the index can be kept reasonably up to date, Sphinx is a better choice for complicated sorts than MySQL.

What's in a Name?

2009.10.13

It’s easy to get caught up in semantics. Figuring out the best names for variables, tables, columns, classes, etc is something that can eat up hours or even days of a development schedule. The idea is that the more precise the name, the better it is. The arguments for precision naming are many :

* Clear names help other developers read your code.
* New developers who come on will immediately understand what’s happening
* Calling well named methods of classes will read like sentence, further increasing readability.
* Clear names will be able to help developers relate things in the UI to the code.

Keep in mind, I’m not talking about naming conventions. Naming conventions are simply rules for choosing the character sequences. They don’t dictate what words you should assign to things in your code.

Whatever names developers choose, they will get strewn throughout the layers of the application. Database, table and column names will be impacted. Variables in server-side scripts. Organization of classes into folders. Javascript file names. Memcache keys. URLs. Just like sand at the beach, the labels the dev team decided on goes everywhere you can think of. Invariably, the marketing team will bound down the hall, and announce the product is being rebranded. Jobs will become Gigs. Friends will become Followers. Application code will become confusing.

New devs won’t get it anyway.

The fact of the matter is overthinking naming is a good way to get nowhere fast. Keeping it simple and just take enough time to make sure that things make sense will give devs more time to focus on important stuff. Like being able to articulate the thought process behind code.

Categories : Development  Web Dev Teams

Do one thing, but do it really well

2009.07.07

In life there are people who will consider themselves Jacks of all Trades. But as the saying goes, they are master of none.

Websites will always start out as a codebase that does everything. There will be a couple files that add users, encode video to Flash, pull rss feeds, assemble HTML, update products, charge users, manipulate images, redirect old links, handle file uploads, calculate shipping, delete categories, create rss feeds, search the database, etc. Sometimes the code to do these will be organized into files, sometimes it won’t. The whole site will run on a single server, or more likely, a slice of a single server.

None of the things in the above list will be done well. None. This is mostly because there is too little code and too little hardware focused on doing too much. Also, every piece of code will be tightly coupled. So any one of those features could potentially get a ton of traffic, or hit a bump, and consume a ton of resources. Once that happens, it’s safe to assume the whole thing will go down in flames.

So to avoid the Fail Whale, its really important to build sites as a group of components that work together. Architecture is key, and when carefully thought out, can ensure that the most important parts of the site stay up. Even when your image manipulation script on the backend freaks out, the home page should continue to load flawlessly.

With database-driven apps (almost every major site on the web), there needs to be particular attention paid to a caching layer. Again, since most sites start out with a jumbled codebase, the likelihood that all the code to manage data is in the same place is unlikely. Given the complexities of managing cache objects, making sure that objects are invalidated on update is crucial to making updates look seamless. So there needs to be a set of code that’s good at one thing: managing data and its cache.

Search is another area that commonly relies on database, and can eat a ton of resources. If performing search in SQL, difficult queries can lock tables and keep other queries from being answered. As good as some DBMSes have gotten at handling search ( ie MySQL’s FULLTEXT ), they still can’t fulfill the concurrency demands of a site with heavy traffic. So, again, the solution is a change where a resource intensice feature needs to be isolated from other code. There are a few different ways to do this. One is running replication, which may not be possible in smaller hosting environments. Another is to use Full Text Search (Lucene, Sphinx, etc.) Again, this may not be possible in smaller hosting environments.

Using code that’s already good at managing and retrieving data, an interface can be built to query your data. A second hosting environment that’s suitable for running the search tool of your choice can then query the data code for updates it needs to keep itself updated. In turn, this server will return search results without tying up any resources necessary for doing important stuff, like serving the home page.

So in these two short examples, we’ve created a theoretical architecture that can sustain heavy, site-breaking traffic to the search, and still continue to serve the home page. Of course, until the Apache server becomes so inundated with requests that it can’t do anything. Then it’s time to get that load balancer in place…