Death of the CTR

Since the dawn of online advertising, the gold standard of effectiveness has been the CTR. This has made a lot of sense, since for the first time ever, adertisers could leverage technology to figure out exactly how well they were communicating. A user would click, and that click would be recorded. The total number of clicks is compared against the total number of ads that are put on the screen, and bingo, you know exactly how effective the campaign was. Combine that number with more advanced analytics, such as tracking the user past the initial click, onto the advertiser’s site, and onto really interesting places, like the confirmation page of an ecommerce site. This gave advertisers really effective ways to quantify the effectiveness of a campaign.

What has been more difficult to quantify is the value of advertising in creating engagement and awareness. Simply seeing a brand association between host site and banner advertisement creates awareness of a brand. Seeing a banner ad doesn’t necessarily trigger immediate reactions (clicks), but can trigger actions of the user later. Users may be inclined to purchase products later on because of the brand awareness created by seeing banner ads. This is awesome for advertisers, who get a return, albeit indirect, from banner advertising, but it’s far less awesome for publishers. That publisher put a great deal of work to create content that people want to see, and advertising fees are very common way of monetizng that work. However, CPMs are commonly determined by CTR. If that publisher has a lousy CTR, as a result of something terrible, like having a savvy demographic that knows not to click on ads, then that publisher suffers.

Microsoft has long been a proponent of measuring engagement, and Google has recently mentioned rolling out tools that will track a user across sessions on multiple sites. It’s clear that the industry needs to move in this direction, although hopefully it will move slowly and find ways to avoid becoming a ubiquitous, Minority Report-style system where Skynet knows who you are, and will show you ads for Banana Republic after you’ve purchased khakis from the Gap. However, the rewards for publishers could be great if the larger players in the industry were able to track users across websites, and even devices.

May 31st, 2010

MySQL Slave Delay and Maatkit

This post could alternately be titled: ‘How to make developers hate you.’

A very common criticism of MySQL is that there is no support for delayed replication. Delaying data flowing from master to slave can be very useful in certain cases. For example, running a co-located slave for backups is still susceptible to data problems that caused by a DELETE with no where or a mistaken executed DROP. However, by running the slave anywhere from an hour to a day behind, you have the opportunity to catch whatever problems caused and have a good copy of your data ready to go.

In sandbox environments, a consistent slave delay is a great way to reproduce race conditions. In fact, running slave delay gives you the opportunity to ensure that data will be out of sync between the master and slave. When you can count on this part of the environment, developers can test and write code against this condition. Of course, in reality, working in this type of environment is reaally annoying, but necessary.

Delayed MySQL replication can be accomplished by using a tool from the maatkit library. Documentation for the tool can be found at [http://www.maatkit.org/doc/mk-slave- delay.html](http://www.maatkit.org/doc/mk-slave- delay.html). What’s great about this tool is that can be run as a daemon, so that it can be easily run for an extended period of time, without have to do any serious management.

May 3rd, 2010

MySQL Skip-name-resolve

Small, obscure optimizations sometimes have the potential to make the greatest impact. For example, every time a connection is made, MySQL will do a DNS lookup of the host that is trying to connect. If MySQL is handling many connections, the overhead of an extra DNS lookup can be hefty, simply because of the number of extra operations that have to be performed before MySQL can actually start doing actual work.

Thankfully, there is an option in recent versions (4.1+) of MySQL that will instruct MySQL to skip the extra DNS lookup. It’s a fairly obscure option called skip-name-resolve. The only caveat to using this option is that the users defined the GRANT tables can only use IP addresses as hostnames. For most MySQL users, this shouldn’t be an issue.

Apr 2nd, 2010

Sandboxes

Setting up a sandbox environment has normally been a trivial task. Set up a vhost, get a copy of the database, build out the app, and start doing stuff. When ‘Stuff’ is ‘Done’ push the changes onto production, and bask in your own crapulence. That is, until your data set exceeds the limits of the sandbox, and the SOA which is saving the day in production is becoming nothing but a headache in development.

The basic sandbox environment for an app includes a reasonably recent data set, similar ( underpowered can be OK, depending ) hardware, the exact same versions of php, mysql and apache. Php, mysql and apache need to be configured exactly the same way as in production. In fact, as part of this process, it might be useful to pull down those ever so important configuration files, put them in a safe place. Perhaps source control (cough). Consistent configuration is extremely important. Bugs produced by configuration problems are notoriously hard to reproduce, and result in devs combing through code looking for bugs that don’t exist.

Maintaining a few sandboxes should be a trivial endeavor. That is, until your project gets too big. A natural response to handling ever-growing problems is use a Service Oriented Architecture; that is, to shard off aspects of the app and dedicate hardware and resources to it. However, three or four shards later, multiplied by an environment for each developer, the guy who was doing sys admin work as needed just became full timer. Unfortunately, there’s no way around this, even with a clever sys admin, who can leave enough automated scripts around so that developers can mostly maintain their own environment.

The fact of the matter is maintaining the development environment is one of the most important things a company can do. Close attention to detail in the sandbox will make all the difference in the deployment process. Changes in code base, file permissions and configuration can all be tested and deployed the same as in production. So every build to every sandbox (everyone builds daily, right?) is a chance for the development team to catch mistakes, and learn from them, before the big push. And if that fails, we all know how to handle a crisis.

Japanese Inspection

Everybody procrastinates. Some tasks that get pushed off don’t matter, it just gets done later. Some tasks that go over deadline result in profanities and bloody noses. But every once in a while, tasks come along that have an expiration date. As in, if it doesn’t get done by a certain time, it doesn’t matter.

You ever heard of a “Japanese Inspection?” Japanese Inspection, you see, when the Japs take in a load of lettuce they’re not sure they wanna let in the country, why they’ll just let it sit there on the dock ‘til they get good and ready to look at, But then of course, it’s all gone rotten… ain’t nothing left to inspect. You see, lettuce is a perishable item…like you two monkeys.

What Big John was referring to was the fact that all he had to do was ignore Cole and Rowdy until they didn’t any fight left in them. Tasks can be just the same way. Eventually, the need for the task to be done just goes away, or starts to smell. The only thing that really matters is being able to tell the difference between the things that really need to get done, and the things that just aren’t important enough to get done.

Feb 12th, 2010

Sphinx Full Text Search Engine

For a very long time, I was convinced that a FULLTEXT index in MySQL was the best solution for all your searching needs. Then I realized that it was horribly slow, and mixing with complex joins completely destroyed any chances of using MySQL indexes in any way that would make sense or get decent results. The solution to fast and scalable free text search on any website is, of course, a Full Text search engine.

There are a few different ones out there. After a brief affair with Lucene, I settled on Sphinx. Sphinx is easy to install, even on 64-bit machines, and is architected in a way that makes a lot of sense for the web. The following steps were performed on a Red Hat machine. Don’t skip the mysql-dev install, even if you already MySQL installed.

Sphinx installation
1
2
3
4
5
6
7
yum install gcc-c++
yum install mysql-dev*
wget http://www.sphinxsearch.com/downloads/sphinx-0.9.9.tar.gz
tar xzvf sphinx-0.9.9.tar.gz
mkdir /usr/local/sphinx
./configure --prefix /usr/local/sphinx --with-mysql> ./make
make install

Once installed, it’s fairly simple to start playing with the packaged example data and queries. The php APIs make integration easy, either to build a service, or use locally as a substitute for MySQL. In fact, as long as the index can be kept reasonably up to date, Sphinx is a better choice for complicated sorts than MySQL.

Jan 29th, 2010

Google Short Links

Having your own domain and server is a really fun thing. You can keep a blog, store files, post photos. All good ole down-home interwebz fun you can dream up. With the advent of micro-blogging, everyone now has an audience they need to communicate with in the briefest fashion.

Short links have become the norm for sharing, but with a huge price. Short links break the interconnectedness of the Internet. Search results that depend on the count of links become incorrect. Some services try to reconnect links between domains by using some DNS trickery. But the issue remains that there is a middleman that can’t help make the connection.

Unless, of course, that middleman is the search engine. Google’s recently announced URL shortener could solve many of the problems inherent with URL shorteners. By being the middleman, Google would have all the necessary information to put the pieces back together. It’s easy enough to set up, provided you have a Google Apps account. It does not currently have an API, but here’s hoping.

Jan 7th, 2010

30 Seconds to Mars :: This Is War

30 Seconds to Mars recently released their third album, This is War, and it is quite the departure from their first album. Brand New Day was raw and angry, with amazing guitar sounds, great composition, and a real sense of urgency in the writing. The album was really exciting to listen to, and the live performances were great. I saw them at Avalon in NYC a few years ago, and it’s still one of my all time favorite shows.

This is War is mild and boring in comparison. The effects driven distorted guitars characteristic to Brand New Day is missing. They seem to have been replaced with over-produced electronics. Jared Leto has less than half the intensity than he did in Brand New Day, and the lyrics have lost their edge. The focus on deeply layered choruses of what sounds like children singing lacks impact. The collaboration with Kanye and his 808 didn’t really go anywhere, and just seemed to pull the band further from their roots. It’s sad to hear a band that had such a great band with a unique sound has gone so far off track.

Dec 22nd, 2009

Changing Criteria

Occasionally, a project will come across my plate with the criteria, ‘Make sure this works everywhere, is completely template-able, and is something we can grow with.’ Normally this is coupled with ‘We need this to work with X right now, and Y and Z later.’ What I really hear is ‘Make it work for X, and ship the damn thing.’ After all, hitting those deadlines is really important.

Of course, this has a whole bunch of ugly assumptions tied to it. The first is: when I get to Y, everything I did for X will work. All I need to do is drop in a few config changes, and tweak a few parameters, and I’m done. (Yea, right) Secondly: that every case Y needs to cover is contained within X. (Not Likely) Third: All of this will be so well documented that any literate individual will be able to implement Y by osmosis.

So. Do we spend time now or later? Shipping X seems pretty simple, so why not just build X, satisfy the business dude and call it a day. Spending time now means that deadlines may have to shift, and something that should be simple becomes complex. We have other, more important, projects to work on.

Eventually Y comes calling. So let me introduce you to…Future Web Dev Guy Person Girl! If you’re lucky, that person is you. If you’re not, it’s another dev. The assumptions we made back in paragraph 2 have reared their ugly heads. Since they were assumptions, you’re probably boned. If not, you’re probably one of these guys. If you’re the rest of us, Future Web Dev Guy Person Girl definitely hates your guts, because the groundwork that was supposed to be laid out is not there. They’re running through a lice-infested rat’s nest of procedural functions trying to pass the additional variable that will make this all work.

The best way to keep Future Web Dev Guy Person Girl from cursing like a sailor is implement correctly, test thoroughly, and deal with Y before it’s due. Deadlines need to be managed according to project scope, and if project scope includes Y, it needs to be accounted for now, before you lose a friend in Future Web Dev Guy Person Girl.

Dec 12th, 2009

The Technician, Now on a Cloud Server

I am pleased to announce that this site is now hosted on the Rackspace Cloud. It was a simple migration from MediaTemple, and has given me the level of control I want. I got to choose my OS (CentOS), versions of php and MySQL, and setup apache how I like it. I’m free of Plesk and those and the limitations therein.

The one thing I would really like to see from the Rackspace Cloud is DNS Support. My goal when migrating [http://chr.ishenry.com](http://chr.ishenry.com) was to move entirely off of MediaTemple. The one thing I really did like about hosting with them was that DNS was integrated directly into the service. With the Rackspace Cloud, there was no such convenience. However, a quick signup with DynDNS and a tweak to my domain registrar solved that.

Big thanks to Ryan Kearney’s video tutorial for the yum command that brought everything together.

Nov 29th, 2009