The Progress Bar Pysch

2011.03.13

A classic UX problems is communicating to users how long they’ll have to wait before their task completes. A spinner or progress bar provides feedback that the system is, in fact, doing something, and how long that task may take. Psychologically, progress bars create tension while progressing, and resolution when completed.

From a technical standpoint, progress bars are black magic. The developer is attempting to estimate a task based on potentially thousands of variables. In the case of a file upload, the developer has to deal with differing network conditions, disk performance, etc, etc, etc. Then they have to write the code to communicate what is happening to the browser. Not a trivial task. However, when executed well, can provide the user with reasonable feedback about their task.

Lately, sites like LinkedIn, Mint.com, and OKCupid have used that same tension to motivate users to completely fill out their profiles. During profile creation, a progress bar is displayed indicating how far the user has come along. Once the user completely fills out their profile, the progress bar hits 100%, and what changes? In most cases, nothing. The progress bar is just a psychological hack to entice users to go through the entire process.

The question is: Exactly how effective is the progress bar at enticing users to fully complete the task at hand? And are they actually worth it.

Categories : Best Practices  sxsw  Tools

Search is Hard

2010.10.10

The title of this post is a direct quote from a Facebook engineer presenting at the SXSW panel Beyond Lamp. Search is a critical function of any site, but its gotten much much harder as Google has gotten better. To quote the Beyond Lamp panel one more time:

Search is always compared against Google, which is like comparing the canoe you just built to the QE2.

The difficulty of search is made apparent by the majority of sites, even major sites get it wrong. A large factor in the success of search is relevancy. Google takes into account 500 million variables in determining how relevant content is. Not only that, but they also know who you are, what you’ve clicked, and can make decisions based on that to present pages that are more relevant to you. Facebook’s EdgeRank, LinkedIn’s Signal are other examples of search implementations that are vast in scale.

In a startup, where time is of the essence and resources need to be begged, borrowed or stolen, search is a huge challenge. Like trying to be build the QE2 with nothing but a swiss army knife. Basic tools normally don’t cut it. MySQL’s FULLTEXT indexes are helpful, but start trying to implement basic IR techniques like booleans, and MySQL’s builtin functionality starts to lack the ability to get the results your want.

There are ways to simplify building search. Sphinx provides great matching capabilities and incredibly fast sorting. When combined with other data, Sphinx can be a great way to get users fast, meaningful results. The one downside with using a document based search engine is that there is little room for returning completely tailored results. Unlike MySQL, which allows you to slice and dice data in any way you choose, it is more difficult to return results that take into account relationship specific to users and documents. However, for most search tasks, it should function very well.

Gearman

2009.07.25

Users have high expectations of web apps in terms of performance, responsiveness and tons of features. Normally, you’re only allowed two of any list of three really cool things. In the case of Web Apps, that would be true. Most will find some compromise of between performance / responsiveness and tons of features. More features usually equals less responsiveness, depending on the feature and scale.

Enter Gearman. Gearman is a queuing system that allows work to be farmed out to other servers. Most importantly, it allows for intense tasks to be queued and performed in the background. This means that when a user performs an action that could potentially take a long time (sending notification emails, updating Full Text indexes, etc), that slow task can be queued to run in the background, and the page can be sent to the user, keeping things snappy.

Gearman is pretty simple to install on Red Hat.

download gearman from server
> wget http://launchpad.net/gearmand/trunk/0.8/+download/gearmand-0.8.tar.gz

unzip and move into the directory
> tar -xvzf gearmand-0.8.tar.gz
> cd gearmand-0.8

Red Hat didn’t have some dependencies. The next few steps will vary depending on your *nix distro.

Install the libevent developer library.
> yum install libevent-devel

Install the e2fsprogs developer library
> yum install e2fsprogs-devel

configure and install
> ./configure
> make
> make install

/** Net Gearman **/

download php extension from the pecl repo
> wget http://pecl.php.net/get/gearman-0.4.0.tgz

untar
> tar -xvf gearman-0.4.0.tgz

build the extension
> phpize
> ./configure
> make
> make test
> make install

Add the extension to the php.ini

[gearman]
extension=gearman.so

And you’re all set!

Integration will depend on if you decide to use the php extension, and how encapsulated the code base is. I highly recommend using the pecl extension, as it provides great implementations of the client and worker. and Gearman will save you.

Save MySQL

2009.07.16

Runaway queries on MySQL can be a real problem. If a long-running query locks up important tables, other queries trying to query the table will will placed in a queue. Each new query is a new connection to MySQL. Once you hit max_connections, your MySQL connection code will start to fail. Depending on how errors are handled at this stage of the request, this could mean total disaster for a site.

Although there is no way to fix this within the MySQL server itself, a bit of clever scripting can be run via cron to check if there is a problem. Presenting : save_mysql

/usr/bin/mysql -e ‘show full processlist \G;’ 2> /dev/null |
grep -A1 -B5 -E “Time: [1-9][0-9][0-9]?” |
grep -E “\bId\:\ |\bState\:\ ” |
/usr/bin/perl -n -e “if( $. % 2 ) { chomp $_;print $_; } else { print $_; }” |
grep -E “\ State\:\ Sending\ data$|\ State\:\ Sorting\ result$” |
awk {‘print $2′} |
xargs -iTHREAD -r -n1 /usr/bin/mysqladmin kill THREAD &> /dev/null

/usr/bin/mysql -e ‘show full processlist \G;’ 2> /dev/null
This line will grab a list of all the currently running queries and commands from the MySQL server. It also redirects any error output to the blackhole. It produces output like so:

*************************** 1. row ***************************
Id: 842863
User: admin
Host: localhost
db: NULL
Command: Query
Time: 0
State: NULL
Info: show full processlist

grep -A1 -B5 -E “Time: [1-9][0-9][0-9]?”
The grep here will grab line directly below and the 5 above if the time is over 100 seconds. This line can be tweaked to grep for less time. My preference is between 30 seconds and a minute. So instead of
[1-9][0-9][0-9]
you’d have
[3-9][0-9] (30 seconds) OR [6-9][0-9] (60 seconds)

grep -E “\bId\:\ |\bState\:\ ”
This will filter out the other lines from the previous grep and just grab the MySQL process ID and it’s State.

/usr/bin/perl -n -e “if( $. % 2 ) { chomp $_;print $_; } else { print $_; }”
Quick Perl script to put id and state from the step above on the same line.

grep -E “\ State\:\ Sending\ data$|\ State\:\ Sorting\ result$”
This line will filter out the queries being run that are in the state ‘Sending Data’ or ‘Sorting Result’. These are both states where it’s safe to kill the query.

awk {‘print $2′}
This line grabs the query ID from the output.

xargs -iTHREAD -r -n1 /usr/bin/mysqladmin kill THREAD &> /dev/null
Lastly, this line will grab the ID from above to the mysqladmin kill command, effectively killing the query.

Success!

2009.05.30

When making changes to a large site, it’s really helpful to have tools to measure how those changes affect performance. One of my favorite tools is cacti. This is a graph of the load average of one our database servers

Database Load Average

Database Load Average

We done good…

A tool to DRY off

2009.05.19

Every developer worth their bits knows that code repeated is a maintenance problem waiting to happen. However, code written by a group of devs under tight deadlines tends to get pretty ugly pretty quick, with lots of snippets being copy/pasted because ‘they work’. The allure of getting things up and running quickly is a siren call that constantly lures us away from the all-important refactoring and integration that makes code maintainable. But once the dust has settled, and there is a spare moment to re-read and consider what should be changed, the task of refactoring seems too daunting to even bother.

Thankfully, Sebastian Bergmann has created a tool that will find every dirty little Ctrl-V. It’s called the php Copy Paste Detector, and can be installed using pear. Or download the source from git.

What’s really interesting is when you play with the number of tokens and number of lines that constistutes a copy-paste. For my purposes, I used a minimum of 5 lines. In quite a few cases, the copy/paste turned out to declarations, or including the same style sheets and scripts on different pages. But when it was php, it was abundantly clear what needed to be refactored, and how.