A tool to DRY off

2009.05.19

Every developer worth their bits knows that code repeated is a maintenance problem waiting to happen. However, code written by a group of devs under tight deadlines tends to get pretty ugly pretty quick, with lots of snippets being copy/pasted because ‘they work’. The allure of getting things up and running quickly is a siren call that constantly lures us away from the all-important refactoring and integration that makes code maintainable. But once the dust has settled, and there is a spare moment to re-read and consider what should be changed, the task of refactoring seems too daunting to even bother.

Thankfully, Sebastian Bergmann has created a tool that will find every dirty little Ctrl-V. It’s called the php Copy Paste Detector, and can be installed using pear. Or download the source from git.

What’s really interesting is when you play with the number of tokens and number of lines that constistutes a copy-paste. For my purposes, I used a minimum of 5 lines. In quite a few cases, the copy/paste turned out to declarations, or including the same style sheets and scripts on different pages. But when it was php, it was abundantly clear what needed to be refactored, and how.

  • http://www.semanticdesigns.com/Clone Ira Baxter

    AFAICT, Bergmann’s CPD only finds bits of code that consist of source lines that are exactly the same. Running it on Joomla produces something like 1% detected clones.

    See CloneDR at http://www.semanticdesigns.com/Products/Clone/PHPCloneDR.html for a copy-paste detector that works on huge code bases, and detects clones that are copy-paste-*edit*, even if the variables are renamed, the code is reformatted, and/or comments are changed/inserted/deleted. Running this Joomla finds that the code is 15% made of clones.