If you are a Java (or a Groovy) developer, your home directory will probably contains (at least) some GBs of downloaded JAR files.

Every dependency management tool (and IDE) downloads its own copy of the JAR files he need in a different directory. This lead to the situation where you might have several copies of the same file wasting your precious disk space!

In such situation one can take advantage of one of the nicest features of the BTRFS filesystem: deduplication.

Deduplication is not a feature supported by the official BTRFS’s tools yet, altough the filesystem itself, being a COW filesystem, provides all the needed capabilities.

To leverage them, there’s an handy tool called duperemove.

Basically, what this amazing tool does is scan one or more directories (or subvolumes), find same content extents and deduplicate them. Deduplicate an extent mean that every reference to it points to the same phisical extent on disk (having hence only a phisical written content on disk).

To give you some numbers I’ll show how I used this on my commuting laptop (which has a small 300GB disk, so every GB is precious)

In my home have the following directories (related to the topic):

.eclipse
.gradle
.grails
.groovy
.IdeaIC13
.IdeaIC14
.IntelliJIdea13
.IntelliJIdea14
.ivy2
.m2

for a total usage of 4.8GB.

I used duperemove to deduplicate them with this command:

./duperemove -rdh /home/federico/{\
.eclipse,\
.gradle,\
.grails,\
.groovy,\
.IdeaIC13,\
.IdeaIC14,\
.IntelliJIdea13,\
.IntelliJIdea14,\
.ivy2,\
.m2\
}

after the execution the result was:

Comparison of extent info shows a net change in shared extents of: 704.0M

which is roughly a 14%, but, as they say, YMMV.

One might also suggest that I clean those directory from time to time, or just delete old releases, but  when I’m commuting I work offline and being able to patch and build some project which I’ve not been working on recently is quite handy. Also, having a small HD, even 700M might be useful!

UPDATE: including /usr/share/java(where my OS stores all packaged java libraries) and /usr/local (where I usually install binary distributions of software) further improves results.


visitors