Skip to main content
  1. posts/

Faster file deletion with rsync

·343 words·2 mins· loading · loading · · ·
Dev Programming
Table of Contents

Introduction
#

As a bioinformatician, I often work with huge datasets – think terabytes of intermediate files, reads, or assemblies that I no longer need after a workflow run. Back in graduate school, working on our High Performance Computing (HPC) cluster running Linux, I learned that deleting large directories with rm -rf could take far longer than expected. It was slow enough that I’d sometimes submit the deletion as a separate compute job.

After some searching, I came across a tip suggesting an alternative: use rsync. It may sound odd at first, since rsync is usually used for copying or syncing files, not deleting them – but it actually offers a clever shortcut for wiping out files multiple orders of magnitude faster.

It should go without saying: use caution when deleting data. Always double-check paths before running potentially destructive commands. And never use sudo here unless you really know what’s happening.

The rsync method
#

What rsync is meant for
#

rsync was designed for efficiently syncing files and directories, especially over networks. It compares differences between a source and destination and transfers only what’s needed. But that same compare-and-sync mechanism also makes it excellent for deleting – by syncing your target directory with an empty one.

Using rsync to delete data
#

Here’s how to do it safely and effectively.

Create an empty directory (this will act as the “template” for deletion):

mkdir /tmp/empty

Now, tell rsync to make your target directory (/data/big-ol-directory) match the empty one:

rsync -a --delete /tmp/empty/ /data/big-ol-directory/

rsync will go through /data/huge_tmp and remove everything that doesn’t exist in /tmp/empty, effectively deleting all files while leaving the directory itself intact.

If you want to preview which files would be deleted before actually removing anything, run a dry-run:

rsync -a --delete --dry-run /tmp/empty/ /data/big-ol-directory/

That’s it! This approach is typically faster than rm -rf, especially on file systems with heavy metadata overhead or when deleting many small files.

One nice bonus is that rsync gives you more predictable progress output and can handle tricky filenames or permissions cleanly.

Related

Leveraging generic type hints of classes in Python
·805 words·4 mins· loading · loading
Dev Python Type Hinting Programming Tutorial
nciraspw
Dev Python Package Programming Network Ras
boston311
139 words·1 min· loading · loading
Dev Python Programming Package