eBay's TSV Utilities

Command line tools for large tabular data files.


Project maintained by eBay Hosted on GitHub Pages — Theme by mattgraham

Visit the main page

Other open-source tools

There are a number of open-source toolkits with functionality similar to the TSV Utilities. Several are listed below:

A much more comprehensive list of tools can be found here: Structured text tools.

The different toolkits are certainly worth investigating if you work with tabular data files. Several have quite extensive feature sets. Each toolkit has its own strengths, your workflow and preferences are likely to fit some toolkits better than others.

File format is perhaps the most important dimension. CSV files are very common. However, CSV files cannot be processed reliably by standard Unix tools. For this reason, CSV toolkit functionality typically extends into the space of traditional Unix tools. For example, CSV toolkits often have their own "sort" operation, as Unix sort does not operate reliably on CSV files. This is unfortunate, as creating a program with the speed and quality of a program like GNU sort is a meaningful undertaking.

Many CSV toolkits also support TSV files, certainly appealing. Unfortunately, usage can be complicated and error prone due to the need to specify record delimiters and CSV style escape rules. Another issue is that not all CSV toolkits support fully turning off CSV escape syntax. This is usually not obvious and can lead to subtle errors when processing TSV files containing quotes.

Tradeoffs between file formats is its own topic. Appropriate choice of format is often dependent on the specifics of the environment and tasks being performed. See Comparing TSV and CSV formats for a discussion of TSV and CSV formats. The brendano/tsvutils README (Brendan O'Conner) has a nice discussion of the rationale for using TSV files.