XLS Tools

The "XLS Tools" applications let you process data held in Microsoft Excel files directly from the command line, i.e. in the same way that you would manipulate text data using UNIX commands like sed, grep, and awk.

Download

These tools accompany the Excel on the Command Line talk which I presented at YAPC::Europe 2006.

XLSperl

Perl "one-liners" have a great many uses for quick data processing tasks - for me, Perl commands have even replaced the classic UNIX commands by virtue of Perl's greater flexibility. For example, a simple "grep" function can be written as:

perl -lne '/pattern/ and print' file.txt

which improves on the standard UNIX grep command by allowing the extended features of Perl regular expressions to be used. To learn more about Perl "one-liners", see http://minimalperl.com.

However, as with the traditional UNIX tools this form of processing is only suitable for source data that is in some sort of plain text format. XLSperl allows these "one-liner" Perl commands to be used to process data held in Microsoft Excel (.xls) files.

Usage
Basic usage of XLSperl is as follows:

XLSperl [options] -e 'perl_code' file.xls

Additionally Microsoft Excel files can be piped in to XLSperl

cat file.xls | XLSperl [options] -e 'perl_code'

Features
XLSperl supports many of Perl's standard command-line options to reduce the amount of code needed to make a useful command.

The most important of these are the looping options. Using "-n" will loop through every cell in the Excel file, executing your Perl code (specified with the "-e" option) for each one. So a basic "Excel grep" command can be written as simply as:

XLSperl -nle '/pattern/ and print' file.xls

I've also used the "-l" option here which, as with Perl, enables automatic line-end processing (no need to include "\n" on the end of every print statement!).

While looping on every cell is quite handy, it is often even more useful to loop through a file row by row instead. By using the "-a" (autosplit) option we can do just that. Autosplit runs your Perl code on each row, with the cell values available in the %F and @F variables (these are the same, i.e. $F{A} is equal to $F[0]). This lets you extract columns from Excel files:

XLSperl -nale 'print $F{C}' file.xls

Special variables
As well as %F and @F, XLSperl adds the following special variables to the standard ones available in Perl:

$WS     = Worksheet name
$ROW    = Current row (1 .. x)
$COL    = Current column name (A .. x)
$COLNUM = Current column number (1 .. x)
$CELL   = Current cell (A1 .. ZZx)
@F      = Array of cell values (in autosplit mode)
%F      = Hash of cell values (in autosplit mode)

With these you can do things like make an enhanced "grep" function that gives the cell position for each match that occurs:

XLSperl -nale '/pattern/ and print "$ARGV:$WS:$CELL $_"' *.xls

Creating Excel files
New in version 0.3 is the ability to create Excel files as well as read them. To do this, use the XLSprint command, which takes a list of values and writes them as a row to an Excel document, e.g.

XLSperl -e 'XLSprint qw/Hello World/' >hello.xls

So for a "grep" function that inputs and outputs XLS files, use:

XLSperl -nale '$F{A} =~ /pattern/ and XLSprint @F' in.xls >out.xls

You can even process text files and output the results in Excel format - XLSperl will automatically detect the format of each input file and "do the right thing". The following command will output an Excel document containing details of user accounts from a UNIX system:

XLSperl -F: -nale 'next if /^#/; XLSprint @F' /etc/passwd >pw.xls

Note that for this example I've used the "-F" option, which sets the record separator when dealing with text files in autosplit mode.

Documentation
Full documentation for XLSperl is contained in the README.txt file in the distribution archive, and can also be read on-line here.

To do
XLSperl works well but it isn't quite perfect yet... :-), these are the features I'm working on for future versions:

  • Include a mechanism for in-place editing of Excel documents (like the "-i" option to Perl).
  • Ability to use XLSperl as a command interpreter ("#!/usr/bin/XLSperl" in scripts).

Dependencies
The binary packages of XLSperl have no external dependencies, but if you want to run from source you'll need to install the CPAN modules Spreadsheet::ParseExcel, Variable::Alias, and Spreadsheet::WriteExcel::Simple.

 

XLSgrep

XLSgrep is an implementation of a basic "grep" command for Microsoft Excel (.XLS) files.

Usage:

XLSgrep search_pattern file1.xls ... fileX.xls
cat file.xls | XLSgrep search_pattern

where search_pattern is a Perl regular expression.

To use XLSgrep you'll need to install the Spreadsheet::ParseExcel module from CPAN.


Copyright and License
Copyright (C) 2007 Jon Allen.
XLSperl and XLSgrep are licensed under the terms of the Artistic License version 2.0.