Perl "one-liners" have a great many uses for quick data processing tasks - for me, Perl commands have even replaced the classic UNIX commands by virtue of Perl's greater flexibility. For example, a simple "grep" function can be written as:
perl -lne '/pattern/ and print' file.txt
which improves on the standard UNIX grep command by allowing the extended features of Perl regular expressions to be used. To learn more about Perl "one-liners", see http://minimalperl.com.
However, as with the traditional UNIX tools this form of processing is only suitable for source data that is in some sort of plain text format. XLSperl allows these "one-liner" Perl commands to be used to process data held in Microsoft Excel (.xls) files.
XLSperl [options] -e 'perl_code' file.xls cat file.xls | XLSperl [options] -e 'perl_code'
On Microsoft Windows you need to use double quotes (") instead of single quotes (') around the Perl command, e.g.
XLSperl -e "perl_code" file.xls
XLSperl supports many of Perl's standard command-line options to reduce the amount of code needed to make a useful command.
The most important of these are the looping options. Using "-n" will loop through every cell in the Excel file, executing your Perl code (specified with the "-e" option) for each one. So a basic "Excel grep" command can be written as simply as:
XLSperl -nle '/pattern/ and print' file.xls
I've also used the "-l" option here which, as with Perl, enables automatic line-end processing (no need to include "\n" on the end of every print statement).
While looping on every cell is quite handy, it is often even more useful to loop through a file row by row instead. By using the "-a" (autosplit) option we can do just that. Autosplit runs your Perl code on each row, with the cell values available in the %F and @F variables (these are the same, i.e. $F{A} is equal to $F[0]). This lets you extract columns from Excel files:
XLSperl -nale 'print $F{C}' file.xls
As well as %F and @F, XLSperl adds the following special variables to the standard ones available in Perl:
$WS = Worksheet name $ROW = Current row (1 .. x) $COL = Current column name (A .. x) $COLNUM = Current column number (1 .. x) $CELL = Current cell (A1 .. ZZx) @F = Array of cell values (in autosplit mode) %F = Hash of cell values (in autosplit mode)
With these you can do things like make an enhanced "grep" function that gives the cell position for each match that occurs:
XLSperl -nale '/pattern/ and print "$ARGV:$WS:$CELL $_"' *.xls
XLSperl can create Excel files as well as read them. To do this, use the XLSprint command, which takes a list of values and writes them as a row to an Excel document, e.g.
XLSperl -e 'XLSprint qw/Hello World/' >hello.xls
So for a "grep" function that inputs and outputs XLS files, use:
XLSperl -nale '$F{A} =~ /pattern/ and XLSprint @F' in.xls >out.xls
You can even process text files and output the results in Excel format - XLSperl will automatically detect the format of each input file and "do the right thing". The following command will output an Excel document containing details of user accounts from a UNIX system:
XLSperl -F: -nale 'next if /^#/; XLSprint @F' /etc/passwd >pw.xls
Note that for this example I've used the "-F" option, which sets the record separator when dealing with text files in autosplit mode.
Full documentation for XLSperl is contained in the README.txt file in the distribution archive.
Excel on the command line, presentation about the development and use of XLSperl.
Jon Allen - Perl & Catalyst developer, web designer, and technical manager.
Contact jj@jonallen.info.
Follow me on Twitter at twitter.com/JJ_Perl.