useful Perl5 oneliners

Writing shell scripts, I often find myself in need of doing some string manipulation. Of course there is sed, awk, cut, grep, etc. and they all have their moments. But those are a bunch of separate tools that often have to be chained up for the desired effect, because each has it's own capabilities and limitations.

In situations, where chaining and piping the standard tools becomes ugly to me, I just use Perl. Perl is a full-blown scripting langage with great string manipulation capabilities that comes pre-installed on pretty much any GNU/Linux system (in its version 5). This ensures portability and also makes the oneliners easily extensible to everything Perl is capable of. Plus, Perl code can be written pretty condensed which comes in handy for oneliners.

Keep in mind that there is also Perl6, the epic upgrade to the Perl programming language, which is still in development and pretty different in some ways to Perl5. Perl6 does not come preinstalled (yet), which is why we focus on Perl5 here.

Perl5 basics

Let's just have a quick recap of Perl's capabilities and concepts important for our oneliners.

Perl has this genius concept of using a default variable whenever there is no variable specified. So if you just print for example, it will actually print $_ for you.

The same is true for regexes: A match-regex like m/\d+/ will search the default variable $_ for digits and return a true value if it found any. A substitution-regex like s/,/\./g will replace any comma in the default variable $_ with a dot and return a true value if it actually did.

Keep in mind that the regex delimiters don't need to be a forward slash. They can be pretty much anything. Forward slashes can be annoying when dealing with paths. To get the directory name of a full file path for example, one could also do a s#/[^/]+$## by removing the last element from the path (replacing it with nothing).

For full Perl5 documentation, visit perldoc.perl.org.

Perl command-line switches

To run Perl code from the command line, hand it to the perl binary via the -e switch:

perl -e 'print "This is Perl code!\n"'

To read from STDIN, one possibility is to do an explicit while(<>){} loop to read line-by-line:

perl -e 'while(<>){print}'

Inside the loop, Perl sets the default variable to the current line. This would be a simple cat imitation, reading and directly printing again.

Since reading from STDIN and doing something to every line is a very common task, there is the -n switch, which spares you writing the loop. The following would be exactly equivalent to the above:

perl -ne 'print'

You could even condense this further with the -p switch, which makes Perl print the default variable at the end of every loop cycle.

perl -pe ''

Pretty useless oneliner anyway :-)

The oneliners

Now let's do something sensible!

joining lines

perl -pe 's#\n$# #g'

This joins STDIN lines together with a space.

splitting lines

perl -pe 's#\s+#\n#g'

This splits every STDIN line at whitespace.

skipping empty lines

perl -ne 'print unless m#^\s*$#'

This skips lines with nothing (but whitespace).

extracting information

It happens that you have a bunch of lines, among which there is some information buried that you would like to access. This information may not present in every line. In order to get the wanted bits, one can make use of perl's regex capabilities. Imagine for example you would like to get the vendor:device id's of your Realtek devices on your machine, which can be found in the lsusb output after the ID word:

lsusb | grep 'Realtek'
# Bus 001 Device 005: ID 0bda:0129 Realtek Semiconductor Corp. RTS5129 Card Reader Controller

As of perl's TIMTOWTDI principle, you have various possibilities getting these ids. Easiest, perform a match with the minimum necessary describing the important parts - here m/ID\s+(\w{4}:\w{4})\s+Realtek/. We tell perl to look for the string ID, followed by some whitespace, then the id pair which we capture by round parens, then again some whitespace and finally the Realtek word. Perl fills the variables $1, $2, $3, ..., $9 for us with the contents of the last matched groups in order. Thus, $1 is set to our id pair 0bda:0129. And since a match expression m// returns something true if it matched something, we can use it as a condition to print it:

lsusb | perl -ne 'print $1 if m/ID\s+(\w{4}:\w{4})\s+Realtek/'
# 0bda:0129

Or, you could describe the whole string with a regex and replace it with solely the wanted bits:

lsusb | perl -ne 'print if s/^.*\s+(\w{4}:\w{4})\s+Realtek.*$/$1/'
# 0bda:0129

This has the advantage that you can do further manipulation in one go, for example swap both ids or the like.


[more onliners are to follow...]

:-)

/nobodyinperson