I highly recommend The AWK Programming Language by Aho, Kernighan and Weinberger[1][2]. It has a very similar style to The C Programming Language, also co-written by Kernighan. Even as a general book on programming, it is pretty good in that it presents the reader with useful exercises from simple text processing all the way up to more advanced database implementations. I would not recommend it as a first programming book, but it really covers how powerful the AWK language is in a way that is very accessible.
I would love there was some sort of course to 'Become a Command Line Ninja' where you work through various tasks using command line tools, maybe with an element of competition.
Obviously people pick this up on the job but not everyone who works with Unix terminals has a fully fledged developer/sys-admin role. My 'bag-of-tricks' evolves at a snail's pace.
UNIX Power Tools by O'Reilly is a text I strongly recommend for this. It's a sort of hardcopy hypertext narrative through a number of command-line and terminal-focused tools.
Parts of it are dated now (it's ... gasp, 22 years since its first edition), but since it's one of those books based on UNIX philosophy it's aged quite well.
It, Linux in a Nutshell, the Sed & AWK book, are still a pretty good introduction.
Something to add to your 'bag-of-tricks' -- this was once handed down to me by a master by the name of .ike, as it was handed down to him before. Maybe you'd be interested:
"The 3-finger Claw Technique" The sweetest 3 functions, ever.
In the above example, using 'safe' suddenly gives the user the following special powers:
1. if `/some/dir` does not exist, the script will safely exit before that un-tarring does any damage.
2. the actual 'directory does not exist' will return to stderr for the script, (as opposed to just telling us the useless fact that the enclosing script failed).
Now bourne shell starts behaving like a modern language! (ala Python/Ruby tracebacks, Perl errors, etc…)
Additionally, if you 'exit 0' at the end, you can run non-safe operations, and always be guaranteed that if the shell exits, it was 'complete'.
An example:
# consider the following lines
safe cd /some/dir
tar xzvfp /my/big/tarball.tbz
Now, safe was removed from the un-tar command, right? So, imaging tarballs created by people using a Mac (with HFS+ and some filesystem-specific sticky bit somewhere in the filesystem which was tar'd up). Now, you un-tar it on some BSD or other NIX box, and tar complains as it goes- and exits non-zero. One way to handle this, is to consider the tar 'reliable', and not use safe for it.
Now, if we continue to be conscious of this simple safe/notsafe distinction with these scripts, they can be called by other safe scripts- and behave just like any respectable UNIX program!
If it fails, (for bad perms, no tarball to unpack, etc…) cron will actually have a reasonable message to email/log on failure!
Thanks to the noble efforts of the Clan of the White Lotus, these 3 functions originate during the late thirteenth century. The original author, William Baxter, explained that these three functions took 15 years to boil down to what they are- I believe that after mercilessly abusing them for several years myself…
Below is another example with more bits.
#!/bin/sh
shout() { echo "$0: $*" >&2; }
barf() { shout "$*"; exit 111; }
safe() { "$@" || barf "cannot $*"; }
for i in "before $@ after"
do
echo "arg <$i>"
done
for i in "before $* after"
do
echo "arg <$i>"
done
safe echo "this is ok"
safe echo "this is ok, too" || safe echo "so is this"
safe bad_echo "this is bad"
exit 0
You don't need to copy those boilerplate functions everywhere you go. Bash and others have "set -e" which will make the overall script exit with an error if one of its commands returns an error. Also useful is "set -u" to detect the use of unset variables (though some people write shell scripts which depend on those, so it's not as universally applicable).
Try writing "set -eu" as the first line of your shell scripts. It's sort of like compiling with "-Wall -Werror"--not the default but perhaps should have been.
Knowing bash, sed (single-line), grep quite well, I still regularly run into problems (mostly multiline complex regex substitutions I guess) where I could use a more expressive or more efficient tool. I am thinking that I should perhaps invest in learning AWK. But then, why not go all the way and learn Perl? What are the advantages and disadvantages?
If you've got bash, sed, grep, and add awk, then you are 80-90% of the way to Perl 4 proficiency -- that's basic Perl circa 1990, adequate for short scripts (and more efficient that bash/sed/awk scripts). Perl started life as a superset of awk, with sed, grep, and some weird extras (streams) thrown in. Indeed, the perl distro ships with a2p, a program for automatically converting awk scripts into perl scripts.
Perl 5 adds variable scoping, pointers, modules, OOP and a bunch of powerful expressive stuff and abstractions -- but if you learn awk on top of what you've already got it means you're about 90% of the way to introductory Perl proficiency, and you can add the advanced stuff later.
So by all means learn awk! Just remember that it's not suitable for really big jobs -- if you need to create awk scripts that go beyond a couple of dozen lines of code, that's when you'll probably want to upgrade to Perl.
While I agree that awk and Perl 4 have roughly comparable powers, I would not call Perl "a superset of awk" (At least not Perl 3/4; I'm not familiar with the syntax of earlier versions).
Where awk really shines is situations that call for a set of record based production rules, and if in fact your problem is in that domain, you could write hundreds or even thousands of rules without the problem becoming unsuitable for awk. Like for any domain specific language, you could consider the limitations imposed by awk as valuable discipline to keep you in the problem domain.
While perl can be written in an awk-like style, it's really much more of a general purpose language, so it can accommodate workflows that deviate substantially from the "sequentially process records" style that awk excels at.
The autosplit and loop options to Perl make it very suitable for an awk replacemnt. It even has BEGIN and END pseudo-patterns.
Once you've tried Perl you don't want to flip back and forth to awk because the inconsistencies in syntax (within and to each other) will drive you mad.
What you said, but really once you've started Perl there's no reason to go back to sed or awk.
Any program you ever want to distribute has to have some decent error handling and option and arg parsing and that precludes them both.
Even when I think in sed or awk, I'll generally write in Perl; or, if I have an old awk script I'll use a2p to convert it.
a2p (awk to perl), s2p (sed to perl) and find2perl (find to perl) are all excellent ways to get you into Perl.
Nowadays my preferred language for almost anything is Python, but I still find myself writing Perl for some throwaways.
I think there is no use to master bash, sed and awk. When you have a task that becomes tricky (beyond entry level) in sed, it is often easy in awk. When you have a task that becomes tricky in bash or awk, switch to a more powerful scripting language like perl/python/ruby/?. I love perl, but I feel it is getting outdated. I try to improve my programming speed in python but I am still far from my speed in perl.
Awk, more often than anything else in my toolbox has been the thing that earns respect and sometimes awe when I come into a new environment and a routine time-saver in practice.
sed, grep, and awk are such useful utilities. A lot of developers I work with tend to reach for Python first, but in most cases these tools are all that you need.
Using Python is probably a better choice though. Its so easy to call other programs, emulate a bash script and much much more....all in code which is easily readable and maintainable!
I like Python well enough, but at Localytics we use Ruby where I'd previously used Python and I'm finding it a lot nicer. Backticks alone make my life way, way easier.
And here I've been using Awk as a nicer grep in my CLI for searching for simple strings... This is pretty interesting, I can see it really being useful too.
1. http://cm.bell-labs.com/cm/cs/awkbook/ 2. https://www.goodreads.com/book/show/703101.The_awk_Programmi...