Understanding Bash (2018)

fooblat · on Jan 16, 2020

I love bash and I know I'm not alone.

Sure, it's syntax is based on ideas from the 1960s[0] and it has a weird core library of functions[1]. Certainly writing code that is a blend of logic and managing other programs takes some getting used to. However, it is very well documented[2] and I've had great success with it over the past 20 years.

If one takes the time to get to know it, it is actually easy and fun to write scripts that are robust and easy to maintain.

I'd venture to say that bash is the putty in the little gaps of the internet. It is the fitting that glues programs to the various systems on which they run.

0. https://en.wikipedia.org/wiki/ALGOL

1. https://en.wikipedia.org/wiki/POSIX

2. https://www.gnu.org/software/bash/manual/

edit: formatting

bordercases · on Jan 16, 2020

It's also extremely fast.

kazinator · on Jan 16, 2020

Sure, until you need to do something advanced like have a function return a string and use it in the caller.

Test case:

  #!/bin/bash

  fun()
  {
    echo "result"
  }

  RES=$(fun)
  echo $RES

Snippet from system call trace (via "strace bash ./test.sh"):

  pipe([3, 4])                            = 0
  rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
  rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
  rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
  lseek(255, -10, SEEK_CUR)               = 51
  clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fccbed72a10) = 3870
  rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
  rt_sigaction(SIGCHLD, {0x446240, [], SA_RESTORER|SA_RESTART, 0x7fccbe3b0ff0}, {0x446240, [], SA_RESTORER|SA_RESTART, 0x7fccbe3b0ff0}, 8) = 0
  close(4)                                = 0
  read(3, "result\n", 128)                = 7
  read(3, "", 128)                        = 0

In other words, this requires a pipe to be created a child process to be spawned, and its output to be read from that pipe.

In any reasonable scripting language, and even some unreasonable ones, this is all done in one address space. How about, oh, Awk:

  function fun() { return "result" }

One of the most common mantras for effective shell programming is "avoid writing loops; orchestrate external utilities".

Spivak · on Jan 16, 2020

The way one typically accomplishes this in bash is the following since the $() construct spawns a subshell.

    #!/bin/bash
    
    fun() {
        res="result"
    }

    fun && echo "$res"

The same strace snippet.

    openat(AT_FDCWD, "./test.sh", O_RDONLY) = 3
    stat("./test.sh", {st_mode=S_IFREG|0755, st_size=53, ...}) = 0
    ioctl(3, TCGETS, 0x7ffd65c6d890)        = -1 ENOTTY (Inappropriate ioctl for device)
    lseek(3, 0, SEEK_CUR)                   = 0
    read(3, "#!/bin/bash\n\nfun() {\n  res=\"resu"..., 80) = 53
    lseek(3, 0, SEEK_SET)                   = 0
    prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=512*1024}) = 0
    fcntl(255, F_GETFD)                     = -1 EBADF (Bad file descriptor)
    dup2(3, 255)                            = 255
    close(3)                                = 0
    fcntl(255, F_SETFD, FD_CLOEXEC)         = 0
    fcntl(255, F_GETFL)                     = 0x8000 (flags O_RDONLY|O_LARGEFILE)
    fstat(255, {st_mode=S_IFREG|0755, st_size=53, ...}) = 0
    lseek(255, 0, SEEK_CUR)                 = 0
    read(255, "#!/bin/bash\n\nfun() {\n  res=\"resu"..., 53) = 53
    fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x2), ...}) = 0
    write(1, "result\n", 7result
    )                 = 7
    read(255, "", 53)                       = 0
    rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
    rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
    exit_group(0)                           = ?

If you want use the old magics to return values like the built-ins then it's a little uglier but a lot cleaner.

    #!/bin/bash

    fun() {
      eval $1=\""result"\"
    }

    fun res && echo $res

kazinator · on Jan 16, 2020

Assignment to hard-coded global variable is not a return mechanism, and generally a nonstarter. It's not a viable approach for generally returning strings out shell functions everywhere in a codebase as a matter of habit.

Use of eval: still slow, because the shell eval re-processes input from the character level and up. eval should be generally avoided as much as possible in shell programming. Careless use of eval can introduce security holes (piece of untrusted datum gets evaled as an expression). You really need to have your black belt in "shell escaping karate".

Producing output and capturing with command substitution is the primary idiom for getting text out of a shell function. It has no visible side effect. The rebinding of standard output is scoped to the process substitution (and is confined to the child process, in fact), and the creation of the temporary process and pipe, expensive as they might be, are invisible to the program semantics.

> The way one typically accomplishes this in bash

In summary, what you propose here is not only vanishingly atypical, but also bad coding practice.

Spivak · on Jan 16, 2020

> In summary, what you propose here is not only vanishingly atypical, but also bad coding practice.

What I'm describing has been SOP in bash for twenty years. You can program in a more modern style and replace some uses of eval with ${!indirect_references}, declare, and associative arrays but eval is still in wide use today. Search through /etc for eval with grep and you'll return plenty of results in the wild.

For example the eval in my previous post could be rewritten as the following

    declare -g "$1"="result"

but this form, although safer, is the less common usage.

Assigning to global variables or passing in a variable name to get the return value is just how bash works. See $MAPFILE $OPTIND $OPTARG for examples of the former and read as an example of the latter.

LukeShu · on Jan 16, 2020

That [declare -g] is bad too, because it means you can't use it to set a local variable (see my sibling comment about dynamic scoping), which is surprising and confusing to the caller.

While I agree with the parent that the whole thing is disgusting, I also recognize that it's a valid optimization technique, and it's sometimes necessary. So, if you do go down that route, I'd encourage you to do it as

    printf -v "$1" '%s' "$result"

LukeShu · on Jan 16, 2020

Without disagreeing with the essence of what you said:

> Assignment to hard-coded global variable ...

Because Bash is dynamically-scoped, it's not necessarily global; it could be scoped to the calling function. For example:

    setit() {
     myvar=bar
    }
    
    myfunc() {
     local myvar
     setit
     echo myfunc $myvar # will print: myfunc bar
    }
    
    myvar=foo
    myfunc             # will print: myfunc bar
    echo global $myvar # will print: global foo

kazinator · on Jan 20, 2020

Most shell programmers do not know what "dynamic scope" is, lacking a Lisp background. Those familiar with lexical languages like C or Java will naively expect "local" to be lexical.

Unlike in dynamically scoped Lisps, in shell programming there isn't any widely recognized and applied naming convention to avoid accidental name capture due to dynamic scope (binding a variable, not knowing that a function which is then used accesses that as a global).

POSIX has no "local", and so makes no recommendation in this regard, nor sets any naming precedent for users programs to follow. It is more concerned with the separation between system variables (named with ALL_CAPS) and user variables (not so named).

Put these two together and you have a recipe for bugs.

3xblah · on Jan 16, 2020

The article mentions loops with an incrementing variable, e.g., i=$((i+1)). It is also possible to do this without loops. Is it faster. Left to the reader to decide. I used this technique sometimes so I could check on the progress of a script while it is running and also continue a script later where I left off. It also allowed me to stop the script by removing a file. At the time, the shell I used had not yet implemented LINENO, which is also very useful when scripts are terminated before finishing.

For example, if I wanted to "loop" 15 times^1

To begin, first create a file that stores the count.^2 It also acts as a way to stop the "loop" from advancing if it is removed.

     echo 0 > x

Then create a file that acts as an "on/off switch" to stop the script, e.g., if it is not runnning in the foreground.

     > x-on

Then run the script. The "prologue" and "epilogue" use only shell built-ins. No external programs or Bash-isms are required.^3

     #! /bin/sh

     CRAWL_DELAY=30;
     test -f x||exit $LINENO;
     test -f x-on||exit $LINENO;
     read x < x;
     test $x -le 14||exit $LINENO;

     # do stuff;
     echo https://example.com/$x;

     sleep CRAWL_DELAY;
     echo $((x+1)) >x;
     test -f x-on||exit $LINENO;
     test -f $0||exit $LINENO;
     $0

To see progress,

     cat x

To stop the script before it finishes,

     rm x-on

To restart the script and continue where left off,

     > x-on

and run the script.

1. Sometimes I did not have access to a program like seq or jot so something like this was not possible

     for x in $(seq 15);do
     echo https://example.com/$x;
     done

2. I always work in a tmpfs-mounted directory so these "files" are just memory, they are not saved on "disk".

3. Bash has too many features for this author to keep track of, like, e.g., base conversion

     echo $((16#a))

aidenn0 · on Jan 16, 2020

I find it to be quite slow; I wrote a command-line arguments parser library using shell metaprogramming, and I had to precompile the output for bash, while in dash it was fast enough to run as-is.

LukeShu · on Jan 16, 2020

I suspect that the slowness was caused by Bash having a couple of pathological cases where it converts back and forth between the LANG/LC encoding and internal wide-characters repeatedly, which can cause it to be stupidly slow. I suspect that setting LC_ALL=C for that portion of the code would speed Bash up to Dash speeds.

aidenn0 · on Jan 17, 2020

Factor of 3 speedup, thanks!

0xbadcafebee · on Jan 16, 2020

> [ ] is a command—basically another way to call the built-in test command.

Literally, '/usr/bin/[', which takes as an argument a final ']', but actually a different binary than '/usr/bin/test'. And is different from the bash builtin. Crazy shit.

  vagrant@vagrant:~$ [ --version
  -bash: [: missing `]'
  vagrant@vagrant:~$ /usr/bin/[ --version
  [ (GNU coreutils) 8.28
  Copyright (C) 2017 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.
  
  Written by Kevin Braunsdorf and Matthew Bradburn.

kohtatsu · on Jan 17, 2020

I think it's cute.

JoelMcCracken · on Jan 16, 2020

This seems like a very basic introduction to bash, which is fine, but I was hoping for something called "Understanding Bash" to help with some of the understanding about why it behaves so weirdly at times.

For me, the biggest problem when writing bash is when I need to do something just a little more complex than what is there, but there is no good way to do it. An example that usually trips me up is this: https://mywiki.wooledge.org/BashFAQ/050.

FWIW, the above link is probably the most useful thing I have found that actually helps me understand bash.

nixpulvis · on Jan 16, 2020

I've been writing a (hopefully one day) POSIX compatible shell using a parser generator library. Having to try and fit sh into a parser framework makes you make some choices, and it's a great way to find these weird things. For example, I was surprised at first to learn that `{ ls }` isn't a valid program. Or:

    FOO=1 echo $FOO
    # vs
    FOO=1; echo $FOO
    # vs
    FOO=1 printenv FOO

fanf2 · on Jan 17, 2020

If you think you might want to use arrays in bash, or any kind of structured data, stop. If your script is getting much more than 100 lines, stop.

Rewrite it in a better programming language before it is too late!

gbacon · on Jan 16, 2020

“Too much for bash, Goose; I’m switching to Perl!”

empath75 · on Jan 16, 2020

> Note that the exit value of true is 0, and the exit value of false is 1. This is somewhat counterintuitive, and it's the exact opposite of most programming languages.

"Happy families are all alike; every unhappy family is unhappy in its own way."

https://en.wikipedia.org/wiki/Anna_Karenina_principle

reportgunner · on Jan 16, 2020

This is there to allow more than just true and false, like warnings and custom exit codes.

https://stackoverflow.com/questions/4419952/difference-betwe...

I have also encountered this in C#

https://docs.microsoft.com/en-us/dotnet/api/system.environme...

AdmiralAsshat · on Jan 16, 2020

It took me awhile to wrap my head around when I initially encountered it, but the idea made sense after I thought about it. If the process exited successfully, we don't care--it did what it was supposed to do. But if it failed, we'd probably like as much info about why it failed as possible; and so having the exit status be more like an error code clearly has some value.

elmigranto · on Jan 16, 2020

Good point. I think the fact that it’s named “exit status” instead of “error code” is primary confusion factor.

empath75 · on Jan 16, 2020

I was being somewhat oblique about it but that was what my quote was saying.

michaelcampbell · on Jan 16, 2020

Some older versions of BASIC did this as well, except false was -1.

I think internally it was represented as all 0 bits for true, all 1 bits for false.

To be pendantic, at least in terms of bash if's, ['s, and test's, I think 0 is true and non-0 is non-true though, no?

flingo · on Jan 16, 2020

I certainly prefer this to the reverse, and having truth values and return values mean something different.

banachtarski · on Jan 16, 2020

The article is wrong though. For error codes, 0 being a success value is the CONVENTION. A non-zero code is an error. This isn't about "programming languages."

SAI_Peregrinus · on Jan 16, 2020

The problem is that for bash `true` is a function (or builtin) that returns 0, while for C (and thus most languages) any non-zero value (typically 1 for the builtins) is true. In c99 and later with stdbool.h, `true` is almost always defined as `#define true 1`.

aidenn0 · on Jan 16, 2020

the existence of the && and || operators (not to mention the true and false utilities) would indicate it's more than just a convention.

gbacon · on Jan 16, 2020

Wrong is a bit harsh for a convention or semantics, more like imprecisely worded.

m4r35n357 · on Jan 16, 2020

It is far simpler to just learn the POSIX shell first - much shorter man page. The the Bash stuff is then just a few additions if you ever need them (other shells are available).

aidenn0 · on Jan 16, 2020

Indeed, I an older version of https://pubs.opengroup.org/onlinepubs/9699919799.2018edition... in two weekends. (I never got around to implementing job control).

While doing so, I found a part of the specification that (as far as I can tell) no shell implements (sourcing my /etc/profile failed with my shell due to the difference): the specification blesses (at the time)[1] 20 utilities that should be run without regards to the PATH, but any other built-ins should not be run if the command does not exist in the path.

So, for example:

    PATH=""
    echo "Hello"

Should fail on every POSIX compliant shell. I haven't found a shell that implements echo as a builtin in which the above fails though.

1: Now it has a larger list for which it is unspecified what happens. See "Command search and execution" for the list. "echo" is not on the list though, so my example is still well-defined in POSIX yet breaks on all shells I tried.

JdeBP · on Jan 16, 2020

You want the Watanabe shell in its POSIX-conformant mode.

* https://unix.stackexchange.com/a/496291/5132

* https://unix.stackexchange.com/a/496377/5132

Also observe the /opt/ast/bin mechanism in the '93 Korn shell.

aidenn0 · on Jan 17, 2020

Wow, someone actually did it! When I reported this non-conformance to the dash mailing-list, I first got pushback that I was wrong about the spec, and then once I convinced them I was right the response was something along the lines of "that's stupid why would we do that"

frou_dh · on Jan 16, 2020

Notably, such a concept as a function having a local variable is beyond POSIX shell.

JdeBP · on Jan 16, 2020

Indeed, and local is one of the few non-POSIX things that Debian people famously could not live without, and so is required to exist in all Debian shells that can be used as /bin/sh such as the Debian Almquist shell.

* https://debian.org/doc/debian-policy/ch-files.html#s-scripts

m4r35n357 · on Jan 16, 2020

"Variables may be declared to be local to a function by using a 'local' command."

frou_dh · on Jan 16, 2020

Here's the spec. Where is it? https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V...

m4r35n357 · on Jan 16, 2020

It is in both ash and dash, the shells I mentioned. Your attempt at point scoring is irrelevant but expected.

ash & dash are about 100kB executables (POSIX plus a tiny number of non-interactive improvments like 'local'), bash is 1100kB. Every shell and subshell.

frou_dh · on Jan 16, 2020

My point is that POSIX shell purity is masochism. Understanding how it's the core and other shells build on top of it is certainly valuable, but it's not inherently particularly good to use.

m4r35n357 · on Jan 16, 2020

Maybe we are at cross purposes. My default shell is bash, but my default "sh" is dash. This is as it should be (and is default in Debian & Ubuntu I think).

If you are talking about interactive use, of course bash is the one to use, but for writing shell scripts, it is a 10X interactivity overhead" over dash.

frou_dh · on Jan 16, 2020

IMO it's a big win for scripts to at least be able to use Arrays in that otherwise "stringly-typed" world.

aidenn0 · on Jan 16, 2020

As a POSIX shell masochist myself, Arrays are the single biggest missing feature. POSIX shells actually do have one array: $@. Having more would make my life so much easier.

The amount of gyrations I go through to account for the lack of arrays is easily 1000x worse than accounting for not having local variables.

commandlinefan · on Jan 16, 2020

What man page is that? On OS/X and redhat, man sh brings up the Bourne Shell man page.

JdeBP · on Jan 16, 2020

There really isn't a POSIX sh manual page; and you are conflating Bourne with Bourne Again on RedHat.

The closest that anyone comes is, I believe, OpenBSD. The ksh(1) manual page is the PD Korn shell manual. And the separate sh(1) manual page "describes only the features [of ksh] relevant to a POSIX shell".

On others, sh(1) is usually the manual page for one of the named shells.

Debian and Ubuntu's sh(1), for example, is the dash(1) for the Debian Almquist shell, which is famously not a POSIX sh, because it explicitly includes 3 extensions that Debian people could not bear to part with in their big project of over a decade ago to remove bashisms from package maintainer scripts and suchlike.

m4r35n357 · on Jan 16, 2020

If the default sh is set to bash (yuk!) you will need to use ash or dash to get the basic POSIX shell and man page. On my Xubuntu 18.04 system, "man sh" brings up the dash man page.

Thanks for nothing RedHat & Apple!

nixpulvis · on Jan 16, 2020

On my distribution (Arch Linux) the `sh` tool comes from the `bash` package, and is partially shared code I believe, though I'm unsure exactly how they are related at this point.

LukeShu · on Jan 17, 2020

On Arch, `/bin/sh` is a symlink to `bash`. When Bash is invoked with the name `sh`, it behaves closer to the historical Bourne shell. The differences between normal Bash behavior and the behavior when invoked as `sh`:

- It behaves as if `--posix` was given.

- It behaves as if `--norc` was given.

- If it looks for a user profile file (i.e. it is a login shell), then it only looks at ~/.profile (instead of the usual behavior of giving precedence to ~/.bash_profile then ~/.bash_login then ~/.profile)

- At startup, if $ENV is set, it `source "$ENV"`.

0xbadcafebee · on Jan 17, 2020

I've been using bash for nearly two decades, and I just found out that Bash is apparently the only shell whose built-in echo uses the -e option. I thought others used this too, but it seems every other shell's echo just implicitly interpolates escaped characters. POSIX actually says anything following '\' is undefined behavior. Apparently printf is the only portable way to interpolate escaped characters in output.

Here's a bunch of very useful tips like that: https://www.etalabs.net/sh_tricks.html

kohtatsu · on Jan 17, 2020

That resource is lovely, thank you.

RMPR · on Jan 21, 2020

> Second, in principle, there's nothing to enforce that a UNIX shell must have echo as a built-in, and therefore, it's important to have the external utility /bin/echo as a fallback.

I read here https://en.wikipedia.org/wiki/POSIX#Overview that echo has been standardized for POSIX, can you elaborate further ?

RMPR · on Jan 20, 2020

Amazing article, I like to understand the why behind things like that. Kudos to you.

ReedJessen · on Jan 16, 2020

This is a really good article and worth the read.