Sure, it's syntax is based on ideas from the 1960s[0] and it has a weird core library of functions[1]. Certainly writing code that is a blend of logic and managing other programs takes some getting used to. However, it is very well documented[2] and I've had great success with it over the past 20 years.
If one takes the time to get to know it, it is actually easy and fun to write scripts that are robust and easy to maintain.
I'd venture to say that bash is the putty in the little gaps of the internet. It is the fitting that glues programs to the various systems on which they run.
Assignment to hard-coded global variable is not a return mechanism, and generally a nonstarter. It's not a viable approach for generally returning strings out shell functions everywhere in a codebase as a matter of habit.
Use of eval: still slow, because the shell eval re-processes input from the character level and up. eval should be generally avoided as much as possible in shell programming. Careless use of eval can introduce security holes (piece of untrusted datum gets evaled as an expression). You really need to have your black belt in "shell escaping karate".
Producing output and capturing with command substitution is the primary idiom for getting text out of a shell function. It has no visible side effect. The rebinding of standard output is scoped to the process substitution (and is confined to the child process, in fact), and the creation of the temporary process and pipe, expensive as they might be, are invisible to the program semantics.
> The way one typically accomplishes this in bash
In summary, what you propose here is not only vanishingly atypical, but also bad coding practice.
> In summary, what you propose here is not only vanishingly atypical, but also bad coding practice.
What I'm describing has been SOP in bash for twenty years. You can program in a more modern style and replace some uses of eval with ${!indirect_references}, declare, and associative arrays but eval is still in wide use today. Search through /etc for eval with grep and you'll return plenty of results in the wild.
For example the eval in my previous post could be rewritten as the following
declare -g "$1"="result"
but this form, although safer, is the less common usage.
Assigning to global variables or passing in a variable name to get the return value is just how bash works. See $MAPFILE $OPTIND $OPTARG for examples of the former and read as an example of the latter.
That [declare -g] is bad too, because it means you can't use it to set a local variable (see my sibling comment about dynamic scoping), which is surprising and confusing to the caller.
While I agree with the parent that the whole thing is disgusting, I also recognize that it's a valid optimization technique, and it's sometimes necessary. So, if you do go down that route, I'd encourage you to do it as
Without disagreeing with the essence of what you said:
> Assignment to hard-coded global variable ...
Because Bash is dynamically-scoped, it's not necessarily global; it could be scoped to the calling function. For example:
setit() {
myvar=bar
}
myfunc() {
local myvar
setit
echo myfunc $myvar # will print: myfunc bar
}
myvar=foo
myfunc # will print: myfunc bar
echo global $myvar # will print: global foo
Most shell programmers do not know what "dynamic scope" is, lacking a Lisp background. Those familiar with lexical languages like C or Java will naively expect "local" to be lexical.
Unlike in dynamically scoped Lisps, in shell programming there isn't any widely recognized and applied naming convention to avoid accidental name capture due to dynamic scope (binding a variable, not knowing that a function which is then used accesses that as a global).
POSIX has no "local", and so makes no recommendation in this regard, nor sets any naming precedent for users programs to follow. It is more concerned with the separation between system variables (named with ALL_CAPS) and user variables (not so named).
Put these two together and you have a recipe for bugs.
The article mentions loops with an incrementing variable, e.g., i=$((i+1)). It is also possible to do this without loops. Is it faster. Left to the reader to decide. I used this technique sometimes so I could check on the progress of a script while it is running and also continue a script later where I left off. It also allowed me to stop the script by removing a file. At the time, the shell I used had not yet implemented LINENO, which is also very useful when scripts are terminated before finishing.
For example, if I wanted to "loop" 15 times^1
To begin, first create a file that stores the count.^2 It also acts as a way to stop the "loop" from advancing if it is removed.
echo 0 > x
Then create a file that acts as an "on/off switch" to stop the script, e.g., if it is not runnning in the foreground.
> x-on
Then run the script. The "prologue" and "epilogue" use only shell built-ins. No external programs or Bash-isms are required.^3
#! /bin/sh
CRAWL_DELAY=30;
test -f x||exit $LINENO;
test -f x-on||exit $LINENO;
read x < x;
test $x -le 14||exit $LINENO;
# do stuff;
echo https://example.com/$x;
sleep CRAWL_DELAY;
echo $((x+1)) >x;
test -f x-on||exit $LINENO;
test -f $0||exit $LINENO;
$0
To see progress,
cat x
To stop the script before it finishes,
rm x-on
To restart the script and continue where left off,
> x-on
and run the script.
1. Sometimes I did not have access to a program like seq or jot so something like this was not possible
for x in $(seq 15);do
echo https://example.com/$x;
done
2. I always work in a tmpfs-mounted directory so these "files" are just memory, they are not saved on "disk".
3. Bash has too many features for this author to keep track of, like, e.g., base conversion
I find it to be quite slow; I wrote a command-line arguments parser library using shell metaprogramming, and I had to precompile the output for bash, while in dash it was fast enough to run as-is.
I suspect that the slowness was caused by Bash having a couple of pathological cases where it converts back and forth between the LANG/LC encoding and internal wide-characters repeatedly, which can cause it to be stupidly slow. I suspect that setting LC_ALL=C for that portion of the code would speed Bash up to Dash speeds.
> [ ] is a command—basically another way to call the built-in test command.
Literally, '/usr/bin/[', which takes as an argument a final ']', but actually a different binary than '/usr/bin/test'. And is different from the bash builtin. Crazy shit.
vagrant@vagrant:~$ [ --version
-bash: [: missing `]'
vagrant@vagrant:~$ /usr/bin/[ --version
[ (GNU coreutils) 8.28
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Kevin Braunsdorf and Matthew Bradburn.
This seems like a very basic introduction to bash, which is fine, but I was hoping for something called "Understanding Bash" to help with some of the understanding about why it behaves so weirdly at times.
For me, the biggest problem when writing bash is when I need to do something just a little more complex than what is there, but there is no good way to do it. An example that usually trips me up is this: https://mywiki.wooledge.org/BashFAQ/050.
FWIW, the above link is probably the most useful thing I have found that actually helps me understand bash.
I've been writing a (hopefully one day) POSIX compatible shell using a parser generator library. Having to try and fit sh into a parser framework makes you make some choices, and it's a great way to find these weird things. For example, I was surprised at first to learn that `{ ls }` isn't a valid program. Or:
FOO=1 echo $FOO
# vs
FOO=1; echo $FOO
# vs
FOO=1 printenv FOO
> Note that the exit value of true is 0, and the exit value of false is 1. This is somewhat counterintuitive, and it's the exact opposite of most programming languages.
"Happy families are all alike; every unhappy family is unhappy in its own way."
It took me awhile to wrap my head around when I initially encountered it, but the idea made sense after I thought about it. If the process exited successfully, we don't care--it did what it was supposed to do. But if it failed, we'd probably like as much info about why it failed as possible; and so having the exit status be more like an error code clearly has some value.
The article is wrong though. For error codes, 0 being a success value is the CONVENTION. A non-zero code is an error. This isn't about "programming languages."
The problem is that for bash `true` is a function (or builtin) that returns 0, while for C (and thus most languages) any non-zero value (typically 1 for the builtins) is true. In c99 and later with stdbool.h, `true` is almost always defined as `#define true 1`.
It is far simpler to just learn the POSIX shell first - much shorter man page. The the Bash stuff is then just a few additions if you ever need them (other shells are available).
While doing so, I found a part of the specification that (as far as I can tell) no shell implements (sourcing my /etc/profile failed with my shell due to the difference): the specification blesses (at the time)[1] 20 utilities that should be run without regards to the PATH, but any other built-ins should not be run if the command does not exist in the path.
So, for example:
PATH=""
echo "Hello"
Should fail on every POSIX compliant shell. I haven't found a shell that implements echo as a builtin in which the above fails though.
1: Now it has a larger list for which it is unspecified what happens. See "Command search and execution" for the list. "echo" is not on the list though, so my example is still well-defined in POSIX yet breaks on all shells I tried.
Wow, someone actually did it! When I reported this non-conformance to the dash mailing-list, I first got pushback that I was wrong about the spec, and then once I convinced them I was right the response was something along the lines of "that's stupid why would we do that"
Indeed, and local is one of the few non-POSIX things that Debian people famously could not live without, and so is required to exist in all Debian shells that can be used as /bin/sh such as the Debian Almquist shell.
It is in both ash and dash, the shells I mentioned. Your attempt at point scoring is irrelevant but expected.
ash & dash are about 100kB executables (POSIX plus a tiny number of non-interactive improvments like 'local'), bash is 1100kB. Every shell and subshell.
My point is that POSIX shell purity is masochism. Understanding how it's the core and other shells build on top of it is certainly valuable, but it's not inherently particularly good to use.
Maybe we are at cross purposes. My default shell is bash, but my default "sh" is dash. This is as it should be (and is default in Debian & Ubuntu I think).
If you are talking about interactive use, of course bash is the one to use, but for writing shell scripts, it is a 10X interactivity overhead" over dash.
As a POSIX shell masochist myself, Arrays are the single biggest missing feature. POSIX shells actually do have one array: $@. Having more would make my life so much easier.
The amount of gyrations I go through to account for the lack of arrays is easily 1000x worse than accounting for not having local variables.
There really isn't a POSIX sh manual page; and you are conflating Bourne with Bourne Again on RedHat.
The closest that anyone comes is, I believe, OpenBSD. The ksh(1) manual page is the PD Korn shell manual. And the separate sh(1) manual page "describes only the features [of ksh] relevant to a POSIX shell".
On others, sh(1) is usually the manual page for one of the named shells.
Debian and Ubuntu's sh(1), for example, is the dash(1) for the Debian Almquist shell, which is famously not a POSIX sh, because it explicitly includes 3 extensions that Debian people could not bear to part with in their big project of over a decade ago to remove bashisms from package maintainer scripts and suchlike.
If the default sh is set to bash (yuk!) you will need to use ash or dash to get the basic POSIX shell and man page. On my Xubuntu 18.04 system, "man sh" brings up the dash man page.
On my distribution (Arch Linux) the `sh` tool comes from the `bash` package, and is partially shared code I believe, though I'm unsure exactly how they are related at this point.
On Arch, `/bin/sh` is a symlink to `bash`. When Bash is invoked with the name `sh`, it behaves closer to the historical Bourne shell. The differences between normal Bash behavior and the behavior when invoked as `sh`:
- It behaves as if `--posix` was given.
- It behaves as if `--norc` was given.
- If it looks for a user profile file (i.e. it is a login shell), then it only looks at ~/.profile (instead of the usual behavior of giving precedence to ~/.bash_profile then ~/.bash_login then ~/.profile)
I've been using bash for nearly two decades, and I just found out that Bash is apparently the only shell whose built-in echo uses the -e option. I thought others used this too, but it seems every other shell's echo just implicitly interpolates escaped characters. POSIX actually says anything following '\' is undefined behavior. Apparently printf is the only portable way to interpolate escaped characters in output.
> Second, in principle, there's nothing to enforce that a UNIX shell must have echo as a built-in, and therefore, it's important to have the external utility /bin/echo as a fallback.
Sure, it's syntax is based on ideas from the 1960s[0] and it has a weird core library of functions[1]. Certainly writing code that is a blend of logic and managing other programs takes some getting used to. However, it is very well documented[2] and I've had great success with it over the past 20 years.
If one takes the time to get to know it, it is actually easy and fun to write scripts that are robust and easy to maintain.
I'd venture to say that bash is the putty in the little gaps of the internet. It is the fitting that glues programs to the various systems on which they run.
0. https://en.wikipedia.org/wiki/ALGOL
1. https://en.wikipedia.org/wiki/POSIX
2. https://www.gnu.org/software/bash/manual/
edit: formatting