- cross-posted to:
- [email protected]
- cross-posted to:
- [email protected]
Regex
Edit: to everyone who responded, I use regex infrequently enough that the knowledge never really crystalizes. By the time I need it for this one thing again, I haven’t touched it in like a year.
No. Learn it properly once and you’re good. Also it’s super handy in vim.
interns gonna intern
Don’t let the gatekeepers keep you out. This site helps.
Nice! This is the one I use: https://regexr.com/
Though it appears to be very similar on the face of it.Chatgpt helps even more
I know that LLMs are probably very helpful for people who are just getting started, but you will never understand it if you can’t grasp the fundamentals. Don’t let “AI” make you lazy. If you do use LLMs make sure you understand the output it’s giving you enough to replicate it yourself.
This may not be applicable to you specifically, but I think this is nice info to have here for others.
I have no interest in learning regex ever in my life, I have better things to dedicate my brain capacity to haha
Most of regex is pretty basic and easy to learn, it’s the look ahead and look behind that are the killers imo
(?=)
for positive lookahead and(?!)
for negative lookahead. Stick a<
in the middle for lookbehind.
You get used to it, I don’t even see the code—I just see: group… pattern… read-ahead…
You always forget regex syntax?
I’ve always found it simple to understand and remember. Even over many years and decades, I’ve never had issues reading or writing simple regex syntax (excluding the flags and shorthands) even after long regex breaks.
It’s not about the syntax itself, it’s about which syntax to use. There are different ones and remembering which one is for which language is tough.
This is exactly it. Regex is super simple. The difficulty is maintaining a mental mapping between language/util <-> regex engine <-> engine syntax & character class names. It gets worse when utils also conditionally enable extended syntaxes with flags or options.
The hardest part is remembering whether you need to use
\w
or[:alnum:]
.Way too few utils actually mention which syntax they use too. Most just say something accepts a “regular expression”, which is totally ambiguous.
This is exactly it. Regex is super simple. The difficulty is maintaining a mental mapping between language/util <-> regex engine <-> engine syntax & character class names. It gets worse when utils also conditionally enable extended syntaxes with flags or options.
The hardest part is remembering whether you need to use
\w
or[:alnum:]
.Way too few utils actually mention which syntax they use too. Most just say something accepts a “regular expression”, which is totally ambiguous.
I give you that, true. I wish vim had PCRE
There is the “very magic” mode for vim regexes. It’s not the exact PCRE syntax, but it’s pretty close. You only need to add \v before the expression to use it. There is no permanent mode / option though. (I think you can remap the commands, like / to /\v)
This is exactly it. Regex is super simple. The difficulty is maintaining a mental mapping between language/util <-> regex engine <-> engine syntax & character class names. It gets worse when utils also conditionally enable extended syntaxes with flags or options.
The hardest part is remembering whether you need to use
\w
or[:alnum:]
.Way too few utils actually mention which syntax they use too. Most just say something accepts a “regular expression”, which is totally ambiguous.
This is exactly it. Regex is super simple. The difficulty is maintaining a mental mapping between language/util <-> regex engine <-> engine syntax & character class names. It gets worse when utils also conditionally enable extended syntaxes with flags or options.
The hardest part is remembering whether you need to use
\w
or[:alnum:]
.Way too few utils actually mention which syntax they use too. Most just say something accepts a “regular expression”, which is totally ambiguous.
For me I spent one hour of ADHD hyper focusing to get the gist of regex. Python.org has good documentation. It’s been like 2 years so I’ve forgotten it too lol.
twitch
I just use the regex101 site. I don’t need anything too complicated ever. Has all the common syntax and shows matches as you type. Supports the different languages and globals.
This is one of the best uses for LLM’s imo. They do all my regex for me.
So true. Every time I have to look up how to write a bash for loop. Where does the semicolon go? Where is the newline? Is it terminated with
done
? Or withend
? The worst part with bash is that when you do it wrong, most of the time there is no error but something completely wrong happens.It all makes sense when you think about the way it will be parsed. I prefer to use newlines instead of semicolons to show the blocks more clearly.
for file in *.txt do cat "$file" done
The
do
anddone
serve as the loop block delimiters. Such as{
and}
in many other languages. The shell parser couldn’t know where stuff starts/ends.Edit: I agree that the
then
/fi
,do
/done
case
/esac
are very inconsistent.Also to fail early and raise errors on uninitialized variables, I recommend to add this to the beginning of your bash scripts:
set -euo pipefail
Or only this for regular sh scripts:
set -eu
-e
: Exit on error-u
: Error on access to undefined variable-o pipefail
: Abort pipeline early if any part of it fails.There is also
-x
that can be very useful for debugging as it shows a trace of every command and result as it is executed.set -euo pipefail
Fun fact, if you’re forced to write against POSIX shell, you aren’t allowed to use these options, since they’re not a thing, which is (part of) the reason why for example Google doesn’t allow any shell language but bash, lol.
Btw, all three set options given above are included in POSIX since 2024: https://pubs.opengroup.org/onlinepubs/9799919799/
Ooh, you’re totally right!! I forgot about that since it’s not in the older versions.
I can only remember this because I initially didn’t learn about
xargs
— so any time I need to loop over something I tend to usefor var in $(cmd)
instead ofcmd | xargs
. It’s more verbose but somewhat more flexible IMHO.So I run loops a lot on the command line, not just in shell scripts.
Knowing that there is still a bash script i wrote around 5 years ago still running the entirety of my high scool lab makes me sorry for the poor bastard that will need to fix those hieroglyphs as soon as some package breaks the script. I hate that i used bash, but it was the easiest option at the time on that desolate server.
Bash scripts survive because often times they are the easiest option on an abandoned server
I mastered and forgot almost entirely RegEx several times now
deleted by creator
Finite rules for perfectly sifting infinite options
deleted by creator
“mastered”
Mastered as in I could teach it to others, and assemble many complicated rules for many complicated patterns.
I am always impressed with folks that retain it.
I would a ton of it for a month or two, and then do nothing with it again for more than a year or more.
It takes a lot for permanent burn-in for me. That’s the curse. The blessing is that I learn very quickly.
Ever since I switched to Fish Shell, I’ve had no issues remembering anything. Ported my entire catalogue of custom scripts over to fish and everything became much cleaner. More legible, and less code to accomplish the same things. Easier argument parsing, control structures, everything. Much less error prone IMO.
Highly recommend it. It’s obviously not POSIX or anything, but I find that the cost of installing fish on every machine I own is lower than maintaining POSIX-compliant scripts.
Enjoy your scripting!
If you’re going to write scripts that requires installing software, might as well use something like python though? Most Linux distros ship also ship with python installed
A shell script can be much more agile, potent, and concise, depending on the use case.
E.g. if you want to make a facade (wrapper) around a program, that’s much cleaner in
$SHELL
. All you’re doing is checking which keyword/command the user wanted, and then executing the commands associated with what you want to achieve, like maybe displaying a notification and updating a global environment variable or something.Executing a bunch of commands and chaining their output together in python is surely much more cumbersome than just typing them out next to each other separated by a pipe character. It’s higher-level. 👍
If it’s just text in text out though, sure, mostly equivalent, but for me this is rarely the use case for a script.
I’m not anti bash or fish, I’ve written in both just this week, but if we’re talking about readability/syntax as this post is about, and you want an alternative to bash, I’d say python is a more natural alternative. Fish syntax is still fairly ugly compared to most programming languages in my opinion.
Different strokes for different folks I suppose.
Fish syntax is still fairly ugly compared to most programming languages in my opinion.
subprocess.run(["fd", "-t", "d", "some_query"])
vs
fd -t d some_query
Which is cleaner? Not to mention if you want to take the output from the command and pipe it into another one.
It’s not about folks with weird opinions or otherwise, it’s about use cases. 🙂 I don’t think python is any more “natural” than most other imperative languages.
Fish is probably even more natural, actually, due to it being more high level and the legibility of the script is basically dependent on the naming of the commands and options and variables used within it, rather than something else, just like python. They probably have similarly legible keywords. Fish I imagine has fewer, which is a good thing for legibility. A script does a lot more with a lot less, due to the commands themselves doing so much behind the scenes. There’s a lot more boilerplate to a “proper” programming language than a scripting language.
But if you want to do something that python is better suited for, like advanced data processing or number crunching, or writing a whole application, then I would say that would be the better choice. It’s not about preference for me when it comes to python vs fish, it’s about the right tool for the job. But if we’re talking about bash vs fish, then I’m picking fish purely by preference. 👍
I’ve been meaning to check out
fish
. Thanks for the reminder!Happy adventuring! ✨
I wish I could but since I use bash at work (often on embedded systems so no custom scripts or anything that isn’t source code) I just don’t want to go back and forth between the two.
Yeah, using one tool and then another one can be confusing at times. 😅
I switched to fish a while back, but haven’t learned how to script in it yet. Sounds like I should learn
Give it a shot after reading through the manual! (Extremely short compared to bash’s!) It’s a joy in my opinion. ☺️👌
It’s the default on CachyOS and I’ve been enjoying it. I typically use zsh.
Yeah I also went bash -> zsh -> fish. Zsh was just too complicated to configure for my taste. Couldn’t do it, apart from copy pasting stuff I didn’t understand myself, and that just didn’t sit right.
I love fish but sadly it has no proper equivalent of
set -e
as far as I know.; or return;
in every line is not a solution.
Me with powershell. I’ll write a pretty complex script, not write powershell for 3 months, come back and have to completely relearn it.
And I thought I was the only one… for smaller bash scripts chatGPT/Deepseek does a good enough job at it. Though I still haven’t tried VScode’s copilot on bash scripts. I have only tried it wirh C code and it kiiiinda did an ass job at helping…
AI does decently enough on scripting languages if you spell it out enough for it lol, but IMO it tends to not do so well when it comes to compiled languages
I’ve tried Python with VScode Copilot (Claude) and it did pretty good
Yeah I tried that, Claude with some C code. Unfortunately the Ai only took me from point A to point A. And it only took a few hours :D
That’s because scripted languages are more forgiving in general.
I was chalking it up to some scripting languages just tending to be more popular (like python) and thus having more training data for them to draw from
But that’s a good point too lol
Both can be true, Python does have a lot of examples floating online.
There’s always the old piece of wisdom from the Unix jungle: “If you write a complex shellscript, sooner or later you’ll wish you wrote it in a real programming language.”
I wrote a huge PowerShell script over the past few years. I was like “Ooh, guess this is a resume item if anyone asks me if I know PowerShell.” …around the beginning of the year I rewrote the bloody thing in Python and I have zero regrets. It’s no longer a Big Mush of Stuff That Does a Thing. It’s got object orientation now. Design patterns. Things in independent units. Shit like that.
I consider python a scripting language too.
They’re all programming languages, they all have their places.
All scripting languages are programming languages but not all programming languages are scripting languages
I use it for scripting too. I don’t need Python as much as before nowaday.
I initially read “UNIX jungle” as “UNIX jingle” and thought I had been really missing out!
You have, look up the SuSE songs.
Today I tried to write bash (I think)
I grabbed a bunch of commands, slapped a bunch of “&&” to string them together and saved them to a .sh file.
It didn’t work as expected and I did not, at all, look at any documentation during the process. (This is obviously on me, I’ll try harder next time)
Remember to make the .sh file executable with chmod +x
I try to remember to use man when learning a new command/program. And I almost always half-ass it and press the search button immediately to find whatever flag i need.
PSA: Run ShellCheck on your shell scripts. It turns up a shocking number of programming errors. https://www.shellcheck.net/
Thank you for this. About a year ago I came across ShellCheck thanks to a comment just like this on Reddit. I also happened to be getting towards the end of a project which included hundreds of lines of shell scripts across dozens of files.
It turns out that despite my workplace having done quite a bit of shell scripting for previous projects, no one had heard about Shell Check. We had been using similar analysis tools for other languages but nothing for shell scripts. As you say, it turned up a huge number of errors, including some pretty spicy ones when we first started using it. It was genuinely surprising to see how many unique and terrible ways the scripts could have failed.
I wish it had a more comprehensive auto correct feature. I maintain a huge bash repository and have tried to use it, and it common makes mistakes. None of us maintainers have time to rewrite the scripts to match standards.
Then you’ll have to find the time later when this leads to bugs. If you write against bash while declaring it POSIX shell, but then a random system’s
sh
doesn’t implement a certain thing, you’ll be SOL. Or what exactly do you mean by “match standards”?I honestly think autocorrecting your scripts would do more harm than good. ShellCheck tells you about potential issues, but It’s up to you to determine the correct behavior.
For example, how could it know whether
cat $foo
should becat "$foo"
, or whether the script actually relies on word splitting? It’s possible that$foo
intentionally contains multiple paths.Maybe there are autofixable errors I’m not thinking of.
FYI, it’s possible to gradually adopt ShellCheck by setting
--severity=error
and working your way down to warnings and so on. Alternatively, you can add one-off#shellcheck ignore SC1234
comments before offending lines to silence warnings.For example, how could it know whether
cat $foo
should becat "$foo"
, or whether the script actually relies on word splitting? It’s possible that$foo
intentionally contains multiple paths.Last time I used ShellCheck (yesterday funnily enough) I had written
ports+=($(get_elixir_ports))
to split the input sinceget_elixir_ports
returns a string of space separated ports. It worked exactly as intended, but ShellCheck still recommended to make the splitting explicit rather than implicit.The ShellCheck docs recommended
IFS=" " read -r -a elixir_ports <<< "(get_elixir_ports)" ports+=("${elixir_ports[@]}")
Bash substitution is regex-level wizardry.
Slapping a $ before an environment variable name is “wizardry?”
interns be interning
Not quite that, but more the
${variable##.*}
sort of thing.Back then, a pain in the ass. Nowadays, I just let an AI handle that. I used this crap for years and years and still cannot remember, which symbols you need in which order. And why should I remember? I’m not the computer. The computer should know, not me.
Right, so that’s just the string manipulation functions. I already posted a link to the bible for this following a different reply to the same comment to which you replied.
Nope, the whole
${variable/regex/replacement}
syntaxAll the string manipulation functions are easy: https://tldp.org/LDP/abs/html/string-manipulation.html
This one is my bookmark
Every time I have to do this I always go here. I can never remember the prefix suffix parts next time I do parameter substitution.
i used powershell, and even after trying every other shell and as a die hard Linux user I’ve considered going back to powershell cause damn man
Yeah. The best way to write any
bash
script is:apt/yum install PowerShell; pwsh script.ps1
I am a huge fan of using PowerShell for scripting on Linux. I use it a ton on Windows already and it allows me to write damn near cross-platform scripts with no extra effort. I still usually use a Bash or Fish shell but for scripting I love being able to utilize powershell.
Maybe applies more to regex, the write only language.
Back when I did a lot of Perl, those were okay-ish to parse. Nowadays, not so much. I guess it’s like Bash. If you write a lot of it (maybe some people do), it’s probably simple. If it’s only once every six months or less, eeehhh…
It all boils down to familiarity, which comes from repetitiveness.The copy paste language. AI writes better regex than I do
and you won’t get better if you use ai for it
Meh I rarely use it. Even if I don’t use AI I wouldn’t get better at it, since I will forget everything the next time I will use it.
Bash was the first language I learned, got pretty decent at it. Now what happens is I think of a tiny script I need to write, I start writing it in Bash, I have to do string manipulation, I say fuck this shit and rewrite in Python lol
This. But Pandas and Numpy.
Pandas and Numpy and Bash.
.loc and .iloc queries are a fun syntax adventure every time
Oh my!