summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--content/posts/2021-07-29-option-parsing-on-a-budget.md102
1 files changed, 102 insertions, 0 deletions
diff --git a/content/posts/2021-07-29-option-parsing-on-a-budget.md b/content/posts/2021-07-29-option-parsing-on-a-budget.md
new file mode 100644
index 0000000..9fe6d7c
--- /dev/null
+++ b/content/posts/2021-07-29-option-parsing-on-a-budget.md
@@ -0,0 +1,102 @@
+$title "Option Parsing On a Budget"
+$tags c information
+
+Recently I was writing a little code generation utility which took lots of
+positional arguments. I wanted to add two optional features to this utility,
+these options would take no arguments. I decided to use `getopt` but realised
+that this would make the code depend on POSIX, I liked the idea of staying
+dependency free so I quickly investigated really simple solutions for option
+parsing (without compromises) which would be equivalent to POSIX and GNU
+`getopt`.
+
+$pre
+
+The first iteration of the code used `getopt`, this was some pretty standard
+`getopt` code. Very portable to all systems which implement the basic POSIX
+`getopt`.
+
+```.c
+while (c = getopt(argc, argv, "dl"), c != -1) {
+ switch (c) {
+ case 'd': des_init = true; break;
+ case 'l': comp_lit = true; break;
+ case '?': usage();
+ default: assert("Option not implemented" == NULL);
+ }
+}
+```
+
+My first attempt at replacing this code looked similar to the following:
+
+```.c
+int opti;
+for (opti = 1; argv[opti] != NULL && argv[opti][0] == '-'; opti++) {
+ if (strcmp(argv[opti], "--") == 0) {
+ opti++;
+ break;
+ }
+ for (const char *opt = &argv[opti][1]; *opt != '\0'; opt++) {
+ switch (*opt) {
+ case 'd': des_init = true; break;
+ case 'l': comp_lit = true; break;
+ default:
+ fprintf(stderr, "%s unknown option -- %c\n", argv0, *opt);
+ usage();
+ }
+ }
+}
+```
+
+This replacement was POSIX `getopt` compliant in that it parsed options until it
+hit `--` or until the first non-option argument. This replacement was twice as
+long as the `getopt` version but did meant that the code no longer relied on
+POSIX. The `opti` variable had the same purpose as `optind` in getopt style
+code.
+
+I would have been happy with this version but I noticed that my program did not
+actually permit any arguments beginning with `-` and I was also up for the
+challenge. That being said, I don't think handling this is an essential feature.
+
+The final version, after a few unreadable iterations ended up being only 21
+lines long. This version handles mixed positional and optional arguments by
+relying on the C standard which allows modification of `argv`. Additionally,
+this version made code which followed it more readable than the getopt version.
+It really seems like a win win.
+
+```.c
+bool opts_end = false;
+argc = 0;
+for (int i = 1; argv[i] != NULL; i++) {
+ if (opts_end || argv[i][0] != '-') {
+ argv[argc++] = argv[i];
+ continue;
+ }
+ if (strcmp(argv[i], "--") == 0) {
+ opts_end = true;
+ continue;
+ }
+ for (const char *opt = &argv[i][1]; *opt != '\0'; opt++) {
+ switch (*opt) {
+ case 'd': des_init = true; break;
+ case 'l': comp_lit = true; break;
+ default:
+ fprintf(stderr, "%s: unknown option -- %c\n", argv0, *opt);
+ usage();
+ }
+ }
+}
+```
+
+That being said, implementing mixed options and non-options could be considered
+a misfeature. It can cause unexpected problems more often than it solves them.
+Additionally, although GNU `getopt` does this, modifying `argv` is considered by
+some to be a bit of a dirty trick. But, as mentioned before, positional
+arguments in this particular codebase could not start with a hyphen, and
+implementing this feature seemed like a fun task.
+
+Obviously this code does not handle option arguments, that's because I didn't
+have a need for those. In the case that I needed option arguments I would likely
+have gone with `arg.h` or `getopt`.
+
+As a final note, the code in this post is taken from a MIT licensed codebase,
+and should be published as part of [pack](https://the-tk.com/cgit/pack) shortly.