diff options
Diffstat (limited to 'content/posts')
-rw-r--r-- | content/posts/2021-07-29-option-parsing-on-a-budget.md | 102 |
1 files changed, 102 insertions, 0 deletions
diff --git a/content/posts/2021-07-29-option-parsing-on-a-budget.md b/content/posts/2021-07-29-option-parsing-on-a-budget.md new file mode 100644 index 0000000..9fe6d7c --- /dev/null +++ b/content/posts/2021-07-29-option-parsing-on-a-budget.md @@ -0,0 +1,102 @@ +$title "Option Parsing On a Budget" +$tags c information + +Recently I was writing a little code generation utility which took lots of +positional arguments. I wanted to add two optional features to this utility, +these options would take no arguments. I decided to use `getopt` but realised +that this would make the code depend on POSIX, I liked the idea of staying +dependency free so I quickly investigated really simple solutions for option +parsing (without compromises) which would be equivalent to POSIX and GNU +`getopt`. + +$pre + +The first iteration of the code used `getopt`, this was some pretty standard +`getopt` code. Very portable to all systems which implement the basic POSIX +`getopt`. + +```.c +while (c = getopt(argc, argv, "dl"), c != -1) { + switch (c) { + case 'd': des_init = true; break; + case 'l': comp_lit = true; break; + case '?': usage(); + default: assert("Option not implemented" == NULL); + } +} +``` + +My first attempt at replacing this code looked similar to the following: + +```.c +int opti; +for (opti = 1; argv[opti] != NULL && argv[opti][0] == '-'; opti++) { + if (strcmp(argv[opti], "--") == 0) { + opti++; + break; + } + for (const char *opt = &argv[opti][1]; *opt != '\0'; opt++) { + switch (*opt) { + case 'd': des_init = true; break; + case 'l': comp_lit = true; break; + default: + fprintf(stderr, "%s unknown option -- %c\n", argv0, *opt); + usage(); + } + } +} +``` + +This replacement was POSIX `getopt` compliant in that it parsed options until it +hit `--` or until the first non-option argument. This replacement was twice as +long as the `getopt` version but did meant that the code no longer relied on +POSIX. The `opti` variable had the same purpose as `optind` in getopt style +code. + +I would have been happy with this version but I noticed that my program did not +actually permit any arguments beginning with `-` and I was also up for the +challenge. That being said, I don't think handling this is an essential feature. + +The final version, after a few unreadable iterations ended up being only 21 +lines long. This version handles mixed positional and optional arguments by +relying on the C standard which allows modification of `argv`. Additionally, +this version made code which followed it more readable than the getopt version. +It really seems like a win win. + +```.c +bool opts_end = false; +argc = 0; +for (int i = 1; argv[i] != NULL; i++) { + if (opts_end || argv[i][0] != '-') { + argv[argc++] = argv[i]; + continue; + } + if (strcmp(argv[i], "--") == 0) { + opts_end = true; + continue; + } + for (const char *opt = &argv[i][1]; *opt != '\0'; opt++) { + switch (*opt) { + case 'd': des_init = true; break; + case 'l': comp_lit = true; break; + default: + fprintf(stderr, "%s: unknown option -- %c\n", argv0, *opt); + usage(); + } + } +} +``` + +That being said, implementing mixed options and non-options could be considered +a misfeature. It can cause unexpected problems more often than it solves them. +Additionally, although GNU `getopt` does this, modifying `argv` is considered by +some to be a bit of a dirty trick. But, as mentioned before, positional +arguments in this particular codebase could not start with a hyphen, and +implementing this feature seemed like a fun task. + +Obviously this code does not handle option arguments, that's because I didn't +have a need for those. In the case that I needed option arguments I would likely +have gone with `arg.h` or `getopt`. + +As a final note, the code in this post is taken from a MIT licensed codebase, +and should be published as part of [pack](https://the-tk.com/cgit/pack) shortly. |