summaryrefslogtreecommitdiffstats
path: root/content/posts/2021-07-29-option-parsing-on-a-budget.md
blob: 9fe6d7c8e0fe2159f59002aa6bf5e26d0f9d1719 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
$title "Option Parsing On a Budget"
$tags c information

Recently I was writing a little code generation utility which took lots of
positional arguments. I wanted to add two optional features to this utility,
these options would take no arguments. I decided to use `getopt` but realised
that this would make the code depend on POSIX, I liked the idea of staying
dependency free so I quickly investigated really simple solutions for option
parsing (without compromises) which would be equivalent to POSIX and GNU
`getopt`.

$pre

The first iteration of the code used `getopt`, this was some pretty standard
`getopt` code. Very portable to all systems which implement the basic POSIX
`getopt`.

```.c
while (c = getopt(argc, argv, "dl"), c != -1) {
    switch (c) {
    case 'd': des_init = true; break;
    case 'l': comp_lit = true; break;
    case '?': usage();
    default: assert("Option not implemented" == NULL);
    }
}
```

My first attempt at replacing this code looked similar to the following:

```.c
int opti;
for (opti = 1; argv[opti] != NULL && argv[opti][0] == '-'; opti++) {
    if (strcmp(argv[opti], "--") == 0) {
        opti++;
        break;
    }
    for (const char *opt = &argv[opti][1]; *opt != '\0'; opt++) {
        switch (*opt) {
        case 'd': des_init = true; break;
        case 'l': comp_lit = true; break;
        default:
            fprintf(stderr, "%s unknown option -- %c\n", argv0, *opt);
            usage();
        }
    }
}
```

This replacement was POSIX `getopt` compliant in that it parsed options until it
hit `--` or until the first non-option argument. This replacement was twice as
long as the `getopt` version but did meant that the code no longer relied on
POSIX. The `opti` variable had the same purpose as `optind` in getopt style
code.

I would have been happy with this version but I noticed that my program did not
actually permit any arguments beginning with `-` and I was also up for the
challenge. That being said, I don't think handling this is an essential feature.

The final version, after a few unreadable iterations ended up being only 21
lines long. This version handles mixed positional and optional arguments by
relying on the C standard which allows modification of `argv`. Additionally,
this version made code which followed it more readable than the getopt version.
It really seems like a win win.

```.c
bool opts_end = false;
argc = 0;
for (int i = 1; argv[i] != NULL; i++) {
    if (opts_end || argv[i][0] != '-') {
        argv[argc++] = argv[i];
        continue;
    }
    if (strcmp(argv[i], "--") == 0) {
        opts_end = true;
        continue;
    }
    for (const char *opt = &argv[i][1]; *opt != '\0'; opt++) {
        switch (*opt) {
        case 'd': des_init = true; break;
        case 'l': comp_lit = true; break;
        default:
            fprintf(stderr, "%s: unknown option -- %c\n", argv0, *opt);
            usage();
        }
    }
}
```

That being said, implementing mixed options and non-options could be considered
a misfeature. It can cause unexpected problems more often than it solves them.
Additionally, although GNU `getopt` does this, modifying `argv` is considered by
some to be a bit of a dirty trick. But, as mentioned before, positional
arguments in this particular codebase could not start with a hyphen, and
implementing this feature seemed like a fun task.

Obviously this code does not handle option arguments, that's because I didn't
have a need for those. In the case that I needed option arguments I would likely
have gone with `arg.h` or `getopt`.

As a final note, the code in this post is taken from a MIT licensed codebase,
and should be published as part of [pack](https://the-tk.com/cgit/pack) shortly.