Tomasz Kramkowski

Option Parsing On a Budget

Recently I was writing a little code generation utility which took lots of positional arguments. I wanted to add two optional features to this utility, these options would take no arguments. I decided to use getopt but realised that this would make the code depend on POSIX, I liked the idea of staying dependency free so I quickly investigated really simple solutions for option parsing (without compromises) which would be equivalent to POSIX and GNU getopt.

The first iteration of the code used getopt, this was some pretty standard getopt code. Very portable to all systems which implement the basic POSIX getopt.

while (c = getopt(argc, argv, "dl"), c != -1) {
    switch (c) {
    case 'd': des_init = true; break;
    case 'l': comp_lit = true; break;
    case '?': usage();
    default: assert("Option not implemented" == NULL);
    }
}

My first attempt at replacing this code looked similar to the following:

int opti;
for (opti = 1; argv[opti] != NULL && argv[opti][0] == '-'; opti++) {
    if (strcmp(argv[opti], "--") == 0) {
        opti++;
        break;
    }
    for (const char *opt = &argv[opti][1]; *opt != '\0'; opt++) {
        switch (*opt) {
        case 'd': des_init = true; break;
        case 'l': comp_lit = true; break;
        default:
            fprintf(stderr, "%s unknown option -- %c\n", argv0, *opt);
            usage();
        }
    }
}

This replacement was POSIX getopt compliant in that it parsed options until it hit -- or until the first non-option argument. This replacement was twice as long as the getopt version but did meant that the code no longer relied on POSIX. The opti variable had the same purpose as optind in getopt style code.

I would have been happy with this version but I noticed that my program did not actually permit any arguments beginning with - and I was also up for the challenge. That being said, I don't think handling this is an essential feature.

The final version, after a few unreadable iterations ended up being only 21 lines long. This version handles mixed positional and optional arguments by relying on the C standard which allows modification of argv. Additionally, this version made code which followed it more readable than the getopt version. It really seems like a win win.

bool opts_end = false;
argc = 0;
for (int i = 1; argv[i] != NULL; i++) {
    if (opts_end || argv[i][0] != '-') {
        argv[argc++] = argv[i];
        continue;
    }
    if (strcmp(argv[i], "--") == 0) {
        opts_end = true;
        continue;
    }
    for (const char *opt = &argv[i][1]; *opt != '\0'; opt++) {
        switch (*opt) {
        case 'd': des_init = true; break;
        case 'l': comp_lit = true; break;
        default:
            fprintf(stderr, "%s: unknown option -- %c\n", argv0, *opt);
            usage();
        }
    }
}

That being said, implementing mixed options and non-options could be considered a misfeature. It can cause unexpected problems more often than it solves them. Additionally, although GNU getopt does this, modifying argv is considered by some to be a bit of a dirty trick. But, as mentioned before, positional arguments in this particular codebase could not start with a hyphen, and implementing this feature seemed like a fun task.

Obviously this code does not handle option arguments, that's because I didn't have a need for those. In the case that I needed option arguments I would likely have gone with arg.h or getopt.

As a final note, the code in this post is taken from a MIT licensed codebase, and should be published as part of pack shortly.