Argument parsing

SQLTrack includes argument parsing functions based on docopt-ng, a fork of the original docopt that is actively maintained.

Instead of writing an argument parser in code, docopt parses help texts in POSIX syntax to know what arguments exist, whether they are switches or parameters, etc. For our purposes the help texts are extracted from docstrings in the main script file. This means we don’t need to run the script to obtain its arguments, so we can add them to the database even if the run has not started yet, e.g., because it is in the queue of a batch scheduling system.

Here’s a simple example:

from sqltrack.args import docopt_main

@docopt_main
def main(args):
    """
    usage: example [options] [--learning-rate N...]

    options:
        -h --help                Print help text.
        --model M                Which model to train [default: resnet18]
        -e N, --epochs N         Number of training epochs [default: 90]
        -b N, --batch-size N     Mini-batch size
        -l R, --learning-rate R  Learning rate  [default: 0.1]
        --amp                    Use AMP (Automatic Mixed Precision)
    """
    print(args)
    print(args.epochs)

# This will run main a second time with AMP forced on and set epochs to 360.
# You should not do this in practice, since docopt_main already calls the
# main function, but this is only a silly example anyways.
main({"amp": True, "epochs": 360})

Our main function is decorated with docopt_main, which parses the command line arguments defined in the docstring and immediately calls main (with the usual if __name__ == "__main__" guard). For the sake of completeness we also call main({"amp": True, "epochs": 360}) ourselves, which is a bit silly, since you would normally run the main function only once, but illustrates how to call the decorated function from code.

$ python examples/argument_parsing.py -e 180
{'amp': False,
 'batch_size': None,
 'epochs': 180,
 'help': False,
 'learning_rate': ['0.1'],
 'model': 'resnet18'}
180
{'amp': True,
 'batch_size': None,
 'epochs': 360,
 'help': False,
 'learning_rate': ['0.1'],
 'model': 'resnet18'}
360

The output tells us that the args object passed to main is a dictionary. More precisely it is of type ParsedOptions, a dictionary subclass that can be accessed via attributes, like you would with an argparse.Namespace object. In the example above, we print the batch size with print(args.batch_size).

Warning

One caveat of docopt_main is that it immediately calls the decorated function, so it must be defined after everything else in your script. If this is not something you want, you can use docopt_arguments instead and add the if __name__ == "__main__" guard yourself as usual. It does the same thing, but doesn’t call the decorated function.

Argument types

POSIX help texts do not define types for arguments, so docopt simply returns parsed values as strings. While this is 100% safe, it is quite annoying to use in practice and challenges the main reason why we opted to use docopt in the first place: to parse arguments without running code.

We opted to include a – what we believe to be – reasonable mechanism to guess types in SQLTrack. First, we try a suffix match of the argument name with a number of explicit conversion functions. If all these fail we finally try to convert values to number types (integer, float, complex) and finally JSON.

Here’s an overview of all conversions that are attempted by default:

The final trial and error stage is fixed. While you can append new conversions (or replace existing ones, without changing the order) with sqltrack.args.register_conversion(), we recommend you don’t, as your changes to the conversion logic cannot be replicated without running your code.

Hint

You can use suffix matching to avoid edge cases. E.g., to avoid the conversion of --version 3.0 to float, use --versionstr 3.0 instead.

Limitations

Multiple values

Many argument parsers (like argparse) allow arguments with multiple values. One argument with three values could be represented on the command line as --arg 1 2 3. Docopt does not support this, as it always expects argument-value pairs if the argument is not a simple switch.

You can instead specify that an argument may be repeated in the usage part of the help text like so:

Usage: example [options] [--arg VALUE...]

The equivalent command line would then be --arg 1 --arg 2 --arg 3. Values for repeatable arguments are passed as lists, even if there is only one value.

Another alternative that avoids repeating the argument name is to use the conversion from JSON built into SQLTrack, e.g., --arg [1, 2, 3]. In this case you should not specify the argument as repeatable, or else the result would be a nested list [[1, 2, 3]].