Argument parsing¶
SQLTrack includes argument parsing functions based on docopt-ng, a fork of the original docopt that is actively maintained.
Instead of writing an argument parser in code, docopt parses help texts in POSIX syntax to know what arguments exist, whether they are switches or parameters, etc. For our purposes the help texts are extracted from docstrings in the main script file. This means we don’t need to run the script to obtain its arguments, so we can add them to the database even if the run has not started yet, e.g., because it is in the queue of a batch scheduling system.
Here’s a simple example:
from sqltrack.args import docopt_main
@docopt_main
def main(args):
"""
usage: example [options] [--learning-rate N...]
options:
-h --help Print help text.
--model M Which model to train [default: resnet18]
-e N, --epochs N Number of training epochs [default: 90]
-b N, --batch-size N Mini-batch size
-l R, --learning-rate R Learning rate [default: 0.1]
--amp Use AMP (Automatic Mixed Precision)
"""
print(args)
print(args.epochs)
# This will run main a second time with AMP forced on and set epochs to 360.
# You should not do this in practice, since docopt_main already calls the
# main function, but this is only a silly example anyways.
main({"amp": True, "epochs": 360})
Our main function is decorated with docopt_main
, which parses the command line
arguments defined in the docstring and immediately calls main
(with the usual if __name__ == "__main__"
guard). For
the sake of completeness we also call main({"amp": True,
"epochs": 360})
ourselves, which is a bit silly, since you would
normally run the main function only once, but illustrates how to
call the decorated function from code.
$ python examples/argument_parsing.py -e 180
{'amp': False,
'batch_size': None,
'epochs': 180,
'help': False,
'learning_rate': ['0.1'],
'model': 'resnet18'}
180
{'amp': True,
'batch_size': None,
'epochs': 360,
'help': False,
'learning_rate': ['0.1'],
'model': 'resnet18'}
360
The output tells us that the args
object passed to main
is a dictionary. More precisely it is of type ParsedOptions, a dictionary
subclass that can be accessed via attributes, like you would with
an argparse.Namespace
object. In the example above,
we print the batch size with print(args.batch_size)
.
Warning
One caveat of docopt_main
is that it immediately calls
the decorated function, so it must be defined after
everything else in your script. If this is not something you
want, you can use docopt_arguments
instead and add the
if __name__ == "__main__"
guard yourself as usual.
It does the same thing, but doesn’t call the decorated
function.
Argument types¶
POSIX help texts do not define types for arguments, so docopt simply returns parsed values as strings. While this is 100% safe, it is quite annoying to use in practice and challenges the main reason why we opted to use docopt in the first place: to parse arguments without running code.
We opted to include a – what we believe to be – reasonable mechanism to guess types in SQLTrack. First, we try a suffix match of the argument name with a number of explicit conversion functions. If all these fail we finally try to convert values to number types (integer, float, complex) and finally JSON.
Here’s an overview of all conversions that are attempted by default:
name
matches*int
→int
name
matches*float
→float
name
matches*complex
→complex
name
matches*path
→pathlib.Path
name
matches*json
→json.loads()
name
matches*str
→str
try
int
try
float
try
complex
try
json.loads()
The final trial and error stage is fixed. While you can append
new conversions (or replace existing ones, without changing the
order) with sqltrack.args.register_conversion()
, we
recommend you don’t, as your changes to the conversion logic
cannot be replicated without running your code.
Hint
You can use suffix matching to avoid edge cases.
E.g., to avoid the conversion of --version 3.0
to float,
use --versionstr 3.0
instead.
Limitations¶
Multiple values¶
Many argument parsers (like argparse
) allow arguments
with multiple values. One argument with three values could be
represented on the command line as --arg 1 2 3
. Docopt does
not support this, as it always expects argument-value pairs if
the argument is not a simple switch.
You can instead specify that an argument may be repeated in the usage part of the help text like so:
Usage: example [options] [--arg VALUE...]
The equivalent command line would then be --arg 1 --arg 2 --arg
3
. Values for repeatable arguments are passed as lists, even if
there is only one value.
Another alternative that avoids repeating the argument name is to
use the conversion from JSON built into SQLTrack, e.g., --arg
[1, 2, 3]
. In this case you should not specify the argument as
repeatable, or else the result would be a nested list
[[1, 2, 3]]
.