vsupalov

A Luigi Task Which Uses Multiple Outputs of Other Task

An easy way to combine similar data from multiple Luigi tasks.

January 9, 2016 [ luigi ]

If you want to have a single Luigi Task consume data from multiple other dependencies, you can simply do so by specifying multiple Tasks using a list in the requires() Task method. You can also provide parameters to each, and their output will be subsequently processed one after the other:

def requires(self):
    return [
        ATask(),
        AnotherTask(),
        OneMore(with_a_parameter)
    ]

An alternative would be to yield the required tasks, with the same effect but a slightly different syntax. Yields are also handy to require tasks in a more dynamic fashion.

def requires(self):
    yield ATask()
    yield AnotherTask()
    yield OneMore(with_a_parameter)

Make sure that you are able to process the outputs (which are similarly structured in the best case) of all tasks involved. In case you don’t want to use the data from the other tasks, but only want them to have finished beforehand, you should use the requires() method instead.

Want to be notified when I publish new content?

Just enter your name and email below. You will also get content that I share exclusively with the list. Zero spam!