vsupalov


Create Luigi Task Dependencies Without Having to Handle Their Output

How to use the _requires() function to create non-input dependencies for a Luigi task.

December 4, 2015 [ luigi ]

You are in a situation, where you have a Luigi task which already requires() data from another task, but should wait until a set of other tasks are done beforehands as well. No need to get on the hacky side of things!

The _requires() method can be overridden in a task, to help specify other tasks which should finish. However unlike the tasks listed in requires(), you will not have to handle the data in this case. Here is an example of using the _requires() function:

# this function is used to collapse a
# potentially multi-nested item collection (lists, ...)
from luigi import flatten

# in a Luigi Task based class
def _requires(self):
    # this method needs to return an iterable
    # which contains the _requires of the superclass
    return flatten([
        # a few tasks you'd like to run before this one but 
        # honestly don't want to use their output data here
        SomeTaskWhichYouWantToRunBeforeThisOne()
        SomeOtherTask(),
        YetAnotherTask(),
        # important! "CLASSNAME" needs to be adjusted
        super(CLASSNAME, self)._requires()
        ])

You can find the original function and documentation directly in the Luigi Github repository. The final call to the superclass _requires() function ensures that tasks which are specified in your usual requires() function are handled and nothing breaks in interesting ways.

Want to be notified when I publish new content?

Just enter your name and email below. You will also get content that I share exclusively with the list, and zero spam!