Create Luigi Task Dependencies Without Having to Handle Their Output
You are in a situation, where you have a Luigi task which already requires() data from another task, but should wait until a set of other tasks are done beforehands as well. No need to get on the hacky side of things!
The _requires() method can be overridden in a task, to help specify other tasks which should finish. However unlike the tasks listed in requires(), you will not have to handle the data in this case. Here is an example of using the _requires() function:
# this function is used to collapse a # potentially multi-nested item collection (lists, ...) from luigi import flatten # in a Luigi Task based class def _requires(self): # this method needs to return an iterable # which contains the _requires of the superclass return flatten([ # a few tasks you'd like to run before this one but # honestly don't want to use their output data here SomeTaskWhichYouWantToRunBeforeThisOne() SomeOtherTask(), YetAnotherTask(), # important! "CLASSNAME" needs to be adjusted super(CLASSNAME, self)._requires() ])
You can find the original function and documentation directly in the Luigi Github repository. The final call to the superclass _requires() function ensures that tasks which are specified in your usual requires() function are handled and nothing breaks in interesting ways.
Join the mailing list!
Subscribe to get notified about future articles and stay in touch via email.
I write about Kubernetes, Docker, automation- and deployment topics, but would also like to keep you up to date about news around the business-side of things.