vsupalov

Will Using Luigi Limit Your Pipeline Performance and Language Choice Flexibility?

January 9, 2016

You are currently looking around for the best foundation for your company’s future data pipeline, and Luigi is looking like a solid choice. But it’s in Python and that’s also the language you have to write all the data processing code in, right? Python is a blast to develop in, but it’s not the fastest language around. Will it hold you back eventually? Does using Luigi mean that all the data crunching needs to happen in Python as well?

No reason to worry. You are not limiting yourself to a single language. While Luigi provides the means to implement data processing code right inside of its Tasks, you don’t have to. Think of it as something capable to save you effort and glue together parts of your pipeline which will be doing the actual heavy lifting. For once, there is a well supported focus on running Hadoop-ecosystem type of jobs as well as interacting with Spark which are more than suited to process huge amounts of data.

There are of course other ways. The simplest one, would be to use subprocesses to execute other programs from the Python code. You can either execute programs written in Scala, R, C, Julia, Lua, Ruby, JavaScript, Python3, or anything which can be wrapped in a bash script. Luigi will take care of orchestration issues, such as dependency resolution or alerting. Advanced approaches can involve the awesome power of containers. Using Docker, you can delegate any heavy lifting to code running inside a containerized environment, which in turn can be setup to support any programming language you might need. For an example of a very advanced Luigi-powered setup, check out this excellent talk by Ville Tuulos from AdRoll. He provides a tour of AdRoll’s data processing systems which use Luigi tasks to orchestrate Docker containers in the AWS cloud. With their setup, AdRoll are able to handle data at petabyte-scale while giving their developers a free choice of tools.

Join the mailing list!


Subscribe to get notified about future articles and stay in touch via email.

I write about Kubernetes, Docker, automation- and deployment topics, but would also like to keep you up to date about news around the business-side of things.

Privacy and your data: You can get more information about the usage of your data, the storage of your registration, sending out mails with the US-provider ConvertKit, statistical analysis of emails sent and your possibility to unsubscribe in my Privacy Policy.

I use the US-provider ConvertKit for email automation. By clicking to submit this form, you acknowledge that the information you provide will be transferred to ConvertKit for processing in accordance with their Privacy Policy and Terms.

We won't send you spam. Unsubscribe at any time. Powered by ConvertKit