Will Using Luigi Limit Your Pipeline Performance and Language Choice Flexibility?
No. You will be able to handle large amounts of data, using whichever tools you need.
You are currently looking around for the best foundation for your company’s future data pipeline, and Luigi is looking like a solid choice. But it’s in Python and that’s also the language you have to write all the data processing code in, right? Python is a blast to develop in, but it’s not the fastest language around. Will it hold you back eventually? Does using Luigi mean that all the data crunching needs to happen in Python as well?
No reason to worry. You are not limiting yourself to a single language. While Luigi provides the means to implement data processing code right inside of its Tasks, you don’t have to. Think of it as something capable to save you effort and glue together parts of your pipeline which will be doing the actual heavy lifting. For once, there is a well supported focus on running Hadoop-ecosystem type of jobs as well as interacting with Spark which are more than suited to process huge amounts of data.