You’ve built your Flask web app and are working on deploying the site - either on Heroku or on your own VPS of choice. It’s your first, small app and you kinda expected that setting debug to False on the app.run should be enough. Maybe enable threaded too?
You really shouldn’t rely on that. The official docs disagree as well.
While lightweight and easy to use, Flask’s built-in server is not suitable for production as it doesn’t scale well and by default serves only one request at a time.
What now? Well, no neeed to be confused. All is fine, you just need to understand what the Flask development web server is meant for, what it lacks and what to use instead.
Flask’s Built-In Web Server
The built-in Flask web server is provided for development convenience.
With it you can make your app accessible on your local machine without having to set up other services and make them play together nicely. However, it is only meant to be used by one person at a time, and is built this way. It can also serve static files, but does so very slowly compared to tools which are built to do it quickly. This does not matter when only one person is accessing it, so it’s perfect for what it is meant for.
When running a web app in production, you want it to be able to handle multiple users and many requests, without those fine people having to wait noticeable amounts of time for the pages and static files to load.
A Production Stack
A production setup usually consists of multiple components, each designed and built to be really good at one specific thing. They are fast, reliable and very focused.
Communication with the whole thing, as in the case of the built-in web server, happens via HTTP. A request comes in and arrives at the first component - a dedicated web server. It is great at reading static files from disk (your css files for example) and handling multiple requests. When a request is not a static file, but meant for your all it gets passed on down the stack.
The application server gets those fancy requests and converts the information from them into Python objects which are usable by frameworks. How this is supposed to happen is described by a specification people agreed on - WSGI.
Your Flask app does not actually run as you would think a server would - waiting for requests and reacting to them. It can be seen as a function which is called by the application server, being provided the request object.
The output of running your app is then packaged up into a HTTP response by the application server and passed back to the web server to be delivered back to the user.
Which Means
If you want to run Flask in production, be sure to use a production-ready web server like Nginx, and let your app be handled by a WSGI application server like Gunicorn.
If you plan on running on Heroku, a web server is provided implicitly. You just need to specify a command to run the application server (again, Gunicorn is fine) in the Procfile.