Python 3, Bottle.py, Apache and WSGI - Import Nightmares

At some point in a webdev’s life, one might consider moving away from classic web development in for example PHP, and move on to more convenient frameworks. If you already know a language, learning another one just for web development can also seem like a waste of time. Alternatives might be more elegant, flexible, and better integrated with the things you’re currently writing. I found myself in the position of having to demo my work on author profiling that was completely written in Python 3 (omesa + sklearn). The problem with framework switching, I found, is learning how to do frustratingly arbitrary operations all over again. Moreover, however sluggish LAMP set-ups might seem, they are very robust, well-understood and have plenty of documentation. Changing to more obscure environments like I did with bottle.py can shovel you in the face at any point. It’s been a bumpy road to say the least, so to help any pursuers of this path in the future, I hereby present you with my findings thus far.

Bottle.py

Bottle.py is a very minimalistic web- framework for Python. It is so small that delivering hello world to a port is literally this:

"""file: helloworldinabottle.py"""
from bottle import route, run

@route('/')
def index():
    return "hello world"

run(host='localhost', port=8080)

It works with all your existing code and libraries that Python has. So say that you want to test an already trained SVM classifier with bigram features on data fed by a query it can be as much as:

"""file: some_ml_example.py"""
from bottle import route, run
import sklearn
import pickle

clf = pickle.load(open('/somedir/model.pickle', 'rb'))
big = pickle.load(open('/somedir/bigram_vectorizer.pickLe', 'rb'))

@route('/<query>')
def index(query):
    v = big.transform([query]).toarray()
    return clf.predict(v)[0]

run(host='localhost', port=8080)

That’s right, no POST, no GET, you can directly feed a string to localhost if you desire (you can definitely also use POST and GET). And with these few lines of code I was completely blown away by the elegance of the framework. Everything pretty much peachy, until you actually want to implement your app in an existing web server environment. Chances are huge that this will be in Apache. Traditional protocols for Apache do not handle these kind of apps, though, and so WSGI was brought to life.

Apache & WSGI

Without filling this post with the specifics and motivations behind WSGI, let’s immediately jump into how to configure Bottle.py and Apache to run this stuff. First we need to do a bit of work in our some_ml_example.py that had the bottle code. After renaming it to app.wsgi it should contain the following:

"""file: app.wsgi"""
import os, sys
import bottle
from beaker.middleware import SessionMiddleware

os.chdir(os.path.dirname(__file__))
sys.path.append(os.path.dirname(__file__))

application = SessionMiddleware(bottle.app())

clf = pickle.load(open('/somedir/model.pickle', 'rb'))
big = pickle.load(open('/somedir/bigram_vectorizer.pickLe', 'rb'))

@bottle.route('/<query>')
def index(query):
    v = big.transform([query]).toarray()
    return clf.predict(v)[0]

So obviously we are not serving to a port anymore. Apache will pick up the app and serve it to wherever you wish. The path part is important as the directory orientation changes when everything is loaded into Apache, so your new app still needs to know where all the files are at. Notice that we’ve also added a dependency; beaker. Beaker does… Now that our file is done, we need to configure Apache to fetch it. In Linux, this is done as follows:

cd /etc/apache2/sites-available
nano yourappname.conf

Around the web, I found the tendency for people to explain that adding a bottle application to Apache requires the following config:

WSGIDaemonProcess yourapp user=you group=somegroup processes=1 threads=5
<Directory /home/directory/to/webserver/html/appdir/app.wsgi>
  WSGIProcessGroup somegroup
  WSGIApplicationGroup %{GLOBAL}
  Order deny,allow
  Allow from all
</Directory>

If all is well, and mod_wsgi is installed and loaded into Apache, at least the route("/") we used in helloworldinabottle.py should be working after an Apache restart or reload. Now, however that may be true and may or not work for everyone, it was working fine for my previous websites. Not for this one, though.

The WSGI Deadlock

When I started to work on the demo, and tried to load an example similar to some_ml_example.py that included import sklearn, the bottle page where it was loaded would hang up. Just loading forever, and the Apache error log would not report anything. After searching all the possible query combinations of “apache bottle import module page hangs should I quit building websites starting now”, I found one particular page that described part the issue on stackoverflow. Case in point is what’s called a deadlock; when applications are not assigned to a system-wide application group, they are considered to be an instance of Apache. Now, Apache doesn’t like doing any heavy lifting in terms of loading of interpreting, caching and running processes; that’s why most of that stuff is capped to a maximum amount of CPU and Memory load. This is why imports of very large packages take forever and will probably never resolve in any time that users of your website would want to wait. Now, the issue should actually already be resolved! After all, we included the WSGI Applictiongroup to be %{GLOBAL}, which should do processing on the local system. Apparently that’s not working very well now is it? I carried on looking up what might be the issue, when I struck upon another stackoverflow post that mentioned specifying the Process and Applicationgroup separately in newer versions of mod_wsgi. The changes required for this are again done in the /etc/apache2/sites-enabled/yourapp.wsgi, as follows:

WSGIDaemonProcess yourapp user=you group=somegroup processes=1 threads=5
WSGIApplication /yourapp /dir/to/app/app.wsgi process-group=yourapp application-group=%{GLOBAL}
<Directory /home/directory/to/webserver/html/appdir/app.wsgi>
  WSGIProcessGroup %{GLOBAL}
  WSGIApplicationGroup %{GLOBAL}
  Order deny,allow
  Allow from all
</Directory>

Now after a good sudo service apache2 restart you application should be working as expected! Hope this helped!