Wednesday, February 1, 2012

Changing Python search path - for good. Or: the magic of the 'site' package.

After using Python for about two years now and being a somewhat active developer, I still frequently run into problems with my Python search path.
Luckily, I usually have root on the boxes I work on, so I could do some hacks.

At the moment, I'm at the IST Austria, where I can use the cluster, but I don't have root. So this time I needed a real solution.

Here it goes:
The problem is the following: I have some locally installed packages, like scikit-learn and joblib, that are not globally installed. That would be easy enough to solve by setting the PYTHONPATH environment variable in my .profile, pointing to the install location.

But I also have newer versions of already installed packages, like IPython. Here, modifying the Python path environment variable doesn't help, as this is appended to the search path. A somewhat hacky solution is to insert your package dir into the search path at the beginning of each script, like so:
import sys
sys.path.insert(0, "MYPATH")
That is somewhat nasty, as you have to insert it into every file. Also, I found it gives you trouble when trying to use parallel computing.

So here is the REAL solution:
*drumroll*
Check out the site package. It tells you how to configure "sites", which are places where Python looks for packages.
Additionally, "site" directories can hold ".pth" files. Those are files that tell Python where to look for additional packages.
You can add sites by using site.addsidedir, but each user also has a standard site, on Linux it is
~/.local/lib/pythonX.Y/site-packages
You can also get this dir by looking at site.USER_SITE What I did was create a link at
~/.local/lib/pythonX.Y/site-packages
to point at my local install location.

As not all my packages are installed there (the things that are git checkouts and build inplace), I also added a .pth file there, that points to my other directories.

To make this a bit more concrete:

My locally installed packages are in ~/python_packages/lib/python2.6/site-packages/. So I
$ ln -s ~/.local/lib/python2.6/site-packages/ ~/python_packages/lib/python2.6/site-packages/
And created a new .pthfile
~/python_packages/lib/python2.6/site-packages/local.pth
containing:
import sys; sys.__plen = len(sys.path)
/clusterhome/amueller/checkout/joblib
/clusterhome/amueller/checkout/scikit-learn
./IPython
import sys; new=sys.path[sys.__plen:]; del sys.path[sys.__plen:]; \
p=getattr(sys,'__egginsert',0); sys.path[p:p]=new; sys.__egginsert = p+len(new)
The first and last line are taken from the easy_install.pth and they basically insert the path to the front. (Only lines starting with import are executed in .pth files The other lines add directories to the path. You can check whether you where successful in IPython:
In [1]: import sys

In [2]: sys.path
Out[2]: 
['',
 '/clusterhome/amueller/python_packages/bin',
 '/clusterhome/amueller/.local/lib/python2.6/site-packages/pyflakes-0.5.0-py2.6.egg',
 '/clusterhome/amueller/checkout/joblib',
 '/clusterhome/amueller/checkout/scikit-learn',
 '/clusterhome/amueller/.local/lib/python2.6/site-packages/IPython',
 '/usr/local/lib/python2.6/dist-packages/scikit_learn-0.9-py2.6-linux-x86_64.egg',
 '/usr/lib/python2.6',
 '/usr/lib/python2.6/plat-linux2',
 '/usr/lib/python2.6/lib-tk',
 '/usr/lib/python2.6/lib-old',
 '/usr/lib/python2.6/lib-dynload',
 '/clusterhome/amueller/.local/lib/python2.6/site-packages',
 '/usr/local/lib/python2.6/dist-packages',
 '/usr/lib/python2.6/dist-packages',
 '/usr/lib/python2.6/dist-packages/PIL',
 '/usr/lib/pymodules/python2.6',
 '/usr/lib/pymodules/python2.6/gtk-2.0',
 '/usr/lib/python2.6/dist-packages/wx-2.8-gtk2-unicode',
 '/clusterhome/amueller/.local/lib/python2.6/site-packages/IPython/extensions']

Success :) Hope that helped any one!

6 comments:

  1. I feel virtualenv is somewhat overkill for this simple problem. I just want one environment, which is mine.

    ReplyDelete
    Replies
    1. That is true. But wouldn't a virtualenv be easier to backup and transfer to other machines?

      Delete
  2. I think using virtualenv has a lot of advantages.

    The first on top of my head would be the ability to replicate your environment very quickly, using a pip freeze to get all the packages, and then upgrade only the one you need to try, without loosing your current working environment.

    Virtualenwrapper makes also super easy to create a new virtualenv, and switch between them.

    However you knew the package already, so maybe I'm missing something here!

    ReplyDelete
    Replies
    1. Maybe I should use virtualenv. To be honest, I never really used it. I try to keep things simple and just adding some directories to my search path was all I wanted ;)

      Delete