Django setup with rdkit for chemoinformatics studies

, by Stéphane

Although Rdkit and Django installation and deployment are well documented independently, it may become rapidly difficult to a newcomer to get a simple rdkit + django implementation working using a real apache backend. This article will detail how they can be implemented and used for a small demonstration application.

Django setup

We will first create our django sample application using virtualenv and pip.

virtualenv venv
source venv/bin/activate
pip install django==3.2.20

Now that the minimum requirements are met, it is time to start a new project called chemoinformaticsDemo.

django-admin startproject chemoinformaticsDemo

This will install django files and a subdirectory called again chemoinformaticsDemo where the projects settings will have to be updated. For now we can keep it as is.

At this point we only need some dirs that will be filled in later.

cd chemoinformaticsDemo
mkdir static
mkdir chemoinformaticsDemo/templates

There is now the default infrastructure for the django project (without rdkit calls, see below). In order to get a proper django setup, we need to initalize the database for django models, mostly for authentication management at this step.

This will install django files and a subdirectory called again chemoinformaticsDemo where the projects settings will have to be updated. For now we can keep it as is.

At this point we only need some dirs that will be filled in later.

cd chemoinformaticsDemo
mkdir static
mkdir chemoinformaticsDemo/templates

There is now the default infrastructure for the django project (without rdkit calls, see below). In order to get a proper django setup, we need to initalize the database for django models, mostly for authentication management at this step.

python manage.py makemigrations
python manage.py migrate

It you are really impatient, you can already start your application in the developper mode using:

python manage.py runserver

To see the result you can fire up your browser and point to the location

http://localhost:8000

To get something more specific, you can add a template in chemoinformaticsDemo/templates, for instance called base.html.

chemoinformaticsDemo/templates/base.html

<html lang="en">
<head>
   <meta  charset="utf-8"> 
</head>
<body>
This is the starting page of the project.
   <div id="content">
       {% block content %}{% endblock content %}
   </div>
</body>
</html>

This page will be displayed via the urls mapping in django, so you have to specify a simple urls.py and views.py, as exemplified after.

chemoinformaticsDemo/urls.py
from django.conf.urls import patterns, include, url
from django.conf import settings

from django.contrib import admin
admin.autodiscover()

urlpatterns = (
   url(r'^$','chemoinformaticsDemo.views.home', name='home'),
   url(r'^admin/', include(admin.site.urls)),
)

The corresponding views.py will be called from this url redirection.


chemoinformaticsDemo/views.py
from django.shortcuts import render
from django.conf import settings

def home(request):
    return render(request, 'home.html')

If you refresh the previously opened http page, you should now get a small message welcoming you:



This is the starting page of the project.

It is now time to add some content to the project, and especially to link rdkit with django.

RdKit installation (http://www.rdkit.org)

This installation was performed on Ubuntu 14.04.4 LTS 64bits.

First download the latest rdkit tarball. ( https://sourceforge.net/projects/rdkit/files/latest/download )
Decompress the archive, enter the rdkit directory and create a build directory, then use default options, except for the rdkit installation location:


cd /opt
tar -zxvf RDKit_2016_03_1.tgz
cd rdkit-Release_2016_03_1
mkdir build
cd build
sudo cmake  ..
sudo make install

You will need sudo / root rights to write to the /opt directory.

It is important to have rdkit paths available system-wide so Apache will also be able to find it properly. The easiest way is to indicate to the system where it should search for rdkit libraries. This is done by adding a file in the linker directory, in /etc/ld.so.conf.d/ (https://howtolamp.com/articles/adding-shared-libraries-to-system-library-path/). We can create a simple file called rdkit.conf, and update the dynamic loader cache to have rdkit included properly. In short:


sudo echo "/opt/rdkit-Release_2016_03_1/lib" > /etc/ld.so.conf.d/rdkit.conf
sudo ldconfig

If you want to be sure the library search now contains the RdKit path, just grep for it.


grep -i rdkit /etc/ld.so.cache
Binary file /etc/ld.so.cache matches

You also need that rdkit is available for python programs out of you django environment.


Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path.append('/opt/RDkit_2016_03_1')
>>> from rdkit import Chem
>>> quit()

Please note that the /lib directory is not present in the sys.path.append command because python bindings are present at the RDkit root.

Testing rdkit inside django

Now that everything is in place, we can test a simple but functionnal application. First we create a dedicated application called rdkitDemo (and not simply rdkit otherwise there will be namespace collisions).


python manage.py startapp rdkitDemo
mkdir rdkitDemo/templates

You need to edit the chemoinformaticsDemo/settings.py to add to "INSTALLED_APPS" rdkitDemo (using any text editor).
Now django has to take into account this new application, so migrate existing tables.


python manage.py makemigrations
python manage.py migrate

Up-to-now since no modifications were made to the models (and therefore to the tables) there should be no error messages and no migrations detected.
To enrich you django project, we will add new urls to take into account rdkitDemo and add a sample html file for showing a simple demonstration of rkdit.


update to chemoinformaticsDemo/urls.py
from django.conf.urls import patterns, include, url
from django.conf import settings

from django.contrib import admin
admin.autodiscover()

urlpatterns = (
   url(r'^$','essai.views.home', name='home'),
   url(r'^admin/', include(admin.site.urls)),
   {{url(r'^rdkit/', include('rdkitDemo.urls')),}}
)

The corresponding entry in the new application involves its own urls.py file.


rdkitDemo/urls.py
from django.conf.urls import patterns, include, url
from django.contrib import admin 

urlpatterns = patterns('',
   url(r'^demo', 'rdkitDemo.views.simple_example', name='rdkit_simple_example'),
)

We can now set up a small demonstration of what can be accomplished with rdkit, for instance computing the 2D coordinates of beta galactose from its SMILES description in pubchem, and optimize its geometry in 3D (https://pubchem.ncbi.nlm.nih.gov/compound/beta-D-galactose). The generated molecule will be transformed into a 2D plot stored on the disk.

To store the PNG file, add a subdirectory to the static directory seen above.


mkdir static/demo
chmod 777 static/demo

Important Note: the chmod will be used later, this is not mandatory for now and should also NOT be done on a production server...

Now that everything is in place, we can add the necessary files in rdkitDemo.


rdkitDemo/views.py
from django.shortcuts import render
from django.conf import settings

from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit import DataStructs
from rdkit.Chem.Fingerprints import FingerprintMols
from rdkit.Chem import Draw

from PIL import Image
# Create your views here.

def simple_example(request):
   smiles_data='C([C@@H]1[C@@H]([C@@H]([C@H]([C@@H](O1)O)O)O)O)O'
   m = Chem.MolFromSmiles(smiles_data) # Code SMILES issu de pubchem
   # data=Chem.MolToMolBlock(m) # Unused, not pretty to display (yet)
   # 3D conversion and optimization
   m2=Chem.AddHs(m)
   AllChem.EmbedMolecule(m2)
   AllChem.MMFFOptimizeMolecule(m2) # more complex, better?
   # AllChem.UFFOptimizeMolecule(m) # faster, less stronger :-)
   Draw.MolToFile(m,settings.BASE_DIR+'/static/demo/test.png')
   toto=FingerprintMols.FingerprintMol(m) # not used
   return render(request,'demo.html', {'data':smiles_data})


To display the result of the view, a new template is created, based on the base.html file already available.


rdkitDemo/templates/demo.html
{% extends "base.html" %}

{% load static %}

{% block title %}Simple rdkit Demo{% endblock title %}

{% block content %}

Welcome to this demo project for the molecule described by {{ data }}.<br />

<img src="{% static 'demo/test.png' %}" alt="Image générée" />

<footer style="text-align:right">
<a href="{% url 'home' %}">Back</a>
</footer>

{% endblock content %}

If everything worked perfectly until here, you can start again your django project in developper mode, pay attention to update your manage.py file to extend the system path for rdkit. Pay also attention to spaces above if you copy/paste the code since python uses spaces (or abs) to indent the code and determine logical blocks.


manage.py
#!/usr/bin/env python
import os
import sys

if __name__ == "__main__":
   os.environ.setdefault("DJANGO_SETTINGS_MODULE", "essai.settings")
   sys.path.append('/opt/RDkit_2016_03_1')

   from django.core.management import execute_from_command_line

   execute_from_command_line(sys.argv)

You should now get a working rdkit + django installation in the developper mode for django.


(venv)python manage.py runserver

Reminder: we are still in the virtualenvironment created at the beginning. If you get errors about PIL not found, just add it using pip, and do the same for any missing dependencies.


pip install Pillow

Only one step is needed to migrate this small application to a real server available for the community.

Migration to apache

If everything is is place, you can now configure apache to use your virtual environment and rkdit. Assuming the virtual environment was stored in /home/stephane/example, one needs to write the following apache configuration file.


/etc/apache2/sites-available/rdkitDemo.conf
<VirtualHost *:80>

   ServerName localhost
   ServerAlias localhost

   # Very important, see https://code.google.com/archive/p/modwsgi/wikis/ApplicationIssues.wiki#Python_Simplified_GIL_State_API
   WSGIApplicationGroup %{GLOBAL}

   # https://docs.djangoproject.com/en/dev/howto/deployment/wsgi/modwsgi/

   WSGIDaemonProcess essaiRdkit python-path=/home/stephane/example/chemoinformaticsDemo:/opt/RDkit_2016_03_1/:/home/stephane/example/chemoinformaticsDemo/django1.8/lib/python2.7/site-packages display-name=essaiRdkit
   WSGIProcessGroup essaiRdkit
   WSGIScriptAlias /essai '/home/stephane/example/chemoinformaticsDemo/chemoinformaticsDemo//wsgi.py'

   <Directory /home/stephane/example/chemoinformaticsDemo/>
       WSGIProcessGroup essaiRdkit
       <Files wsgi.py>
           Options +ExecCGI -MultiViews +FollowSymLinks
           AllowOverride None
           Require all granted
       </Files>
   </Directory>

   Alias /static /home/stephane/example/chemoinformaticsDemo/static/

   <Location /static/>
       WSGIProcessGroup essaiRdkit
       Options -Indexes +FollowSymLinks
       Require all granted
   </Location>

   # Pour séparer les bons logs de l'ivraie

   ErrorLog ${APACHE_LOG_DIR}/djangoprojects-error.log
   CustomLog ${APACHE_LOG_DIR}/djangoprojects-access.log combined
   
</VirtualHost>

Remember that we opened the door for apache writing with the chmod 777 to the demo subdir before, THIS HAS TO BE CHANGED using a proper location for www-data user access...

Everything is now in place, activate the new virtual apache environment and reload apache configuration.


udo a2ensite rdkitDemo
sudo service apache2 reload

You should now have a great django web site to play with, just integrate your rdkit applications!
To ensure everything is up and running, open your brower to the location


http://localhost/essai/

In case you will be in trouble, check apache logs in /var/log/apache2/djangoprojects-error.log, they are isolated to better identify mistakes and / or copy/pasting errors.
Enjoy.