Saturday, October 24, 2020

Generative Programming with templates in ARIA

 

Generative Programming with Templates In Aria

generative
/ˈdʒɛn(ə)rətɪv/
adjective
  1. 1. 
    relating to or capable of production or reproduction.
programming
/ˈprəʊɡramɪŋ/
noun
  1. 1. 
    the process or activity of writing computer programs.
template
/ˈtɛmpleɪt,ˈtɛmplət/

noun

     1. a shaped piece of rigid material used as a pattern for processes such as cutting out, shaping, or drilling.


From two of these definitions we could say a generative program is program that literaly gives birth to another program. So a generative programme at it's simplest its using one computer program to write another program which you then run. Thought you might think this sounds complicated and esoteric its quite often quite a useful technique that allows us to control other programs, decouple one program from another and run complex processes. In actual fact most programmers should be aware of the process as it is fundamental to how programs are built. For example when we write program in a compiled programming language we use another program, the compiler, to transform text into another file, the executable that we then run. In the case of python which is compiled and interpreted the interpreter coverts a python file <file>.py into a byte code file <file>.pyc which is then read and used to run the python interpreter. This avoids having to recompile each time the script is run if there is no change in the text of the .py file.

The picture below shows you the general case for two python interpreters. Python interpreter A takes a template and uses it along with some programming logic in the script generating_script.py to produce generated_script.py. The script generated_script.py is then run and  by interpreter B to produce some output.





So how would we do this in a simple python program, in this case one that produces a csh script? The easiest way is with strings, dictionaries and templates. Consider this python program.

Python-[template.py]:------------------------------------------------

#!/bin/python

template = '''

    #!/bin/csh


    echo the template parameter a was %(a)s

'''

parameters {'a' : 'wibble'}

text = template % parameters

with open ('my_script.csh','w') as file_handle:

    file_handle.write(text)

---------------------------------------------------------:EndPython

Here we create a string template in a variable called template and a dictionary of replacement text to go into the string template called parameters. We then combine the two using the % operator which replaces %(a)s in the template with the value defined by the string "a" in the dictionary parameters (value "wibble") giving us the final text which is then written to my_script.csh.

Csh-[my_script.csh]:-------------------------------------------------

    #!/bin/csh


    echo the template parameter a was wibble

--------------------------------------------------------------:EndCsh

Now if we now run myscript.csh by typing

csh my_script.csh

we get the following printed on the terminal

the template parameter a was wibble

Of course the generated program can be quite flexible, for example we could write a program that reads parameters from a file or command line and will do different things depending on what is in the file or command line, or even more esoteric things such as the status of a device or web page. Consider the following more complicated case from the real world, a toy implementation of the process used to generate a psf file (protein structure file) in the program ARIA which generates protein structures from NMR data. In this case we will reconsidering what would happen if ARIA used xplor-nih as it's structure generation program (currently it uses CNS or Yasara).





Multiple steps occur (note we are using a simplified version of how things work to make it clearer)

First ARIA reads a project file that kicks the whole process off. Then

1. ARIA writes a template pdb file which contains sequence data but no useable coordinates (all coordinates are 0.000 0.000 0.000). This is provided in this example in test_templates/data/sequence/hrdc.pdb

2. Aria writes run.json, this file contains parameters for all fixed data used by ARIA during multiple stages of calculation (structure generation and multiple rounds of structure calculation). An example would be something like

Json-[run.json]:-----------------------------------------------------

{
"data": {
"pdb_or_sequence": "PDB",
"initial_pdb" : "/Users/gst9/Dropbox/git/ariaxc/ariaxc/tests/test_templates/data/sequence/hrdc.pdb",
"initial_seq" : "/Users/gst9/Dropbox/git/ariaxc/ariaxc/tests/test_templates/data/hrdc.seq",



"filenames" : {

"project_root" : "/Users/gst9/Dropbox/git/ariaxc/ariaxc/tests/test_templates",
"xplor_root" : "/Users/gst9/programs/xplor-nih/2.51",
"file_root" : "hrdc"
}
}
}

-------------------------------------------------------------:EndJson

Note this contains all the information about where files can be found and where results should be put.

3. Aria generates a driver csh script that's going to run the structure generation engine xplor-nih using the python make_generate_template.py

Python-[make_generate_template.py]:----------------------------------

import json
import sys


#1.a assume we are in generate_template as working directory
with open('run.json') as fh:
json_data = json.load(fh)


project_root = json_data['data']['filenames']['project_root']
tmp_dir = project_root + '/tmp'
output_dr = tmp_dir + 'generate_template'

py_xplor = json_data['data']['filenames']['xplor_root'] + '/' + 'bin/pyXplor'

data = {
"py_xplor" : py_xplor,
"tmp_dir" : tmp_dir,
"project_root" : project_root,
}


template = """
# SGE facility
#$ -N generate_template
#$ -S /bin/csh

## results will be stored here
setenv NEWIT ./

## project path
setenv RUN %(project_root)s

## individual run.cns is stored here
setenv RUN_CNS %(tmp_dir)s

## CNS working directory
cd ${RUN_CNS}/generate_template

## solves some NFS sync problems
cat %(project_root)s/protocols/generate_template.py > /dev/null

## command line
%(py_xplor)s %(project_root)s/protocols/generate_template.py >! generate_template.out

touch done
""" % data

with open ('generate_template.csh', 'w') as out_file:
out_file.write(template)

-----------------------------------------------------------:EndPython

Again in our case we are going to pull parameters from run.json, though in reality the data is pulled from ARIA internal data structures (effectively the the file run1.xml and data distributed with the program such as forcefields) The script is parameterised by 

  1. the location of the xplor-nih distribution on the users computer (xplor_root)
  2. the location of the project directory (project_root)
  3. the root name for the files produced by the project (file_root)
All other required names and paths are then derived from these

4. Aria runs generate_template.csh, which changes to the directory  <project_root>/tmp/ generate_template finds the xplor-nih instance on the computer from <data.filenames.xplor_root>, and runs generate_template.py (which now gets its os.getcwd() as <project_root>/tmp/generate_template because generate_template.csh has changed to this directory using cd

Csh-[generate_template.csh]:-----------------------------------------

# SGE facility
#$ -N generate_template
#$ -S /bin/csh

## results will be stored here
setenv NEWIT ./

## project path
setenv RUN /Users/gst9/Dropbox/git/ariaxc/ariaxc/tests/test_templates

## individual run.cns is stored here
setenv RUN_CNS /Users/gst9/Dropbox/git/ariaxc/ariaxc/tests/test_templates/tmp

## CNS working directory
cd ${RUN_CNS}/generate_template

## solves some NFS sync problems
cat /Users/gst9/Dropbox/git/ariaxc/ariaxc/tests/test_templates/protocols/generate_template.py > /dev/null

## command line
/Users/gst9/programs/xplor-nih/2.51/bin/pyXplor /Users/gst9/Dropbox/git/ariaxc/ariaxc/tests/test_templates/protocols/generate_template.py >! generate_template.out

touch done

--------------------------------------------------------------:EndCsh

5. protocols/generate_template.py is run and reads run.json and makes the following decisions on how to run based on the data read. it knows where to find run.json because it is in the pwd retrieved using os.getcwd() (remember the pwd was set by generate_template.csh and is inherited from this parent process)

  1. to use a pdb file or a sequence file for structure generation (<data.pdb_or_sequence> in the json file)
  2. where the initial pdb or sequence file is stored  (<data.initial_pdb> in the json file)
  3. where the final data should be stored (<data.tmp> in the json file)
  4. what the root of the output file should be (<data.file_root> in the json file)
  5. where to find the pyXplor program (using <data.xplor_root> in the json file)
generate_template.py creates the psf file by running pdb2psf or seq2psf using the input files defined by run.json and putting it in <filenames.project_root>/xplor/begin/<data.file_root>.psf

 
Python-[generate_template.py]:---------------------------------------

 import json

import os
import subprocess

# 0. lets grab the starting directory froom the command line
root_directory = os.getcwd()
print(os.getcwd())

#1.a
with open(root_directory + '/' + 'run.json') as fh:
json_data = json.load(fh)

#1.b
pdb_or_sequence = json_data['data']['pdb_or_sequence']
xplor_root = json_data['data']['filenames']['xplor_root']
out_dir = json_data['data']['filenames']['project_root'] + '/xplor/begin'
psf_file_name = out_dir + '/' + json_data['data']['filenames']['file_root'] + '.psf'

#2 uses seq2psf or pdb2psf to create a psf file <fileroot>.psf in the begin directory in project_root
if pdb_or_sequence == 'PDB':
pdb_file = json_data['data']['initial_pdb']

pdb2psf = xplor_root + '/bin/pdb2psf'

subprocess.call([pdb2psf, pdb_file, '-outfile', psf_file_name])

elif pdb_or_sequence == 'SEQ':
sequence_file = json_data['data']['initial_sequence']

seq2psf = xplor_root + '/bin/seq2psf'

subprocess.call([seq2psf, sequence_file, '-outfile', psf_file_name])

else:
raise Exception('unexpected choise for sequence source file %(pdb_or_sequence)s' % {'pdb_or_sequence' : pdb_or_sequence})

--------------------------------------------------------------:Python













No comments: