Saturday, September 18, 2021

Hiding code in IPython notebooks on binder

Binder is a really neat tool to put ipython notebooks on the web so anyone else can see them and play with them. All you do is put a link to a repository (most probably github but there are other alternatives ...

However if you upload a code heavy ipython notebook you are greated by something that looks like this: This isn't great for an interactive teaching environment, so how do you hide the the python code but still retain it for those who are curious. There are lots of answers and I tried a few before I found a reasonable result. So here is my best solution, there is a notebooks extension called appmode which works really well as you can go from this

to this

however it requires some mucking about with your binder urls and converting to a conda based code flow. So... 1. add an environment.yml to the root of your github, it should contain trhe packages you use plus appmode, this will download files from conda I have

channels:
  - conda-forge
dependencies:
  - appmode
  - numpy
  - scipy
  - ipywidgets
  - matplotlib

2. Then change the format of your binder badge, I started with

1	[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/varioustoxins/fft_demos/HEAD)

and had to alter it to

1	[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/varioustoxins/fft_demos/HEAD?urlpath=apps%2Ffft_window.ipynb)

note how the end of the url becomes HEAD?urlpath=apps%2Ffft_window.ipynb rather than head HEAD and apps%2F is inserted between the urlpath= and the path to of your ipython notebook (fft_window.ipynb in my case) Thats it, the notebook I used this on provides interactive NMR processing demos checkit out on github. More of that another time when its finished...

Saturday, June 19, 2021

The NEF Header

NEF (NMR Exchange Format¹) files have headers (one per file) that define which programs wrote the file and it’s history. However, there are a few things that are not clear

Here’s the header

save_nef_nmr_meta_data
   _nef_nmr_meta_data.sf_category      nef_nmr_meta_data
   _nef_nmr_meta_data.sf_framecode     nef_nmr_meta_data
   _nef_nmr_meta_data.format_name      nmr_exchange_format
   _nef_nmr_meta_data.format_version   1.1
   _nef_nmr_meta_data.program_name     NEFPipelines
   _nef_nmr_meta_data.program_version  0.0.1
   _nef_nmr_meta_data.creation_date    2021-06-19T17:36:39.073848
   _nef_nmr_meta_data.uuid             NEFPipelines-2021-06-19T17:36:39.073848-9006508160

   loop_
      _nef_run_history.run_number
      _nef_run_history.program_name
      _nef_run_history.program_version
      _nef_run_history.script_name

     1   NEFPipelines   0.0.1   header.py    

   stop_

save_

Please note firstly the entries sf_category and sf_framecode these are mandetory for the frame to be recognised.

The first frame that that isn’t clear in its format is _nef_nmr_meta_data.creation_date, however this appears to be a isoformat date time, and the mostr reasonable decision is that this is a UTC ² date time as there is no time zone information and this is unique worldwide. The simple way yom ake this in python is

from datetime import datetime
utc_date_time = datetime.now().isoformat()

The second question is what is the _nef_nmr_meta_data.uuid tag? This is a UUID³ which uniquely identifies this version of the file apart form any other⁴. This has the form: NEFPipelines-2021-06-19T17:36:39.073848-9006508160. The first part is obvious its our programmes name and the second part is the current time. However, whats the third part 9006508160 well its just a 10 digit random number to ensure that the uuid is unique (think of creating the file at the same time on multiple threads…without the random number they would all have the same Universally Unique Identifier!

from random import randint
from datetime import datetime

utc_date_time = datetime.now().isoformat()
random_value = ''.join(["{}".format(randint(0, 9)) for num in range(10)])
uuid = f'NEFPipelines-{utc_date_time}-{random_value}'

Finally there is the loop

   loop_
      _nef_run_history.run_number
      _nef_run_history.program_name
      _nef_run_history.program_version
      _nef_run_history.script_name

     1   NEFPipelines   0.0.1   header.p    

   stop_

This just lists the programs that have editied the file in order… lastest to oldest.

So a complete header would be

data_new

save_nef_nmr_meta_data
   _nef_nmr_meta_data.sf_category      nef_nmr_meta_data
   _nef_nmr_meta_data.sf_framecode     nef_nmr_meta_data
   _nef_nmr_meta_data.format_name      nmr_exchange_format
   _nef_nmr_meta_data.format_version   1.1
   _nef_nmr_meta_data.program_name     NEFPipelines
   _nef_nmr_meta_data.program_version  0.0.1
   _nef_nmr_meta_data.creation_date    2021-06-19T17:36:39.073848
   _nef_nmr_meta_data.uuid             NEFPipelines-2021-06-19T17:36:39.073848-9006508160

   loop_
      _nef_run_history.run_number
      _nef_run_history.program_name
      _nef_run_history.program_version
      _nef_run_history.script_name

     1   NEFPipelines   0.0.1   header.p    

   stop_
...

Oh and one last thing, this header should be renewed each time a program reads and modifies the file, so new program names, dates, and uuid, plus an extra line in the_nef_run_history loop.

NMR Exchange Format a unified and open standard for representation of NMR restraint data. ↩︎
Universal coodinated time the worlds primary time standard and the effective successor of Greenewich Mean Tim (GMT). ↩︎
UUIDs are Universally Unique Identifiers ↩︎
you could use a hash but then you would need to know what the hash of the file is before the file is complete, and adding the hash to the file would change the value of the files hash… ↩︎

Saturday, October 24, 2020

Generative Programming with templates in ARIA

Generative Programming with Templates In Aria

generative

/ˈdʒɛn(ə)rətɪv/

adjective

1.
relating to or capable of production or reproduction.

programming

/ˈprəʊɡramɪŋ/

noun

1.
the process or activity of writing computer programs.

template

/ˈtɛmpleɪt,ˈtɛmplət/

noun

1. a shaped piece of rigid material used as a pattern for processes such as cutting out, shaping, or drilling.

From two of these definitions we could say a generative program is program that literaly gives birth to another program. So a generative programme at it's simplest its using one computer program to write another program which you then run. Thought you might think this sounds complicated and esoteric its quite often quite a useful technique that allows us to control other programs, decouple one program from another and run complex processes. In actual fact most programmers should be aware of the process as it is fundamental to how programs are built. For example when we write program in a compiled programming language we use another program, the compiler, to transform text into another file, the executable that we then run. In the case of python which is compiled and interpreted the interpreter coverts a python file <file>.py into a byte code file <file>.pyc which is then read and used to run the python interpreter. This avoids having to recompile each time the script is run if there is no change in the text of the .py file.

The picture below shows you the general case for two python interpreters. Python interpreter A takes a template and uses it along with some programming logic in the script generating_script.py to produce generated_script.py. The script generated_script.py is then run and by interpreter B to produce some output.

So how would we do this in a simple python program, in this case one that produces a csh script? The easiest way is with strings, dictionaries and templates. Consider this python program.

Python-[template.py]:------------------------------------------------

#!/bin/python

template = '''

#!/bin/csh

echo the template parameter a was %(a)s

'''

parameters {'a' : 'wibble'}

text = template % parameters

with open ('my_script.csh','w') as file_handle:

file_handle.write(text)

---------------------------------------------------------:EndPython

Here we create a string template in a variable called template and a dictionary of replacement text to go into the string template called parameters. We then combine the two using the % operator which replaces %(a)s in the template with the value defined by the string "a" in the dictionary parameters (value "wibble") giving us the final text which is then written to my_script.csh.

Csh-[my_script.csh]:-------------------------------------------------

#!/bin/csh

echo the template parameter a was wibble

--------------------------------------------------------------:EndCsh

Now if we now run myscript.csh by typing

csh my_script.csh

we get the following printed on the terminal

the template parameter a was wibble

Of course the generated program can be quite flexible, for example we could write a program that reads parameters from a file or command line and will do different things depending on what is in the file or command line, or even more esoteric things such as the status of a device or web page. Consider the following more complicated case from the real world, a toy implementation of the process used to generate a psf file (protein structure file) in the program ARIA which generates protein structures from NMR data. In this case we will reconsidering what would happen if ARIA used xplor-nih as it's structure generation program (currently it uses CNS or Yasara).

Multiple steps occur (note we are using a simplified version of how things work to make it clearer)

First ARIA reads a project file that kicks the whole process off. Then

1. ARIA writes a template pdb file which contains sequence data but no useable coordinates (all coordinates are 0.000 0.000 0.000). This is provided in this example in test_templates/data/sequence/hrdc.pdb

2. Aria writes run.json, this file contains parameters for all fixed data used by ARIA during multiple stages of calculation (structure generation and multiple rounds of structure calculation). An example would be something like

Json-[run.json]:-----------------------------------------------------

{
    "data": {
        "pdb_or_sequence": "PDB",
        "initial_pdb" : "/Users/gst9/Dropbox/git/ariaxc/ariaxc/tests/test_templates/data/sequence/hrdc.pdb",
        "initial_seq" : "/Users/gst9/Dropbox/git/ariaxc/ariaxc/tests/test_templates/data/hrdc.seq",



      "filenames" : {

            "project_root" : "/Users/gst9/Dropbox/git/ariaxc/ariaxc/tests/test_templates",
            "xplor_root" : "/Users/gst9/programs/xplor-nih/2.51",
            "file_root" : "hrdc"
      }
   }
}

-------------------------------------------------------------:EndJson

Note this contains all the information about where files can be found and where results should be put.

3. Aria generates a driver csh script that's going to run the structure generation engine xplor-nih using the python make_generate_template.py.

Python-[make_generate_template.py]:----------------------------------

import json
import sys


#1.a assume we are in generate_template as working directory
with open('run.json') as fh:
    json_data = json.load(fh)


project_root = json_data['data']['filenames']['project_root']
tmp_dir = project_root + '/tmp'
output_dr = tmp_dir + 'generate_template'

py_xplor = json_data['data']['filenames']['xplor_root'] + '/' + 'bin/pyXplor'

data = {
    "py_xplor" : py_xplor,
    "tmp_dir" : tmp_dir,
    "project_root" : project_root,
}


template = """
# SGE facility
#$ -N generate_template
#$ -S /bin/csh

## results will be stored here
setenv NEWIT ./

## project path
setenv RUN %(project_root)s

## individual run.cns is stored here
setenv RUN_CNS %(tmp_dir)s

## CNS working directory
cd ${RUN_CNS}/generate_template

## solves some NFS sync problems
cat %(project_root)s/protocols/generate_template.py > /dev/null

## command line
%(py_xplor)s %(project_root)s/protocols/generate_template.py >! generate_template.out

touch done

""" % data

with open ('generate_template.csh', 'w') as out_file:
    out_file.write(template)

-----------------------------------------------------------:EndPython

Again in our case we are going to pull parameters from run.json, though in reality the data is pulled from ARIA internal data structures (effectively the the file run1.xml and data distributed with the program such as forcefields) The script is parameterised by

the location of the xplor-nih distribution on the users computer (xplor_root)
the location of the project directory (project_root)
the root name for the files produced by the project (file_root)

All other required names and paths are then derived from these

4. Aria runs generate_template.csh, which changes to the directory <project_root>/tmp/ generate_template finds the xplor-nih instance on the computer from <data.filenames.xplor_root>, and runs generate_template.py (which now gets its os.getcwd() as <project_root>/tmp/generate_template because generate_template.csh has changed to this directory using cd

Csh-[generate_template.csh]:-----------------------------------------

# SGE facility
#$ -N generate_template
#$ -S /bin/csh

## results will be stored here
setenv NEWIT ./

## project path
setenv RUN /Users/gst9/Dropbox/git/ariaxc/ariaxc/tests/test_templates

## individual run.cns is stored here
setenv RUN_CNS /Users/gst9/Dropbox/git/ariaxc/ariaxc/tests/test_templates/tmp

## CNS working directory
cd ${RUN_CNS}/generate_template

## solves some NFS sync problems
cat /Users/gst9/Dropbox/git/ariaxc/ariaxc/tests/test_templates/protocols/generate_template.py > /dev/null

## command line
/Users/gst9/programs/xplor-nih/2.51/bin/pyXplor  /Users/gst9/Dropbox/git/ariaxc/ariaxc/tests/test_templates/protocols/generate_template.py >! generate_template.out

touch done

--------------------------------------------------------------:EndCsh

5. protocols/generate_template.py is run and reads run.json and makes the following decisions on how to run based on the data read. it knows where to find run.json because it is in the pwd retrieved using os.getcwd() (remember the pwd was set by generate_template.csh and is inherited from this parent process)

to use a pdb file or a sequence file for structure generation (<data.pdb_or_sequence> in the json file)
where the initial pdb or sequence file is stored (<data.initial_pdb> in the json file)
where the final data should be stored (<data.tmp> in the json file)
what the root of the output file should be (<data.file_root> in the json file)
where to find the pyXplor program (using <data.xplor_root> in the json file)

generate_template.py creates the psf file by running pdb2psf or seq2psf using the input files defined by run.json and putting it in <filenames.project_root>/xplor/begin/<data.file_root>.psf

Python-[generate_template.py]:---------------------------------------

import json

import os
import subprocess

# 0. lets grab the starting directory froom the command line
root_directory = os.getcwd()
print(os.getcwd())

#1.a
with open(root_directory + '/' + 'run.json') as fh:
    json_data = json.load(fh)

#1.b
pdb_or_sequence = json_data['data']['pdb_or_sequence']
xplor_root = json_data['data']['filenames']['xplor_root']
out_dir = json_data['data']['filenames']['project_root'] + '/xplor/begin'
psf_file_name = out_dir + '/' + json_data['data']['filenames']['file_root'] + '.psf'

#2 uses seq2psf or pdb2psf to create a psf file <fileroot>.psf in the begin directory in project_root
if pdb_or_sequence == 'PDB':
    pdb_file = json_data['data']['initial_pdb']

    pdb2psf = xplor_root + '/bin/pdb2psf'

    subprocess.call([pdb2psf, pdb_file, '-outfile', psf_file_name])

elif pdb_or_sequence == 'SEQ':
    sequence_file = json_data['data']['initial_sequence']

    seq2psf = xplor_root + '/bin/seq2psf'

    subprocess.call([seq2psf, sequence_file, '-outfile', psf_file_name])

else:
    raise Exception('unexpected choise for sequence source file %(pdb_or_sequence)s' % {'pdb_or_sequence' : pdb_or_sequence})

--------------------------------------------------------------:Python

Thursday, April 01, 2010

So nmr isn't entirely safe??

This arrived in the lab today. So maybe I can't describe NMR as absolutely safe any more...

have a moral easter

Tuesday, June 23, 2009

whats good about galileo

Well its amazing the latest eclipse release train has come round the tracks and to use an old marketing cliché 'its good to talk'. So what about the new version whats good

well I have been doing an awful lot of emf work and this has improved a lot

much faster conversion of genmodels into java source code. I used to watch this menu bar a lot (tens of minutes)

Its now much much faster, which is great when you have really big models...
Also since I tend to keep up with the M[1-7] versions this one looks like an excellent addition. I will have to try soon (as soon as I can get through to the friends of eclipse server or get a good torrent feed)
Links in java doc headers yeah! but still no search function in the external html viewer which is hard to setup if you are offline...
rectangular selections, I won't suffer from nedit withdrawal pangs any further
p2 the replacement for the update manager is much better, much more reliable. However, the UI is still a bit strange
- why do you type to add to the site selection, rather than filter the selection?
- why is the menu item in help called 'Install New Software...' when it leads to a dialog to 'Install and Manage Software...'?

Some pain points

when generating code with emf, debugging problems can be truly painful

errors appear in projects that are hidden,
errors get reported in dialog boxes which refer to line numbers in source text which doesn't have line numbers and has to be cut into other editors for analysis.
errors which don't have line numbers or filenames
errors in dialog boxes rather than in problem problem panes
its really hard to rerun a source generation run from a gen model
no usable text editor for jet
limitations in jet and no sign of jet2
etc

also

you can't jump to a super class in the ecore editor
navigating the ecore editor doesn't have the keyboard shortcuts the navigator has
in mint if you go the ecore file from the genmodel it doesn't take you to the item you right clicked on
the ecore sample editor doesn't notice resource changes...

now don't get me wrong these are all minor niggles compared to the size and breadth of the features implemented this year especially for example the introduction of emf databinding....

so all I have to do now for next year is to participate more and look forward to a sunny helios

regards
gary

Friday, July 04, 2008

A problem of layout?

Today there has been a certain amount of debate about the lack of visibility for certain projects on the Eclipse Ganymede download page so here is a 'quick fix' with all the other packages as a pseudo package

Monday, June 23, 2008

Looking for the Wood not the Trees

So what do I like about eclipse Ganymede? Its new sparkly and has lost of new features (in fact there so many neat tweaks that I can't keep up with them and I expect I won't use some just because they are too hard to find)

Diversity

So why the excitement? Well I don't think its the bling, its the fact that there is a faster more versatile version of the platform that I use for so many things. You see it used to be that I used to have a range of strategies for editing and dealing with the heterogeneous data and programming languages that I use.

Can we say nedit, Komodo, IntelliJ IDEA, wing etc. Most of them were open source thought I must admit I used IntelliJ IDEA (and damn good it was too [eclipse still isn't as polished but is more versatile and open]). However, now I have one tool suite that allows me to deal with most of these at the same time within the same editing environment, so I have less things to remember less windows to find on my desktop and more support for what I am doing there's more...

Community & Open Source

I have found the eclipse community (and especially the emf mailing list http://www.eclipse.org/newsportal/thread.php?group=eclipse.tools.emf and especially Ed Merks) to be quite wonderful. Dealing with a newbie, who makes quite few mistakes and still being friendly helpful and authoritative all at the same time is a wonderful skill.

Having an open source platform has also been a vital asset so on the project I have been working on. For example how do I replace <%packageA.packageB.Class%> with the correct declaration in a custom jet template? Go look at the source code ;-) (though it would be even nicer if I could go and look at the javadoc!) (see the getBody function). So another one of my answers is its the whole, not the trees.

In conclusion Where else could I have work flow that goes python->model-> java all in the same environment, with such ease?

can you analyse this