Saturday, June 19, 2021

The NEF Header

NEF (NMR Exchange Format1) files have headers (one per file) that define which programs wrote the file and it’s history. However, there are a few things that are not clear

Here’s the header

save_nef_nmr_meta_data
   _nef_nmr_meta_data.sf_category      nef_nmr_meta_data
   _nef_nmr_meta_data.sf_framecode     nef_nmr_meta_data
   _nef_nmr_meta_data.format_name      nmr_exchange_format
   _nef_nmr_meta_data.format_version   1.1
   _nef_nmr_meta_data.program_name     NEFPipelines
   _nef_nmr_meta_data.program_version  0.0.1
   _nef_nmr_meta_data.creation_date    2021-06-19T17:36:39.073848
   _nef_nmr_meta_data.uuid             NEFPipelines-2021-06-19T17:36:39.073848-9006508160

   loop_
      _nef_run_history.run_number
      _nef_run_history.program_name
      _nef_run_history.program_version
      _nef_run_history.script_name

     1   NEFPipelines   0.0.1   header.py    

   stop_

save_

Please note firstly the entries sf_category and sf_framecode these are mandetory for the frame to be recognised.

The first frame that that isn’t clear in its format is _nef_nmr_meta_data.creation_date, however this appears to be a isoformat date time, and the mostr reasonable decision is that this is a UTC 2 date time as there is no time zone information and this is unique worldwide. The simple way yom ake this in python is

from datetime import datetime
utc_date_time = datetime.now().isoformat()

The second question is what is the _nef_nmr_meta_data.uuid tag? This is a UUID3 which uniquely identifies this version of the file apart form any other4. This has the form: NEFPipelines-2021-06-19T17:36:39.073848-9006508160. The first part is obvious its our programmes name and the second part is the current time. However, whats the third part 9006508160 well its just a 10 digit random number to ensure that the uuid is unique (think of creating the file at the same time on multiple threads…without the random number they would all have the same Universally Unique Identifier!

from random import randint
from datetime import datetime

utc_date_time = datetime.now().isoformat()
random_value = ''.join(["{}".format(randint(0, 9)) for num in range(10)])
uuid = f'NEFPipelines-{utc_date_time}-{random_value}'

Finally there is the loop

   loop_
      _nef_run_history.run_number
      _nef_run_history.program_name
      _nef_run_history.program_version
      _nef_run_history.script_name

     1   NEFPipelines   0.0.1   header.p    

   stop_

This just lists the programs that have editied the file in order… lastest to oldest.

So a complete header would be

data_new

save_nef_nmr_meta_data
   _nef_nmr_meta_data.sf_category      nef_nmr_meta_data
   _nef_nmr_meta_data.sf_framecode     nef_nmr_meta_data
   _nef_nmr_meta_data.format_name      nmr_exchange_format
   _nef_nmr_meta_data.format_version   1.1
   _nef_nmr_meta_data.program_name     NEFPipelines
   _nef_nmr_meta_data.program_version  0.0.1
   _nef_nmr_meta_data.creation_date    2021-06-19T17:36:39.073848
   _nef_nmr_meta_data.uuid             NEFPipelines-2021-06-19T17:36:39.073848-9006508160

   loop_
      _nef_run_history.run_number
      _nef_run_history.program_name
      _nef_run_history.program_version
      _nef_run_history.script_name

     1   NEFPipelines   0.0.1   header.p    

   stop_
...

Oh and one last thing, this header should be renewed each time a program reads and modifies the file, so new program names, dates, and uuid, plus an extra line in the_nef_run_history loop.


  1. NMR Exchange Format a unified and open standard for representation of NMR restraint data. ↩︎

  2. Universal coodinated time the worlds primary time standard and the effective successor of Greenewich Mean Tim (GMT). ↩︎

  3. UUIDs are Universally Unique Identifiers ↩︎

  4. you could use a hash but then you would need to know what the hash of the file is before the file is complete, and adding the hash to the file would change the value of the files hash… ↩︎