Chapter 11: Nuclear Magnetic Resonance with NMRglue#

Nuclear magnetic resonance (NMR) spectroscopy is one of the most common and powerful analytical methods used in modern chemistry. Up to this point, we having been primarily dealing with text-based data files - that is, files that can be opened with a text editor and still contain human comprehensible information. If you open most files that come out of an NMR instrument in a text editor, it will look more like gibberish than anything a human should be able to read. This is because they are binary files; they are written in computer language rather than human language.

We need a specialized module to be able to import and read these data, and luckily a Python module called NMRglue to does exactly this. The module contains submodules for dealing with data from each of the major NMR spectroscopy file types which includes Bruker, Pipe, Sparky, and Varian. It does not read JOEL files, but as of this writing, JOEL spectrometers support exporting data into at least one of the above file types supported by NMRglue.

Currently, NMRglue is not included with Anaconda, so you will need to fetch NMRglue and install it yourself. Instructions are included on the following website or you can use pip to install it. If Jupyter is installed on your computer, you should be able to install it through the terminal using pip install nmrglue, and if you are using Google Colab, you should include !pip install nmrglue in the first code cell of the notebook (see section 0.2). NMRglue requires you have NumPy and SciPy already installed, and matplotlib should also be installed for visualization.

https://nmrglue.readthedocs.io/en/latest/install.html

All use of NMRglue below assumes the following import with alias. NMRglue is not a major library in the SciPy ecosystem, so the ng alias is not a strong convention but is used here for convenience and to be consistent with the online documentation.

import nmrglue as ng
import numpy as np
import matplotlib.pyplot as plt

11.1 NMR Processing#

The general procedure for collecting NMR data is to excite a given type of NMR-active nuclei with a radio-frequency pulse and allow them to relax. As they precess, their rotation leads to a voltage oscillation in the instrument at characteristic frequencies, and the spectrometer records these oscillations as a free induction decay (fid) depicted below (Figure 1, left). It is the frequency of these oscillations that we are interested in because they are informative to a trained chemist as to the chemical environment of the nuclei. One challenge is that all the different signals from each of the nuclei are stacked on top of each other making it difficult to distinguish one from the other or to determine the wave frequency. This is similar to the problem of a computer discerning a single instrument in an entire orchestra playing at once. Fortunately, there is a mathematical equation called the Fourier transform that converts the above fid into a graph showing all of the different frequencies (Figure 1, right). This is what is known as converting the time domain to the frequency domain.

Figure 1 Raw NMR spectroscopy data is converted from the time domain (left) to the frequency domains (right) using a Fourier transform.

The general steps for dealing with NMR spectroscopic data in Python are outlined below.

  1. Load the fid data into a NumPy array using NMRglue

  2. Fourier transform the data to the frequency domain

  3. Phase the spectrum

  4. Reference the spectrum

  5. Measure the chemical shifts and integrals of the peaks

11.2 Importing Data with NMRglue#

The importing of data using NMRglue is performed by the read function from one of the submodules shown in Table 1. Additional modules can be found in the NMRglue documention. The choice of module is dictated by the data file type.

Table 1 Examples of NMRglue Modules

Module

Description

bruker

Bruker data as a single file

pipe

Pipe data as a single file with an .fid extension

sparky

Sparky NMR file format with .ucsf extension

varian

Varian/Agilent data as a folder of data with an .fid extension

jcampdx

JCAMP-DX files with .dx or .jdx extensions

The read() function loads the NMR file and returns a tuple containing a dictionary of metadata and data in an ndarray. The dictionary includes information required to complete the processing of the NMR date. Looking at the NMR data shown below, you may noticed each point includes both both real and imaginary components (i.e., the mathematical terms with j). Both are necessary for phasing the spectrum later on, so don’t discard any of the data.

dic, data = ng.pipe.read('data/EtPh_1H_NMR_CDCl3.fid')
data
array([-0.00194889-0.00471539j, -0.00192186-0.00472489j,
       -0.00191337-0.00473085j, ..., -0.00189737+0.00591656j,
       -0.00191882+0.005872j  , -0.00191135+0.00587132j], dtype=complex64)
# Reversed the Fourier transform for demo purposes being as this data 
# was colleced on a spectrometer that already Fourier transformed the data.

from scipy.fft import ifft
data = ifft(data)[::-1]

The dictionary, dic, above contains a very long list of values, and the dictionary keys can be different among different file formats. To maintain a shorter, more useful, and more consistent dictionary of metadata, NMRglue provides the guess_udic() function for generating a universal dictionary among all file formats.

udic = ng.pipe.guess_udic(dic, data)
udic
{'ndim': 1,
 0: {'sw': 5994.65478515625,
  'complex': True,
  'obs': 399.7821960449219,
  'car': 1998.9109802246094,
  'size': 13107,
  'label': 'Proton',
  'encoding': 'direct',
  'time': False,
  'freq': True}}

The universal dictionary is a nested dictionary. The first key is ndim which provides the number of dimensions in the NMR spectrum. Most NMR spectra are one dimensional, but two dimensional is also fairly common. Subsequent key(s) are for each dimension in the NMR spectrum with the value as a nested dictionary of metadata. Because the data for the above spectrum is one-dimensional, there is only one nested dictionary. Table 2 below provides a description of each piece of metadata contained in the universal dictionary.

Table 2 udic Dictionary Keys for Single Dimensions*

Key

Description

Data Type

car

Carrier frequency (Hz)

Float

complex

Indicates if the data contain complex values

Boolean

encoding

Encoding format

String

freq

Indicates if the data are in the frequency domain

Boolean

label

Observed nucleus

String

obs

Observed frequency (MHz)

Float

size

Number of data points in spectrum

Integer

sw

Spectral width (Hz)

Float

time

Indicates if the data are in the time domain**

Boolean

* That is, it is assumed that we are looking at single dimensions from the NMR data, so for example, we are looking at udic[0].

** Being that the data must be in either the frequency or time domain, the freq and time keywords effectively provide the same information.

11.3 Fourier Transforming Data#

When the data is first imported, it is often in the time domain. You can confirm this by checking that the time value in the udic is set to True like below.

udic[0]['time']

We can also view the data by plotting with matplotlib.

fig0 = plt.figure(figsize=(16,6))
ax0 = fig0.add_subplot(1,1,1)
ax0.plot(data.real);
../../_images/e460e69b1f2d1022e0ad41e2f47ed27c9f90bb18e58f030e11ac69ed5badadd9.svg

To convert the data to the frequency domain, we will use the fast Fourier transform function (fft) from the fft SciPy module. NMRglue also contains Fourier transform functions, but we will use SciPy here. The plot below inverts the x-axis with

from scipy.fft import fft  # imports only fft function
fdata = fft(data)
fig1 = plt.figure(figsize=(16,6))
ax1 = fig1.add_subplot(1,1,1)
ax1.plot(fdata.real)
plt.gca().invert_xaxis() # reverses direction of x-axis to conform to NMR plotting norms
../../_images/4740d000cecff82bb99865dae4bee0994d01126d81d850f2ef41723e0ce2589d.svg

When you plot the Fourier transformed data, you may get a ComplexWarning error message because the Fourier transform will return complex values (i.e., values with real and imaginary components). To only work with the real components, use the .real method as is done above. The plot now looks more like an NMR spectrum, but most of the resonances are out of phase. The next step is to phase the spectrum.

11.4 Phasing Data#

Phasing is the post-processing procedure for making all peaks point upward as shown in Figure 2. There is more to it than taking the absolute value as that would not always generate a single peak, so NMRglue contains a series of functions for phasing spectra.

Figure 2 Phasing an NMR spectrum results in all the signals pointing in the positive direction.

11.4.1 Autophasing#

The simplest method to phase your NMR spectrum is to allow the autophasing function to handle it for you. Below is the function which takes the data and the phasing algorithm as the arguments.

ng.process.proc_autophase.autops(data, algorithm)

The permitted phasing algorithms can be either acme or peak_minima. It is important to feed the autops() function the data array with both the real and imaginary components.

phased_data = ng.process.proc_autophase.autops(fdata,'acme')
Optimization terminated successfully.
         Current function value: 0.001729
         Iterations: 118
         Function evaluations: 233
fig2 = plt.figure(figsize=(16,6))
ax2 = fig2.add_subplot(1,1,1)
ax2.plot(phased_data.real)
plt.gca().invert_xaxis()
../../_images/54449e29c166a5a02cc359afd60027ea491556cf5878bfc68fa1692e10535709.svg

You should try both algorithms to see which works best for you. The above spectrum is the result of the acme autophasing algorithm which is close but still slightly off. If neither of the provided autophasing algorithms work for you, you will need to instead manually phase the NMR spectrum as discussed below.

11.4.2 Manual Phasing#

Manually phasing the NMR spectrum is a two-step process. First, you need to call the manual_ps phasing() function and adjust the p0 and p1 sliders until the spectrum appears phased.

%matplotlib  # exists inline plotting
p0, p1 = ng.process.proc_autophase.manual_ps(fdata.real)

After closing the window, the function will return values for p0 and p1 that you found to properly phase the spectrum. Second, input those p0 and p1 values into the ps() phasing function to actually phase the spectrum.

phased_data = ng.proc_base.ps(fdata, p0=p0, p1=p1)

%matplotlib inline  # reinstates inline plotting

fig3 = plt.figure(figsize=(16,6))
ax3 = fig3.add_subplot(1,1,1)
ax3.plot(phased_data.real)
plt.gca().invert_xaxis()

You can then plot the phased_data to get your NMR spectrum with all the peaks pointing upward.

11.5 Chemical Shift#

Even though the NMR spectrum is now phased, it is unlikely to be properly referenced. That is, the peaks are not currently located at the correct chemical shift. Referencing is often performed by knowing the accepted chemical shifts of the solvent resonances or an internal standard (e.g., tetramethylsilane, TMS) and adjusting the spectrum by a correction factor. Currently, we are plotting our data against the index of each data point, so first we need to create a frequency scaled x-axis as an array followed by adjusting the location of the spectrum so that it is properly referenced.

11.5.1 Generate the X-Axis#

The x-axis is the frequency scale, so this axis is sometimes presented in hertz (Hz). However, because the frequency of NMR resonances depends upon the instrument field strength, the same sample will exhibit different frequencies in different instruments. To make the frequency axis independent of the spectrometer field strength, NMR spectra are often presented on a ppm scale which is the ratio of the observed chemical shift (Hz) versus a standard over the spectrometer frequency (MHz) at which that particular nucleus is observed.

\[ ppm = \frac{Observed \, Frequency \, (Hz)}{Spectrometer \, Frequency \, (MHz)} \]

This makes the locations of the peaks consistent from spectrometer to spectrometer no matter the strength of the magnet. This is where the udic from section 11.2 is important because we can obtained the observed frequency width (Hz) of the spectrum, and the resolution of the data. The latter is how many data points are in the spectrum which is important so that we avoid a plotting error (we all know this one: ValueError: x and y must have same first dimension,...). If any of the values from the udic are 999.99, this means the spectrometer did not record this piece of information and you will need to find it elsewhere.

size = udic[0]['size'] # points in data
sw = udic[0]['sw']     # width in Hz
obs = udic[0]['obs']   # carrier frequency
from math import floor
hz = np.linspace(0, floor(sw), size) # x-axis in Hz 
ppm = hz / obs                       # x-axis in ppm

Now if we plot the spectrum, we see it in a ppm scale.

fig4 = plt.figure(figsize=(16,6))
ax4 = fig4.add_subplot(1,1,1)
ax4.plot(ppm, phased_data.real)
ax4.set_xlabel('Chemical Shift, ppm')
plt.gca().invert_xaxis()
../../_images/77fa758cd672f3e3c200b738fe2846d7a6935d6a45293632d6e235d9b120e997.svg

Alternatively, NMRglue contains an object called a unit conversion object that can be created and used to convert between ppm, Hz, and point index values for any position in an NMR spectrum. To create a unit unit conversion object, use the make_uc() function which takes two arguments – the dictionary, dic, and the original data array, data, generated from reading the NMR file in section 11.2.

unit_conv = ng.pipe.make_uc(dic, data)   

ppm = unit_conv.ppm_scale()

The last line of the above code generates an array of ppm values required for the x-axis to plot the NMR data. The following example uses the ppm scale generated by doing the math ourselves.

11.5.2 Referencing the Data#

In the above spectrum, the small resonance at 2.58 ppm is internal TMS (tetramethylsilane) standard which should be located at 0.00 ppm. The temptation is to subtract 2.58 ppm from the x-axis, but the spectrum is not simply moved over but instead is rolled. That is, as the spectrum is moved, some of it disappears off one end and reappears on the other (Figure 3).

Figure 3 Referencing an NMR spectrum is performing by rolling it until the peaks reside at the correct shifts. As a signal falls of one end of the spectrum, it reappears at the other end.

Conveniently for us, NumPy has a function np.roll() that does exactly this to array data.

np.roll(array, shift)

The np.roll() function takes two required arguments. The first is the array containing the data and the second is the amount to shift or roll the data. The shift is not in ppm but rather positions in the data array. If you know your referencing correction in ppm (Δppm), use the following equation which describes the relationship between the correction in ppm (Δppm) and the correction in number of data points (Δpoints). The size is the number of point in a spectrum, obs is the observed carrier frequency, and sw is the sweep width in Hz. These values are all available from the universal dictionary.

\[ \Delta_{points} = \frac{\Delta_{ppm} \times size \times obs}{sw} \]

The example below requires the spectrum to be shifted by -2.58 ppm.

point_shift = int((-2.5782 * size * obs) / sw)
data_ref = np.roll(phased_data, point_shift)
fig5 = plt.figure(figsize=(16,6))
ax6 = fig5.add_subplot(1,1,1)
ax6.plot(ppm, data_ref.real)
ax6.set_xlabel('Chemical Shift, ppm')
plt.xlim(8,0)
plt.xticks(np.arange(0,8,0.2))
plt.show()
../../_images/2ad569ac7b792880941e42db58583197b6d90338cc7ce2c50327e9d9ac1cd9ce.svg

If you want to narrow the plot to where the resonances are located, you can use the plt.xlim(8,0) function. Notice that 8 is first to indicate that the plot is from 8 ppm \(\rightarrow\) 0 ppm. The use of plt.xlim(8,0) removes the need to use plt.gca().invert_xaxis() to flip the x-axis.

11.6 Integration#

Integration of the area under the peaks can be performed using either integration functions from the scipy.integrate module or though NMRglue’s integration function(s). Because the integration function in NMRglue supports limit values in the ppm scale, it is probably the most convenient and is demonstrated below.

The integration is performed using the integrate() function below where data is your NMR data as a NumPy array, the conv_obj is an NMRglue unit conversion object (see section 11.5.1), and limits is a list or array of limits for integration.

ng.analysis.integration.integrate(data, conv_obj, limits)
uc = ng.pipe.make_uc(dic, data)
limits = np.array([[7.07,7.37], [1.10, 1.35], [2.50,2.75]])
fig6 = plt.figure(figsize=(16,6))
ax6 = fig6.add_subplot(1,1,1)
ax6.plot(ppm, data_ref.real)
ax6.set_xlabel('Chemical Shift, ppm')
plt.xlim(8,0)
plt.xticks(np.arange(0,8,0.2))
for lim in limits.flatten():
    plt.axvline(lim, c='r')
../../_images/decdd961cffc456b3326e0a643d90625ba588d84348533cfa11e02e4830736b4.svg

The limits are in ppm, so take a look at the spectrum above and decide where you want to put the integration limits. An NMR spectrum with the chosen integration limits are shown above as vertical red lines.

Now to integrate our NMR spectrum.

area = ng.analysis.integration.integrate(data_ref.real, uc, limits)
area
array([-0.00061345, -0.0003549 , -0.00034152], dtype=float32)

These values are probably not what you expected, but if we divide all of them by the smallest value, we are left with the relative areas under the peaks like you would get from most NMR processing software.

area / np.min(area)
array([1.       , 0.578522 , 0.5567217], dtype=float32)

The spectrum above is the \(^1\)H NMR of ethylbenzene in CDCl\(_3\) which has five aromatic protons, and the other two resonances should have three and two protons. If we do some math to make the integrations total to ten protons and round to the nearest integer, we get 5:3:2. There is a small amount of error likely due to the solvent resonance (CHCl\(_3\), 7.27 ppm) being included in the integration of the aromatic protons.

11.7 Peak Picking#

Another piece of information that is commonly extracted from NMR spectra is the chemical shift of the resonances. Similar to integration, SciPy contains functions such as scipy.signal.argrelmax() that can find peaks in spectra, but again, NMRglue contains a function, below, designed for the task of locating peaks in NMR spectra.

ng.analysis.peakpick.pick(data, pthres=)

There are numerous optional arguments for the peak picking function, but the two mandatory pieces of information required are the data array and a positive threshold (pthres) above which any peak will be identified. Glancing at the spectrum below, all peaks are above 0.1 (green dotted line) and the baseline is below 0.1, so this seems like a reasonable threshold.

fig7 = plt.figure(figsize=(16,6))
ax7 = fig7.add_subplot(1,1,1)
ax7.plot(ppm, data_ref.real)
ax7.set_xlabel('Chemical Shift, ppm')
plt.xlim(8,0)
plt.xticks(np.arange(0,8,0.2))
plt.axhline(0.1, c='C2', ls='--');
../../_images/d60c43b1329f63040c176e2981ea3130b877e674eab3f6bd5621070d7b4ba9f2.svg

Note

The ng.analysis.peakpick.pick() does not work with NumPy versions 1.24 and later if you are using a version of nmrglue before 0.10. Consider upgrading your version of nmrglue if ng.analysis.peakpick.pick() raises an error.

peaks = ng.analysis.peakpick.pick(data_ref.real, pthres=0.1)
peaks
rec.array([(1077., 1, 34.18577215, 27.63014793),
           (2299., 2, 31.36229393, 16.8708477 ),
           (2332., 3,  5.        ,  0.88378489),
           (6275., 4, 35.96966773, 25.60334969),
           (6355., 5, 32.13851119, 17.82110214)],
          dtype=[('X_AXIS', '<f8'), ('cID', '<i8'), ('X_LW', '<f8'), ('VOL', '<f8')])

The output of this function is an array of tuples with each tuple containing information about an identified peak. From this, we can already tell there are four peaks identified. Each tuple contains an index for the peak, a peak number, a line width of the peak, and an estimate of the areas of each peak. We can use the index values to index the ppm array for the chemical shifts.

peak_loc = []
for x in peaks:
    peak_loc.append(ppm[int(x[0])])
print(peak_loc)
[1.2320797763091693, 2.6300384454361936, 2.6677901934568085, 7.178552085738197, 7.2700714748790825]
fig8 = plt.figure(figsize=(16,6))
ax8 = fig8.add_subplot(1,1,1)
ax8.plot(ppm, data_ref.real)
ax8.set_xlabel('Chemical Shift, ppm')
plt.xlim(8,0)
plt.xticks(np.arange(0,8,0.2))
for p in peak_loc:
    plt.axvline(p, c='C1', ls='--', alpha=1)
../../_images/372ab3fa9fbf66af0de61f1da14b4ee03d4fcb2ee67f3a38bc21292a4129b106.svg

We can plot the NMR spectrum with these chemical shifts marked with vertical dotted lines shown below. Looks like it did a pretty good job locating the resonances! If NMRglue fails to properly identify the peaks, there are a number of parameters described in the NMRglue documentation that can be adjusted.

Further Reading#

  1. NMRglue Website. https://www.nmrglue.com/ (free resource)

  2. NMRglue Documentation Page. http://nmrglue.readthedocs.io/en/latest/tutorial.html (free resource)

  3. J.J. Helmus, C.P. Jaroniec, Nmrglue: An open source Python package for the analysis of multidimensional NMR data, J. Biomol. NMR 2013, 55, 355-367, http://dx.doi.org/10.1007/s10858-013-9718-x. (paper on nmrglue)

Exercises#

Complete the following exercises in a Jupyter notebook and NMRglue library. Any data file(s) refered to in the problems can be found in the data folder in the same directory as this chapter’s Jupyter notebook. Alternatively, you can download a zip file of the data for this chapter from here by selecting the appropriate chapter file and then clicking the Download button.

  1. Open the 1H NMR spectrum of ethanol, EtOH_1H_NMR.fid, taken in CDCl\(_3\) with TMS using NMRglue. Use the pipe module.

    a) Plot the resulting spectrum and be sure to properly reference it if not done already.

    b) Integrate the methyl (-CH\(_3\)) versus the methylene (-CH\(_2\)-) resonances and calculate the ratio.

  2. Open the \(^1\)H and \(^{13}\)C NMR spectra of 2-ethyl-1-hexanol, 2-ethyl-1-hexanol_1H_NMR_CDCl3.fid and 2-ethyl-1-hexanol_13C_NMR_CDCl3.fid, in CDCl\(_3\) with TMS and plot them on a ppm scale. Be sure to properly phase and reference the spectra if not done already. Use the pipe module.