Graphene Raman: peak fitting of D, G and 2D

This is a fitting script for Raman spectra of graphene, to determine important characteristics like the ratio of the defect (D) peak to the graphite (G) peak or the symmetry of the the 2D peak.

Graphene Raman peaks

The main graphene peaks are modeled as lmfit Pseudo Voigt peaks.

The script is aimed at a standard evaluation of these properties for multiple spectra, to obtain reasonable statistics.



In [1]:

    
#The default definition of peaks and ranges in the graphene spectrum
GRAPHENEPEAKS = [
    {
        #The name of the peak
        'prefix': 'D',
        #The range in the spectrum: baseline correction will irgnore this range
        'range': {'min': 1290.0, 'max': 1420.0},
        #constraints for the peak position
        'center': {'min': 1340.0, 'init': 1350, 'max': 1360.0, 'vary': True},
        #constraints for the peak width
        'sigma': {'min': 5.0, 'init': 30.0, 'max': 40.0, 'vary': True},
        #constraints for the gaussian to lorentzian ratio
        'fraction': {'min': 0.01, 'init': 0.8, 'max': 1.0, 'vary': True},
        #constraints for the peak height
        'amplitude': {'min': 0.0, 'vary': True},
        #If set to False, the peak will not be fitted, only initialized
        'vary': True
    },
    {
        #The name of the peak
        'prefix': 'G',
        #The range in the spectrum: baseline correction will irgnore this range
        'range': {'min': 1480.0, 'max': 1700.0},
        #constraints for the peak position
        'center': {'min': 1560.0, 'init': 1585.0, 'max': 1610.0, 'vary': True},
        #constraints for the peak width
        'sigma': {'min': 5.0, 'init': 25.0, 'max': 50.0, 'vary': True},
        #constraints for the gaussian to lorentzian ratio
        'fraction': {'min': 0.01, 'init': 0.4, 'max': 1.0, 'vary': True},
        #constraints for the peak height
        'amplitude': {'min': 0.0, 'vary': True},
        #If set to False, the peak will not be fitted, only initialized
        'vary': True
    },
    {
        #The name of the peak
        'prefix': 'Dplus',
        #The range in the spectrum: baseline correction will irgnore this range
        'range': {'min': 2350.0, 'max': 2540.0},
        #constraints for the peak position
        'center': {'min': 2420.0, 'init': 2450.0, 'max': 2500.0, 'vary': True},
        #constraints for the peak width
        'sigma': {'min': 5.0, 'init': 25.0, 'max': 50.0, 'vary': True},
        #constraints for the gaussian to lorentzian ratio
        'fraction': {'min': 0.01, 'init': 0.4, 'max': 1.0, 'vary': True},
        #constraints for the peak height
        'amplitude': {'min': 0.0, 'vary': True},
        #If set to False, the peak will not be fitted, only initialized
        'vary': False
    },
    {
        #The name of the peak
        'prefix': 'twoD',
        #The range in the spectrum: baseline correction will irgnore this range
        'range': {'min': 2580.0, 'max': 2790.0},
        #constraints for the peak position
        'center': {'min': 2650.0, 'init': 2670, 'max': 2750.0, 'vary': True},
        #constraints for the peak width
        'sigma': {'min': 5.0, 'init': 25.0, 'max': 80.0, 'vary': True},
        #constraints for the gaussian to lorentzian ratio
        'fraction': {'min': 0.01, 'init': 0.4, 'max': 1.0, 'vary': True},
        #constraints for the peak height
        'amplitude': {'min': 0.0, 'vary': True},
        #If set to False, the peak will not be fitted, only initialized
        'vary': True
    }
]

The required packages are mostly part of the anaconda distribution. lmfit is installed separately. Tested with Python 3.5.2 and lmfit 0.9.5



In [2]:

    
%load_ext autoreload
%autoreload 2

Multi-peak module

For clear arrangement, the multi-peak module is in the "multipeak" folder. Make changes there in the .py files, if you want to customize the evaluation.



In [3]:

    
#from multipeak import Dataset, MultiPseudoVoigtModel, GrapheneModelResults, printmd
import multipeak
from copy import deepcopy



In [4]:

    
from sys import version as pythonVersion
from lmfit._version import get_versions
print('Python version:', pythonVersion)
print('lmfit version:', get_versions()['version'])









    



Python version: 3.5.2 | packaged by conda-forge | (default, Jul 26 2016, 02:06:09) [MSC v.1900 64 bit (AMD64)]
lmfit version: 0.9.5

This is to prevent cell scrolling for longer outputs:



In [5]:

    
%%javascript
IPython.OutputArea.auto_scroll_threshold = 9999;

Fitting task

µ-Raman spectra (100X objective) of graphene have to be supplied in a text file, with the Ramanshift in cm^-1 and the intensity in separate columns. The resolution should be better than 1.5 cm^-1, the range from at least from 1265 cm^-1 to 2810 cm^-1 for fitting of the default peaks.

The file can contain multiple datasets with an equal number of datapoints, each dataset is then individually processed and average values and histograms are generated.

Optionally, the data will be cut to the range specified by TASK['xmin'] and TASK['xmax'].

The default fitting uses separate fitting tasks with specific baseline subtraction for the D + G band and the 2D band. In general this leads to more robust results, compared to model a sigle baseline with a higher order polynomial.



In [6]:

    
TASK = {
    #Filename as full path. The file can contain multiple datasets with an equal number of datapoints
    #'filename': 'examples/CVD_graphene_example.txt',  #optional
    
    #If filename is left unspecified a file dialog will open in:
    'initialDir': './',
    
    #Columns in the file to be used
    'posx_column': 0,  #optional
    'posy_column': 1,  #optional
    'x_column': 2, #raman shift
    'y_column': 3, #raman intensity
    
    #Axis labels for plotting
    'xlabel': 'Raman shift',
    'xunit': 'cm^{-1}',
    'ylabel': 'Intensity',
    'yunit': 'counts',
    
    #Set to the order of polynomial for baseline subtraction, 0 for no baseline
    #Maxium allowed value is 7
    #Note that this is ignored if a partially defined baseline is used
    'baselineOrder': 6,
    
    #Minimum signal-to-noise ratio, if 0 all speactra are used
    'snr': 100,
    
    #The minimum number of remaining spectra after signal-to-noise selection 
    'minimumStatisctics': 100, #evaluation according to the standard must include 100 spectra
    
    #Minium x value
    'xmin': 1200.0,  #optional
    #Maximum x value
    'xmax': 2810.0,  #optional
    
    #Dictionary where the peaks are defined
    'peaks': deepcopy(GRAPHENEPEAKS)
}

Customizations for pyplot figures:



In [7]:

    
%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (12.0, 8.0)
plt.rcParams.update({'font.size': 14})

Reading of the datasets

The datasets are read from a text file using TASK['filename'] with x and y columns defined in TASK['x_column'] and TASK['y_column'].



In [8]:

    
#we first read the data
DATASET = multipeak.Dataset(task=TASK, maxNumber=1000)









    



Reading  C:/Users/hoffmann/Tempo Box/jupyter/graphenestandards/raman/examples/CVD_graphene_example.txt
Data  C:/Users/hoffmann/Tempo Box/jupyter/graphenestandards/raman/examples/CVD_graphene_example.txt  read
Data reversed
121  datasets detected
Data cut:  1200.0 2810.0
Data reshaped
Weights arrays initialized

Baseline subtraction

A partially defined polynomial baseline is subtracted from every dataset, using the following sequences:



In [9]:

    
"""Partially defined polynomial baseline"""
BASELINE = [
    {
        #partial baseline is defined
        'untilX': 1750.0,  #in wavenumbers
        #and has
        'polynomialOrder': 2,
        #and requires minimum signal-to-noise ratio
        'snr': 100  #which can be 0 to include all spectra
    },
    {
        'untilX': 2555.0,
        'polynomialOrder': 4,
        'snr': 0
    },
    {
        'untilX': 2810.0,
        'polynomialOrder': 1,
        'snr': 100
    }    
]



In [10]:

    
DATASET.subtractMultiBaseline(baselineParts=BASELINE, exampleSpec=0)









    




Example baseline






    












    




Example baseline






    












    




Example baseline






    












    




Example for corrected spectrum

Filtering

Datasets are omitted if the signal to noise ratio is lower than the value defined in the baseline parts. This is recommended, because for low signal spectra the evaluation is not reliable. The signal to noise ratio is determined by the squared variance of the data divided by the squared variance of the data points excluding the peak ranges.



In [11]:

    
DATASET.filterDatasetBySNR()









    



0 spectra were excluded from 121 datasets
121 datasets left for fitting

Building of the multi-peak model

Reads the peaks defined in TASK['peaks'] and builds and lmfit CompositeModel from them.



In [12]:

    
#finally we build and fit the model
GRAPHENEMODEL = multipeak.MultiPseudoVoigtModel(DATASET, weights=DATASET.weights['peaks'])

2D peak fitting

Here starts the actual data processing. Depending on the number of datasets this can take a while (for example 0.2s per spectrum).



In [13]:

    
GRAPHENEMODEL.runFit()









    



--- Fitting datasets started  Thu, 15 Dec 2016 16:52:54 +0000
--- Fit completed in 23.34633445739746 seconds ---

Building graphene fit results

This object takes the standard fit results from lmfit and provides multi-peak, multi-spectra and graphene specific evaluations.



In [14]:

    
#build the graphene specific evaluations from the array of fit results
RESULTS = multipeak.GrapheneModelResults(GRAPHENEMODEL, DATASET)

Reporting of fit results



In [15]:

    
multipeak.printmd('### Average fit results')
RESULTS.printPeakResultsTable()

multipeak.printmd('### Peak ratios')
RESULTS.printDerivedResultsTable(derivedParams=[
    'DtoG_area',
    'DtoG_height',
    'twoDtoG_area',
    'twoDtoG_height'
])

multipeak.printmd('### 2D symmetry')
RESULTS.printDerivedResultsTable(derivedParams=[
    'twoDr2',
    'twoDmaxRes'
])

multipeak.printmd('### Fitted spectra overview')
RESULTS.plot3D(save=True)

multipeak.printmd('### Fit of average spectrum')
RESULTS.plotAvgFit(saveName='avgFit')
print(GRAPHENEMODEL.avgFitResult.fit_report())









    




Average fit results






    




Results per peak from 121 datasets:
center sigma fwhm amplitude fraction height
D 1343.014 +/- 2.570 16.500 +/- 4.403 32.999 +/- 8.806 33718.450 +/- 28134.878 0.414 +/- 0.269 968.670 +/- 625.238
G 1583.340 +/- 3.753 10.753 +/- 4.176 21.506 +/- 8.352 240500.102 +/- 80259.279 0.831 +/- 0.157 11574.772 +/- 1939.070
twoD 2673.818 +/- 3.952 20.850 +/- 3.486 41.701 +/- 6.973 710343.370 +/- 132302.904 0.568 +/- 0.070 17747.490 +/- 5001.802






    




Peak ratios






    




Derived results from 121 datasets:
D/G area ratio 0.140 +/- 0.132
D/G height ratio 0.097 +/- 0.123
2D/G area ratio 3.196 +/- 1.010
2D/G height ratio 1.518 +/- 0.335







    




2D symmetry






    




Derived results from 121 datasets:
Coefficient of determination of 2D single peak fitting 0.999 +/- 0.002
Normalized squared maximum 2D residual 0.001 +/- 0.001







    




Fitted spectra overview






    












    




Fit of average spectrum






    












    



[[Model]]
    ((Model(pvoigt, prefix='D_') + Model(pvoigt, prefix='G_')) + Model(pvoigt, prefix='twoD_'))
[[Fit Statistics]]
    # function evals   = 175
    # data points      = 1014
    # variables        = 16
    chi-square         = 11473272.277
    reduced chi-square = 11496.265
    Akaike info crit   = 9496.547
    Bayesian info crit = 9575.293
[[Variables]]
    D_height:          989.832021 +/- 0        (0.00%)  == 'D_amplitude / D_fwhm'
    D_fraction:        0.56503839 +/- 0        (0.00%) (init= 0.8)
    D_sigma:           17.4112029 +/- 0        (0.00%) (init= 30)
    D_amplitude:       34468.3324 +/- 0        (0.00%) (init= 2.392861e+07)
    D_center:          1342.66663 +/- 0        (0.00%) (init= 1350)
    D_fwhm:            34.8224058 +/- 0        (0.00%)  == 'D_sigma * 2.0'
    G_height:          11360.8310 +/- 0        (0.00%)  == 'G_amplitude / G_fwhm'
    G_fraction:        0.95222191 +/- 0        (0.00%) (init= 0.4)
    G_sigma:           10.7691079 +/- 0        (0.00%) (init= 25)
    G_amplitude:       2.4469e+05 +/- 0        (0.00%) (init= 2.392861e+07)
    G_center:          1583.60782 +/- 0        (0.00%) (init= 1585)
    G_fwhm:            21.5382158 +/- 0        (0.00%)  == 'G_sigma * 2.0'
    Dplus_height:      4.7857e+05 +/- 0        (0.00%)  == 'Dplus_amplitude / Dplus_fwhm'
    Dplus_fraction:    0.40000000 +/- 0        (0.00%) (init= 0.4)
    Dplus_sigma:       25         +/- 0        (0.00%) (init= 25)
    Dplus_amplitude:   2.3929e+07 +/- 0        (0.00%) (init= 2.392861e+07)
    Dplus_center:      2450       +/- 0        (0.00%) (init= 2450)
    Dplus_fwhm:        50         +/- 0        (0.00%)  == 'Dplus_sigma * 2.0'
    twoD_height:       17408.5352 +/- 0        (0.00%)  == 'twoD_amplitude / twoD_fwhm'
    twoD_fraction:     0.59131728 +/- 0        (0.00%) (init= 0.4)
    twoD_sigma:        20.4875679 +/- 0        (0.00%) (init= 25)
    twoD_amplitude:    7.1332e+05 +/- 0        (0.00%) (init= 2.392861e+07)
    twoD_center:       2674.40499 +/- 0        (0.00%) (init= 2670)
    twoD_fwhm:         40.9751359 +/- 0        (0.00%)  == 'twoD_sigma * 2.0'
[[Correlations]] (unreported correlations are <  0.100)



In [16]:

    
exampleFitNumber = 0
#multipeak.printmd('## Example fit')
#RESULTS.plotFit(exampleFitNumber)
#print(GRAPHENEMODEL.fitResults[exampleFitNumber].fit_report())

2D peak symmetry



In [17]:

    
multipeak.printmd('### Normalized squared maximum residuals (NSR)')
RESULTS.plot2Dresiduals()

multipeak.printmd('### 2D R2 intervals')
RESULTS.print2Dr2Intervals(0.995)
RESULTS.print2Dr2Intervals(0.99)
RESULTS.print2Dr2Intervals(0.98)
RESULTS.print2Dr2(save=True)

multipeak.printmd('### 2D symmetry outliers')
RESULTS.plot2DOutliers(3, save=True)









    




Normalized squared maximum residuals (NSR)






    












    




2D R2 intervals






    



Symmetric 2D shapes (R2 > 0.995): 95.87  %
Symmetric 2D shapes (R2 > 0.99): 99.17  %
Symmetric 2D shapes (R2 > 0.98): 100.00  %






    




2D peak R2: 0.999 +/- 0.002






    












    




2D symmetry outliers

Reporting of peak ratios



In [18]:

    
multipeak.printmd('### D/G intervals')
RESULTS.printDtoGIntervals(0.2)
RESULTS.printDtoGIntervals(0.5)
RESULTS.printDtoGIntervals(0.8)
RESULTS.printDtoGIntervals(1.0)

multipeak.printmd('## Peak ratios')
RESULTS.printPeakRatios(save=True)

multipeak.printmd('## D/G outliers')
RESULTS.plotDtoGOutliers(3, save=True)









    




D/G intervals






    



D/G < 0.2): 83.47  %
D/G < 0.5): 98.35  %
D/G < 0.8): 98.35  %
D/G < 1.0): 99.17  %






    




Peak ratios






    



D/G ratio by area:  0.140 +/- 0.132
D/G ratio by height:  0.097 +/- 0.123






    












    



2D/G ratio by area:  3.196098293  +/-  1.00997137911
2D/G ratio by height:  1.51768893776  +/-  0.334821312659






    












    




D/G outliers

Reporting of other results



In [19]:

    
multipeak.printmd('### Peak widths')
RESULTS.printParamResults('fwhm')
RESULTS.plotPeakParamHist('G', 'fwhm', xunit='cm-1', bins=50, saveName='G-width-hist')
RESULTS.plotPeakParamHist('twoD', 'fwhm', xunit='cm-1', bins=50, saveName='2D-width-hist')

multipeak.printmd('### Peak positions')
RESULTS.printParamResults('center')
RESULTS.plotPeakParamHist('G', 'center', xunit='cm-1', bins=50, saveName='G-position-hist')
RESULTS.plotPeakParamHist('twoD', 'center', xunit='cm-1', bins=50, saveName='2D-position-hist')









    




Peak widths






    



D fwhm: 32.999 +/- 8.806
G fwhm: 21.506 +/- 8.352
twoD fwhm: 41.701 +/- 6.973






    












    












    




Peak positions






    



D center: 1343.014 +/- 2.570
G center: 1583.340 +/- 3.753
twoD center: 2673.818 +/- 3.952

Write result files



In [20]:

    
RESULTS.writeOutput('fitReport')
RESULTS.writePeakResultsTable('peakResults')









    



Results for each sprectrum saved as:  C:/Users/hoffmann/Tempo Box/jupyter/graphenestandards/raman/examples/CVD_graphene_example_fitReport.csv
Peak results saved as:  C:/Users/hoffmann/Tempo Box/jupyter/graphenestandards/raman/examples/CVD_graphene_example_peakResults.csv

Evaluation average values as custom string

Print result String in desired format for import to Excel, Origin, ...



In [21]:

    
RESULTS.printAvgRes([
        'DtoG_area',
        'twoDr2',
        'twoDtoG_area',
        'G_fwhm',
        'twoD_fwhm',
        'G_amplitude',
        'twoDtoG_height'
    ], delimiter=' ')









    




Format:  | D/G area ratio +/-  | Coefficient of determination of 2D single peak fitting +/-  | 2D/G area ratio +/-  | G_fwhm +/-  | twoD_fwhm +/-  | G_amplitude +/-  | 2D/G height ratio +/-







    











    



0.139917009745 0.132036891485 0.998607438914 0.00150956422691 3.196098293 1.00997137911 21.5055799413 8.3517300045 41.7006193683 6.97297214821 240500.102224 80259.2786442 1.51768893776 0.334821312659 






    











    



Possible names are:
twoD_sigma | D_fwhm | twoD_center | G_fwhm | G_center | D_center | D_height | twoD_height | D_amplitude | twoD_amplitude | G_amplitude | G_sigma | G_height | D_sigma | D_fraction | G_fraction | twoD_fraction | twoD_fwhm | DtoG_height | twoDmaxRes | twoDtoG_height | twoDr2 | twoDtoG_area | DtoG_area |

Finally, a hacky solution to save the evaluation result as a html page. Anyone who knows a better solution, go ahead!



In [22]:

    
%%javascript
IPython.notebook.save_notebook();



In [23]:

    
from nbconvert import HTMLExporter
import codecs
import nbformat
import time

time.sleep(1)
exporter = HTMLExporter()
output_notebook = nbformat.read('DtoG-2Dsymmetry.ipynb', as_version=4)
output, resources = exporter.from_notebook_node(output_notebook)
codecs.open(RESULTS.baseFilename + '.html', 'w', encoding='utf-8').write(output)

Good sciencing everyone!

	center	sigma	fwhm	amplitude	fraction	height
D	1343.014 +/- 2.570	16.500 +/- 4.403	32.999 +/- 8.806	33718.450 +/- 28134.878	0.414 +/- 0.269	968.670 +/- 625.238
G	1583.340 +/- 3.753	10.753 +/- 4.176	21.506 +/- 8.352	240500.102 +/- 80259.279	0.831 +/- 0.157	11574.772 +/- 1939.070
twoD	2673.818 +/- 3.952	20.850 +/- 3.486	41.701 +/- 6.973	710343.370 +/- 132302.904	0.568 +/- 0.070	17747.490 +/- 5001.802

D/G area ratio	0.140 +/- 0.132
D/G height ratio	0.097 +/- 0.123
2D/G area ratio	3.196 +/- 1.010
2D/G height ratio	1.518 +/- 0.335

Coefficient of determination of 2D single peak fitting	0.999 +/- 0.002
Normalized squared maximum 2D residual	0.001 +/- 0.001