PyMKS Tutorial

Daniel Wheeler

https://materialsinnovation.github.io/pymks

PyMKS Overview

  • PyMKS is a materials informatics toolkit (Materials Knowledge System) written in Python

  • PyMKS provides a machine learning technique unavailable in other machine learning libraries

  • The core of PyMKS is a set of convolution kernels, the rest is an API that works with Scikit-Learn

  • PyMKS provides documentation and testing for MKS use cases

</p>

Tutorial

  • live coding - follow along - lots of typing

  • should have software installed - running a Jupyter notebook

    • install on your laptop

    • use Matin

    • use Binder

  • please stop me if I go too fast

  • feel free to ask questions

Syllabus

  1. Python Intro?
  2. Composing microstructure into a digital signal
  3. Classify a microstructure using 2-point stats
  4. Learn a Cahn-Hilliard simulation using regression

Tools: PyMKS, Scikit-Learn, Dask, Numpy

1. Python Intro

  • How to use the Jupyter Notebook
  • How many know Python?
    • Variables
    • Containers
    • Loops
    • Conditionals
    • Arrays, vector calculations
    • Plotting using Matplotlib

2. Composing microstructure into a digital signal

  • Microstructure function
  • Why do we need to generate features for each data set?
  • Generate features for each sample of data
  • Using PyMKS microstructure functions
    • continuous data
    • discrete data
    • periodic data
    • multiple variable data

3. Classify microstructures using 2-point stats

  • Generate artificial microstructures
  • Discretize the data using a discrete microstructure function
  • Create features for each sample using 2 points stats
  • Dimensionality reduction on the 2 points stats using PCA
  • Classification
    • Train test splits
    • Data pipelines
    • Cross-validation
  • Speed things up using Dask

4. Learn a Cahn-Hilliard simulation using regression

  • Generate sample Cahn-Hilliard data
  • Discretize the data using a continuous microstructure function
  • Regression analysis to avoid repeating the simulations
    • generating the MKS "kernel" or "coefficients" using MKS "localization"
    • scaling up the "coefficients"
  • Compare 2 microstructure functions
    • Train test splits
    • Data pipelines
    • Cross-validation
  • Speed things up with Dask