Example Aggregations

Aggregating data with MDF

Searches using Forge.search() are limited to 10,000 results. However, there are two methods to circumvent this restriction: Forge.aggregate_source() and Forge.aggregate().


In [1]:
import json
from mdf_forge.forge import Forge

In [2]:
mdf = Forge()

aggregate_source - NIST XPS DB

Example: We want to collect all records from the NIST XPS Database and analyze the binding energies. This database has almost 30,000 records, so we have to use aggregate().


In [3]:
# First, let's aggregate all the nist_xps_db data.
all_entries = mdf.aggregate_sources("nist_xps_db")
print(len(all_entries))


29190

In [4]:
# Now, let's parse out the enery_uncertainty_ev and print the results for analysis.
uncertainties = {}
for record in all_entries:
    if record["mdf"]["resource_type"] == "record":
        unc = record.get("nist_xps_db_v1", {}).get("energy_uncertainty_ev", 0)
        if not uncertainties.get(unc):
            uncertainties[unc] = 1
        else:
            uncertainties[unc] += 1
print(json.dumps(uncertainties, sort_keys=True, indent=4, separators=(',', ': ')))


{
    "0": 29189
}

aggregate - Multiple Datasets

Example: We want to analyze how often elements are studied with Gallium (Ga), and what the most frequent elemental pairing is. There are more than 10,000 records containing Gallium data.


In [5]:
# First, let's aggregate everything that has "Ga" in the list of elements.
all_results = mdf.aggregate("material.elements:Ga")
print(len(all_results))


18232

In [6]:
# Now, let's parse out the other elements in each record and keep a running tally to print out.
elements = {}
for record in all_results:
    if record["mdf"]["resource_type"] == "record":
        elems = record["material"]["elements"]
        for elem in elems:
            if elem in elements.keys():
                elements[elem] += 1
            else:
                elements[elem] = 1
print(json.dumps(elements, sort_keys=True, indent=4, separators=(',', ': ')))


{
    "Ac": 267,
    "Ag": 323,
    "Al": 322,
    "Ar": 2,
    "As": 872,
    "Au": 372,
    "B": 301,
    "Ba": 342,
    "Be": 281,
    "Bi": 4172,
    "Br": 38,
    "C": 87,
    "Ca": 370,
    "Cd": 174,
    "Ce": 325,
    "Cl": 57,
    "Co": 381,
    "Cr": 315,
    "Cs": 160,
    "Cu": 403,
    "Dy": 317,
    "Er": 321,
    "Eu": 304,
    "F": 84,
    "Fe": 2989,
    "Ga": 18232,
    "Gd": 156,
    "Ge": 333,
    "H": 159,
    "Hf": 310,
    "Hg": 282,
    "Ho": 323,
    "I": 41,
    "In": 364,
    "Ir": 305,
    "K": 313,
    "La": 312,
    "Li": 469,
    "Lu": 291,
    "Mg": 683,
    "Mn": 4357,
    "Mo": 437,
    "N": 137,
    "Na": 339,
    "Nb": 296,
    "Nd": 179,
    "Ni": 363,
    "Np": 252,
    "O": 1390,
    "On": 6,
    "Os": 288,
    "Ox": 39,
    "P": 153,
    "Pa": 272,
    "Pb": 278,
    "Pd": 361,
    "Pm": 273,
    "Pr": 312,
    "Pt": 338,
    "Pu": 280,
    "Rb": 163,
    "Re": 134,
    "Rh": 320,
    "Ru": 304,
    "S": 161,
    "Sb": 327,
    "Sc": 331,
    "Se": 138,
    "Si": 412,
    "Sm": 330,
    "Sn": 303,
    "Sr": 221,
    "Ta": 160,
    "Tb": 174,
    "Tc": 139,
    "Te": 361,
    "Th": 287,
    "Ti": 211,
    "Tl": 295,
    "Tm": 312,
    "U": 223,
    "V": 1646,
    "Va": 2,
    "W": 259,
    "Xe": 1,
    "Y": 332,
    "Yb": 324,
    "Zn": 315,
    "Zr": 167
}

In [ ]: