In [1]:
from mdf_forge.forge import Forge
In [2]:
mdf = Forge()
In [3]:
mdf.match_field("mdf.source_name", "oqmd")
mdf.current_query()
Out[3]:
In [4]:
mdf.reset_query()
In [5]:
mdf.current_query()
Out[5]:
In [6]:
mdf.exclude_field("mdf.source_name", "sluschi").match_field("material.elements", "Al").exclude_field("mdf.source_name", "oqmd")
res, info = mdf.search(limit=10, info=True)
When you use the info=True
argument, search()
will return a tuple instead of a list. The first element in the tuple will be the same list of results you're used to, but the second tuple element will be a dictionary of query info.
In [7]:
res[0]
Out[7]:
In [8]:
info
Out[8]:
In [9]:
mdf.match_field("mdf.source_name", "nist_xps_db")
Out[9]:
In [10]:
res, info = mdf.search(limit=10, info=True, reset_query=False)
info["query"]
Out[10]:
In [11]:
res, info = mdf.search(limit=10, info=True)
info["query"]
Out[11]:
In [12]:
mdf.show_fields()
Out[12]:
If you give show_fields()
a top-level block, it will show you the mapping for that block, including the expected datatypes.
In [13]:
mdf.show_fields("mdf")
Out[13]:
To learn more about specific fields, use describe_field()
. This method can tell you what a field means, what unit of measurement it uses, or other useful information. When you call describe_field()
, you must pass in the resource_type
you're interested in (such as dataset
or record
). Since the full schema for a resource_type
is very long, you can also pass in a field
you're interested in, in the standard dot notation (if you don't, you will get the full schema for the resource_type
instead).
In [14]:
mdf.describe_field("dataset", field="mdf")
If you want your results in a dictionary instead of being printed out, you can set raw=True
.
In [15]:
mdf.describe_field("record", field="mdf.source_name", raw=True)
Out[15]:
To learn more about an organization registered with MDF, use describe_organization()
. This method can tell you more about an organization, including the provided description, homepage, and submission rules. When you call describe_organization()
, you just pass in the name or alias of an organization (capitalization doesn't matter).
In [16]:
mdf.describe_organization("argonne national laboratory")
In [17]:
mdf.describe_organization("CHiMaD")
You can also get a brief overview of an organization without the technical details by setting summary=True
. describe_organization()
also supports the raw
argument to get results back as a dictionary (raw
overrides summary
).
In [18]:
mdf.describe_organization("NIST", summary=True)
In [19]:
mdf.describe_organization("NIST MDR", raw=True)
Out[19]:
This method allows you to automatically collect all the datasets that have records returned from a search. In other words, if you search for mdf.elements:Al
and a record from OQMD is returned, you can pass that record to fetch_datasets_from_results()
and get the OQMD dataset entry back.
In [20]:
records = mdf.search("dft.converged:true AND mdf.resource_type:record")
In [21]:
res = mdf.fetch_datasets_from_results(records)
res[0]
Out[21]:
If you don't want to keep the results at all, you can also use fetch_datasets_from_results()
to execute a search and use those results instead of passing it your own results.
In [22]:
res = mdf.match_field("material.elements", "Al").fetch_datasets_from_results()
res[0]
Out[22]:
Queries submitted with search()
are limited to returning 10,000 results. If this limit is too low, you can use aggregate()
to retrieve all results from a query, no matter how many. Please be careful with this function, as you can easily accidentally retrieve a very large number of results without meaning to. Consider using search(your_query, limit=0, info=True)
first to discover how many results you will get beforehand (see Query info above for more information).
For this example, we will see how many results the query will retrieve before aggregating.
In [23]:
mdf.match_field("mdf.source_name", "oqmd*").match_field("material.elements", "Pb").exclude_field("material.elements", "Al")
res, info = mdf.search(limit=0, info=True, reset_query=False)
print("Number of results:", info["total_query_matches"])
Assuming we want all of these results, we can use aggregate()
on the same query.
In [24]:
res = mdf.aggregate()
print("Number of results:", len(res))
In [ ]: