Python is a dynamically typed functional programming language.
Pandas is a high performance library for manipulating multidimensional data.
Other notable libraries:
There are a number of other important packages but these are core building blocks of a Python data scientist and a good place to start.
Lets explore the power of these tools by examing a problem that many of us have seen before; processing the result of a molecular dynamics simulation (MD).
MD simulations are an integral part of the tools at the computational chemist's disposal. These simulations provide a description of atomic and molecular behavior.
They can be used to predict novel properties, aid in chemical design, and support experimental conclusions. Performing these simulations can be tricky from a theoretical standpoint, but assuming we can get past that, lets consider analyzing (with a focus on reproducibility) the result of a simulation.
Given an XYZ trajectory file containing 2000 frames (geometry snapshots) with 195 atoms each where each frame is in a 12.55 Å x 12.55 Å x 12.55 Å cell (thanks Adam!),
2) compute all of the interatomic distances (accounting for periodic boundary conditions), and
In [ ]: