Table is Hail's distributed analogue of a data frame or SQL table. It will be familiar if you've used R or
Table differs in 3 important ways:
Table has two different kinds of fields:
Hail can import data from many sources: TSV and CSV files, JSON files, FAM files, databases, Spark, etc. It can also read (and write) a native Hail format.
You can read a dataset with hl.read_table. It take a path and returns a
ht stands for Hail Table.
We've provided a method to download and import the MovieLens dataset of movie ratings in the Hail native format. Let's read it!
F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages. DOI=https://dx.doi.org/10.1145/2827872.
In [ ]:import hail as hl hl.init()
In [ ]:hl.utils.get_movie_lens('data/')
In [ ]:users = hl.read_table('data/users.ht')
In [ ]:users.describe()
In [ ]:users.show()
In [ ]:users.count()
You can access fields of tables with the Python attribute notation
table.field, or with index notation
table['field']. The latter is useful when the field names are not valid Python identifiers (if a field name includes a space, for example).
In [ ]:users.occupation.describe()
In [ ]:users['occupation'].describe()
In [ ]:users.occupation.show()
In [ ]: