Adrasteia

Attempt at using dask to read Gaia data.


In [2]:
import dask

In [1]:
from distributed import Client

In [3]:
client = Client()

In [7]:
client.close()

In [1]:
#client.scheduler_info()

In [4]:
import dask.dataframe as dd

In [3]:
%%time
df = dd.read_csv('../data/TgasSource_000-000-*.csv')


CPU times: user 163 ms, sys: 26.8 ms, total: 189 ms
Wall time: 229 ms

In [4]:
df.head()


Out[4]:
hip tycho2_id solution_id source_id random_index ref_epoch ra ra_error dec dec_error ... scan_direction_mean_k4 phot_g_n_obs phot_g_mean_flux phot_g_mean_flux_error phot_g_mean_mag phot_variable_flag l b ecl_lon ecl_lat
0 13989.0 NaN 1635378410781933568 7627862074752 243619 2015.0 45.034330 0.305989 0.235392 0.218802 ... 26.201841 77 1.031233e+07 10577.365273 7.991378 NOT_AVAILABLE 176.740413 -48.714422 42.641825 -16.121052
1 NaN 55-28-1 1635378410781933568 9277129363072 487238 2015.0 45.165007 2.583882 0.200068 1.197789 ... 22.890602 62 9.495646e+05 1140.173576 10.580959 NOT_AVAILABLE 176.916420 -48.645004 42.761180 -16.193033
2 NaN 55-1191-1 1635378410781933568 13297218905216 1948952 2015.0 45.086155 0.213836 0.248825 0.180326 ... 26.715704 60 8.178376e+05 1827.383676 10.743102 NOT_AVAILABLE 176.780400 -48.667845 42.697502 -16.123363
3 NaN 55-624-1 1635378410781933568 13469017597184 102321 2015.0 45.066542 0.276039 0.248211 0.200958 ... 25.878560 61 6.020535e+05 905.877286 11.075682 NOT_AVAILABLE 176.760412 -48.682365 42.677791 -16.118216
4 NaN 55-849-1 1635378410781933568 15736760328576 409284 2015.0 45.136038 0.170697 0.335044 0.170130 ... 26.755468 96 1.388122e+06 2826.428866 10.168701 NOT_AVAILABLE 176.739184 -48.572035 42.773370 -16.055481

5 rows × 59 columns


In [7]:
df.phot_g_mean_mag.mean().compute()


Out[7]:
10.814073743893747