```
In [ ]:
```from fastai.gen_doc.nbdoc import *
from fastai.tabular.models import *

`tabular`

contains all the necessary classes to deal with tabular data, across two modules:

`tabular.transform`

: defines the`TabularTransform`

class to help with preprocessing;`tabular.data`

: defines the`TabularDataset`

that handles that data, as well as the methods to quickly get a`TabularDataBunch`

.

To create a model, you'll need to use `models.tabular`

. See below for an end-to-end example using all these modules.

First, let's import everything we need for the tabular application.

```
In [ ]:
```from fastai.tabular import *

```
In [ ]:
```path = untar_data(URLs.ADULT_SAMPLE)
path

```
Out[ ]:
```

```
In [ ]:
```df = pd.read_csv(path/'adult.csv')
df.head()

```
Out[ ]:
```

Here all the information that will form our input is in the 14 first columns, and the dependent variable is the last column. We will split our input between two types of variables: categorical and continuous.

- Categorical variables will be replaced by a category - a unique id that identifies them - before they are passed through an embedding layer.
- Continuous variables will be normalized and then directly fed to the model.

Another thing we need to handle are the missing values: our model isn't going to like receiving NaNs so we should remove them in a smart way. All of this preprocessing is done by `TabularTransform`

objects and `TabularDataset`

.

We can define a bunch of Transforms that will be applied to our variables. Here we transform all categorical variables into categories. We also replace missing values for continuous variables by the median column value and normalize those.

```
In [ ]:
```procs = [FillMissing, Categorify, Normalize]

To split our data into training and validation sets, we use valid indexes

```
In [ ]:
```valid_idx = range(len(df)-2000, len(df))

`cont_names`

parameter when constructing our `DataBunch`

.

```
In [ ]:
```dep_var = 'salary'
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'native-country']

`TabularDataBunch.from_df`

to create the `DataBunch`

that we'll use for training.

```
In [ ]:
```data = TabularDataBunch.from_df(path, df, dep_var, valid_idx=valid_idx, procs=procs, cat_names=cat_names)
print(data.train_ds.cont_names) # `cont_names` defaults to: set(df)-set(cat_names)-{dep_var}

```
```

`to_np`

here converts from pytorch tensor to numpy):

```
In [ ]:
```(cat_x,cont_x),y = next(iter(data.train_dl))
for o in (cat_x, cont_x, y): print(to_np(o[:5]))

```
```

`TabularDataset`

, the categorical variables are replaced by ids and the continuous variables are normalized. The codes corresponding to categorical variables are all put together, as are all the continuous variables.

`DataBunch`

, we just need to create a model to then define a `Learner`

and start training. The fastai library has a flexible and powerful `TabularModel`

in `models.tabular`

. To use that function, we just need to specify the embedding sizes for each of our categorical variables.

```
In [ ]:
```learn = tabular_learner(data, layers=[200,100], emb_szs={'native-country': 10}, metrics=accuracy)
learn.fit_one_cycle(1, 1e-2)

```
```

`Learner.predict`

method to get predictions. In this case, we need to pass the row of a dataframe that has the same names of categorical and continuous variables as our training or validation dataframe.

```
In [ ]:
```learn.predict(df.iloc[0])

```
Out[ ]:
```