In [1]:

  1. #hide
  2. !pip install -Uqq fastbook kaggle waterfallcharts treeinterpreter dtreeviz
  3. import fastbook
  4. fastbook.setup_book()

In [2]:

  1. #hide
  2. from fastbook import *
  3. from kaggle import api
  4. from pandas.api.types import is_string_dtype, is_numeric_dtype, is_categorical_dtype
  5. from fastai.tabular.all import *
  6. from sklearn.ensemble import RandomForestRegressor
  7. from sklearn.tree import DecisionTreeRegressor
  8. from dtreeviz.trees import *
  9. from IPython.display import Image, display_svg, SVG
  10. pd.options.display.max_rows = 20
  11. pd.options.display.max_columns = 8

[[chapter_tabular]]

Tabular Modeling Deep Dive

Tabular modeling takes data in the form of a table (like a spreadsheet or CSV). The objective is to predict the value in one column based on the values in the other columns. In this chapter we will not only look at deep learning but also more general machine learning techniques like random forests, as they can give better results depending on your problem.

We will look at how we should preprocess and clean the data as well as how to interpret the result of our models after training, but first, we will see how we can feed columns that contain categories into a model that expects numbers by using embeddings.