Awesome enty

It feels like some cascade model (widely used in ads & recommendation), by putting one deep model on top of some simple & very fast model, with the input of the former being the output of the latter. This architecture simplifies computation by magnitude of degrees, but also restricted the power of the deep model. A best practice then is to make the simple model fitting the output of the deep one, and retrain deep model on top of new input, and repeat...