Awesome enty

I tried a few (pretty simple) variants:
* Multiplying the features by a weight matrix. One nice feature of this is that a diagonal matrix recovers standard feature weighting, so this model should be strictly better than per-feature weights.
* A one-hidden-layer feedforward network.
* Summing up the embeddings from the above two. 
I totally agree that allowing multiplicative feature interactions as you suggest should work better though!