I tried a few (pretty simple) variants:
- Multiplying the features by a weight matrix. One nice feature of this is that a diagonal matrix recovers standard feature weighting, so this model should be strictly better than per-feature weights.
- A one-hidden-layer feedforward network.
- Summing up the embeddings from the above two.
I totally agree that allowing multiplicative feature interactions as you suggest should work better though!
One more detail, are you doing any encoding before inputting them into the NN part? I remembered Darkcanuck had some rather succesful attempt in NN (end-to-end), by binning features like the old VCS ways.
And since most features are essentially tabular, apart from the NN approaches with explicit feature interaction, GBDT can work very well as some feature transformation & interaction tool. There are also approaches using GBDT to do clustering, by converting clustering into classifying “dense” & “sparse” of space.
You do not have permission to edit this page, for the following reasons:
- The action you have requested is limited to users in the group: Users.
- You must confirm your email address before editing pages. Please set and validate your email address through your user preferences.
You can view and copy the source of this page.--Kev (talk)
Return to Thread:Talk:BeepBoop/Awesome enty/reply (22).