I tried a few (pretty simple) variants:
- Multiplying the features by a weight matrix. One nice feature of this is that a diagonal matrix recovers standard feature weighting, so this model should be strictly better than per-feature weights.
- A one-hidden-layer feedforward network.
- Summing up the embeddings from the above two.
I totally agree that allowing multiplicative feature interactions as you suggest should work better though!
One more detail, are you doing any encoding before inputting them into the NN part? I remembered Darkcanuck had some rather succesful attempt in NN (end-to-end), by binning features like the old VCS ways.
And since most features are essentially tabular, apart from the NN approaches with explicit feature interaction, GBDT can work very well as some feature transformation & interaction tool. There are also approaches using GBDT to do clustering, by converting clustering into classifying “dense” & “sparse” of space.
I'm using no special encoding, just normalizing the features so they are between 0 and 1. Decision-tree-like algorithms have been tried in robocode before (e.g. Wiki_Targeting/Dynamic_Segmentation), but not in conjunction with clustering/KNN as far as I know.