kernel density is important
My solution to your problem was 2-fold:
1: Use a faster smoothing function. I've ended up at 1/(1+sqr(x))
2: A bit of dynamic programming: pre-calculate a single 'function profile' (and put it into a set of bins), centred at GF0, which runs from GF-2 to GF+2. Then whatever your GF is, you just need to scale your GF to figure out where on the function to draw your value from. So rather than doing an entire smoothing function for each hit, log all your hits (without smoothing) into a set of bins, then do the smoothing afterwards into a different set of bins by checking each bin if it is non-zero and overlaying a 'function profile' with that weight. If you're really sneaky you can even keep what the bin index of the hit is, instead of the actual GF ;-)