User talk:Rednaxela/FastTrig

From Robowiki
Jump to navigation Jump to search

Sound good! But, will it skipped turns at first tick of the first round? » Nat | Talk » 10:41, 3 March 2009 (UTC)

It shouldn't. It will always be as fast or faster than standard trig functions at least so long as you do the following:

  • 'inline' the index calculations like was said and used in the example testfor
  • Run init() BEFORE the battle begins (i.e. in static initialization code of your bot class)

An example of that static initialization code is like this:

public class SomeBot extends AdvancedRobot {
    static {
        FastTrig.init();
    }

    public void run() {
        // Not in here
    }
{

The initialization code doesn't take long (0.0017 seconds on my computer at 2880 divisions) but just to be sure it doesn't hurt to place the initialization outside of where the bot is timed --Rednaxela 16:27, 3 March 2009 (UTC)

My take

Nice work. From what I remember Simonton did something similar. I'm not sure if he had any luck though. One thing I noticed, I'm not sure if it works properly for negative angles, you will be rounding the wrong way. Also, the most useful trig function would be atan2. --Skilgannon 20:43, 3 March 2009 (UTC)

Ahh yes, sine doesn't work right at all yet for negative numbers. Cosine does work correctly except for rounding. I'll fix this now. :) --Rednaxela 21:51, 3 March 2009 (UTC)

Bugs

I was going to change the User:Rednaxela/FastTrig to fix the bug that Skilgannon said, and another one that happens with large DIVISION numbers and saw your edit, I haven't tested the fixed version, but I think it is still buggy. I think it should be:

  public static final double sin(double value) {
    return sineTable[(int)(((value/K + 0.5) % DIVISIONS + DIVISIONS)%DIVISIONS)];
  }
  public static final double cos(double value) {
    return cosineTable[(int)(((value/K + 0.5) % DIVISIONS + DIVISIONS)%DIVISIONS)];
  }

I'm testing it with 62832 DIVISIONS, and much bigger loops than yours and is much faster. Here is the test I used:

package ags.util;
public class FastTrigTest {
  public static void main(String[] args) {
    FastTrig.init();
    double v=0;
    long ms;
    int A = 10000;
    int B = 1000;
    double absolute = 0;
    for ( int i = 0; i < A; ++i ) for ( int j = 0; j < B; ++j ) {
      double angle = Math.random() * Math.PI * 200 - Math.PI * 100;
      absolute = Math.max(absolute, Math.abs(Math.sin(angle) - FastTrig.sin(angle)));
      absolute = Math.max(absolute, Math.abs(Math.cos(angle) - FastTrig.cos(angle)));
    }
    System.out.printf("Wrose error: %.12f\n", absolute);
    v=0;
    ms = -System.nanoTime();
    for ( int i = 0; i < A; ++i ) for ( int j = 0; j < B; ++j )
      v += FastTrig.sin(i * j * Math.PI - Math.PI / 3);
    ms += System.nanoTime();
    System.out.printf("FastTrig time: %.2f seconds\n", ms / 1E9);
    v = 0;
    ms = -System.nanoTime();
    for ( int i = 0; i < A; ++i ) for ( int j = 0; j < B; ++j )
      v += Math.sin(i * j * Math.PI - Math.PI / 3);
    ms += System.nanoTime();
    System.out.printf("Math time: %.2f seconds\n", ms/1E9);
  }
}

It also checks the error, so it was easy to the bugs. But anyway, including this will be a priority for me now :). Good work. --zyx 22:15, 3 March 2009 (UTC)

Thanks! Well, my +40*DIVISIONS one DOES work unless the angle is extremely negative, to a magnitude you wouldn't generally expect to see, but your double-modulus solution is more robust. Now making an updated version that includes that increased robustness, and atan(), and maybe atan2(). Will update the page when that's ready. --Rednaxela 00:05, 4 March 2009 (UTC)

Also discovered speed was further increased by using value*K instead of value/K and adjusting the constant to compensate, and also that the cosine table is redundant. Using the sine table for cosine with a shifted index doesn't affect speed but decreases init time and memory use. --Rednaxela 00:12, 4 March 2009 (UTC)

Hi, I have some problem. I do create new function:

public static final double tan(double value) {
	return FastTrig.sin(value) / FastTrig.cos(value);
}

... then I add following line to test ...

absolute = Math.max(absolute, Math.abs(Math.tan(angle) - FastTrig.tan(angle)));

... then, I run, result in: Worst error: Infinity! What wrong? » Nat | Talk » 03:47, 5 March 2009 (UTC)

Well Nat, at some angles cosine is equal to 0. Due to this you can't simply divide those. You should create a seperate table for tangent and use that and that would be faster anyway. --Rednaxela 04:11, 5 March 2009 (UTC)

Ahh, I forgot. I try your new way, too, but it still have a much of error, I don't understand why! » Nat | Talk » 04:38, 5 March 2009 (UTC)

What about the fact that tangent gives undefined at certain values? The error-checking code isn't designed the cope with that because (Undefined - Undefined) isn't equal to 0. --Rednaxela 04:49, 5 March 2009 (UTC)

Really? because I make all with high diff (over than 1) out, and all is number! Note that some calculator (like Google) return 1.6.....E16 for tan(PI / 2), and my machine return that, too! I don't know because trig function are java native method so it is platform dependent. » Nat | Talk » 05:06, 5 March 2009 (UTC)

Ahh. At PI / 2 it is undefined in actual math, HOWEVER it seems that beause computer can't quite perfetly represent PI/2 in floating point binary, it rounds to a number just SLIGHTLY smaller, which makes the result of the function simply become a very-large-number. Which come to think of it reminds me of why the error would be so high for tangent: Tangent by nature includes some very large values which mean error is naturally similarly larger. Really, "absolute error" shouldn't be what we measure, instead "angle error" is more relevant in reality with the kinds of geometry we'd be doing in robocode. --Rednaxela 05:16, 5 March 2009 (UTC)

OK, I understand. One more I just want to know, can we create an atan map? It seem that sin,cos and atan is most use trig function in robocode. » Nat | Talk » 06:17, 5 March 2009 (UTC)

You cannot post new threads to this discussion page because it has been protected from new threads, or you do not currently have permission to edit.

Contents

Thread titleRepliesLast modified
Faster normalXAngle --> faster sin,cos,tan1118:39, 15 August 2018
Why init instead of static721:08, 20 February 2012
Fast gaussian kernel function219:21, 19 February 2012
Fast Math.exp623:42, 18 February 2012

Faster normalXAngle --> faster sin,cos,tan

Inspired by recent discussion I did a little profiling of DrussGT, and was horrified to find that plain old robocode.util.Utils.normalRelativeAngle() was using a huge percentage of my processing time. So, I tried various methods to see how they were for speed, and came up with one which takes about 1/4 of the time:

   public static final double normalRelativeAngle(double d){
      d += Math.PI;
      double i = Math.floor(d*(1/(2*Math.PI)));
      d -= i*(2*Math.PI);
      return d-Math.PI;
   }
   public static final double normalAbsoluteAngle(double d){
      double i = Math.floor(d*(1/(2*Math.PI)));
      d -= i*(2*Math.PI);
      return d;
   }

I think the big improvement comes from no divisions, no % and no decisions, all of which are slow. Of course, it isn't ulp-accurate (although I'm not sure the previous one was either?), but over 1M calls the largest difference I saw between this and the old one was -3.552713678800501E-15 which is small enough for me to call 0.

Of course, this technique can then be applied to sin and cos, which then run about 2x the speed of previously, and on my system this is now ~4x the speed of Math.sin/cos:

   public static final double sin(double value) {
      value = value*K + 0.5;
      double i = Math.floor(value*(1.0/TRIG_DIVISIONS));
      value -= i*TRIG_DIVISIONS;
      return sineTable[(int)value];
   }

   public static final double cos(double value) {
      value = value*K + 0.5 + Math.PI*0.5*K;
      double i = Math.floor(value*(1.0/TRIG_DIVISIONS));
      value -= i*TRIG_DIVISIONS;
      return sineTable[(int)value];
   }

It can also be applied to the tan method, but I hardly use tan so I haven't bothered with it. Additionally, TRIG_DIVISIONS is no longer restricted to power-of-2 numbers because we lost the &.

This code is public domain, do with it as you like, and give credit if you think I deserve it! Enjoy, and write speedy bots!

Skilgannon11:15, 7 April 2013

Further experimentation with the profiler tells me that our lookup-table sin and cos are actually slower than an expanded polynomial version, I suspect this is due to occasional cache misses. However, the faster normalRelativeAngle code is still useful, here are my latest sin/cos implementations:

 public static final double sin(double d) {
      d += Math.PI;
      double x2 = Math.floor(d*(1/(2*Math.PI)));
      d -= x2*(2*Math.PI);
      d-=Math.PI;
   
      x2 = d * d;
   
      return //accurate to 6.82e-8, 3.3x faster than Math.sin, 
         //faster than lookup table in real-world conditions due to no cache misses
         //all values from "Fast Polynomial Approximations to Sine and Cosine", Garret, C. K., 2012
         (((((-2.05342856289746600727e-08*x2 + 2.70405218307799040084e-06)*x2
         - 1.98125763417806681909e-04)*x2 + 8.33255814755188010464e-03)*x2
         - 1.66665772196961623983e-01)*x2 + 9.99999707044156546685e-01)*d;
   }


   public static final double cos(double d) {
      d += Math.PI;
      double x2 = Math.floor(d*(1/(2*Math.PI)));
      d -= x2*(2*Math.PI);
      d-=Math.PI;
   
      d *= d;
   
      return //max error 5.6e-7, 4x faster than Math.cos, 
         //faster than lookup table in real-world conditions due to less cache misses
         //all values from "Fast Polynomial Approximations to Sine and Cosine", Garret, C. K., 2012
         ((((- 2.21941782786353727022e-07*d + 2.42532401381033027481e-05)*d
         - 1.38627507062573673756e-03)*d + 4.16610337354021107429e-02)*d
         - 4.99995582499065048420e-01)*d + 9.99999443739537210853e-01;
   }
Skilgannon22:48, 7 April 2013

Upon doing my own research (heavy googling) I've decided to try a modified version of the LUT code for now. The key difference is the overall footprint is much smaller than the original implementation, hopefully allowing it to remain in memory. I'm doing this by reducing the number of divisions pretty drastically, storing the values as single-precision, and then doing linear interpolation between the nearest two divisions to get the final output. The asymptotic functions will use polynomial implementations, because I fear the interpolation will totally wreck accuracy. And of course, testing will prove whether this is a good approach or not.

Enamel 32 (talk)02:57, 15 August 2018

Have a look at Polynomial Approximations to Sine and Cosine. This uses no memory and is even faster than the lookup table approach. Skilgannon uses this as far as I know.

Also have a look at Chebyshev Approximation (http://metamerist.com/cheby/example38.htm archive) which works pretty well for atan, asin and acos.

Xor (talk)06:17, 15 August 2018

That is what I'm referring to as the "polynomial implementation" above. My intuition is that the LUT version is only slower in practice because it gets evicted out of the cache due to being so large. I haven't done a side-by-side comparison test yet, but I've shrunk it by roughly a factor of 32 from the smallest version I've seen on this page, and I've added a couple other tricks to reduce the calculations required to find the lookup index. Once I truly need the speed boost I'll begin benchmarking the two against each other.

Enamel 32 (talk)18:39, 15 August 2018
 
 
 

Table lookups are slow I suspect due to array index out-of-bounds checking.

It also happens in kNN search. I was able to double the speed of my bot by replacing a few arrays with objects with hardcoded number of attributes. Only works with constant sized arrays, like data point coordinates.

MN00:18, 8 April 2013

No, I'm fairly sure it's caching, because profiling shows LUT twice as slow as polynomial, but a micro-benchmark shows LUT ~30% faster.

Skilgannon10:21, 8 April 2013
 

How did you do the micro-benchmark?

MN15:24, 8 April 2013
 

I generate a list of random numbers in an array and then time how long it takes for each of the functions to process the entire array in a loop. The result of each function I add to a running total, and I print the total at the end to make sure it doesn't get optimised away, and also to see how accurate they are. I could also calculate the overhead if I just add the results of the array, but I haven't tried that yet, and I think it might get optimised into SSE ops anyway.

Skilgannon17:21, 8 April 2013
 

Just a thought, but why not incorporate the faster normalRelativeAngle and normalAbsoluteAngle into the codebase of Robocode. That way everybody profits from it in the future when a newer version than 1.8.1.0 is standard.

GrubbmGait01:52, 8 April 2013

Personally, I wouldn't have any issues with it, but the fact that using a * instead of / probably loses a few ULPS and I'm not sure that is what we want for inside of the game engine. It is still faster than the standard method even using / , but by doing the - in a separate operation there is probably some ULPS loss there as well.

Skilgannon10:24, 8 April 2013
 

Fast-math classes/functions have lower precision.

MN03:51, 8 April 2013
 

Why init instead of static

Chase, I believe the reason to use the init method instead of a static block within this class is that the static block won't be called until this class is loaded by the JVM the first time it's used. By using an init method and calling it from a static block in your bot class, you ensure it's called when your bot is loaded, before the battle starts and not causing any skipped turns. Hopefully somebody can confirm/deny that.

Voidious15:25, 18 September 2011

Voidious, you right in common, but there're an issue - robots are loaded in theirs threads, so you can skip turns any way.

Jdev15:33, 18 September 2011
 

IIRC, while there is a limit on the time for a robot's class to load, it is far greater than the time allowed per turn.

Rednaxela16:05, 18 September 2011
 

If I recall, the static blocks are called when the class is loaded (in order they are in the source code).

I forget if it is loaded at the same time or on first reference. I think in robocode all class files are loaded at load time. Which means it doesn't matter where it is initialized in this case.

Chase-san01:34, 19 September 2011
 

It's been a long time, but when first writing FastTrig, I believe I tested this, and found that classes like this were initialized upon first reference.

Rednaxela01:36, 19 September 2011
 

I'll test it later.

Chase-san01:40, 19 September 2011
 

I debugged initialization order. Looks like everything is initialized before the battle starts thanks to some field cleaning algorithm inside the engine (net.sf.robocode.host.security.RobotClassLoader#cleanStaticReference(java.lang.reflect.Field)).

MN23:11, 19 February 2012
 

Yeah I added my little init hack awhile back so people could have it either way. So if you forgot to init it wouldn't break on you, and you could init if you really wanted to/it made you feel better.

Chase-san21:08, 20 February 2012
 

Fast gaussian kernel function

Patched a gaussian kernel to work together with that fast exp.

private static final double SQRT_2_PI = Math.sqrt(2 * Math.PI);
private static final double EXP_LIMIT = 700;
private static final double GAUSSIAN_LIMIT = Math.sqrt(EXP_LIMIT * 2);

public static double gaussian(final double u) {
    if (u > GAUSSIAN_LIMIT || u < -GAUSSIAN_LIMIT) {
        return 0;
    }
    return exp(-(u * u) / 2) / SQRT_2_PI;
}

public static double exp(double val) {
    final long tmp = (long) (1512775 * val + 1072632447);
    return Double.longBitsToDouble(tmp << 32);
}
MN23:54, 18 February 2012

A suggestion: get rid of divisions, they are almost as slow as sqrt and slow the code down a lot. Also, sometimes it is important that the gaussian doesn't return 0, so change that to something a tiny bit bigger =)


private static final double SQRT_2_PI = Math.sqrt(2 * Math.PI);
private static final double EXP_LIMIT = 700;
private static final double GAUSSIAN_LIMIT = Math.sqrt(EXP_LIMIT * 2);

public static double gaussian(final double u) {
    if (u > GAUSSIAN_LIMIT || u < -GAUSSIAN_LIMIT) {
        return 2e-200;
    }
    return exp((u * u) * -0.5) * (1 / SQRT_2_PI);
}

public static double exp(double val) {
    final long tmp = (long) (1512775 * val + 1072632447);
    return Double.longBitsToDouble(tmp << 32);
}
Skilgannon11:12, 19 February 2012

That "get rid of divisions" tip doubled the speed of my bot. O.o'

And for a mathematical purism, returning the value at the boundaries as minimum to keep the function smooth:

private static final double SQRT_2_PI_INVERSE = 1 / Math.sqrt(2 * Math.PI);
private static final double EXP_LIMIT = 700;
private static final double GAUSSIAN_LIMIT = Math.sqrt(EXP_LIMIT * 2);
private static final double MIN_GAUSSIAN_VALUE = gaussian(GAUSSIAN_LIMIT);

public static double gaussian(final double u) {
    if (u > GAUSSIAN_LIMIT || u < -GAUSSIAN_LIMIT) {
        return MIN_GAUSSIAN_VALUE;
    }
    return exp(u * u * -0.5) * SQRT_2_PI_INVERSE;
}
 
public static double exp(final double val) {
    final long tmp = (long) (1512775 * val + 1072632447);
    return Double.longBitsToDouble(tmp << 32);
}
MN19:21, 19 February 2012
 
 

Fast Math.exp

I started on a FastMath class and then caved and started playing with this one. Just thought I'd offer up a fast Math.exp I found from this blog (via StackOverflow):

public static double exp(double val) {
    final long tmp = (long) (1512775 * val + 1072632447);
    return Double.longBitsToDouble(tmp << 32);
}

Probably doesn't matter to most of you, but if you're doing DC with gaussians it might. I actually use a different kernel density in my main gun because 20,000 Math.exp's each time I aim would be pretty slow. (Not sure the above would solve that, but anyway...)

Voidious21:40, 17 September 2011

Didn't I post that just above?

Darkcanuck21:55, 17 September 2011
 

FYI - testing for values between -20 and 0, which would be the range if you're doing gaussians with bandwidth = 1/2 bot width and up to 40 bot widths.

Math.exp() time: 0.30864 seconds
FastMath.exp() time: 0.06371 seconds
FastMath.exp() worst error: 0.02899
FastMath.exp() avg error: 0.00073

Or for -3 to 0:

== exp Test ==
Math.exp() time: 0.38425 seconds
FastMath.exp() time: 0.07076 seconds
FastMath.exp() worst error: 0.02899
FastMath.exp() avg error: 0.00464
Voidious22:00, 17 September 2011
 

Oh, haha - sorry, I didn't see it in the file...

Voidious22:00, 17 September 2011
 

Yeah, I didn't update the main page, didn't want to overwrite others' functions. Try the atan/atan2 I posted, they're much more accurate I think.

Darkcanuck23:56, 17 September 2011
 

Yep - already tested them, using them, and running a long benchmark with a FastMath Diamond. =) I actually haven't noticed any huge speed increase in Diamond, so I'm hesitant to keep all this FastMath stuff. But it is faster and the error rate looks pretty low, so if I don't take any score hit I'll definitely keep it.

Voidious00:03, 18 September 2011
 

Just tried that fast exp with a dc/gaussian kernel/swarm targeting gun. 5,5% APS decrease in meleerumble, 12,4% APS decrease in teamrumble. :( Too imprecise for swarm targeting, and you notice the gun shaking a lot more. But in 1v1 there was no APS hit (0,1% APS increase).

Found out its because that function doesn´t work for values outside -700 and 700.

MN02:27, 16 February 2012