User talk:Rednaxela/FastTrig
Sound good! But, will it skipped turns at first tick of the first round? » Nat | Talk » 10:41, 3 March 2009 (UTC)
It shouldn't. It will always be as fast or faster than standard trig functions at least so long as you do the following:
- 'inline' the index calculations like was said and used in the example testfor
- Run init() BEFORE the battle begins (i.e. in static initialization code of your bot class)
An example of that static initialization code is like this:
public class SomeBot extends AdvancedRobot {
static {
FastTrig.init();
}
public void run() {
// Not in here
}
{
The initialization code doesn't take long (0.0017 seconds on my computer at 2880 divisions) but just to be sure it doesn't hurt to place the initialization outside of where the bot is timed --Rednaxela 16:27, 3 March 2009 (UTC)
Contents
My take
Nice work. From what I remember Simonton did something similar. I'm not sure if he had any luck though. One thing I noticed, I'm not sure if it works properly for negative angles, you will be rounding the wrong way. Also, the most useful trig function would be atan2. --Skilgannon 20:43, 3 March 2009 (UTC)
Ahh yes, sine doesn't work right at all yet for negative numbers. Cosine does work correctly except for rounding. I'll fix this now. :) --Rednaxela 21:51, 3 March 2009 (UTC)
Bugs
I was going to change the User:Rednaxela/FastTrig to fix the bug that Skilgannon said, and another one that happens with large DIVISION numbers and saw your edit, I haven't tested the fixed version, but I think it is still buggy. I think it should be:
public static final double sin(double value) {
return sineTable[(int)(((value/K + 0.5) % DIVISIONS + DIVISIONS)%DIVISIONS)];
}
public static final double cos(double value) {
return cosineTable[(int)(((value/K + 0.5) % DIVISIONS + DIVISIONS)%DIVISIONS)];
}
I'm testing it with 62832 DIVISIONS, and much bigger loops than yours and is much faster. Here is the test I used:
package ags.util;
public class FastTrigTest {
public static void main(String[] args) {
FastTrig.init();
double v=0;
long ms;
int A = 10000;
int B = 1000;
double absolute = 0;
for ( int i = 0; i < A; ++i ) for ( int j = 0; j < B; ++j ) {
double angle = Math.random() * Math.PI * 200 - Math.PI * 100;
absolute = Math.max(absolute, Math.abs(Math.sin(angle) - FastTrig.sin(angle)));
absolute = Math.max(absolute, Math.abs(Math.cos(angle) - FastTrig.cos(angle)));
}
System.out.printf("Wrose error: %.12f\n", absolute);
v=0;
ms = -System.nanoTime();
for ( int i = 0; i < A; ++i ) for ( int j = 0; j < B; ++j )
v += FastTrig.sin(i * j * Math.PI - Math.PI / 3);
ms += System.nanoTime();
System.out.printf("FastTrig time: %.2f seconds\n", ms / 1E9);
v = 0;
ms = -System.nanoTime();
for ( int i = 0; i < A; ++i ) for ( int j = 0; j < B; ++j )
v += Math.sin(i * j * Math.PI - Math.PI / 3);
ms += System.nanoTime();
System.out.printf("Math time: %.2f seconds\n", ms/1E9);
}
}
It also checks the error, so it was easy to the bugs. But anyway, including this will be a priority for me now :). Good work. --zyx 22:15, 3 March 2009 (UTC)
Thanks! Well, my +40*DIVISIONS one DOES work unless the angle is extremely negative, to a magnitude you wouldn't generally expect to see, but your double-modulus solution is more robust. Now making an updated version that includes that increased robustness, and atan(), and maybe atan2(). Will update the page when that's ready. --Rednaxela 00:05, 4 March 2009 (UTC)
Also discovered speed was further increased by using value*K instead of value/K and adjusting the constant to compensate, and also that the cosine table is redundant. Using the sine table for cosine with a shifted index doesn't affect speed but decreases init time and memory use. --Rednaxela 00:12, 4 March 2009 (UTC)
Hi, I have some problem. I do create new function:
public static final double tan(double value) {
return FastTrig.sin(value) / FastTrig.cos(value);
}
... then I add following line to test ...
absolute = Math.max(absolute, Math.abs(Math.tan(angle) - FastTrig.tan(angle)));
... then, I run, result in: Worst error: Infinity! What wrong? » Nat | Talk » 03:47, 5 March 2009 (UTC)
Well Nat, at some angles cosine is equal to 0. Due to this you can't simply divide those. You should create a seperate table for tangent and use that and that would be faster anyway. --Rednaxela 04:11, 5 March 2009 (UTC)
Ahh, I forgot. I try your new way, too, but it still have a much of error, I don't understand why! » Nat | Talk » 04:38, 5 March 2009 (UTC)
What about the fact that tangent gives undefined at certain values? The error-checking code isn't designed the cope with that because (Undefined - Undefined) isn't equal to 0. --Rednaxela 04:49, 5 March 2009 (UTC)
Really? because I make all with high diff (over than 1) out, and all is number! Note that some calculator (like Google) return 1.6.....E16 for tan(PI / 2), and my machine return that, too! I don't know because trig function are java native method so it is platform dependent. » Nat | Talk » 05:06, 5 March 2009 (UTC)
Ahh. At PI / 2 it is undefined in actual math, HOWEVER it seems that beause computer can't quite perfetly represent PI/2 in floating point binary, it rounds to a number just SLIGHTLY smaller, which makes the result of the function simply become a very-large-number. Which come to think of it reminds me of why the error would be so high for tangent: Tangent by nature includes some very large values which mean error is naturally similarly larger. Really, "absolute error" shouldn't be what we measure, instead "angle error" is more relevant in reality with the kinds of geometry we'd be doing in robocode. --Rednaxela 05:16, 5 March 2009 (UTC)
OK, I understand. One more I just want to know, can we create an atan map? It seem that sin,cos and atan is most use trig function in robocode. » Nat | Talk » 06:17, 5 March 2009 (UTC)
I've pondered atan, but it's tricky because it doesn't repeat, and I'm not sure how much aproximation is permissiable near the edges --Rednaxela 06:53, 5 March 2009 (UTC)
Arc Cosine Implementation
I've made a modified version that also includes acos, because that's used in so many applications when the cos rule is used, eg non-iterative wall smoothing, non-iterative wall angle, and my linear play-it-forward function. Because the function is so steep at the edges I had to increase the number of bins substantially, but because it is non-repeating it executes extremely quickly. Just doing a linear sweep, I'm getting execution 80X faster than Math.acos(). My max error is around 0.001, and the average is somethingE-5.
public static final int acos_DIVISIONS = 131072;//uses 1 MB of memory
public static final double acosK = DIVISIONS/2;
public static final double[] acosTable = new double[acos_DIVISIONS + 1];
static {
for(int i = 0; i < acos_DIVISIONS; i++){
acosTable[i] = Math.acos(i/acosK - 1);
}
}
public static final double acos(double value){
return acosTable[(int)((value + 1)*acosK + 0.5)] ;
}
--Skilgannon 21:22, 8 April 2009 (UTC) An improvement to acos:
public static final double acos(double value) {
return acosTable[(int)(value*acosK + (acosK + 0.5))] ;
}
This requires one less addition (and one less loading of a double literal) --Starrynte 01:10, 10 April 2009 (UTC)
Nice! I didn't see that! --Skilgannon 01:21, 10 April 2009 (UTC)
Ah! I just realized: shouldn't it be public static final double acosK = acos_DIVISIONS/2;
, and not public static final double acosK = DIVISIONS/2;
? Or is this some Performance Enhancing Bug :) --Starrynte 18:09, 10 April 2009 (UTC)
Arc Tangent Implementation
Here are some (hopefully) speed optimizations, mainly very minor ones, based on the code posted on main page. (However, tiny optimizations add up!)
public static final void init() {
//for(int i=DIVISIONS-1;i>=0;i++) { (comparing i to 0 is faster but this is a very minor thing
for (int i=0; i<DIVISIONS; i++) {
sineTable[i] = Math.sin(i/K); //no need to assign "value" to a variable first
}
}
You may want to consider making DIVISIONS a power of two (maybe 8192), since then you can replace "% DIVISIONS" (modulo DIVISIONS) with "& (DIVISIONS - 1)" (bitwise AND with (DIVISIONS-1), which is the same thing as modulo DIVISIONS). Not sure if AND is necessarily faster, you may need to check.
Also in main(), "Wrost error" should be "Worst error"
Here is a atan2 method that I made based on what I read on the internet. The catch is the Math.sqrt, perhaps it would be possible to make a sqrt table. (Uses acos method posted by Skilgannon above.)
public static final double atan2(double x, double y){
//return asin(y / Math.sqrt(x*x + y*y));
//robocode angles are different, if i understand correctly you just need to swap usage of sines and cosines for robocode-style angles
return acos(y / Math.sqrt(x*x + y*y));
}
Information from http://www.analyzemath.com/Calculators/Rect_Polar.html Please correct me if I'm wrong in how to use trig in robocode. --Starrynte 00:47, 9 April 2009 (UTC)
I don't really know, but it seem that we need to swap y with x, too. And that atan2 implementations, I think using native hava method is faster since it need to call sqrt, which is slow. » Nat | Talk » 01:02, 9 April 2009 (UTC)
I've done some tests, and the custom atan2 seems to be slightly faster, with a sacrifice in init time and memory (due to sqrt and acos tables):
FastTrig init time: 0.07575 seconds FastTrig.atan2(5000,5000) = 0.7852348917896085 Took 0.00003 seconds Math.atan2(5000,5000) = 0.7853981633974483 Took 0.00005 seconds
(The maximum values that can be used with this atan2 is (5000,5000) , since that's as far as I set the sqrt tables to go up to. Likewise, the maximum value for a sqrt is 50000000)
I will post code as soon as I confirm it is working
--Starrynte 16:59, 9 April 2009 (UTC)
public static final double atan2(double x, double y){
//return Math.asin(y / Math.sqrt(x*x + y*y));
//robocode angles are different, swap usage of sines and cosines
return acos(y / Math.sqrt(x*x + y*y)) * (y>=0?1:-1);
}
Basically same thing as above, except the sign thing (Math.signum(y) is slower than just (y>=0?1:-1)). It turns out that the FastTrig acos outweighs the time needed for Math.sqrt (where tables were slower or way too large for the accuracy needed). Compared it with using atan2 in robocode and it works. About 2 to 4 times faster than Math.atan2().
I estimate (haven't tested) the worst error is about 0.05 degrees, which, in the worst case (when you're shooting from a corner to the corner diagonally across it on a 5000 x 5000 field, a distance of about 7071 units) gives an error of 6 units. Compare that with a bot width of 36 units.
--Starrynte 17:45, 9 April 2009 (UTC)
Nice stuff Starrynte and Skilgannon! Hmm, the acos and atan2 error is slightly more I'd like considering how many divisions they use but still, not bad! I'll incorporate these things into the main version on the page some time soon. About that power-of-two optimization, I've considered that myself actually though one needs to keep in mind that "& (DIVISIONS - 1)" wouldn't behave the same as modulus for negative numbers --Rednaxela 18:44, 9 April 2009 (UTC)
- cos is an even function, so negative numbers should be fine for cos at least, because "& (DIVISIONS - 1)" is the equivalent of also taking the mod. --Skilgannon 01:21, 10 April 2009 (UTC)
- I blame my high school comp teacher for explaining that the negatives of 2s complement are just like the positives except with a negative bit at the front as 1 instead of 0. I took that to mean that the binary representation was exactly the same for x and -x except with that first bit 1 instead of 0 for -x. I guess not :-/ --Skilgannon 08:07, 10 April 2009 (UTC)
- OK. But couldn't you use it for the second modulus, when you know everything is going to be positive anyways? You just have to cast to int first. --Skilgannon 10:36, 10 April 2009 (UTC)
Here is another slightly more inaccurate version, but may be faster depending on the computer you're using (it's about 5% slower on my computer). It replaces the divide by square root with an inlined multiply by inverse square root (described here http://en.wikipedia.org/wiki/Fast_inverse_square_root)
public static final double atan2(double x, double y){
return acos(y * (1.5 - 0.5 * (x=x*x + y*y)* (x = Double.longBitsToDouble(0x5fe6ec85e7de30daL - (Double.doubleToLongBits(x)>>1)))*x)*x)*(y>=0?1:-1);
}
--Starrynte 20:46, 9 April 2009 (UTC)
From atan2 arises atan (up to twice as fast), plus a (slight) optimization in both:
public static final double atan(double value){
//return (value>=0 ? acos(1 / value) : -acos(1 / value)); OK if you approximate "Math.sqrt(value*value + 1)" as "value"
return (value>=0 ? acos(1 / Math.sqrt(value*value + 1)) : -acos(1 / Math.sqrt(value*value + 1)));
}
public static final double atan2(double x, double y){
return (y>=0 ? acos(y / Math.sqrt(x*x + y*y)) : -acos(y / Math.sqrt(x*x + y*y)));
}
asin can also be calculated from atan, though not as fast as making a table for it:
Math.asin(x) = atan(x / Math.sqrt(1 - x*x));
Now I wonder if you could make a log table... --User:Starrynte 02:22, 10 April 2009 (UTC)
Just thinking, wouldn't it be:
public static final double atan2(double x, double y){
return (x>=0 ? acos(y / Math.sqrt(x*x + y*y)) : 2*Math.PI -acos(y / Math.sqrt(x*x + y*y)));
}
because it is a 'negative' bearing if x is negative not y, and atan2 is meant to return between 0 and 2PI. --Skilgannon 10:10, 10 April 2009 (UTC)
That may be correct. By the way, I've found (yet) another better implementation of atan2, I'll look at it when I have time (in a few hours) --Starrynte 20:09, 10 April 2009 (UTC)
OK. New atan2 implementation, and from it atan, faster (20-40% faster than old version, and about 120x faster than Math version):
public static final double atan2(double y, double x) {
double abs_y = abs(y);
double angle = ((x>=0d)?QUARTER_PI:QUARTER_PI*3d) + 0.1963*(x = (x + ((x>=0d)?-abs_y:abs_y)) / (abs(x)+y)) *x*x - 0.9817*x;
return y < 0d ? -angle : angle;
}
public static final double atan(double value){
double angle = abs(value);
angle = QUARTER_PI + 0.1963 * (angle = (1 - angle) / (1 + angle)) *angle*angle - 0.9817*angle;
return value < 0d ? -angle:angle;
}
public static final double abs(double a){
return (a <= 0.0D) ? 0.0D - a : a;
}
(Calling abs somehow seems to be faster than inlining it, just like calling sin from cos seems to be faster than inlining it...)
Source: http://dspguru.com/comp.dsp/tricks/alg/fxdatan2.htm, then optimized slightly (note that this probably can be optimized further, but I'm too lazy to).
Now testing all the improved methods for accuracy just to make sure... --Starrynte 23:34, 10 April 2009 (UTC)
Yet another new atan2/atan, but 50-100% more accuracy than the one above, and slightly faster speed.
public static final double newatan2(double y, double x){
if ( x == 0d ) {
return y == 0d ? 0d:(y > 0d ? HALF_PI:-HALF_PI);
}
double atan; double z;
if ( abs( z = y/x ) < 1d ) {
atan = z/(1d + 0.28d*z*z);
if ( x < 0d ) {
return y < 0d ? atan - Math.PI:atan + Math.PI;
}
} else {
if ( y < 0d ) return -HALF_PI - z/(z*z + 0.28d);
atan = HALF_PI - z/(z*z + 0.28d);
}
return atan;
}
public static final double atan(double value){
double atan;
if ( abs(value) < 1d ) {
atan = value/(1d + 0.28d*value*value);
} else {
if ( value < 0d ) return -HALF_PI - value/(value*value + 0.28d);
atan = HALF_PI - value/(value*value + 0.28d);
}
return atan;
}
--Starrynte 03:27, 14 April 2009 (UTC)
FastTrig Re-create
See main page for source.
I'm quite sure both acos and tan need a higher number of divisions, otherwise they get a lot of error due to the steepness of their functions. Your asin function looks right to me, although I cleaned it a bit by calling acos. I also changed the tan because it has a period of PI, not 2PI, so now it's more accurate, although yours did work. I think a lookup table with a single Newton's iteration might be faster and more accurate for sqrt, but that's just speculation. --Skilgannon 10:01, 10 April 2009 (UTC)
- Thanks. The reason for not calling acos is to redice overhead, although it is a little. I don't really know about sqrt, just copy over from above. Anyway, my code is not yet test in anyway. » Nat | Talk » 11:36, 10 April 2009 (UTC)
- Math.sqrt(value) is probably faster, since the divide is expensive. If you can find a quick way of doing Double.longBitsToDouble and Double.doubleToLongBits, then it may be worth considering doing it that way (which, by the way, I edited slightly)
By the way, expect a slightly faster sine and cosine soon *without* using any tables at all! (Once I find the source of robocode's normalRelativeAngle) --Starrynte 17:34, 10 April 2009 (UTC)
- OK, if you're fine with the worst error being 0.001090292603 instead of 0.000436313959, this version of sine is 20-30% faster AND uses no tables at all (cosine coming up soon)
public static final double sin(double value) {
//normalRelativeAngle =)
value=(value %= (2*Math.PI)) < 0.0D ? value < (-Math.PI) ? value + (2*Math.PI) : value : value >= Math.PI ? value - (2*Math.PI) : value;
//approximate sine through a parabola (more details at http://lab.polygonal.de/2007/07/18/fast-and-accurate-sinecosine-approximation/)
return 0.225 * ( ( value*= (4/Math.PI) * (1 - abs(value)/Math.PI) ) * abs(value) - value) + value;
}
--Starrynte 18:00, 10 April 2009 (UTC)
- It turns out that
public static final double cos(double value) {
return sin(value + (Math.PI/2));
}
is the same or (on my computer) even slightly faster than inlining the sine method, which wouldn't be too hard anyways, so here is the cos function. --Starrynte 18:11, 10 April 2009 (UTC)
- Yeah, that is a cos I think. The reason that the old one do not is there's no nomalization in sin function. (adding Math.PI/2 will always result over PI/2) Have Could you please test every method in class above and make sure they work? Also, some statistics on run time compare to Math.xxx would be nice. :-) » Nat | Talk » 18:28, 10 April 2009 (UTC)
Some results (more to be added):
atan2: worst error = 0.010149566515914185, average error = 0.003435479898560425 atan : worst error = 0.01014956651659027 , average error = 0.007611438369067416 sin : worst error = 0.001090292602593357, average error = 0.000512039207871299 cos : worst error = 0.001090292602468734, average error = 0.000498927373420323
--Starrynte 03:10, 11 April 2009 (UTC)
Cool result:
Initializing FastMath... FastMath init time: 0.26825 seconds == Sine Test == Math.sin() time: 22.32937 seconds FastMath.sin() time: 0.82147 seconds FastMath.sin() worst error: 0.00044 FastMath.sin() without table time: 0.86786 seconds FastMath.sin() worst error: 0.00109 == Cosine Test == Math.cos() time: 22.45272 seconds FastMath.cos() time: 0.82310 seconds FastMath.cos() worst error: 0.00036 FastMath.cos() without table time: 0.83544 seconds FastMath.cos() worst error: 0.00076 == Tangent Test == Math.tan() time: 29.05201 seconds FastMath.tan() time: 0.83758 seconds FastMath.tan() max error: 0.00004 == Arcsine Test == Math.asin() time: 15.22509 seconds FastMath.asin() time: 0.18033 seconds FastMath.asin() worst error: 0.00390 == Arccosine Test == Math.asin() time: 14.74417 seconds FastMath.acos() time: 0.16350 seconds FastMath.acos() worst error: 0.00390 == Arctangent Test == Math.atan() time: 2.98773 seconds FastMath.atan() time: 0.53018 seconds FastMath.atan() worst error: 0.00001 new FastMath.atan() time: 0.36471 seconds new FastMath.atan() worst error: 0.01014 == Arctangent2 Test == Math.atan2() time: 4.55493 seconds FastMath.atan2() time: 0.71803 seconds FastMath.atan2() worst error: 0.00391 new FastMath.atan2() time: 0.54249 seconds new FastMath.atan2() worst error: 0.01015
It seem that JAVA atan is really fast and sin/cos without table lookup is pretty slow. New atan refer to new Starrynte's one above. » Nat | Talk » 07:57, 12 April 2009 (UTC)
I'm not using FastTrig on my bots anymore because I didn't really see it saving me ticks, but the tests results kept making me curious why it seems so much faster the LUT compared to the native calls and still I see no gain on practice. So I started reading some Java specs and found out that it uses the HW sin/cos when it can, but it uses a relatively expensive algorithm to be more accurate sometimes.
- If the parameter value is less than PI/5 it seems to use HW always.
- Between PI/5 and PI/4 is hard to know.
- For bigger ones it uses a not so fast algorithm, but if the parameter is always less than PI in magnitude it still not so slow.
I recommend trying the tests with angle parameters in the range [-PI, PI] (which are anyway the most relevant in Robocode) and see how much faster Math becomes, not faster than FastTrig, but the gap closes a lot. --zyx 09:44, 12 April 2009 (UTC)
Original FastTrig don't save you any ticks, like you mentioned, but newer one might. Because inverse trig function is a lot faster! Here a result:
Initializing FastMath... FastMath init time: 0.26595 seconds == Sine Test == Math.sin() time: 2.38096 seconds FastMath.sin() time: 0.76675 seconds FastMath.sin() worst error: 0.00044 FastMath.sin() without table time: 0.98345 seconds FastMath.sin() worst error: 0.00109 == Cosine Test == Math.cos() time: 2.29882 seconds FastMath.cos() time: 0.76020 seconds FastMath.cos() worst error: 0.00044 FastMath.cos() without table time: 1.04117 seconds FastMath.cos() worst error: 0.00109 == Tangent Test == Math.tan() time: 5.53648 seconds FastMath.tan() time: 0.79689 seconds FastMath.tan() max error: 16331239355788294.00000 == Arcsine Test == Math.asin() time: 15.34787 seconds FastMath.asin() time: 0.20251 seconds FastMath.asin() worst error: 0.00383 == Arccosine Test == Math.asin() time: 14.63279 seconds FastMath.acos() time: 0.18518 seconds FastMath.acos() worst error: 0.00383 == Arctangent Test == Math.atan() time: 3.07183 seconds FastMath.atan() time: 0.87246 seconds FastMath.atan() worst error: 0.00388 new FastMath.atan() time: 0.77300 seconds new FastMath.atan() worst error: 0.01015 == Arctangent2 Test == Math.atan2() time: 5.08507 seconds FastMath.atan2() time: 1.13635 seconds FastMath.atan2() worst error: 6.28318 new FastMath.atan2() time: 0.98630 seconds new FastMath.atan2() worst error: 1.57080
Here, sin/cos/tan limit to [-PI,PI], asin and acos limit to [-1,1], atan limit to [-8000,8000] and atan2 limit to [-5000,5000] on both parameter. But this test run ten million calculation (NOTE: I can't do any more, heap size here is 1024 because I use same angle for every function so 7 of double[10000000])
Don't mind tan error, it always get this error for value PI.
Error increase with atan/atan2 probably cause by negative value, since old one never test with negative.
From my point of view with new result, use everything but atan/atan2 since large error factor. If you don't want to use sin/cos then use asin/acos. Current error percentage is ~0.122%, in acceptable rate. It is ~76 times faster! but sin/cos only at ~2.875 times. » Nat | Talk » 13:44, 12 April 2009 (UTC)
- I've fixed the atan2 to work correctly with Robocode co-ordinates. I also switched the second modulus in sin/cos/tan to be a bitwise & instead, as the value is always positive here, so they should be slightly faster now. (Do a diff to see all the changes I made.) I'm using it in my test bot with no problems. I think for it to be relevant to Robocode, sin/cos should be tested (-PI,TWO_PI] (because often absolute angles are used, not just relative), tan tested (-HALF_PI,HALF_PI). BTW, I'm not sure if you've covered this in school, round bracket ( or ) means not inclusive, square bracket [ or ] means inclusive. --Skilgannon 13:18, 12 April 2009 (UTC)
- What do you mean inclusive? I not really good at English (I'm 13) Here a new result base on your change.
Initializing FastMath... FastMath init time: 0.26234 seconds == Sine Test == Math.sin() time: 2.72282 seconds FastMath.sin() time: 0.46620 seconds FastMath.sin() worst error: 0.00038 FastMath.sin() without table time: 1.09060 seconds FastMath.sin() worst error: 0.00109 == Cosine Test == Math.cos() time: 2.59886 seconds FastMath.cos() time: 0.45760 seconds FastMath.cos() worst error: 0.00038 FastMath.cos() without table time: 1.03119 seconds FastMath.cos() worst error: 0.00109 == Tangent Test == Math.tan() time: 4.24820 seconds FastMath.tan() time: 0.80408 seconds FastMath.tan() max error: 16331239362674482.00000 == Arcsine Test == Math.asin() time: 15.49524 seconds FastMath.asin() time: 0.20813 seconds FastMath.asin() worst error: 0.00386 == Arccosine Test == Math.asin() time: 14.95285 seconds FastMath.acos() time: 0.22340 seconds FastMath.acos() worst error: 0.00386 == Arctangent Test == Math.atan() time: 3.06622 seconds FastMath.atan() time: 0.85496 seconds FastMath.atan() worst error: 0.00093 new FastMath.atan() time: 0.67158 seconds new FastMath.atan() worst error: 0.01015 == Arctangent2 Test == Math.atan2() time: 4.96311 seconds FastMath.atan2() time: 1.10923 seconds FastMath.atan2() worst error: 0.00391 new FastMath.atan2() time: 0.96854 seconds new FastMath.atan2() worst error: 1.57080
- faster a little for sin/cos and correct error for atan/atan2. Starrynte, please check your new atan2 implementation, as it result in error at about 90 degrees! I do add my main() to class above for anyone can run test (but without new atan/atan2 and sin/cos without table lookup. WARNING: Require -Xmx1024M) » Nat | Talk » 13:44, 12 April 2009 (UTC)
- OK, just understand what do you mean inclusive. Actually, I do test with inclusive, but the result will not since I never see Math.random() return 0 or 1 ;) Rednaxela, please consider merge this to your page. It is going really better and better for inverse trig. » Nat | Talk » 13:50, 12 April 2009 (UTC)
- (Yet another) new atan2 found, slightly faster than the current atan2, yet (according to my tests) 50-100% more accurate depending on the parameters. See above for the code. I also moved the recreate to the main page (while waiting for Rednaxela to either merge the two or approve of it), as the wiki is warning me about the size of the page. Too lazy to test atan2/atan using now-established ways, if someone would be kind enough to test the new atan2/atan methods... --Starrynte 03:27, 14 April 2009 (UTC)
- faster a little for sin/cos and correct error for atan/atan2. Starrynte, please check your new atan2 implementation, as it result in error at about 90 degrees! I do add my main() to class above for anyone can run test (but without new atan/atan2 and sin/cos without table lookup. WARNING: Require -Xmx1024M) » Nat | Talk » 13:44, 12 April 2009 (UTC)
Three new sine/cosine approximation methods, all without array:
1.
Sine: return (x%=(Math.PI*2)) * (0.98864467697616454 + x * x * (-0.16057654076500594 + x * x * (0.007410226601174971 + x * x * (-.00013957972749774987 + x * x * .00000098419665532389341))));
Cosine: return (x=(x + HALF_PI) % (Math.PI*2)) * (0.98864467697616454 + x * x * (-0.16057654076500594 + x * x * (0.007410226601174971 + x * x * (-.00013957972749774987 + x * x * .00000098419665532389341))));
2.
Sine: x * (0.98441649960736333 + x * x * (-0.15346258175416733 + x * x * (0.005465389576719881)));
Cosine: ((x>=HALF_PI ? x - HALF_PI:x + HALF_PI) * (0.98441649960736333 + x * x * (-0.15346258175416733 + x * x * (0.005465389576719881)));
3.
Sine: return x * (0.98864467697616454 + x * x * (-0.16057654076500594 + x * x * (0.007410226601174971 + x * x * (-.00013957972749774987 + x * x * .00000098419665532389341))));
Cosine: return (x>=THREE_OVER_TWO_PI ? x - HALF_PI:x + HALF_PI) * (0.98864467697616454 + x * x * (-0.16057654076500594 + x * x * (0.007410226601174971 + x * x * (-.00013957972749774987 + x * x * .00000098419665532389341))));
1. is a general purpose one that could be used for all values (15% slower than array, max error 0.00638) 2. can be used when the value is known to be between PI and -PI (not nested) 3. can be used when the value is known to be between TWO_PI and -TWO_PI (not tested) --Starrynte 01:01, 26 April 2009 (UTC)
Additional contributions
First of all, thanks to Rednaxela and everyone who contributed to the recreate version. After lots of effort expended on my own version, I found that these routines were much faster and more accurate too. So I scooped them up and used them in my code! A few of the routines I was using were faster or more accurate, I've pasted them below. --Darkcanuck 04:25, 12 August 2011 (UTC)
/* Inverse Trig Approximations */
private static final double chebyshev_atan(double r) {
// Chebyshev 7th-order polynomial approximation; only valid for -1 < r < +1
// from Matlab demo: http://www.mathworks.com/products/fixed/demos.html?file=/products/demos/shipping/fixedpoint/fixpt_atan2_demo.html
double r2 = r * r;
double r3 = r2 * r;
double r5 = r3 * r2;
double r7 = r5 * r2;
return (0.999133448222780 * r
- 0.320533292381664 * r3
+ 0.144982490144465 * r5
- 0.038254464970299 * r7);
}
public static final double atan(double r) {
if (r < 0.0)
return -atan(-r);
if (r > 1.0)
return HALF_PI - chebyshev_atan(1.0 / r);
return chebyshev_atan(r);
}
public static final double atan2(double y, double x) {
if (x==0.0) {
if (y==0.0)
return 0.0; // should be Double.NaN but Math.atan2 returns 0.0 in this case;
return (y > 0.0) ? HALF_PI : -HALF_PI;
}
double absX = abs(x);
double absY = abs(y);
boolean inv = (absY > absX) && (absY > 0.0);
double absAtan = chebyshev_atan( (inv) ? absX/absY : absY/absX);
if (inv)
absAtan = HALF_PI - absAtan;
if (x > 0.0) {
return (y >= 0.0) ? absAtan : -absAtan;
} else {
if (y >= 0.0)
return PI - absAtan;
else
return -PI + absAtan;
}
}
I'm not actually using the exp() function below (it's not smooth enough for my radial basis functions) but it's much faster than Math.exp() and seems to work well in neural nets:
/* fast approximation of Math.exp()
* based on Java code by Martin Ankerl (http://martin.ankerl.com/2007/02/11/optimized-exponential-functions-for-java/)
* algorithm by Nicol N. Schraudolph ("A Fast, Compact Approximation of the Exponential Function", 1998
* http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.57.1569)
*/
private static final double EXP_A = Math.pow(2, 20) / Math.log(2);
private static final double EXP_B = 1023.0 * Math.pow(2, 20);
private static final double EXP_C = 45799.0;
public static double exp(double val) {
final long tmp = (long) (EXP_A * val + (EXP_B - EXP_C));
return Double.longBitsToDouble(tmp << 32);
}
- [View source↑]
- [History↑]
Contents
Thread title | Replies | Last modified |
---|---|---|
Faster normalXAngle --> faster sin,cos,tan | 11 | 18:39, 15 August 2018 |
Why init instead of static | 7 | 21:08, 20 February 2012 |
Fast gaussian kernel function | 2 | 19:21, 19 February 2012 |
Fast Math.exp | 6 | 23:42, 18 February 2012 |
Inspired by recent discussion I did a little profiling of DrussGT, and was horrified to find that plain old robocode.util.Utils.normalRelativeAngle() was using a huge percentage of my processing time. So, I tried various methods to see how they were for speed, and came up with one which takes about 1/4 of the time:
public static final double normalRelativeAngle(double d){ d += Math.PI; double i = Math.floor(d*(1/(2*Math.PI))); d -= i*(2*Math.PI); return d-Math.PI; } public static final double normalAbsoluteAngle(double d){ double i = Math.floor(d*(1/(2*Math.PI))); d -= i*(2*Math.PI); return d; }
I think the big improvement comes from no divisions, no % and no decisions, all of which are slow. Of course, it isn't ulp-accurate (although I'm not sure the previous one was either?), but over 1M calls the largest difference I saw between this and the old one was -3.552713678800501E-15 which is small enough for me to call 0.
Of course, this technique can then be applied to sin and cos, which then run about 2x the speed of previously, and on my system this is now ~4x the speed of Math.sin/cos:
public static final double sin(double value) { value = value*K + 0.5; double i = Math.floor(value*(1.0/TRIG_DIVISIONS)); value -= i*TRIG_DIVISIONS; return sineTable[(int)value]; } public static final double cos(double value) { value = value*K + 0.5 + Math.PI*0.5*K; double i = Math.floor(value*(1.0/TRIG_DIVISIONS)); value -= i*TRIG_DIVISIONS; return sineTable[(int)value]; }
It can also be applied to the tan method, but I hardly use tan so I haven't bothered with it. Additionally, TRIG_DIVISIONS is no longer restricted to power-of-2 numbers because we lost the &.
This code is public domain, do with it as you like, and give credit if you think I deserve it! Enjoy, and write speedy bots!
Further experimentation with the profiler tells me that our lookup-table sin and cos are actually slower than an expanded polynomial version, I suspect this is due to occasional cache misses. However, the faster normalRelativeAngle code is still useful, here are my latest sin/cos implementations:
public static final double sin(double d) { d += Math.PI; double x2 = Math.floor(d*(1/(2*Math.PI))); d -= x2*(2*Math.PI); d-=Math.PI; x2 = d * d; return //accurate to 6.82e-8, 3.3x faster than Math.sin, //faster than lookup table in real-world conditions due to no cache misses //all values from "Fast Polynomial Approximations to Sine and Cosine", Garret, C. K., 2012 (((((-2.05342856289746600727e-08*x2 + 2.70405218307799040084e-06)*x2 - 1.98125763417806681909e-04)*x2 + 8.33255814755188010464e-03)*x2 - 1.66665772196961623983e-01)*x2 + 9.99999707044156546685e-01)*d; } public static final double cos(double d) { d += Math.PI; double x2 = Math.floor(d*(1/(2*Math.PI))); d -= x2*(2*Math.PI); d-=Math.PI; d *= d; return //max error 5.6e-7, 4x faster than Math.cos, //faster than lookup table in real-world conditions due to less cache misses //all values from "Fast Polynomial Approximations to Sine and Cosine", Garret, C. K., 2012 ((((- 2.21941782786353727022e-07*d + 2.42532401381033027481e-05)*d - 1.38627507062573673756e-03)*d + 4.16610337354021107429e-02)*d - 4.99995582499065048420e-01)*d + 9.99999443739537210853e-01; }
Upon doing my own research (heavy googling) I've decided to try a modified version of the LUT code for now. The key difference is the overall footprint is much smaller than the original implementation, hopefully allowing it to remain in memory. I'm doing this by reducing the number of divisions pretty drastically, storing the values as single-precision, and then doing linear interpolation between the nearest two divisions to get the final output. The asymptotic functions will use polynomial implementations, because I fear the interpolation will totally wreck accuracy. And of course, testing will prove whether this is a good approach or not.
Have a look at Polynomial Approximations to Sine and Cosine. This uses no memory and is even faster than the lookup table approach. Skilgannon uses this as far as I know.
Also have a look at Chebyshev Approximation (http://metamerist.com/cheby/example38.htm archive) which works pretty well for atan, asin and acos.
That is what I'm referring to as the "polynomial implementation" above. My intuition is that the LUT version is only slower in practice because it gets evicted out of the cache due to being so large. I haven't done a side-by-side comparison test yet, but I've shrunk it by roughly a factor of 32 from the smallest version I've seen on this page, and I've added a couple other tricks to reduce the calculations required to find the lookup index. Once I truly need the speed boost I'll begin benchmarking the two against each other.
Table lookups are slow I suspect due to array index out-of-bounds checking.
It also happens in kNN search. I was able to double the speed of my bot by replacing a few arrays with objects with hardcoded number of attributes. Only works with constant sized arrays, like data point coordinates.
No, I'm fairly sure it's caching, because profiling shows LUT twice as slow as polynomial, but a micro-benchmark shows LUT ~30% faster.
I generate a list of random numbers in an array and then time how long it takes for each of the functions to process the entire array in a loop. The result of each function I add to a running total, and I print the total at the end to make sure it doesn't get optimised away, and also to see how accurate they are. I could also calculate the overhead if I just add the results of the array, but I haven't tried that yet, and I think it might get optimised into SSE ops anyway.
Just a thought, but why not incorporate the faster normalRelativeAngle and normalAbsoluteAngle into the codebase of Robocode. That way everybody profits from it in the future when a newer version than 1.8.1.0 is standard.
Personally, I wouldn't have any issues with it, but the fact that using a * instead of / probably loses a few ULPS and I'm not sure that is what we want for inside of the game engine. It is still faster than the standard method even using / , but by doing the - in a separate operation there is probably some ULPS loss there as well.
Fast-math classes/functions have lower precision.
Chase, I believe the reason to use the init method instead of a static block within this class is that the static block won't be called until this class is loaded by the JVM the first time it's used. By using an init method and calling it from a static block in your bot class, you ensure it's called when your bot is loaded, before the battle starts and not causing any skipped turns. Hopefully somebody can confirm/deny that.
Voidious, you right in common, but there're an issue - robots are loaded in theirs threads, so you can skip turns any way.
IIRC, while there is a limit on the time for a robot's class to load, it is far greater than the time allowed per turn.
If I recall, the static blocks are called when the class is loaded (in order they are in the source code).
I forget if it is loaded at the same time or on first reference. I think in robocode all class files are loaded at load time. Which means it doesn't matter where it is initialized in this case.
It's been a long time, but when first writing FastTrig, I believe I tested this, and found that classes like this were initialized upon first reference.
I debugged initialization order. Looks like everything is initialized before the battle starts thanks to some field cleaning algorithm inside the engine (net.sf.robocode.host.security.RobotClassLoader#cleanStaticReference(java.lang.reflect.Field)).
Patched a gaussian kernel to work together with that fast exp.
private static final double SQRT_2_PI = Math.sqrt(2 * Math.PI);
private static final double EXP_LIMIT = 700;
private static final double GAUSSIAN_LIMIT = Math.sqrt(EXP_LIMIT * 2);
public static double gaussian(final double u) {
if (u > GAUSSIAN_LIMIT || u < -GAUSSIAN_LIMIT) {
return 0;
}
return exp(-(u * u) / 2) / SQRT_2_PI;
}
public static double exp(double val) {
final long tmp = (long) (1512775 * val + 1072632447);
return Double.longBitsToDouble(tmp << 32);
}
A suggestion: get rid of divisions, they are almost as slow as sqrt and slow the code down a lot. Also, sometimes it is important that the gaussian doesn't return 0, so change that to something a tiny bit bigger =)
private static final double SQRT_2_PI = Math.sqrt(2 * Math.PI);
private static final double EXP_LIMIT = 700;
private static final double GAUSSIAN_LIMIT = Math.sqrt(EXP_LIMIT * 2);
public static double gaussian(final double u) {
if (u > GAUSSIAN_LIMIT || u < -GAUSSIAN_LIMIT) {
return 2e-200;
}
return exp((u * u) * -0.5) * (1 / SQRT_2_PI);
}
public static double exp(double val) {
final long tmp = (long) (1512775 * val + 1072632447);
return Double.longBitsToDouble(tmp << 32);
}
That "get rid of divisions" tip doubled the speed of my bot. O.o'
And for a mathematical purism, returning the value at the boundaries as minimum to keep the function smooth:
private static final double SQRT_2_PI_INVERSE = 1 / Math.sqrt(2 * Math.PI);
private static final double EXP_LIMIT = 700;
private static final double GAUSSIAN_LIMIT = Math.sqrt(EXP_LIMIT * 2);
private static final double MIN_GAUSSIAN_VALUE = gaussian(GAUSSIAN_LIMIT);
public static double gaussian(final double u) {
if (u > GAUSSIAN_LIMIT || u < -GAUSSIAN_LIMIT) {
return MIN_GAUSSIAN_VALUE;
}
return exp(u * u * -0.5) * SQRT_2_PI_INVERSE;
}
public static double exp(final double val) {
final long tmp = (long) (1512775 * val + 1072632447);
return Double.longBitsToDouble(tmp << 32);
}
I started on a FastMath class and then caved and started playing with this one. Just thought I'd offer up a fast Math.exp I found from this blog (via StackOverflow):
public static double exp(double val) {
final long tmp = (long) (1512775 * val + 1072632447);
return Double.longBitsToDouble(tmp << 32);
}
Probably doesn't matter to most of you, but if you're doing DC with gaussians it might. I actually use a different kernel density in my main gun because 20,000 Math.exp's each time I aim would be pretty slow. (Not sure the above would solve that, but anyway...)
FYI - testing for values between -20 and 0, which would be the range if you're doing gaussians with bandwidth = 1/2 bot width and up to 40 bot widths.
Math.exp() time: 0.30864 seconds FastMath.exp() time: 0.06371 seconds FastMath.exp() worst error: 0.02899 FastMath.exp() avg error: 0.00073
Or for -3 to 0:
== exp Test == Math.exp() time: 0.38425 seconds FastMath.exp() time: 0.07076 seconds FastMath.exp() worst error: 0.02899 FastMath.exp() avg error: 0.00464
Yeah, I didn't update the main page, didn't want to overwrite others' functions. Try the atan/atan2 I posted, they're much more accurate I think.
Yep - already tested them, using them, and running a long benchmark with a FastMath Diamond. =) I actually haven't noticed any huge speed increase in Diamond, so I'm hesitant to keep all this FastMath stuff. But it is faster and the error rate looks pretty low, so if I don't take any score hit I'll definitely keep it.
Just tried that fast exp with a dc/gaussian kernel/swarm targeting gun. 5,5% APS decrease in meleerumble, 12,4% APS decrease in teamrumble. :( Too imprecise for swarm targeting, and you notice the gun shaking a lot more.
But in 1v1 there was no APS hit (0,1% APS increase).
Found out its because that function doesn´t work for values outside -700 and 700.