# Faster normalXAngle --> faster sin,cos,tan

← Thread:User talk:Rednaxela/FastTrig/Faster normalXAngle -- faster sin,cos,tan/reply

Further experimentation with the profiler tells me that our lookup-table sin and cos are actually slower than an expanded polynomial version, I suspect this is due to occasional cache misses. However, the faster normalRelativeAngle code is still useful, here are my latest sin/cos implementations:

public static final double sin(double d) { d += Math.PI; double x2 = Math.floor(d*(1/(2*Math.PI))); d -= x2*(2*Math.PI); d-=Math.PI; x2 = d * d; return //accurate to 6.82e-8, 3.3x faster than Math.sin, //faster than lookup table in real-world conditions due to no cache misses //all values from "Fast Polynomial Approximations to Sine and Cosine", Garret, C. K., 2012 (((((-2.05342856289746600727e-08*x2 + 2.70405218307799040084e-06)*x2 - 1.98125763417806681909e-04)*x2 + 8.33255814755188010464e-03)*x2 - 1.66665772196961623983e-01)*x2 + 9.99999707044156546685e-01)*d; } public static final double cos(double d) { d += Math.PI; double x2 = Math.floor(d*(1/(2*Math.PI))); d -= x2*(2*Math.PI); d-=Math.PI; d *= d; return //max error 5.6e-7, 4x faster than Math.cos, //faster than lookup table in real-world conditions due to less cache misses //all values from "Fast Polynomial Approximations to Sine and Cosine", Garret, C. K., 2012 ((((- 2.21941782786353727022e-07*d + 2.42532401381033027481e-05)*d - 1.38627507062573673756e-03)*d + 4.16610337354021107429e-02)*d - 4.99995582499065048420e-01)*d + 9.99999443739537210853e-01; }

You do not have permission to edit this page, for the following reasons:

You can view and copy the source of this page.

Return to Thread:User talk:Rednaxela/FastTrig/Faster normalXAngle -- faster sin,cos,tan/reply (9).

Have a look at *Polynomial Approximations to Sine and Cosine*. This uses no memory and is even faster than the lookup table approach. Skilgannon uses this as far as I know.

Also have a look at Chebyshev Approximation (~~http://metamerist.com/cheby/example38.htm~~ archive) which works pretty well for atan, asin and acos.

That is what I'm referring to as the "polynomial implementation" above. My intuition is that the LUT version is only slower in practice because it gets evicted out of the cache due to being so large. I haven't done a side-by-side comparison test yet, but I've shrunk it by roughly a factor of 32 from the smallest version I've seen on this page, and I've added a couple other tricks to reduce the calculations required to find the lookup index. Once I truly need the speed boost I'll begin benchmarking the two against each other.