Fixed all 502518670 errors of more than 1 ulp for cbrtf() on amd64.

The maximum error was 3.56 ulps. The bug was another translation error. The double precision version has a comment saying "new cbrt to 23 bits, may be implemented in precision". This means exactly what it says -- that the 23 bit second approximation for the double precision cbrt() may be implemented in single (i.e., float) precision. It doesn't mean what the translation assumed -- that this approximation, when implemented in float precision, is good enough for the the final approximation in float precision. First, float precision needs a 24 bit approximation. The "23 bit" approximation is actually good to 24 bits on float precision args, but only if it is evaluated in double precision. Second, the algorithm requires a cleanup step to ensure its error bound. In float precision, any reasonable algorithm works for the cleanup step. Use the same algorithm as for double precision, although this is much more than enough and is a significant pessimization, and don't optimize or simplify anything using double precision to implement the float case, so that the whole double precision algorithm can be verified in float precision. A maximum error of 0.667 ulps is claimed for cbrt() and the max for cbrtf() using the same algorithm shouldn't be different, but the actual max for cbrtf() on amd64 is now 0.9834 ulps. (On i386 -O1 the max is 0.5006 (down from < 0.7) due to extra precision.)
svn path=/head/; revision=153303
2024-10-05 08:00:30 +00:00 · 2005-12-11 13:22:01 +00:00 · 2005-12-11 13:22:01 +00:00 · 6de073b4ef · 2020-12-20 02:59:44 +00:00
parent 1a787460ba
commit 6de073b4ef
1 changed files with 13 additions and 1 deletions
--- a/lib/msun/src/s_cbrtf.c
+++ b/lib/msun/src/s_cbrtf.c
@ -1,5 +1,6 @@
 /* s_cbrtf.c -- float version of s_cbrt.c.
 * Conversion to float by Ian Lance Taylor, Cygnus Support, ian@cygnus.com.
+ * Debugged by Bruce D. Evans.
 */

 /*
@ -37,7 +38,7 @@ G =  3.5714286566e-01; /* 5/14      = 0x3eb6db6e */
 float
 cbrtf(float x)
 {
-	float r,s,t;
+	float r,s,t,w;
 	int32_t hx;
 	u_int32_t sign;
 	u_int32_t high;
@ -64,6 +65,17 @@ cbrtf(float x)
 	s=C+r*t;
 	t*=G+F/(s+E+D/s);

+    /* chop t to 12 bits and make it larger than cbrt(x) */
+	GET_FLOAT_WORD(high,t);
+	SET_FLOAT_WORD(t,high+0x00001000);
+
+    /* one step Newton iteration to 24 bits with error less than 0.984 ulps */
+	s=t*t;		/* t*t is exact */
+	r=x/s;
+	w=t+t;
+	r=(r-t)/(w+r);	/* r-t is exact */
+	t=t+t*r;
+
    /* retore the sign bit */
 	GET_FLOAT_WORD(high,t);
 	SET_FLOAT_WORD(t,high|sign);