Commit graph

60 commits

Author SHA1 Message Date
Mark Murray e38f230827 lib/msun: Fix tgammal(3) on IEEE 128-bit platforms
Undo the 80-bit "stub" implementation of the 128-bit long double
tgammal(3) function. The latest (as of Feb 2024) version of the
src/contrib/arm-optimised-routines library includes a standalone,
full 128-bit replacement. This needs a small bit of wrapping to
fit it in, but is otherwise a drop-in replacement.

Testing this is hard, as most maths packages blow up as soon as
their 80-bit floating-point capability is exceeded. With 128-bit
tgammal(), this is easy to do, and this is the range that needs to
be checked the most carefully. Using my copy of Maple, I was able
to check that the output was within a few ULP of the correct answer,
right up to the point of 128-bit over- and underflow. Additionally,
the results are no worse, and indeed better than the 80-bit version.

Steve Kargl sent me his libm testing code, which I used to verify
that the excpetions for certain key values were correct. Tested in
this case were +-Inf, +-NaN, +-1 and +-0.

Differential Revision:	https://reviews.freebsd.org/D44168
Reviewed by:	theraven, andrew, imp
2024-03-18 09:48:43 +00:00
Steve Kargl 0dd5a5603e lib/msun: Cleanup after $FreeBSD$ removal
Remove no longer needed explicit inclusion of sys/cdefs.h.

PR:	276669
MFC after:	1 week
2024-01-28 17:00:23 +02:00
Warner Losh dc36d6f9bb lib: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by:		Netflix
2023-11-26 22:23:28 -07:00
Warner Losh 1d386b48a5 Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:42 -06:00
Warner Losh b3e7694832 Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:16 -06:00
Steve Kargl 2d3b0a687b Fixes for bugs in sinpi/cospi/tanpi
patch to fix half-cycle trigonometric functions

Paul Zimmermann, a MPFR developer, contacted me about large errors in
the half-cycle trigonometric functions.  I've have investigated these
issues and developed the attached patch. The float, double, and ld80
(long double) changes have been tested.

Caveat emptor: The ld128 changes have not been compiled.  The ld128
changes have not been tested.  I do not have access to a system that
uses ld128 floating point.

Here is an itemized list of changes:

* lib/msun/src/math_private.h:
  . Add fast floor macros to compute the integer part of |x| for
    0 <= |x| 01xp(N-1), where N is the precision of the type of x.
    These macros are used in the half-cycle trigonometric functions
    (e.g., sinpi(x)).
  . The FFLOOR80 macros is used with the Intel 80-bit extended double
    functions.  This macors corrects an off-by-one error, which led to
    enormous error for |x| > 0x1p32.

* lib/msun/src/s_cospif.c:
* lib/msun/src/s_cospi.c:
* lib/msun/ld80/s_cospil.c:
  . Update Copyright years.
  . Use FFLOOR*() macro to get integer part of |x|.
  . Correct handle the range 0x1p(N-1) <= |x| < 0x1pN.  Here, one needs
    to determine if the integral value of |x| is even or odd to choose
    +1 or -1.  If |x| >= 0x1pN, always return +1.

* lib/msun/src/s_sinpif.c:
* lib/msun/src/s_sinpi.c:
* lib/msun/ld80/s_sinpil.c:
  . Update Copyright years.
  . Use FFLOOR*() macro to get integer part of |x|.

* lib/msun/src/s_tanpif.c:
* lib/msun/src/s_tanpi.c:
* lib/msun/ld80/s_tanpil.c:
  . Update Copyright years.
  . For +-0.5, return +-inf.  Previously, tanpi[fl]() returned an NaN.
  . Use FFLOOR*() to get integer part of |x|.  Need to determine if the
    integer part is even or odd.  This is used to set +-0 for |x|
integral
    and +-inf for (n+1/2).
  . For 0x1p(N-1) <= |x| < 0x1pN need to determine if x is an even or
odd
    integer to select +0 or -0.  For |x| >= 0x1pN, it is always an even
    integer, select 0.
  . Note, tanpi[fl](x) is an odd function, so one needs to consider
    tanpi[fl](-|x|) = - tanpi[fl](|x|).

* lib/msun/ld128/s_cospil.c:
* lib/msun/ld128/s_sinpil.c:
* lib/msun/ld128/s_tanpil.c:
  . Update Copyright years.
  . These routines use an FFLOOR128 macros, which likely should be
    replaced by a bit twiddling algorithm.
  . The same considerations above are applied to 0x1p112 <= |x| <
0x1p113,
    and |x| >= 0x1p113 cases.
  . Note, even and odd determination used fmodl(x,2.), which is likely
    slow.

PR:	272742
MFC after:	1 week
2023-08-03 07:27:58 +03:00
Steve Kargl c66a499e03 Cleanup debugging code in libm
David Das (das@) committed Bruce Evan's (bde's) WIP code for
expl() and logl() in git revision 25a4d6bfda.  That code
included instrumentation that allowed bde to generate pari
scripts used in testing/debugging.  This patch removes that
instrumentation as it is unlikely that others will ever use it.

* math/libm/msun/src/math_private.h:
  . Remove bde's macros for the generation of pari scripts.

* math/libm/msun/ld128/s_expl.c:
* math/libm/msun/ld128/s_logl.c:
* math/libm/msun/ld80/s_expl.c:
* math/libm/msun/ld80/s_logl.c:
  . Remove bde's DOPRINT_START macro.
  . Change RETURNP to RETURNF.
  . Change RETURN2P to RETURNF.  Adjust arguments as needed.
  . Change RETURNPI to RETURNI.
  . Change RETURN2PI to RETURNI.  Adjust arguments as needed.

PR:	272765
MFC after:	1 week
2023-08-03 07:27:58 +03:00
Warner Losh 4d846d260e spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with:		pfg
MFC After:		3 days
Sponsored by:		Netflix
2023-05-12 10:44:03 -06:00
Mark Murray 03a88e3de9 * lib/msun/Makefile b/lib/msun/Makefile:
. Disconnect imprecise.c from the build.  This file can be deleted.
  . Add b_tgammal.c to the build for ld80 and ld128 targets.  The ld128
    is a 'git mv' of imprecise.c to ld128/b_tgammal.c.

* lib/msun/ld80/b_expl.c:
  . New file.  Implement __exp__D for ld80 targets.  This is based on
    bsdsrc/b_exp.c.

* lib/msun/ld80/b_logl.c:
  . New file.  Implement __log__D for ld80 targets.  This is based on
    bsdsrc/b_log.c.

* lib/msun/ld80/b_tgammal.c b/lib/msun/ld80/b_tgammal.c
  . New file.  Implement tgammal(x) for ld80 targets.

Submitted by:           Steve Kargl
Differential Revision:  https://reviews.freebsd.org/D33444
Reviewed by:            pfg
2021-12-15 18:36:20 +00:00
Steve Kargl 046e2d5db1 Implementations of cexpl()
The change implements cexpl() for both ld80 and ld128 architectures.
Testing was done on x86_64 and aarch64 systems.

Along the way sincos[fl]() use an optimization that reduces the argument
to being done one rather than twice.  This optimization actually pointed
to a bug in the ld128 version of sincosl(), which is now fixed.  In
addition, the minmax polynomial coefficients for sincosl() have been
updated.

A concise log of the file-by-file changes follows.

* include/complex.h:
  . Add a prototype for cexpl().

* lib/msun/Makefile:
  . Add s_cexpl.c to the build.
  . Setup a link for cexpl.3 to cexp.3.

* lib/msun/Symbol.map:
  . Expose cexpl symbol in libm shared library.

* lib/msun/ld128/s_cexpl.c:
  * Implementation of cexpl() for 128-bit long double architectures.
    Tested on an aarch64 system.

* lib/msun/ld80/s_cexpl.c:
  * Implementation of cexpl() for Intel 80-bit long double.

* lib/msun/man/cexp.3:
  . Document cexpl().

* lib/msun/man/complex.3:
  . Add a BUGS section about cpow[fl].

* lib/msun/src/s_cexp.c:
  . Include float.h for weak references on 53-bit long double targets.
  . Use sincos() to reduce argument reduction cost.

* lib/msun/src/s_cexpf.c:
  . Use sincosf() to reduce argument reduction cost.

* lib/msun/src/k_sincosl.h:
  . Catch up with the new minmax polynomial coefficients for the kernel for
    the 128-bit cosl() implementation.
  . BUG FIX: *cs was used where *sn should have been.  This means that sinl()
    was no computed correctly when iy != 0.

* lib/msun/src/s_cosl.c:
  . Include fpmath.h to get access to IEEEl2bits.
  . Replace M_PI_4 with pio4,  a 64-bit or 113-bit approximation for pi / 4.

PR:	216862
MFC after:	1 week
2021-11-05 13:51:42 +02:00
Steve Kargl 6d04e1422e cosl(): fix polynomial approximation coefficients for ld128 version
As mention previously, the minmax polynomial approximation
in the kernel for cosl() seem to have a bad set of coefficients.

In testing, cosl() in the interval [0.785, pi/4] for 1 million
values and pi/4 written to 37 decimal digits.  The old version
on an aarch64 system gave

% tlibm/tlibm_lmath -l -x 0.78 -X
7.85398163397448309615660845819875721e-1L cos
Interval tested for cosl: [0.78,0.785398]
count: 1000000
  xm =  7.80213913234863919029058821396125599e-01L
  libm =  7.10763080972549562455058499280609083e-01L
  mpfr =  7.10763080972549562455058499280608983e-01L
  ULP = 1.04431

The max ULP exceeds 1, which is not good.  So, I rinsed off a 10
year code and recomputed coefficients.  The new minmax polynomial
now yields

% tlibm/tlibm_lmath -l -x 0.78 -X
7.85398163397448309615660845819875721e-1L cos
Interval tested for cosl: [0.78,0.785398]
count: 1000000
  xm =  7.82916198746768272588844890973704219e-01L
  libm =  7.08859615479571058183956453286628396e-01L
  mpfr =  7.08859615479571058183956453286628469e-01L
  ULP = 0.75407

which is very good.

PR:	218514
MFC after:	1 week
2021-11-02 10:54:10 +02:00
Steve Kargl 4f889260c3 sinpi[fl] etc: Fix the ld128 implementations
Mark Murray graciously provided access to an aarch64 system
to test the ld128 implementations.  This patch address
* Misuses of copysignl() in sinpil() and tanpil().
* Redo the splitting of argument 'x' into an integer part and
  remainder.  The remainder must satify 0 <= r < 1.
* Update the reduction of the integer part to something that can
  easily be seen as even or odd, e.g., sin(pi*x) = (-1)^n*sin(pi*r)
  with n <= 2^112 and we an reduce n by subtracting integer powers
  of 2.
* In s_cospil.c, fix typos where 'x' is used where 'ax', the
  remainder, is required.
* In tanpil(), fix the use of an uninitialized variable, ax = fabsl(ax),
  ax should be x in fabsl().

One item of note, in the limited tested on aarch64, the max ULP
for sinpil() and cospil() were less than 1.1 ULP, which is higher
that the desired max ULP less than 1.  This was traced to the
kernel for cosl() in the fundamental interval [0,pi/4].
The coefficients in the minmax polynomial likely need refinement.

PR:	218514
MFC after:	1 week
2021-11-01 04:38:19 +02:00
Konstantin Belousov 6312d14461 lib/msun/ld128/s_tanpil.c: make it compile.
Declare local, add missed ';'.
Name function properly.

Fixes:	dce5f3abed
Reviewed by:	kargl
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-27 01:34:12 +03:00
Steve Kargl dce5f3abed [LIBM] implementations of sinpi[fl], cospi[fl], and tanpi[fl]
Both IEEE-754 2008 and ISO/IEC TS 18661-4 define the half-cycle
trignometric functions cospi, sinpi, and tanpi.  The attached
patch implements cospi[fl], sinpi[fl], and tanpi[fl].  Limited
testing on the cospi and sinpi reveal a max ULP less than 0.89;
while tanpi is more problematic with a max ULP less than 2.01
in the interval [0,0.5].  The algorithms used in these functions
are documented in {ks}_cospi.c, {ks}_sinpi.c, and s_tanpi.c.

Note.  I no longer have access to a system with ld128 and
adequate support to compile and test the ld128 implementations
of these functions.  Given the almost complete lack of input from
others on improvements to libm, I doubt that anyone cares.  If
someone does care, the ld128 files contain a number of FIXME comments,
and in particular, while the polynomial coefficients are given
I did not update the polynomial algorithms to properly use the
coefficients.

PR:	218514
MFC after:	2 weeks
2021-10-26 02:50:20 +03:00
Alex Richardson 221622ec0c lib/msun: Avoid FE_INEXACT for x86 log2l/log10l
This fixes tests/lib/msun/logarithm_test after compiling the test with
-fno-builtin (D28577). Adding invln10_lo + invln10_10 results in
FE_INEXACT (for all inputs) and the same for the log2l invln2_lo + invln2_hi.
This patch avoids FE_INEXACT (for exact results such as 0) by defining a
constant and using that.

Reviewed By:	dim
Differential Revision: https://reviews.freebsd.org/D28786
2021-03-08 09:39:32 +00:00
Stefan Eßer a67cc94327 Apply fix for ld80 and ld128 submitted by Steve Kargl:
- Micro-optimization: use sincosl(x) instead of a call to cosl(x) and
  a call to sinl(x).  Argument reduction is done once not twice.

- Use a long double constant instead of an invalid double constant.

- Spell scale2 correctly

He could not test ld128, so that patch is untested.

Submitted by:	sgk at troutmask.apl.washington.edu (Steve Kargl)
2020-09-20 05:28:31 +00:00
Warner Losh a8197ad3aa Remove sparc64 specific parts of libm and fix comments
Once upon a time, sparc64 was the only ld128 architecture. However,
both aarch64 and riscv are now such architectures. Many of the
comments about how slow multiplication was on old sparc64 processors
are now no longer true. However, since no evaluation has been done for
aarch64 yet, it's unclear if they are still relevant or not. If not,
the code should be changed. If so, the comments should remove the
uncertainty.

Reviewed by: emaste@
Differential Revision: https://reviews.freebsd.org/D23658
2020-02-26 18:55:03 +00:00
Pedro F. Giffuni 50757b1452 msun: Fix some old typos.
Seen in a posting from July 27 by "CM Graff" in musl-libc.
2018-12-31 15:43:06 +00:00
Bruce Evans 27aa844253 Centralize the complications for special efficient rounding to integers.
This was open-coded in range reduction for trig and exp functions.  Now
there are 3 static inline functions rnint[fl]() that replace open-coded
expressions, and type-generic irint() and i64rint() macros that hide the
complications for efficiently using non-generic irint() and irintl()
functions and casts.

Special details:

ld128/e_rem_pio2l.h needs to use i64rint() since it needs a 46-bit integer
result.  Everything else only needs a (less than) 32-bit integer result so
uses irint().

Float and double cases now use float_t and double_t locally instead of
STRICT_ASSIGN() to avoid bugs in extra precision.

On amd64, inline asm is now only used for irint() on long doubles.  The SSE
asm for irint() on amd64 only existed because the ifdef tangles made the
correct method of simply casting to int for this case non-obvious.
2018-07-20 12:42:24 +00:00
Bruce Evans 6f1b8a0792 Add a macro nan_mix() and use it to get NaN results that are (bitwise)
independent of the precision in most cases.  This is mainly to simplify
checking for errors.  r176266 did this for e_pow[f].c using a less
refined expression that often didn't work.  r176276 fixes an error in
the log message for r176266.  The main refinement is to always expand
to long double precision.  See old log messages (especially these 2)
and the comment on the macro for more general details.

Specific details:
- using nan_mix() consistently for the new and old pow*() functions was
  the only thing needed to make my consistency test for powl() vs pow()
  pass on amd64.

- catrig[fl].c already had all the refinements, but open-coded.

- e_atan2[fl].c, e_fmod[fl].c and s_remquo[fl] only had primitive NaN
  mixing.

- e_hypot[fl].c already had a different refined version of r176266.  Refine
  this further.  nan_mix() is not directly usable here since we want to
  clear the sign bit.

- e_remainder[f].c already had an earlier version of r176266.

- s_ccosh[f].c,/s_csinh[f].c already had a version equivalent to r176266.
  Refine this further.  nan_mix() is not directly usable here since the
  expression has to handle some non-NaN cases.

- s_csqrt.[fl]: the mixing was special and mostly wrong.  Partially fix the
  special version.

- s_ctanh[f].c already had a version of r176266.
2018-07-17 07:42:14 +00:00
Matt Macy 6813d08ff5 msun: add ld80/ld128 powl, cpow, cpowf, cpowl from openbsd
This corresponds to the latest status (hasn't changed in 9+
years) from openbsd of ld80/ld128 powl, and source cpowf, cpow,
cpowl (the complex power functions for float complex, double
complex, and long double complex) which are required for C99
compliance and were missing from FreeBSD. Also required for
some numerical codes using complex numbered Hamiltonians.

Thanks to jhb for tracking down the issue with making
weak_reference compile on powerpc.

When asked to review, bde said "I don't like it" - but
provided no actionable feedback or superior implementations.

Discussed with: jhb
Submitted by: jmd
Differential Revision: https://reviews.freebsd.org/D15919
2018-07-15 00:23:10 +00:00
Pedro F. Giffuni 5e53a4f90f lib: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using mis-identified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-26 02:00:33 +00:00
Ed Schouten 2cec876a59 Rename cpack*() to CMPLX*().
The C11 standard introduced a set of macros (CMPLX, CMPLXF, CMPLXL) that
can be used to construct complex numbers from a pair of real and
imaginary numbers. Unfortunately, they require some compiler support,
which is why we only define them for Clang and GCC>=4.7.

The cpack() function in libm performs the same task as CMPLX(), but
cannot be used to generate compile-time constants. This means that all
invocations of cpack() can safely be replaced by C11's CMPLX(). To keep
the code building with GCC 4.2, provide copies of CMPLX() that can at
least be used to generate run-time complex numbers.

This makes it easier to build some of the functions outside of libm.
2014-12-16 09:21:56 +00:00
Steve Kargl a4e4b355f4 The value small=2**-(p+3), where p is the precision, can be determine from
lgamma(x) = -log(x) - log(1+x) + x*(1-g) + x**2*P(x) with g = 0.57...
being the Euler constant and P(x) a polynomial.  Substitution of small
into the RHS shows that the last 3 terms are negligible in comparison to
the leading term.  The choice of 3 may be conservative.

The value large=2**(p+3) is detemined from Stirling's approximation
lgamma(x) = x*(log(x)-1) - log(x)/2 + log(2*pi)/2 + P(1/x)/x
Again, substitution of large into the RHS reveals the last 3 terms
are negligible in comparison to the leading term.

Move the x=+-0 special case into the |x|<small block.

In the ld80 and ld128 implementaion, use fdlibm compatible comparisons
involving ix, lx, and llx.  This replaces several floating point
comparisons (some involving fabsl()) and also fixes the special cases
x=1 and x=2.

While here
  . Remove unnecessary parentheses.
  . Fix/improve comments due to the above changes.
  . Fix nearby whitespace.

* src/e_lgamma_r.c:
  . Sort declaration.
  . Remove unneeded explicit cast for type conversion.
  . Replace a double literal constant by an integer literal constant.

* src/e_lgammaf_r.c:
  . Sort declaration.

* ld128/e_lgammal_r.c:
  . Replace a long double literal constant by a double literal constant.

* ld80/e_lgammal_r.c:
  . Remove unused '#include float.h'
  . Replace a long double literal constant by a double literal constant.

Requested by:	bde
2014-10-09 22:39:52 +00:00
Steve Kargl f382031d34 For targets that have a signed zero, lgamma_r(-0, &signgamp) should
set signgamp = -1.

Submitted by:	enh at google dot com (e_lgamma[f]_r.c)
2014-09-17 19:01:22 +00:00
Steve Kargl f7efd14df1 * Makefile:
. Hook e_lgammal[_r].c to the build.
  . Create man page links for lgammal[-r].3.

* Symbol.map:
  . Sort lgammal to its rightful place.
  . Add FBSD_1.4 section for the new lgamal_r symbol.

* ld128/e_lgammal_r.c:
  . 128-bit implementataion of lgammal_r().

* ld80/e_lgammal_r.c:
  . Intel 80-bit format implementation of lgammal_r().

* src/e_lgamma.c:
  . Expose lgammal as a weak reference to lgamma for platforms
    where long double is mapped to double.

* src/e_lgamma_r.c:
  . Use integer literal constants instead of real literal constants.
    Let compiler(s) do the job of conversion to the appropriate type.
  . Expose lgammal_r as a weak reference to lgamma_r for platforms
    where long double is mapped to double.

* src/e_lgammaf_r.c:
  . Fixed the Cygnus Support conversion of e_lgamma_r.c to float.
    This includes the generation of new polynomial and rational
    approximations with fewer terms.  For each approximation, include
    a comment on an estimate of the accuracy over the relevant domain.
  . Use integer literal constants instead of real literal constants.
    Let compiler(s) do the job of conversion to the appropriate type.
    This allows the removal of several explicit casts of double values
    to float.

* src/e_lgammal.c:
  . Wrapper for lgammal() about lgammal_r().

* src/imprecise.c:
  . Remove the lgamma.

* src/math.h:
  . Add a prototype for lgammal_r().

* man/lgamma.3:
  . Document the new functions.

Reviewed by:	bde
2014-09-15 23:21:57 +00:00
Steve Kargl 3b5e0d0f96 * Makefile:
. Add s_erfl.c to building libm.
  . Add MLINKS for erfl.3 and erfcl.3.

* Symbol.map:
  . Move erfl and erfcl to their proper location.

* ld128/s_erfl.c:
  . Implementations of erfl and erfcl in the IEEE 754 128-bit format.

* ld80/s_erfl.c:
  . Implementations of erfl and erfcl in the Intel 80-bit format.

* man/erf.3:
  . Document the new functions.
  . While here, remove an incomplete sentence.

* src/imprecise.c:
  . Remove the stupidity of mapping erfl and erfcl to erf and erfc.

* src/math.h:
  . Move the declarations of erfl and erfcl to their proper place.

* src/s_erf.c:
  . For architectures where double and long double are the same
    floating point format, use weak references to map erfl to
    erf and ercl to erfc.

Reviewed by:	bde (many earlier versions)
2014-07-13 17:05:03 +00:00
Dimitry Andric 6202fb7bd3 In lib/msun/ld128/s_expl.c, remove '/*' within block comment, to avoid a
warning.
2014-02-21 21:54:36 +00:00
Steve Kargl 5f63fbd67f * ld80/k_expl.h:
* ld128/k_expl.h:
  . Split out a computational kernel,__k_expl(x, &hi, &lo, &k) from expl(x).
    x must be finite and not tiny or huge.  The kernel returns hi and lo
    values for extra precision and an exponent k for a 2**k scale factor.
  . Define additional kernels k_hexpl() and hexpl() that include a 1/2
    scaling and are used by the hyperbolic functions.

* ld80/s_expl.c:
* ld128/s_expl.c:
  . Use the __k_expl() kernel.

Obtained from:	bde
2013-12-30 00:51:25 +00:00
Steve Kargl 1a287d1ddf Change a comma to a semicolon.
Remove a blank line that crept into the declarations.

Fix a comment to show a sign on a NaN.
2013-06-03 20:09:22 +00:00
Steve Kargl 3ffff4bad5 ld80 and ld128 implementations of expm1l(). This code started life
as a fairly faithful implementation of the algorithm found in

PTP Tang, "Table-driven implementation of the Expm1 function
in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 18,
211-222 (1992).

Over the last 18-24 months, the code has under gone significant
optimization and testing.

Reviewed by:	bde
Obtained from:	bde (most of the optimizations)
2013-06-03 19:51:32 +00:00
Steve Kargl 42e4111cab Fix two comments that got lost in the disentanglement of the larger diff. 2013-06-03 19:29:03 +00:00
Steve Kargl 8cc74771f2 ld80/s_expl.c:
* Use integral numerical constants, and let the compiler do the
  conversion to long double.

ld128/s_expl.c:

* Use integral numerical constants, and let the compiler do the
  conversion to long double.
* Use the ENTERI/RETURNI macros, which are no-ops on ld128.  This
  however makes the ld80 and ld128 identical.

Reviewed by:	bde (as part of larger diff)
2013-06-03 19:13:44 +00:00
Steve Kargl 35cbca6a7f Micro-optimization: move the unary mius operator to operate
on a literal constant.

Obtained from:	bde
2013-06-03 18:57:35 +00:00
Steve Kargl a3f70b4ed8 Add a comment to note that bde supplied most, if not all,
of the optimizations.
2013-06-03 18:53:40 +00:00
Steve Kargl 1783063f18 ld80/s_expl.c:
* In the special case x = -Inf or -NaN, use a micro-optimization
  to eliminate the need to access u.xbits.man.

* Fix an off-by-one for small arguments |x| < 0x1p-65.

ld128/s_expl.c:

* In the special case x = -Inf or -NaN, use a micro-optimization
  to eliminate the need to access u.xbits.manh and u.xbits.manl.

* Fix an off-by-one for small arguments |x| < 0x1p-114.

Obtained from:	bde
2013-06-03 18:51:34 +00:00
Steve Kargl 31407861b8 ld80/s_expl.c:
* Update the evaluation of the polynomial.  This allows the removal
  of the now unused variables t23 and t45.

ld128/s_expl.c:

* Update the evaluation of the polynomial and the intermediate
  result t.  This update allows several numerical constants to be
  written as double rather than long double constants.   Update
  the constants as appropriate.

Obtained from:	bde
2013-06-03 18:40:00 +00:00
Steve Kargl f3049ab5f3 Update a comment to reflect that we are using an endpoint of
an interval instead of a midpoint.
2013-06-03 18:14:18 +00:00
Steve Kargl 4aa8c9453f Introduce the macro LOG2_INTERVAL, which is log2(number of intervals).
Use the macroi as a micro-optimization to convert a subtraction and
division to a shift.

Obtained from:	bde
2013-06-03 17:51:08 +00:00
Steve Kargl 03e1315345 Whitespace. 2013-06-03 17:40:52 +00:00
Steve Kargl bb23de67bb * Rename the polynomial coefficients from P2, P3, ... to A2, A3, ....
The names now coincide with the name used in PTP Tang's paper.

* Rename the variable from s to tbl to better reflect that
  this is a table, and to be consistent with the naming scheme
  in s_exp2l.c

Reviewed by:	bde (as part of larger diff)
2013-06-03 17:36:26 +00:00
Steve Kargl a1d69112c1 ld80/s_expl.c:
* Update Copyright years to include 2013.

ld128/s_expl.c:

* Correct and update Copyright years.  This code originated from
  the ld80 version, so it should reflect the same time period.

Reviewed by:	bde (as part of larger diff)
2013-06-03 17:21:43 +00:00
David Schultz 25a4d6bfda Add logl, log2l, log10l, and log1pl.
Submitted by:	bde
2013-06-03 09:14:31 +00:00
David Schultz 7dbbb6dde3 Fix some regressions caused by the switch from gcc to clang. The fixes
are workarounds for various symptoms of the problem described in clang
bugs 3929, 8100, 8241, 10409, and 12958.

The regression tests did their job: they failed, someone brought it
up on the mailing lists, and then the issue got ignored for 6 months.
Oops. There may still be some regressions for functions we don't have
test coverage for yet.
2013-05-27 08:50:10 +00:00
Steve Kargl dba466c344 * ld80/s_expl.c:
. Fix the threshold for expl(x) where |x| is small.
  . Also update the previously incorrect comment to match the
    new threshold.

* ld128/s_expl.c:
  . Re-order logic in exceptional cases to match the logic used in
    other long double functions.
  . Fix the threshold for expl(x) where is |x| is small.
  . Also update the previously incorrect comment to match the
    new threshold.

Submitted by:	bde
Approved by:	das (mentor)
2012-09-23 18:32:03 +00:00
Steve Kargl 8f647ffd7f * ld80/s_expl.c:
. Guard a comment from reformatting by indent(1).
  . Re-order variables in declarations to alphabetical order.
  . Remove a banal comment.

* ld128/s_expl.c:
  . Add a comment to point to ld80/s_expl.c for implementation details.
  . Move the #define of INTERVAL to reduce the diff with ld80/s_expl.c.
  . twom10000 does not need to be volatile, so move its declaration.
  . Re-order variables in declarations to alphabetical order.
  . Add a comment that describes the argument reduction.
  . Remove the same banal comment found in ld80/s_expl.c.

Reviewed by:	bde
Approved by:	das (mentor)
2012-09-23 18:06:27 +00:00
Steve Kargl ca50c4b871 Whitespace.
Submitted by:	bde
Approved by:	das (pre-approved)
2012-07-30 21:55:49 +00:00
Steve Kargl 8345cbd275 Replace the macro name NUM with INTERVALS. This change provides
compatibility with the INTERVALS macro used in the soon-to-be-commmitted
expm1l() and someday-to-be-committed log*l() functions.

Add a comment into ld128/s_expl.c noting at gcc issue that was
deleted when rewriting ld80/e_expl.c as ld128/s_expl.c.

Requested by:	bde
Approved by:	das (mentor)
2012-07-26 04:05:08 +00:00
Steve Kargl f7cfe68f59 * ld80/expl.c:
. Remove a few #ifdefs that should have been removed in the initial
    commit.
  . Sort fpmath.h to its rightful place.

* ld128/s_expl.c:
  . Replace EXPMASK with its actual value.
  . Sort fpmath.h to its rightful place.

Requested by:	bde
Approved by:	das (mentor)
2012-07-26 03:59:33 +00:00
Steve Kargl b83ccea32c Compute the exponential of x for Intel 80-bit format and IEEE 128-bit
format.  These implementations are based on

PTP Tang, "Table-driven implementation of the exponential function
in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 15,
144-157 (1989).

PR: standards/152415
Submitted by: kargl
Reviewed by: bde, das
Approved by: das (mentor)
2012-07-23 19:13:55 +00:00