Re: Bug report: double floating point arithmetic errors under Linux

From: MARRE Bruno <Bruno.Marre_at_cea.fr> Date: Tue 22 Oct 2002 02:48:37 PM GMT Message-ID: <3DB56545.5090700@cea.fr> · This archive was generated by hypermail 2.1.8 : Wed 16 Nov 2005 06:08:18 PM GMT GMT

Hi Warwick,

Thank you for replying so fast.

> First, may I ask why you are surprised that different processors do floating
> point numbers slightly differently, particularly when pushing the limits of
> precision?
> 

I thought that Eclipse floats were implemented with the GMP library 
(which is used inside Eclipse as said on the banner), in conformance 
with IEEE norm, in a processor/system independant way
(portability across Eclipse platforms ?).

> Solaris SPARC:
> 
>>[eclipse 1]: N is breal(0_1).
>>
>>N = -4.94065645841247e-324__4.94065645841247e-324
>>Yes (0.00s cpu)
>>
> 
> Linux P4:
> 
>>[eclipse 1]: N is breal(0_1).
>>
>>N = -2.2250738585072014e-308__2.2250738585072014e-308
>>Yes (0.00s cpu)
>>
> 
> This difference is a result of Solaris defining MINDOUBLE to be the smallest
> denormalised double, while Linux defines it to be the smallest normalised
> double, and only crops up in the special case where a zero bound is being
> "widened" for numerical safety.  We will look into whether this should be
> made consistent.

I think that since breal is a specific Eclipse type, this point could be 
made consistent across different versions of Eclipse, but since they are 
defined through a floatting point interval arithmetic, portability 
cannot be achieved with processor/system dependant floatting points ?

> Solaris SPARC:
>>
>>[eclipse 2]: N is breal(rational(16.0)),  %% Interval containing 16.0
>>	breal_max(N,SN),                  %% Successor float of 16.0
>>	Delta is (SN - 16.0)/2.0,         %% Righ error of 16.0
>>	Sixteen is 16.0 + Delta,          %% Are we still on 16.0 ?		
>>         BrDelta is breal(rational(Delta)),%% Interval enclosing Delta
>>	breal_max(BrDelta,SDelta),        %% Successor of Delta
>>	NotSixteen is 16.0 + SDelta.
>> ...
> Now, 16.0 + SDelta is 10.00000000000080000000000008.  On the SPARC, this is
> rounded up, to 10.000000000001, since that is the closer representable
> double.  On the i386, the result is first rounded to the internal register's
> precision, resulting in 10.0000000000008000 (the closest internally
> representable number).  When it comes time to write this out to memory, a
> further rounding must occur, except that the rounded result is, like the
> previous case, half-way between two representable doubles, so it rounds to
> the "even" one again: 10.000000000000.  Hence the difference in the
> results.
> 
> If you think this is wrong, complain to Intel.  :)  

I am not a floating point expert, but it seems to be (almost) wrong with 
respect to IEEE norm ?

> ...
> One can put the i386 into a 64-bit internal double mode, which would
> presumably give the same result as for the SPARC, but this is a trade-off:
> the default 80-bit mode can be expected to have lower average error over a
> sequence of operations, but the 64-bit mode has (apparently)
> ever-so-slightly better worst-case error.
> 

I really would like to have the same accuracy between Eclipse platforms 
(at least for Sparc Sunos and PC Linux).
How can I do this from Eclipse ? How can I put the i386 into a 64-bit 
internal mode ?

> Is this level of accuracy important to you?  If so, why?  What is it you are
> trying to do?

I am trying to write my floating point resolution procedure, which will 
be used for test generation purposes (thus I need to strictly follow 
IEEE for basic operations, and I need portability across sparc and intel).

> I'm also not sure why you're converting these things to rationals and back.
> What is it you are trying to achieve by doing this?

It was just an easy (bad :-( ) way to get the floats immediatly before 
and after a given float, in order to compute the left/right errors of 
this float value (for floating point interval arithmetic). I will write 
later a cleaner definition of these functions.

> P.S.  You're lucky I've just been doing some work with floats and rounding
> modes and so on, so have learnt about all this stuff --- a month ago I
> probably wouldn't have been able to help you!

Again, thank you for detailed answer, it helps me a lot discovering the 
hidden things about floats.

I would like to know for which purpose do you use GMP ? Only for non 
basic float operations ?
Would it be reasonnable to use it (inside Eclipse) in order to provide a 
portable IEEE implementation of floats ?
If it is not reasonnable, is there a way to call GMP operations (basic 
and others), set/get GMP accurracy parameters/rounding modes from Eclipse ?
This could be really convenient for me.

Cheers,

	Bruno