Performance of floating point instructions

Karma: 410

2010-03-10 07:55 UTC

Hi all,
in maemo-mapper I have a lot of code involved in doing transformations from
latitude/longitude to Mercator coordinates (used in google maps, for example),
calculation of distances, etc.

I'm trying to use integer arithmetics as much as possible, but sometimes it's a
bit impractical, and I wonder if it's really worth the trouble.

Does one have any figure about how the performance of the FPU is, compared to
integer operations?

A practical question: should I use this way of computing the square root:

http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Binary_numeral_system_.28base_2.29

(but operating on 32 or even 64 bits), or would I be better using sqrtf() or sqrt()?

Does anyone know any tricks to optimize certain operations on arrays of data?

Ciao,
Alberto

--
http://www.mardy.it <-- geek in un lingua international!

Re: Performance of floating point instructions

Sivan Greenberg

2010-03-10 08:26 UTC

Hi Alberto!

On Wed, Mar 10, 2010 at 9:55 AM, Alberto Mardegan <
mardy@users.sourceforge.net> wrote:

> Hi all,
> in maemo-mapper I have a lot of code involved in doing transformations
> from latitude/longitude to Mercator coordinates (used in google maps, for
> example), calculation of distances, etc.
>
> I'm trying to use integer arithmetics as much as possible, but sometimes
> it's a bit impractical, and I wonder if it's really worth the trouble.
>
> Does one have any figure about how the performance of the FPU is, compared
> to integer operations?
>
> A practical question: should I use this way of computing the square root:
>
>
> http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Binary_numeral_system_.28base_2.29
>
> (but operating on 32 or even 64 bits), or would I be better using sqrtf()
> or sqrt()?
>
>
> Does anyone know any tricks to optimize certain operations on arrays of
> data?
>

Basically, what we did with ThinX OS, is have a full blown soft-float
toolchain which then used the already proven and highly optimized GCC's
stack floating point operations. However , Maemo is not soft float, so I'd
recommend to experiment with rebuilding Mapper using such a soft float
enabled toolchain, statically linked to avoid glitches to system's libc or
have a seperat LD_LIBRARY_PATH to avoid memory hogging, and see where it
gets you.

IMHO this is the best way to do FP optimization. We have experimented with
it alot, including sqrtf and friend to no significant improvement.

Sivan

Re: Performance of floating point instructions

Ove Kaaven

2010-03-10 09:46 UTC

Alberto Mardegan skrev:
> Does anyone know any tricks to optimize certain operations on arrays of
> data?

The answer to that is, obviously, to use the Cortex-A-series SIMD
engine, NEON.

Supposedly you may be able to make gcc generate NEON instructions with
-mfpu=neon -ffast-math -ftree-vectorize (and perhaps -mfloat-abi=softfp,
but that's the default in the Fremantle SDK anyway), but it's still not
very good at it, so writing the asm by hand is still better... and I'm
not sure if it can automatically vectorize library calls like sqrt.

RE: Performance of floating point instructions

Simon Pickering

Karma: 786

2010-03-10 09:50 UTC

> > in maemo-mapper I have a lot of code involved in doing
> > transformations from latitude/longitude to Mercator
> > coordinates (used in google maps, for example), calculation
> > of distances, etc.
> >
> > I'm trying to use integer arithmetics as much as
> > possible, but sometimes it's a bit impractical, and I wonder
> > if it's really worth the trouble.

Is the code slow at the moment and is it specifically the fp stuff that's
slowing it down? If not, I'd say it's probably not worth the effort unless
you're doing this for fun/out of interest.

> > Does one have any figure about how the performance of
> > the FPU is, compared to integer operations?
> >
> > A practical question: should I use this way of
> > computing the square root:
> >
> > http://en.wikipedia.org/wiki/Methods_of_computing_square_roots
> > #Binary_numeral_system_.28base_2.29
> >
> > (but operating on 32 or even 64 bits), or would I be
> > better using sqrtf() or sqrt()?

I'd suggest writing some benchmark code for the functions you wish to
compare.

> > Does anyone know any tricks to optimize certain
> > operations on arrays of data?

There are SIMD extensions
(http://www.arm.com/products/processors/technologies/dsp-simd.php).

> Basically, what we did with ThinX OS, is have a full blown
> soft-float toolchain which then used the already proven and
> highly optimized GCC's stack floating point operations.
> However , Maemo is not soft float, so I'd recommend to
> experiment with rebuilding Mapper using such a soft float
> enabled toolchain, statically linked to avoid glitches to
> system's libc or have a seperat LD_LIBRARY_PATH to avoid
> memory hogging, and see where it gets you.

Soft-float is significantly slower than using the VFP hard-float (using
mfpu, etc., flags on GCC on the N900 and the N8x0 for that matter), there
should be emails containing benchmarks on the list from a long while back
otherwise I can dig them out again. But Alberto's situation is slightly
different as his integer-only code need not deal with arbitrary fp numbers
(as is the case for the soft-float code) as he knows what his inputs' ranges
will be, therefore he should be able to write more efficient and specialised
fixed point integer functions that avoid conversion to and from fp form and
that trim significant figures to the minimum he requires.

Cheers,

Simon

Re: Performance of floating point instructions

Laurent Desnogues

2010-03-10 10:14 UTC

On Wed, Mar 10, 2010 at 10:46 AM, Ove Kaaven <ovek@arcticnet.no> wrote:
> Alberto Mardegan skrev:
>> Does anyone know any tricks to optimize certain operations on arrays of
>> data?
>
> The answer to that is, obviously, to use the Cortex-A-series SIMD
> engine, NEON.
>
> Supposedly you may be able to make gcc generate NEON instructions with
> -mfpu=neon -ffast-math -ftree-vectorize (and perhaps -mfloat-abi=softfp,
> but that's the default in the Fremantle SDK anyway), but it's still not
> very good at it, so writing the asm by hand is still better... and I'm
> not sure if it can automatically vectorize library calls like sqrt.

One has to be careful with that approach: Cortex-A9 SoC won't
necessarily come with a NEON SIMD unit, as it's optional. So it'd
be better to also include code that doesn't assume one has a
NEON unit.

Laurent

Re: Performance of floating point instructions

Marcin Juszkiewicz

Karma: 630

2010-03-10 10:25 UTC

Dnia środa, 10 marca 2010 o 11:14:14 Laurent Desnogues napisał(a):

> One has to be careful with that approach: Cortex-A9 SoC won't
> necessarily come with a NEON SIMD unit, as it's optional. So it'd
> be better to also include code that doesn't assume one has a
> NEON unit.

Or if someone will try to run new ver of maemo-mapper on n8x0 for example.

Regards,
--
JID: hrw@jabber.org
Website: http://marcin.juszkiewicz.com.pl/
LinkedIn: http://www.linkedin.com/in/marcinjuszkiewicz

Re: Performance of floating point instructions

Kimmo Hämäläinen

Karma: 81

2010-03-10 11:39 UTC

On Wed, 2010-03-10 at 10:46 +0100, ext Ove Kaaven wrote:
> Alberto Mardegan skrev:
> > Does anyone know any tricks to optimize certain operations on arrays of
> > data?
>
> The answer to that is, obviously, to use the Cortex-A-series SIMD
> engine, NEON.
>
> Supposedly you may be able to make gcc generate NEON instructions with
> -mfpu=neon -ffast-math -ftree-vectorize (and perhaps -mfloat-abi=softfp,
> but that's the default in the Fremantle SDK anyway), but it's still not
> very good at it, so writing the asm by hand is still better... and I'm
> not sure if it can automatically vectorize library calls like sqrt.

You can also put the CPU to a "fast floats" mode, see hd_fpu_set_mode()
in
http://maemo.gitorious.org/fremantle-hildon-desktop/hildon-desktop/blobs/master/src/main.c

N900 has support for NEON instructions also.

-Kimmo

Re: Performance of floating point instructions

Alberto Mardegan

Karma: 410

2010-03-10 11:57 UTC

Kimmo Hämäläinen wrote:
> You can also put the CPU to a "fast floats" mode, see hd_fpu_set_mode()
> in
> http://maemo.gitorious.org/fremantle-hildon-desktop/hildon-desktop/blobs/master/src/main.c
>
> N900 has support for NEON instructions also.

This sounds interesting!

Is there any performance penalty if this switch is done often?

Ciao,
Alberto

--
http://www.mardy.it <-- geek in un lingua international!

Re: Performance of floating point instructions

Kimmo Hämäläinen

Karma: 81

2010-03-10 12:53 UTC

On Wed, 2010-03-10 at 12:57 +0100, ext Alberto Mardegan wrote:
> Kimmo Hämäläinen wrote:
> > You can also put the CPU to a "fast floats" mode, see hd_fpu_set_mode()
> > in
> > http://maemo.gitorious.org/fremantle-hildon-desktop/hildon-desktop/blobs/master/src/main.c
> >
> > N900 has support for NEON instructions also.
>
> This sounds interesting!
>
> Is there any performance penalty if this switch is done often?

IIRC, there was not. Leonid Moiseichuk was testing this about a year
ago, and he noticed almost 50% speed-up for floats. Notice that this
affects only floats, not doubles, and that there is a small accuracy
penalty.

-Kimmo

Re: Performance of floating point instructions

Eero Tamminen

Karma: 161

2010-03-10 16:20 UTC

Hi,

ext Alberto Mardegan wrote:
> Kimmo Hämäläinen wrote:
>> You can also put the CPU to a "fast floats" mode, see hd_fpu_set_mode()
>> in
>> http://maemo.gitorious.org/fremantle-hildon-desktop/hildon-desktop/blobs/master/src/main.c
>>
>> N900 has support for NEON instructions also.
>
> This sounds interesting!
>
> Is there any performance penalty if this switch is done often?

Why you would switch it off?

Operations on "fast floats" aren't IEEE compatible, but as far as
I've understood, they should differ only for numbers that are very close
to zero, close enough that repeating your algorithm few more times would
produce divide by zero even with IEEE semantics (i.e. if "fast float"
causes you issues, it's indicating that there's most likely some issue
in your algorithm).

- Eero

1 2 3 next »

Register

Performance of floating point instructions

Sivan Greenberg

Ove Kaaven

Laurent Desnogues