[maemo-developers] [maemo-developers] Optimized memory copying functions for Nokia 770 (final part)
From: Siarhei Siamashka siarhei.siamashka at gmail.comDate: Tue Dec 5 09:25:24 EET 2006
- Previous message: [maemo-developers] Err http://repository.maemo.org scirocco/free Packages 404 Not Found
- Next message: [maemo-developers] maemo-stars: problems with package dependency
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello All, Here is an old link with some benchmarks and initial information: http://maemo.org/pipermail/maemo-developers/2006-March/003269.html Now for more completeness, memcpy equivalent is also available and the functions exist in two flavours (either gcc inline macros, or just assembly code), all the sources are here: https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/fastmem-arm9/?root=mplayer The easiest way to try this code is just linking 'fastmem-arm9.S' with your code, it will override glibc 'memcpy' and 'memset' functions with this optimized implementation. But it will probably not affect code that is contained in other shared libararies, for example SDL will still most likely use functions from glibc. If you decide to try using gcc inline macros, it may be not safe, beware of compiler bugs, more details and testcases are here: https://maemo.org/bugzilla/show_bug.cgi?id=733 Anyway, this code may be useful for various games, emulators or any software that may need to clear/initialize or copy large memory blocks fast. So those who are interested, may scavenge something useful there :) At least adding a variation of this this code to allegro game programming library for bitmaps blitting/clearing functions allowed to improve framerate in ufo2000 quite a lot. Sure, that's because of nonoptimal full screen update method which is not very fast and battery friendly anyway and should be changed to screen updates only for the parts of screen that were changed. But sometimes you may have to update full screen anyway, for example when you have it filled with fire and smoke animation. So having fast bitmaps blitting code and being able to just update full screen and have no problems with performance may be a good thing. Technical explanation (at least my understanding of it) is the following. Nokia 770 cpu has some small amount of write back cache, but it is not write allocate. That means if some memory block is already cached, write operation is fast and data is stored immediately to cache. But if some memory block is not cached, it can get to cpu data cache only after read operation, but not write (read allocate cache behaviour). If destination buffer in not in cache, write to it will be performed directly to memory using write buffer. Transfers to memory are performed using blocks of 4, 16, or 32 bytes and these blocks should be aligned. See '5.7 TCM write buffer' and '6.2.2 Transfer size' from http://www.arm.com/pdfs/DDI0198D_926_TRM.pdf So if you write to memory one byte at once, memory bandwidth is wasted (you get only one byte written per memory bus transfer operation, while you could easily get 4 bytes written instead). Here is the worst possible memcpy implementation for example, if you benchmark it, you will get some interesting numbers: void memcpy_trivial(uint8_t *dst, uint8_t *src, int count) { while (--count >= 0) *dst++ = *src++; } But the best performance is achieved when using 16 bytes transfers (aligned at 16 bytes boundary, otherwise it will be just split into some 4 byte transfers). This can't be coded in C, and the use of assembly STM instruction with 4 registers as operands is needed (or any number of registers that is multiple of 4).
- Previous message: [maemo-developers] Err http://repository.maemo.org scirocco/free Packages 404 Not Found
- Next message: [maemo-developers] maemo-stars: problems with package dependency
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]