Os2007HackerEditionArchives

  1. OS2007 Hacker Edition Archives
    1. Reproducing the effort with N800 kernel and initfs
    2. What is done?
    3. Initfs
    4. rootfs
    5. What works
    6. What doesn't work
    7. What needs to be done
    8. New what needs to be done (so, what's left)
  2. LOG (describes the process done, not in particular order)
    1. Getting N800 image to run on 770
    2. First feelings
    3. Major issues
    4. New kernel
    5. Arnaud Patard's kernel patches

OS2007 Hacker Edition Archives

This page contains early information about OS2007 Hacker Edition project. The up-to-date information can be found from Os2007On770 page. The contents of this page are likely to be outdated and do not apply to the current hacker edition release. The purpose of this page is to document what kind of difficulties were encountered while hacking kernel/initfs.

Reproducing the effort with N800 kernel and initfs

Initfs wasn't that bad after all. No source code modifications was done to any component (so, no patches for initfs). Just recompiling a couple of packages was enough. The section lower about initfs still applies. 3rd-party developers cannot use the package mentioned, but the jffs2 filesystem image can be extracted by the flasher tool. Initfs created this way spammed some error messages to the serial console while booting.

Those interested about the old kernel effort can check the patch, but that's likely not so usable.


What is done?

This section describes what we have made and how. Check the LOG section for more details (if any).

Basically, we first flashed normal 770 device with the latest official image (39-14) and then flashed modified versions of kernel, initfs and rootfs. Kernel

This is the most important part. We used the N800 kernel image from apt-repository (kernel-source-rx-34-2.6.18) as a base. Important things:

  • The default configuration provided for 770 is the original one, not the one from OS 2006. Thus it's not a good starting point.
  • It's not possible to compile the kernel for 770 without modifications. This is likely because those parts are from original 770 kernel and much have changed after that. One needs to modify many parts (mostly copying stuff from omap head or OS2006 tree) in order to make the kernel compile.
    • In addition to compilation problems more changes need to be made in order to get drivers to work. Some board specific files are missing platform data, which makes lcd and touchscreen drivers not to work.

Our kernel configuration can be found here. It's not necessarily the "right" one, but seems to do it's job.

The following parts need to be modified:

  • lcd driver
  • nand driver
  • omap1 specific board files (platform datas missing for lcd/touchscreen).

These modifications should be easy for somebody, who knows these drivers.

Initfs

We used rx-34-initfs from apt-repository as a base. The tgz was extracted and modified and then packaged into jffs2 image. The following changes are needed for initfs:

  • uClibc needs to be recompiled without VFP usage. Otherwise the system hangs at boot. Recompiled versions are just copied on top of the old ones.
  • Kernel modules need to be copied from our new kernel installation. Just copy them on top of the old ones.
  • WLAN driver needs to be recompiled/reconfigured. I checked cx3110x from a Nokia internal git repo. It didn't compile/link out of the box, it had the following problems:
    • Missing includes when compiling for 770.
    • Linking didn't recognize the correct EABI version, I needed to edit Makefile.k26 by hand.
  • The firmware for BT chip needs to be copied on top of the N800 one. This file is not compiled, it's just a binary blob, so one can copy 770 version on top of N800 version.

After these modifications initfs should mostly work, but some errors are still spammed by cal-tool. They do not seem to be showstoppers, though...

rootfs

We used the rootfilesystem tgz-image 1.2006.47-20 as a base and modified needed files. Since the lower levels took so much time we have not yet done any actual recompilations for the rootfs, but we have disabled the functionality that have caused problems. The following changes have been made:

  • Made a symlink to /usr/lib/hotplug/firmware that points to 770 BT firmware that we put onto initfs.

Disable the following services (by returning immediately from their startup script):

  • dsp-init (we currently haven't tried to get dsp work)
  • esd (depends on dsp)
  • osso-hss
  • osso-ias

ke-recv needs modifications into startup script, since the sysfs paths are different for 770 and N800. The bad thing the contents of sysfs files is different as well, but this cannot be configured from startup script. So, ke-recv needs to be modified and recompiled

What works

Many things, you can (for example):

  • Boot the system and desktop comes up.
  • Use most applications (notes, paint, images...)
  • Pair device with a phone and connect to internet
  • Connect to wlan and access internet :) Earlier problems were most likely caused by bad signal strength or config...
  • Browser (with OS2006 engine) Maemo.org successfully loaded :)

What doesn't work

  • Work with ke-recv is not yet finished. All MMC-related functionality is not working (using both mmc-cover + usb cable makes ke-recv to lose the current state: It believes that no mmc is inserted any more...). However, this is something that one can certainly fix...
  • Browser crashes at startup, it just flashes briefly. This was first believed to be connectivity issue, but since connectivity mostly works, this is something else. Perhaps an another floating point problem.

Actually it looks like that the browser doesn't crash, it exits by itself?!# I remember seeing a simular effect with low-on-memory situations... I closed some services, which increased the free memory from 4MB to 12MB, but it didn't change anything... Perhaps this was not the case after all.

Opera libraries create several background threads. One of them executes an illegal instruction (SIGILL), which causes the process to terminate. The hard thing was that the return code is just zero, no sign about signal that killed the background thread. The illegal instruction itself is likely to be some N800 specific (floating point?) instruction...

This case is now closed You can run browser and access internet (either via WLAN/BT).

  • Mediaplayer doesn't work, since there is no dsp
  • System sounds do not work
  • Sometimes (usually after you close a program or some system dialog comes up) the system freezes for about half a minute. Do not know why.
  • While going through the startup wizards, the dsme (or whoever is handling /dev/watchdog) get killed for some reason. This causes the omap watchdog to reboot after some time. This seems to be somehow related to config partition contents, since malloc returns NULL (likely because of len 0 from config). Flashing with OS2006 image, setting time etc and reflasing makes the problem to go away? Still no clue how this is related to dsme exit/crash?

What needs to be done

  • Investigate the dsp stuff. I briefly tested with the old dsp modules without success, but that was with old kernel and initfs and new rootfs... This can trigger many new tasks, since some programs can be depending on some specific dsp services (and they were didderent on these devices).

I needed to apply two patches in order to get it to do anything. Now I'm running out of DSP TLB entries... This seems to be because mapping the framebuffer consumes almost all TLB entries. I had to disable framebuffer mapping by removing all relevant lines from dsp_dld_avs.conf and avs_kernelcfg.cmd.

The next problem was that loading the DSP kernel/modules failed, because somebody tried to access location outside dsp memory device region (limit was 0x28000 and tried address was 0x100000). This was caused by change how 2.6.16 and 2.6.18 handled memory enabling. 2.6.18 returned -EIO, but 2.6.16 touched some mutex and returned success.

Now the DSP kernel loads ok, but doesn't start. Linux kernel doesn't receive reply (interrupt) from dsp within some predefined timeout. I'm wondering if the both kernel versions use the same mailbox protocol, or if it's implementation is just buggy...

  • Try to recompile the browser with old libopera, if possible (we are not allowed to distribute the new library with old device anyway...) This is now done (at least most of the functionality works). Some smaller issues may have slipped through unnoticed... The following steps were needed:

    • Install old version of libopera (osso-browser-opera package).
    • Install matching version of opera adapter (we cannot recompile libopera, so we need to take the same version that was used when compiling that).
    • Recompile opera-eal against libopera and opera-adapter. Minor code changes were needed to get that work.
    • Copy new libraries (libopera, libopera-adapter, libopera-eal) to the rootimage.
  • The occasional freeze that usually takes place when you close the program is caused by trying to access esd daemon: The (to be freezed) process opens /tmp/.esd/socket and tries to authenticate with the server. Since there is nobody on the other end, the process freezes until some timeout expires.

New what needs to be done (so, what's left)

  • DSP still not working...
  • esd not working: Either fix dsp or then make a dummy esd (otherwise we experience ugly freezes).
  • If we get dsp to work, check how multimedia works now.
  • Recompile browser to use launcher again, remove tracing data.
  • Remove tracing data from kernel
  • Remove those sw packages that we are not allowed to ship for the old hw.
  • Prepare "final" kernel, initfs and rootfs images

LOG (describes the process done, not in particular order)

Getting N800 image to run on 770

First I tried to create a minimal image for 770 and then appending packages on top of that. This, however, had the following problems:

  • Packages are many
  • Building image inside scratchbox causes all kind of fancy permission problems etc.

So, I decided to use ready made images as starting point:

  • Latest OS2006 image
  • Sales version of N800 image

I first flashed the OS2006 image onto target. Then a used the root filesystem from N800 image. So, the system is having kernel and initfs from OS 2006 and rootfs from N800 (currently). This setup didn't (of course) boot, so I started removing stuff from rootfs until the system booted.

The N800 based rootfs was done as following:

  • Extracted rootfs.tbz into local directory
  • Placed extra exit-calls into several startup sctripts in order to disable functionality that caused problems
  • Used mkfs.jffs2 to build a new root filesystem, which was written to target

Modified scripts (note! Scripts were disabled if some hangs/reboots appeared near their startup. It might happen that the script/service was not related to the problem):

  • (alarmd) => Enabled this again, there was no problems with this after all...
  • bluez-utils
  • btcond
  • esd => Enabled again, similar case to osso-hss (restarts constantly, with return code 2).
  • ke-recv (steals all CPU) => Copied old ke-recv from OS2006: Not API compatible with new one, but didn't notice too severe problems.
  • mediaplayer-daemon
  • metalayer-crawler
  • obexsrv
  • osso-hss => Enabled again, restarts constantly (with exit code 1) every 15 seconds or so... Looks like it fails to init dsp: it tries 10 times and sleeps 2 seconds between attempts.
  • (osso-ias) => Enabled this again, restarts every now and then (at least after esd was enabled again)...
  • osso-connectivity-ui-gwwizard-startup.sh
  • osso-connectivity-ui.sh => Re-enabled this, starts up and scans wlans nicely, connecting fails, though...
  • osso-media-server.sh

After modifying the rootfs, the system boots up to the desktop and one can login via serial console.

First feelings

  • Notes works otherwise, but since the keyboard is not running, you cannot type anything => Something killed the keyboard during boot, after restarting it manually it works fine...
  • File manager, bookmark manager: Seem to work, didn't test memory card/gateway related functionality yet
  • Imageviewer: Works, but rocker arrows do not do anything (this is not a imageviewer issue anyways).
  • Media player: No luck this far, takes ages to start, doesn't do anything usefull. Occasionally crasches
  • Help: Works fine
  • At least starts up, when trying to send image via mail.
  • Screen calibration works
  • changing themes/background work
  • clock works, but alarms do not (since I disabled the daemon)
  • Calculator works
  • Screen brightness controll works
  • Sketch works
  • Pdf reader works
  • Chess works
  • Marbles segfault
  • Connectivity UI was disabled, so browser etc doesn't work
  • Powering the system down from power-button works.
  • Screen/button locking/unlocking works
  • Locking/unlocking with lock code works.
  • Setting alarms works, alarms take place as expected.

Major issues

  • DSP doesn't work => No sound
  • ke-recv: The only kernel dependant package within af. We can:
    • Use old kernel with old version of ke-recv => Not fully API compatible (don't know what breaks), but works quite good in practise: MMC insertion/removal is detected as well as insertion of USB cable. One can rename the card from file manager. The only issue detected is that ke-recv is restarted once during system boot.
    • Using new ke-recv requires N800 kernel, not tried this yet.

New kernel

  • ke-recv (at least) depends on new kernel

Compiling N800 kernel for 770 seems to require small code change to the kernel. This is because a change is done for OMAP2 only, but OMAP1 has stayed the same. Using a similar change for OMAP1 allows us to build the kernel:

~/kernel-source-rx-34-2.6.18 > find | (while read i; do grep omap1_ext_if $i; done) extern struct lcd_ctrl_extif omap1_ext_if; fbdev->ext_if = &omap1_ext_if; ~/kernel-source-rx-34-2.6.18 >

So, symbol omap1_ext_if cannot be found anywhere. By seeing how omapfs_main.c and rfbi.c have changed, we can made similar change to sossi.c:

  • struct lcd_ctrl_extif sossi_extif = {
  • struct lcd_ctrl_extif omap1_ext_if = {

Now the kernel builds, and starts, but doesn't really work:

|110.370513Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(31,3) |110.378967/p>

This seems to happen because I screwed up the flash configuration... Too bad, back to configuration.

It looks like something similar is going on in flash config: Structure members have been changed, but only N800 version (onenand) is changed to use new structure members. 770 version gives errors on missing members... However, at this time the number of errors/changes is much larger.

The reason for this seems to be that Nokia provided driver for omap1710 internal NAND-flash (drivers/mtd/nand/omap-hw.c) is not compatible with 2.6.18 kernel. This driver is updated in the public omap-patch git-repostory, but N800 is using some older version.

After patching the NAND driver files the kernel compiles and runs (until it tries to execute linuxrc from initfs). Initfs ¶

* linuxrc from old OS2006 initfs starts, but gets confused because of different kernel and reboots. This was expected behaviour
* linuxrc from OS2007 initfs does absolutely nothing. Kernel tries to start it, but after that nothing is printed to serial console.

linuxrc is a script executed by /bin/sh, which in turn is a symlink to busybox. It looks like that exec:ing busybox succeeds (since kernel doesn't try to find new init candidates), but it fails to run for some reason.

I build a statically linked version of busybox and put it into initfs (I had to delete testserver and some kernel modules to make room for the static binary). Now busybox starts and linuxrc gets executed. But dsme reports about illegal instruction...

I'm wondering what was this busybox problem in the first place. Since exactly the same happens with plain N800 initfs (which works nicely on N800), there have to be some kind of hardware dependency in busybox or one of the libraries it's linked against to. Anyway, I cannot think a problem that would allow the binary to be exec:ed, but still failing before first instruction is actually run. Or perhaps the dynamic loader just hangs??? No idea.

I noticed that I really should be using uClibc instead of glibc. Ok, I installed armel+uClibc capable toolchain and built the whole initfs from scratch (using the provided gar build system). The same effect: busybox didn't start. But at this time even the statically linked helloworld didn't work at all. So, it looks like that the N800 version of uClibc library doesn't work with 770 hw.

It turned out that the uClibc library is not compiled as a part of initfs creation, but it is taken directly from scratchbox toolchain (it is just stripped). So, the library in the toolchain is incompatible with 770 and this can only be fixed by creating a new toolchain. I'm wondering what kind of incompatibility scratchbox-toolchain-cs2005q3.2-uclibc-arm-eabi_0.9.8.5-3_i386.deb actually contains...

Solved: n770_defconfig is appearantly for the original 770 release, not for OS2006. So the kernel was missing EABI support. Busybox still doesn't work when invoked as sh. The program seems to hang/crash when it executes setjmp call from ash shell main function. This is because uClibc was compiled with VFP support, so as a part of setjmp call it tries to save the state of floating point unit using VFP specific call. The bad thing is that these instructions currently cannot be emulated (the system crashes without mercy), so the toolchain needs to be recompiled without VFP support. After recompiling uClibc the shell starts fine.

The next problem is that dsme says it cannot load a library (which is there)... This was because there was wrong runpath in uclibc configuration, so the library was searched from a wrong (nonexisting) directory.

/usr/bin/bme: can't resolve symbol 'aeabi_d2iz': This was caused by my recompilation... Moving back to original image and just replacing uclibc makes this disappear.

The next problem appears to be that /dev/fb0 cannot be opened. This is because no framebuffer drivers are registered, which in turn is because lcd panel driver is not registered. For some reason probing is never done for the panel...

This panel stuff was pretty weird. Two things:

  • Hardware didn't match device: lcd_lph8923 <=> lcd_mipid
  • For some reason there was no platform data for the lcd driver?

Both of these are pretty weird.

Solved This weird behaviour was caused by the fact that arch/arm/mach-omap1/board-nokia770.c didn't define these!!!!!! I wonder which version of this file is actually included into N800 version of the kernel.

Initializing touchscreen fails, since platform data doesn't contain get_pendown_state function. This likely causes that some of /dev/input/eventX devices are not responding... But under N800 there 6 devices, under 770 just 3. I wonder what those extra devices do?

Now the booting process is stopped, because the device thinks that the battery voltage is not high enough for booting. Actually the reason is that the battery type is not known (error type 65535 is returned). It turns out that battery management utility (bme) relies on library that contains huge amount of ifdefs. So, it's possible that 770 requires recompilation of bme with different configuration. => No, this was just because of the wrong order of switches. No recompilation needed, the same version works for both.

One can get around this by faking battery type by command line tools. However, this is something that needs to be fixed by somebody who knows how this battery stuff works. Now Even X starts, but you also get pile of errors. It came clear that WLAN drivers are binary blobs with no source, so recompiling them is not an option. So, using new kernel is difficult at the end :( So, eventually it could be easier to go with old OS2006 kernel/initfs and just hack applications to survive with that... ?

Compiling latest wlan driver succeeded, after the following problems

* Minor code modifications were needed N800 specific code needed to be changed to it's 770 equivalent). Mostly ifdefs separated 770 and N800 specific blocks nicely
* For some reason linking failed, since the correct EABI version was not detected. Changing makefile fixed this.

After this the modules were copied to modified initfs image. The seemed to work (somehow at least)...

Firmware-binaries are not compiled, but they are taken from apackage "as-they-are". Need to check if similar binaries exist for old hw... Ok, actually one can take binary blob for the old hw from old repository and just copy it to initfs and make a symlink to it from rootfs. Using old firmware makes hcid to come up and be happy :)

After first boot (when running setup wizard), the process that has /dev/watchdog open get killed or somehow exits (or in some other reason tries to close watchdog device). This causes that omap_wdt reboots the device after a timeout (at least 30s). If you skip the wizards fast, the reset will happen, but on next boot they are not executed and the reboot doesn't happen again... I assume that this is dsme, since all dsme_socket_connect calls fail if you skip the wizards fast...

Good news I was able to pair device with phone, establish internet connection and update package list using the connection! Browser doesn't work yet, it still doesn't start. Looks like it's not a connectivity problem after all... Some connectivity dialogs appear as empty almost half a minute, I wonder why is that... I also managed to establish a WLAN connection, but using it failed for some reason. About ke-recv again ¶

New N800 version uses /sys/devices/platform/musb_hdrc/cable to detect USB-cable state. This sysfs path depends on TUSB6010 USB-controller, which is not available on old hw. 770 uses /sys/devices/platform/tahvo-usb/vbus_state for the same task. One can configure the path contents easily, but the problem is that the new location says "disconnected" while the old said "0" when the cable was disconnected. These string are hardcoded into binary, so the same binary cannot work for both platforms. I made a quick hack by using hex editor and changed the string. This makes the situation better: Now ke-recv notices that when mmc cover opens/closes, but cannot handle this in parallel with USB-cable. I wonder if there is some other hardcoded string that needs to be modified...

This case is now closed In oder to make ke-recv to work, we needed to patch the following things:

  • ke-recv needs to be recompiled and DETACHED_STR needs to be changed from "detached" to "0" before compiling...
  • ke-recv startup scripts and all osso-usb* scripts needs paths to be edited
  • Kernel needs patching to allow cable removing while the other end has mounted the drive. Additionally, the kernel must be compiled without CONFIG_USB_FILE_STORAGE_TEST option. Otherwise ke-recv doesn't re-enable mmc after other end ejects the device...

After these steps ke-recv seems to work nicely!

Arnaud Patard's kernel patches

http://git.rtp-net.org/?p=n770.git;a=tree