Planet maemo: category "feed:43af5b2374081abdd0dbc4ba26a0b54c"

Philip Van Hoof

I believe it was the QtContacts Tracker team who requested this feature. When they have to unset the value of a resource’s property and at the same time set a bunch of other properties, they need to use a DELETE statement upfront an INSERT OR REPLACE. The DELETE increases the amount of queries and introduces a SQL SELECT internally for solving the SPARQL DELETE’s WHERE.

Instead of that they wanted a way to express this in the INSERT OR REPLACE, and that way gain a bit of performance. Today I implemented this.

So let’s say we start with:

INSERT { <subject> a nie:InformationElement ; nie:title 'test' }

And then we replace the nie:title:

INSERT OR REPLACE { <subject> nie:title 'new test' }

Then of course we get ‘new test’ for the nie:title of the resource:

SELECT ?title { <subject> nie:title ?title }

Then let’s say we want to unset the nie:title, we can either use:

DELETE { <subject> nie:title ?title } WHERE { <subject> nie:title ?title }

or we can now also use this (and avoid an extra internal SQL SELECT to solve the SPARQL DELETE’s WHERE):

INSERT OR REPLACE { <subject> nie:title null }

For multi value properties will a null object in INSERT OR REPLACE results in a reset of the entire list of objects. There is still a SQL SELECT happening internally to get the so called old values, but that one is sort of unavoidable and is also used by a normal DELETE. I hope this feature helps the QtContacts Tracker team gain performance for their contact synchronization use cases.

You can find this in a branch, it might take some time before it reaches master as most of the Tracker team is at the Berlin Desktop Summit; it must be reviewed, of course. Since it doesn’t really change any of the existing APIs, as it only adds a feature, we might also bring it to 0.10. Although now that we started with 0.11, I think it probably belongs in 0.11 only. Distributions should probably just upgrade, wait for the new features until they decide to bump the version of their packages, or backport features themselves.

Categories: english
Philip Van Hoof

Refactoring our writeback system

2011-07-14 22:15 UTC  by  Philip Van Hoof
0
0

Tracker writes back certain metadata to your files. It for example writes back in XMP the title of a JPeg file, among other fields that XMP supports.

We had a service that runs in the background waiting for signals coming from the RDF store that tell it to perform a writeback.

To avoid that our FS miner would pick up the changes that the writeback service made, and that way index the file again, we introduced a D-Bus API for our FS miner called IgnoreNextUpdate. When the API is issued will the FS miner ignore the first next filesystem event that would otherwise be handled on a specific file.

That API is now among our biggest sources of race conditions. Although we wont remove it from 0.10 due to API promises, we don’t like it and want to get rid of it. Or at least we want to replace all its users.

To get rid of it we of course had to change the writeback service in a way that it wouldn’t need the API call on the FS miner any longer.

The solution we came up with was to move the handling of the signal and the queuing to the FS miner‘s process. There we have all the control we need.

The original reason why writing back was done as a service was to be robust against the libraries, used for the actual writeback, crashing or hanging. We wanted to keep this capability, so just like the extractor is a portion of the writeback system going to run out of process of the FS miner.

When a queued writeback task is to be run, an IPC call to a writeback process is made and returns only when it’s finished. Then the next task in the queue, in the FS miner, is selected. A lot like how the extracting of metadata works.

We have and will be working on this in the writeback-refactor branches next few days.

Categories: controversial
Philip Van Hoof

The ever growing journal problem

2011-07-11 14:55 UTC  by  Philip Van Hoof
0
0

Current upstream situation

Click to read 1544 more words
Categories: condescending
Philip Van Hoof

We delivered

2011-06-22 22:46 UTC  by  Philip Van Hoof
0
0

Damned guys, we’re too shy about what we delivered. When the N900 was made public we flooded the planets with our blogs about it. And now?

I’m proud of the software on this device. It’s good. Look at what Engadget is writing about it! Amazing. We should all be proud! And yes, I know about the turbulence in Nokia-land. Deal with it, it’s part of our job. Para-commandos don’t complain that they might get shot. They just know. It’s called research and development! (I know, bad metaphor)

I don’t remember that many good reviews about even the N900, and that phone was by many of its owners seen as among the best they’ve ever owned. Now is the time to support Harmattan the same way we passionately worked on the N900 and its predecessor tablets (N810, N800 and 770). Even if the N9′s future is uncertain: who cares? It’s mostly open source! And not open source in the ‘Android way’. You know what I mean.

The N9 will be a good phone. The Harmattan software is awesome. Note that Tracker and QSparql are being used by many of its standard applications. We have always been allowed to develop Tracker the way it’s supposed to be done. Like many other similar projects: in upstream.

As for short term future I can announce that we’re going to make Michael Meeks happy by finally solving the ever growing journal problem. Michael repeatedly and rightfully complained about this to us at conferences. Thanks Michael. I’ll write about how we’ll do it, soon. We have some ideas.

We have many other plans for long term future. But let’s for now work step by step. Our software, at least what goes to Harmattan, must be rock solid and very stable from now on. Introducing a serious regression would be a catastrophe.

I’m happy because with that growing journal – problem, I can finally focus on a tough coding problem again. I don’t like bugfixing-only periods. But yeah, I have enough experience to realize that sometimes this is needed.

And now, now we’re going to fight.

Categories: condescending
Philip Van Hoof

INSERT OR REPLACE explained in more detail

2011-03-25 16:30 UTC  by  Philip Van Hoof
0
0

A few weeks ago we were asked to improve data entry performance of Tracker’s RDF store.

Click to read 2124 more words
Categories: controversial
Philip Van Hoof

INSERT OR REPLACE explained in more detail

2011-03-17 23:38 UTC  by  Philip Van Hoof
0
0

A few weeks ago we were asked to improve data entry performance of Tracker’s RDF store.

Click to read 2204 more words
Categories: Informatics and programming
Philip Van Hoof

SPARQL Update has INSERT and DELETE. To update an existing triple in RDF you need to DELETE it first. You of course already have our INSERT-SILENT but that just ignores certain errors; it doesn’t replace triples.

Click to read 1568 more words
Categories: controversial
Philip Van Hoof

A few months ago we added the implicit tracker:modified property to all resources. This property is an auto-increment. It used to be that the property was incremented on ~ each SQL update-query that happens. The value is stored per resource.Synchronization in water

We are now changing this to be per transaction. A transaction in Tracker is one set of SPARQL-Update INSERT or DELETE queries. You can do inserts and deletes about multiple resources in one such sentence (a sentence can contain multiple space delimited Update queries). An exception is everything related to ontology changes. These ontology changes get the first increment as their value for tracker:modified. This is also for ontology changes that happen after the initial ontology transaction (at the first start, is this first transaction made). The exception is made for supporting future ontology changes and the possibly needed data conversions.

The per-resource tracker:modified value is useful for application’s synchronization purposes: you can test your application’s stored tracker:modified value against the always increasing (w. exception at int. overflow) Tracker’s tracker:modified value to know whether or not your version is older.

The reason why we are changing this to per-transaction is because this way we can guarantee that the value will be restored after a journal replay and/or a backup’s restore without having to store it in either the journal nor the backup. This means that we now guarantee the value being restored without having to change either the backup’s format nor the journal’s format.

Having a persistent journal we actually make a simple copy of the journal to deliver you a backup in a fast file-copy. But let this deception be known only by the people who care about the implementation. Sssht!

We’re already rotating and compressing the rotated chunks for reducing the journal size. We’re working on not journaling data that is embedded in local files this week. A re-index of that local file will re-insert the data anyway. This will significantly reduce the size of the journal too.

Categories: condescending
Philip Van Hoof

All quiet on the Tracker front

2011-01-26 12:49 UTC  by  Philip Van Hoof
0
0

It has been a long time since we wrote propaganda about the Tracker project. That has a lot to do with both the holiday-season and the fact that we’re preparing for a stable release. This means that we are increasingly reluctant to new features.All quiet on the Western Front

We still made quite some progress, though. We for example ported almost everything from dbus-glib and dbus-1 to GDBus and GVariant. This was quite a work; next few weeks will be used for cleaning up and regression fixing. Jürg decided that it was more easy to simply port ~ all of tracker-store to Vala than to port tracker-store’s dbus-glib and dbus-1 C code to GDBus. This should please contributors who will now have a much more easy to understand codebase.

Tracker’s tracker-store is mostly an IPC layer above libtracker-data plus the signaling mechanism. The public libtracker-sparql is a API layer above the same private libtracker-data that protects you from doing things that you should not do. Like trying to do writes without going over the IPC layer.

This week we’re working on transient properties and adding tracker:modified to the journal and backup file. The tracker:modified property is an auto-incremented value. It’s useful for synchronization purposes. Right now when you replay the journal or restore a backup, we reset the count. This means that in between journal replays and/or backup restores you wont have the same tracker:modified values for your resources. This is what we are changing. We plan to restore the tracker:modified value during journal replay and backup-restore.

Transient properties are properties that are reset after restart of tracker-store (per Desktop session), they aren’t backed up (and therefor not restored either) and they aren’t journaled. They are useful for for example presence status of a contact.

The nice guys and girls at the QSparql team are working on a simple cursor that doesn’t do any buffering nor uses any threads. This is useful for Qt applications that want maximum performance while reading the results of a query.

Last week we introduced and ported to our newest location ontology, SLO.

Categories: condescending
Philip Van Hoof

Although with SQLite WAL we have direct-access now, we don’t support direct-access for insert and delete SPARQL queries. Those queries when made using libtracker-sparql still go over D-Bus using Adrien’s FD passing D-Bus IPC technique. The library will do that for you.

After investigating a performance analysis by somebody from Intel we learned that there is still a significant overhead per each IPC call. In the analysis the person made miner-fs combine multiple insert transactions together and then send it over as a single big transaction. This was noticeably faster than making many individual IPC requests.

The problem with this is that if one of the many insert queries fail, they all fail: not good.

We’re now experimenting with a private API that allows you to pass n individual insert transactions, and get n errors back, using one IPC call.

The numbers are promising even on Desktop D-Bus (the test):

$ cd tests/functional-tests/
$ ./update-array-performance-test
First run (first update then array)
Array: 0.103675, Update: 0.139094
Reversing run (first array then update)
Array: 0.290607, Update: 0.161749
$ ./update-array-performance-test
First run (first update then array)
Array: 0.105920, Update: 0.137554
Reversing run (first array then update)
Array: 0.118785, Update: 0.130630
$ ./update-array-performance-test
First run (first update then array)
Array: 0.108501, Update: 0.136524
Reversing run (first array then update)
Array: 0.117308, Update: 0.151192
$

We’re now deciding whether or not the API will become public; returning arrays of errors isn’t exactly ‘nice’ or ‘standard’.

Categories: condescending
Philip Van Hoof

While trying to handle a bug that had a description like “if I do this, tracker-store’s memory grows to 80MB and my device starts swapping”, we where surprised to learn that a sqlite3_stmt consumes about 5 kb heap. Auwch.

Before we didn’t think that those prepared statements where very large, so we threw all of them in a hashtable for in case the query was ran again later. However, if you collect thousands of such statements, memory consumption obviously grows.

We decided to implement a LRU cache for these prepared statements. For clients that access the database using direct-access the cache will be smaller, so that max consumption is only a few megabytes. Because our INSERT and DELETE queries are more reusable than SELECT queries, we split it into two different caches per thread.

The implementation is done with a simple intrinsic linked ring list. We’re still testing it a little bit to get good cache-size numbers. I guess it’ll go in master soon. For your testing pleasure you can find the branch here.

Categories: condescending
Philip Van Hoof

We have a feature request to support return types and to give back variable names. We currently return an array (of array) of just strings, with no typing. This doesn’t work very well for knowing whether a cell is (for example) unbound. Empty string isn’t the same as unbound. So what can you do?

With direct-access the implementation is easy, we’ll just read it from the SPARQL engine. We have all this info already anyway. For filedescriptor passing with D-Bus we need to marshal it over the protocol.

Although we might come back to this decision short term, we wont yet do it for our “normal” (non-FD passing) D-Bus query method. SPARQL’s type system is different from D-Bus’s, so we shouldn’t try to match them somehow. Any custom format that we’d come up with, would be arbitrary.

Maybe someday we’ll add another “normal” D-Bus method that gives you a big string with SPARQL Query Results in JSON or SPARQL Query Results in XML back. Right now this has no priority for us, plus it would be a lot slower due to serialization. Post 0.9 everybody should be using libtracker-sparql and that’ll select either FD passing or direct-access.

Anyway, this will likely be the API for Sparql.Cursor. The methods get_value_type and get_variable_name got added.

public enum Tracker.Sparql.ValueType {
	UNBOUND, URI, STRING, INTEGER,
	DOUBLE, DATETIME, BLANK_NODE
}

public abstract class Tracker.Sparql.Cursor : Object {
	public Connection connection { get; }
	public abstract int n_columns { get; }
+	public abstract ValueType get_value_type (int column);
+	public abstract unowned string? get_variable_name (int column);
	public abstract unowned string? get_string (int column, out long length = null);
	public abstract bool next (Cancellable? cancellable = null) throws GLib.Error;
	public async abstract bool next_async (Cancellable? cancellable = null) throws GLib.Error;
	public abstract void rewind ();
}

I usually post about work in progress, not about something that is done. Same this time, of course. You can find the branch where we’re working on this here.

Categories: english