Planet maemo: category "feed:43af5b2374081abdd0dbc4ba26a0b54c"

Philip Van Hoof

The original SPARQL regex support of Tracker is using a custom SQLite function. But of course back when we wrote it we didn’t yet think much about optimizing. As a result, we were using g_regex_match_simple which of course recompiles the regular expression each time.

Today Jürg and me found out about sqlite3_get_auxdata and sqlite3_set_auxdata which allows us to cache a compiled value for a specific custom SQLite function for the duration of the query.

This is much better:

static void
function_sparql_regex (sqlite3_context *context,
                       int              argc,
                       sqlite3_value   *argv[])
{
  gboolean ret;
  const gchar *text, *pattern, *flags;
  GRegexCompileFlags regex_flags;
  GRegex *regex;

  if (argc != 3) {
    sqlite3_result_error (context, “Invalid argument count”, -1);
    return;
  }

  regex = sqlite3_get_auxdata (context, 1);
  text = sqlite3_value_text (argv[0]);
  flags = sqlite3_value_text (argv[2]);
  if (regex == NULL) {
    gchar *err_str;
    GError *error = NULL;
    pattern = sqlite3_value_text (argv[1]);
    regex_flags = 0;
    while (*flags) {
      switch (*flags) {
      case ’s’: regex_flags |= G_REGEX_DOTALL; break;
      case ‘m’: regex_flags |= G_REGEX_MULTILINE; break;
      case ‘i’: regex_flags |= G_REGEX_CASELESS; break;
      case ‘x’: regex_flags |= G_REGEX_EXTENDED; break;
      default:
        err_str = g_strdup_printf (”Invalid SPARQL regex flag ‘%c’”, *flags);
        sqlite3_result_error (context, err_str, -1);
        g_free (err_str);
        return;
      }
      flags++;
    }
    regex = g_regex_new (pattern, regex_flags, 0, &error);
    if (error) {
      sqlite3_result_error (context, error->message, error->code);
      g_clear_error (&error);
      return;
    }
    sqlite3_set_auxdata (context, 1, regex, (void (*) (void*)) g_regex_unref);
  }
  ret = g_regex_match (regex, text, 0, NULL);
  sqlite3_result_int (context, ret);
  return;
}

Before (this was a test on a huge amount of resources):

$ time tracker-sparql -q "select ?u { ?u a rdfs:Resource . FILTER (regex(?u, '^titl', 'i')) }"
real	0m3.337s
user	0m0.004s
sys	0m0.008s

After:

$ time tracker-sparql -q "select ?u { ?u a rdfs:Resource . FILTER (regex(?u, '^titl', 'i')) }"
real	0m1.887s
user	0m0.008s
sys	0m0.008s

This will hit Tracker’s master today or tomorrow.

Categories: Informatics and programming
Philip Van Hoof

Working hard at the Tracker project

2010-03-17 17:41 UTC  by  Philip Van Hoof
0
0

Today we improved journal replaying from 1050s for my test of 25249 resources to 58s.

Journal replaying happens when your cache database gets corrupted. Also when you restore a backup: restore uses the same code the journal replaying uses, backup just makes a copy of your journal.

During the performance improvements we of course found other areas related to data entry. It looks like we’re entering a period of focus on performance, as we have a few interesting ideas for next week already. The ideas for next week will focus on performance of some SPARQL functions like regex.

Meanwhile are Michele Tameni and Roberto Guido working on a RSS miner for Tracker and has Adrien Bustany been working on other web miners like for Flickr, GData, Twitter and Facebook.

I think the first pieces of the RSS- and the other web miners will start becoming available in this week’s unstable 0.7 release. Martyn is still reviewing the branches of the guys, but we’re very lucky with such good software developers as contributors. Very nice work Michele, Roberto and Adrien!

Categories: Informatics and programming
Philip Van Hoof

Tinymail 1.0!

2010-03-05 17:35 UTC  by  Philip Van Hoof
0
0

Tinymail’s co-maintainer Sergio Villar just released Tinymail’s first release.

psst. I have inside information that I might not be allowed to share that 1.2 is being prepared already, and will have bodystructure and envelope summary fetch. And it’ll fetch E-mail body content per requested MIME part, instead of always entire E-mails. Whoohoo!

Categories: Informatics and programming
Philip Van Hoof

An ode to our testers

2010-03-02 13:49 UTC  by  Philip Van Hoof
0
0

You know about those guys that use your software against huge datasets like their entire filesystem, with thousands of files?

We do. His name is Tshepang Lekhonkhobe and we owe him a few beers for reporting to us many scalability issues.

Today we found and fixed such a scalability issue: the update query to reset the availability of file resources (this is for support for removable media) was causing at least a linear increase of VmRss usage per amount of file resources. For Tshepang’s situation that meant 600 MB of VmRss. Jürg reduced this to 30 MB of peak VmRss in the same use-case, and a performance improvement from minutes to a second or two, three. Without memory fragmentation as glibc is returning almost all of the VmRss back to the kernel.

Thursday is our usual release day. I invite all of the 0.7 pioneers to test us with your huge filesystems, just like Tshepang always does.

So long and thanks for all the testing, Tshepang! I’m glad we finally found it.

Categories: Informatics and programming
Philip Van Hoof

Invisible costs

2010-03-01 17:49 UTC  by  Philip Van Hoof
0
0


We would rather suffer the visible costs of a few bad decisions than incur the many invisible costs that come from decisions made too slowly - or not at all - because of a stifling bureaucracy.

Letter by Warren E. Buffett to the shareholders of Berkshire, February 26, 2010

Categories: Art & culture
Philip Van Hoof

Working hard

2010-02-18 22:09 UTC  by  Philip Van Hoof
0
0

I don’t decide about Tracker’s release. The team of course does.

But when you look at our roadmap you notice one remaining ‘big feature’. That’s coping with modest ontology changes.

Right now if we’d make even a small ontology change all of our users would have to recreate their databases. We also don’t support restoring a backup of your metadata over a modified ontology.

This is about to change. This week I started working in a branch on supporting class and property ontology additions.

I finished it today and it appears to be working. The patches obviously need a thorough review by the other team members, and then testing of course. I invite all the contributors and people who have been testing Tracker 0.7’s releases to tryout the branch. It only supports additions, so don’t try to change properties or classes, or remove them. You can only add new ones. You might have noticed the nao:deprecated property in the ontology files? That’s what we do with deleted properties.

Anyway

Meanwhile are Martyn and Carlos working on a bugfix in the miner about duplicate entries for file resources and on a timeout for the extractor so that extraction of large or complicated documents doesn’t block the entire filesystem miner.

Jürg is working on timezone storage for xsd:dateTime fields and last few days he implemented limited support for named graphs.

By the looks of it, one would almost believe that Tracker’s first new stable release is almost ready!

Categories: Informatics and programming
Philip Van Hoof

This (super) cool .NET developer and good friend came to me at the FOSDEM bar to tell me he was confused about why during the Tracker presentation I was asking people to replace F-Spot and Banshee.

I hope I didn’t say it like that, I would never intent to say that. But I’ll review the video of the presentation as soon as Rob publishes it.

Anyway, to ensure everybody understood correctly what I did wanted to say (whether or not I did, is another question):

The call was to inspire people to reimplement or to provide different implementations of F-Spot’s and Banshee’s data backends, so that they would use an RDF store like tracker-store instead of each app its own metadata database.

I think I also mentioned Rhythmbox in the same sentence because the last thing I would want is to turn this into a .NET vs. anti-.NET debate. It just happens to be that the best GNOME softwares for photo and music management are written in .NET (and that has a good reason).

People who know me also know that I think those anti-.NET people are disruptive ignorable people. I also actively and willingly ignore them (and they should know this). I’m actually a big fan of the Mono platform.

I’ll try to ensure that I don’t create this confusion during presentations anymore.

Categories: Informatics and programming
Philip Van Hoof

Hmrrr

2010-02-07 11:39 UTC  by  Philip Van Hoof
0
0

In line with what I usually do at conferences, I lost my glasses at the GNOME Beer event this year. If somebody found it, and maybe even has it, please let me know. It’s kinda hard to see presentations without it.

Categories: Informatics and programming
Philip Van Hoof

FWD: [Tracker] tracker-miner-rss 0.3

2010-01-27 01:39 UTC  by  Philip Van Hoof
0
0

This is the kind of stuff that needs a forward on the planets:

From: Roberto -MadBob- Guido

This is just an update about tracker-miner-rss effort, already mentioned in this list some time ago.

Website, SVN, Last release (0.3)

Since 0.2 we (Michele and me) have just dropped dependency from rss-glib due some limitation found, and created our own Glib-oriented feeds handling library, libgrss, starting from the code of Liferea and adding nice stuffs such as a PubSub subscriber implementation. At the moment it is shipped with tracker-miner-rss itself, in the future may be splitted so to easy usage by other developers.

Next will come integration with libchamplain to describe geographic points found in geo-rss enabled feeds, integration with libedataserver to better handle “person” rappresentation (suggestions for a better PIM-like shared library with useful objects?), and perhaps a first full-featured feed reader using Tracker as backend.

Enjoy :-)

Roberto is doing a demo on FSter at FOSDEM during our presentation. My role in the presentation will be light this year. I decided to give most of the talk away to Rob Taylor and Roberto. I will probably demo Debarshi Ray’s Solang and if time permits his work on the Nautilus integration. Regretfully Debarshi can’t come and so he asked me to do the demo.

Categories: Informatics and programming
Philip Van Hoof

Solang, a photo manager

2010-01-18 21:59 UTC  by  Philip Van Hoof
0
0

For the last few weeks has Debarshi Ray contributed to Tracker’s Nautilus plugin and worked on Solang, a photo manager that will start using Tracker’s SPARQL capability to get a language to query for metadata about the photos and the photos themselves.

Debarshi explains it all very well himself on his own blog.

We’ll probably do a lightening demo during our Tracker presentation at FOSDEM about how Solang did this integration. We’re also planning to demo the code of a few other applications that are working on integrating with Tracker’s store.

Somebody should port Solang to the next version of Maemo!

Categories: Informatics and programming
Philip Van Hoof

SPARQL subqueries

2009-12-09 16:54 UTC  by  Philip Van Hoof
0
0

This style of subqueries will also work (you can do this one without a subquery too, but it’s just an example of course):

SELECT ?name COUNT(?msg)
WHERE {
	?from a nco:Contact  ;
	          nco:hasEmailAddress ?name . {
		SELECT ?from
		WHERE {
			?msg a nmo:Email ;
			         nmo:from ?from .
		}
	}
} GROUP BY ?from  

The same query in QtTracker will look like this (I have not tested this, let me know if it’s wrong Iridian):

#include <QObject>
#include <QtTracker/Tracker>
#include <QtTracker/ontologies/nco.h>
#include <QtTracker/ontologies/nmo.h>

void someFunction () {
	RDFSelect outer;
	RDFVariable from;
	RDFVariable name = outer.newColumn<nco::Contact>("name");
	from.isOfType<nco::Contact>();
	from.property<nco::hasEmailAddress>(name);
	RDFSelect inner = outer.subQuery();
	RDFVariable in_from = inner.newColumn("from");
	RDFVariable msg;
	msg.property<nmo::from>(in_from);
	msg.isOfType<nmo::Email>();
	outer.addCountColumn("total messages", msg);
	outer.groupBy(from);
	LiveNodes from_and_count = ::tracker()->modelQuery(outer);
}

What you find in this branch already supports it. You can find early support for subqueries in QtTracker in this branch.

To quickly put some stuff about Emails into your RDF store, read this page (copypaste the turtle examples in a file and use the tracker-import tool). You can also enable our Evolution Tracker plugin, of course.

ps. Yes, somebody should while building a GLib/GObject based client library for Tracker copy ideas from QtTracker.

Categories: Informatics and programming
Philip Van Hoof

Bla bla bla, subqueries in SPARQL, bla bla

2009-12-08 15:33 UTC  by  Philip Van Hoof
0
0

Coming to you in a few days is what Jürg has been working on for last week.

Yeah, you guess it right by looking at the query below: subqueries!

This example shows you the amount of E-mails each contact has ever sent to you:

SELECT ?address
    (SELECT COUNT(?msg) AS ?msgcnt WHERE { ?msg nmo:from ?from })
WHERE {
    ?from a nco:Contact ;
          nco:hasEmailAddress ?address .
}

The usual warnings apply here: I’m way early with this announcement. It’s somewhat implemented but insanely experimental. The SPARQL spec has something for this in a draft wiki page. Due to lack of error reporting and detection it’s easy to make stuff crash or to get it to generate wrong native SQL queries.

But then again, you guys are developers. You like that!

Why are we doing this? Ah, some team at an undisclosed company was worried about performance and D-Bus overhead: They had to do a lot of small queries after doing a parent query. You know, a bunch of aggregate functions for counts, showing the last message of somebody, stuff like that.

I should probably not mention this feature yet. It’s too experimental. But so exciting!

Anyway, here’s the messy branch and here’s the reviewed stuff for bringing this feature into master.

ps. I wish I could show you guys the query that we support for that team. It’s awesome. I’ll ask around.

Categories: Informatics and programming