<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="FeedCreator 1.7.6(BH)" -->
<rss version="2.0">
    <channel xmlns:g="http://base.google.com/ns/1.0">
        <title>Planet Maemo: category &quot;feed:46b1d6b26651a331cde2ad188d699e0c&quot;</title>
        <description>Blog entries from Maemo community</description>
        <link>http://maemo.org/news/planet-maemo/</link>
        <lastBuildDate>Sun, 24 May 2026 07:02:14 +0000</lastBuildDate>
        <generator>FeedCreator 1.7.6(BH)</generator>
        <language>en</language>
        <managingEditor>planet@maemo.org</managingEditor>
        <item>
            <title>Large-scale sparse multiclass classification</title>
            <link>http://www.mblondel.org/journal/2013/05/12/large-scale-sparse-multiclass-classification/</link>
            <description><![CDATA[
<p>I&#8217;m thrilled to announce that my paper &#8220;Block Coordinate Descent Algorithms for Large-scale Sparse Multiclass Classiﬁcation&#8221; (published in the Machine Learning journal) is now online: <a href="http://www.mblondel.org/publications/mblondel-mlj2013.pdf">PDF</a>, <a href="http://www.mblondel.org/publications/bib/mblondel-mlj2013.txt">BibTeX</a> [*].</p>
<h3>Abstract</h3>
<p>Over the past decade, l1 regularization has emerged as a powerful way to learn classifiers with implicit feature selection. More recently, mixed-norm (e.g., l1/l2) regularization has been utilized as a way to select entire groups of features. In this paper, we propose a novel direct multiclass formulation specifically designed for large-scale and high-dimensional problems such as document classification. Based on a multiclass extension of the squared hinge loss, our formulation employs l1/l2 regularization so as to force weights corresponding to the same features to be zero across all classes, resulting in compact and fast-to-evaluate multiclass models. For optimization, we employ two globally-convergent variants of block coordinate descent, one with line search (Tseng and Yun, 2009) and the other without (Richtárik and Takáč, 2012). We present the two variants in a unified manner and develop the core components needed to efficiently solve our formulation. The end result is a couple of block coordinate descent algorithms specifically tailored to our multiclass formulation. Experimentally, we show that block coordinate descent performs favorably to other solvers such as FOBOS, FISTA and SpaRSA. Furthermore, we show that our formulation obtains very compact multiclass models and outperforms l1/l2- regularized multiclass logistic regression in terms of training speed, while achieving comparable test accuracy.
</p></blockquote>
<h3>Code</h3>
<p>The code of the proposed multiclass method is available in my Python/Cython machine learning library, <a href="https://github.com/mblondel/lightning">lightning</a>. Below is an example of how to use it on the News20 dataset.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> sklearn.<span style="color: black;">datasets</span> <span style="color: #ff7700;font-weight:bold;">import</span> fetch_20newsgroups_vectorized
<span style="color: #ff7700;font-weight:bold;">from</span> lightning.<span style="color: black;">primal_cd</span> <span style="color: #ff7700;font-weight:bold;">import</span> CDClassifier
&nbsp;
bunch = fetch_20newsgroups_vectorized<span style="color: black;">&#40;</span>subset=<span style="color: #483d8b;">&quot;all&quot;</span><span style="color: black;">&#41;</span>
X = bunch.<span style="color: black;">data</span>
y = bunch.<span style="color: black;">target</span>
&nbsp;
clf = CDClassifier<span style="color: black;">&#40;</span>penalty=<span style="color: #483d8b;">&quot;l1/l2&quot;</span>,
                   loss=<span style="color: #483d8b;">&quot;squared_hinge&quot;</span>,
                   multiclass=<span style="color: #008000;">True</span>,
                   max_iter=<span style="color: #ff4500;">20</span>,
                   alpha=1e-4,
                   C=<span style="color: #ff4500;">1.0</span> / X.<span style="color: black;">shape</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>,
                   tol=1e-3<span style="color: black;">&#41;</span>
clf.<span style="color: black;">fit</span><span style="color: black;">&#40;</span>X, y<span style="color: black;">&#41;</span>
<span style="color: #808080; font-style: italic;"># accuracy</span>
<span style="color: #ff7700;font-weight:bold;">print</span> clf.<span style="color: black;">score</span><span style="color: black;">&#40;</span>X, y<span style="color: black;">&#41;</span> 
<span style="color: #808080; font-style: italic;"># percentage of selected features</span>
<span style="color: #ff7700;font-weight:bold;">print</span> clf.<span style="color: black;">n_nonzero</span><span style="color: black;">&#40;</span>percentage=<span style="color: #008000;">True</span><span style="color: black;">&#41;</span></pre></div></div>

<p>To use the variant without line search (as presented in the paper), add the max_steps=0 option to CDClassifier.</p>
<h3>Data</h3>
<p>I also released the <a href="http://www.mblondel.org/data/">Amazon7</a> dataset used in the paper. It contains 1,362,109 reviews of Amazon products. Each review may belong to one of 7 categories (apparel, book, dvd, electronics, kitchen &#038; housewares, music, video) and is represented as a 262,144-dimensional vector. It is, to my knowledge, one of the largest publically available multiclass classification dataset.</p>
<p>[*] The final publication is available <a href="http://link.springer.com/article/10.1007%2Fs10994-013-5367-2">here</a>.</p>
<span class="net_nemein_favourites">0 <a href="http://maemo.org/news/?net_nemein_favourites_execute=fav&net_nemein_favourites_execute_for=1e2bb0438d91ac6bb0411e29818a7abbf70e57fe57f&net_nemein_favourites_url=https://maemo.org/news/favorites//json/fav/midgard_article/1e2bb0438d91ac6bb0411e29818a7abbf70e57fe57f/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-favorite.png" style="border: none;" alt="Add to favourites" title="Add to favourites" /></a>0 <a href="http://maemo.org/news/?net_nemein_favourites_execute=bury&net_nemein_favourites_execute_for=1e2bb0438d91ac6bb0411e29818a7abbf70e57fe57f&net_nemein_favourites_url=https://maemo.org/news/favorites//json/bury/midgard_article/1e2bb0438d91ac6bb0411e29818a7abbf70e57fe57f/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-buried.png" style="border: none;" alt="Bury" title="Bury" /></a></span>]]></description>
            <author>Mathieu Blondel &lt;mblondel@rubyforge.org&gt;</author>
            <category>feed:46b1d6b26651a331cde2ad188d699e0c</category>
            <pubDate>Sun, 12 May 2013 12:52:56 +0000</pubDate>
            <guid>http://maemo.org/midcom-permalink-1e2bb0438d91ac6bb0411e29818a7abbf70e57fe57f</guid>
        </item>
        <item>
            <title>Transparent system-wide proxy</title>
            <link>http://www.mblondel.org/journal/2011/06/27/transparent-system-wide-proxy/</link>
            <description><![CDATA[
<p>Proxies can be a powerful way to enforce anonymity or to bypass various kinds of restrictions on Internet (government censorship, regional contents, &#8230;). In this post, I&#8217;ll describe a simple technique to create a transparent proxy at the system level. It&#8217;s especially useful in cases when you want to make sure that all connections make it through the proxy or when your application of interest doesn&#8217;t have proxy support. I don&#8217;t think this technique is that well-known, hence this post.<br />
<span id="more-172"></span></p>
<h3>1. Create SOCKS proxy with OpenSSH</h3>
<p>If you have a user account on a remote machine, the simplest way to create a proxy is to use the following command.</p>

<div class="wp_syntax"><div class="code"><pre class="shell" style="font-family:monospace;">$ ssh -N -D 1080 username@serverhost</pre></div></div>

<p>It creates a <a href="http://en.wikipedia.org/wiki/SOCKS">SOCKS</a> proxy listening to port 1080 on your local machine. The main advantage of the SOCKS protocol is that it can route connections from any port between the client and the server.</p>
<h3>2. Forward connection transparently with iptables or ipfw</h3>
<p>Most modern browsers can let the user define a SOCKS proxy in their advanced networking preference section. Yet, many applications may not support the SOCKS protocol at all. The solution is to use system tools such as iptables (Linux) or ipfw (FreeBSD, OS X) to enforce the routing at the system level. For example, on OS X, I use the following command to redirect port 80 (HTTP).</p>

<div class="wp_syntax"><div class="code"><pre class="shell" style="font-family:monospace;">$ sudo ipfw add 100 fwd 127.0.0.1,12345 dst-port 80</pre></div></div>

<h3>3. Redirect connections to SOCKS proxy with redsocks</h3>
<p>iptables and ipfw don&#8217;t have built-in support for the SOCKS protocol. It is thus necessary to use an additional program to transform the connections on the fly. This is where <a href="https://github.com/darkk/redsocks">redsocks</a> comes in.</p>

<div class="wp_syntax"><div class="code"><pre class="shell" style="font-family:monospace;">$ redsocks -c config_file</pre></div></div>

<p>In the configuration file, you need to configure redsocks to listen to port 12345 and to redirect to port 1080. &#8220;generic&#8221; can be used as the redirector option. The github repository includes a sample configuration file. </p>
<p>And voila! We now have configured the system to transparently route connections for us. To summarize, here is the big picture:</p>

<div class="wp_syntax"><div class="code"><pre class="shell" style="font-family:monospace;">local machine -&gt; redsocks -&gt; SOCKS proxy -&gt; target server</pre></div></div>

<span class="net_nemein_favourites">4 <a href="http://maemo.org/news/?net_nemein_favourites_execute=fav&net_nemein_favourites_execute_for=6ada701aa04e11e081e4878977b042924292&net_nemein_favourites_url=https://maemo.org/news/favorites//json/fav/midgard_article/6ada701aa04e11e081e4878977b042924292/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-favorite.png" style="border: none;" alt="Add to favourites" title="Add to favourites" /></a>1 <a href="http://maemo.org/news/?net_nemein_favourites_execute=bury&net_nemein_favourites_execute_for=6ada701aa04e11e081e4878977b042924292&net_nemein_favourites_url=https://maemo.org/news/favorites//json/bury/midgard_article/6ada701aa04e11e081e4878977b042924292/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-buried.png" style="border: none;" alt="Bury" title="Bury" /></a></span>]]></description>
            <author>Mathieu Blondel &lt;mblondel@rubyforge.org&gt;</author>
            <category>feed:46b1d6b26651a331cde2ad188d699e0c</category>
            <pubDate>Sun, 26 Jun 2011 23:18:29 +0000</pubDate>
            <guid>http://maemo.org/midcom-permalink-6ada701aa04e11e081e4878977b042924292</guid>
        </item>
        <item>
            <title>Regularized Least Squares</title>
            <link>http://www.mblondel.org/journal/2011/02/09/regularized-least-squares/</link>
            <description><![CDATA[
<p>Recently, I&#8217;ve contributed a bunch of improvements (sparse matrix support, classification, generalized cross-validation) to the ridge module in <a href="http://scikit-learn.sourceforge.net/">scikits.learn</a>. Since I&#8217;m receiving good feedback regarding my posts on Machine Learning, I&#8217;m taking this as an opportunity to summarize some important points about Regularized Least Squares, and more precisely Ridge regression.<br />
<span id="more-135"></span></p>
<h3>Ridge regression</h3>
<p>Given a centered input <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bx%7D%5E%2A%20%5Cin%20%5Cmathbf%7BR%7D%5ED&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{x}^* \in \mathbf{R}^D' title='\mathbf{x}^* \in \mathbf{R}^D' class='latex' />, a linear regression model predicts the output <img src='http://s.wordpress.com/latex.php?latex=y%5E%2A%20%5Cin%20%5Cmathbf%7BR%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='y^* \in \mathbf{R}' title='y^* \in \mathbf{R}' class='latex' /> by</p>
<img src='http://s.wordpress.com/latex.php?latex=y%5E%2A%20%3D%20f%28%5Cmathbf%7Bx%7D%5E%2A%29%20%3D%20%5Cmathbf%7Bx%7D%5E%2A%20%5Ccdot%20%5Cmathbf%7Bw%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='y^* = f(\mathbf{x}^*) = \mathbf{x}^* \cdot \mathbf{w}' title='y^* = f(\mathbf{x}^*) = \mathbf{x}^* \cdot \mathbf{w}' class='latex' />
<p>where <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bw%7D%20%5Cin%20%5Cmathbf%7BR%7D%5ED&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{w} \in \mathbf{R}^D' title='\mathbf{w} \in \mathbf{R}^D' class='latex' /> is a weight vector.</p>
<p>Given a set of training instances <img src='http://s.wordpress.com/latex.php?latex=X%20%3D%20%28%5Cmathbf%7Bx%7D_1%2C%5Cdots%2C%5Cmathbf%7Bx%7D_N%29%5ET&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X = (\mathbf{x}_1,\dots,\mathbf{x}_N)^T' title='X = (\mathbf{x}_1,\dots,\mathbf{x}_N)^T' class='latex' /> and its associated set of outputs <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7By%7D%20%3D%20%28y_1%2C%5Cdots%2Cy_N%29%5ET&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{y} = (y_1,\dots,y_N)^T' title='\mathbf{y} = (y_1,\dots,y_N)^T' class='latex' />, the method of least-squares finds the weight vector <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bw%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{w}' title='\mathbf{w}' class='latex' /> which minimizes the sum of square differences between the predictions and the actual labels. </p>
<img src='http://s.wordpress.com/latex.php?latex=L_%7BLS%7D%28%5Cmathbf%7Bw%7D%29%20%3D%20%5Csum_%7Bi%3D1%7D%5EN%20%28y_i%20-%20f%28%5Cmathbf%7Bx%7D_i%29%29%5E2&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='L_{LS}(\mathbf{w}) = \sum_{i=1}^N (y_i - f(\mathbf{x}_i))^2' title='L_{LS}(\mathbf{w}) = \sum_{i=1}^N (y_i - f(\mathbf{x}_i))^2' class='latex' />
<p>However, minizing the sum of square errors on the training set doesn&#8217;t necessarily mean that the model will do well on new data. For this reason, Ridge regression adds a regularization term, which controls the complexity of the weight vector. The objective function becomes </p>
<img src='http://s.wordpress.com/latex.php?latex=L_%7BRidge%7D%28%5Cmathbf%7Bw%7D%29%20%3D%20%5Csum_%7Bi%3D1%7D%5EN%20%28y_i%20-%20f%28%5Cmathbf%7Bx%7D_i%29%29%5E2%20%2B%20%5Cfrac%7B%5Clambda%7D%7B2%7D%20%5C%7C%5Cmathbf%7Bw%7D%5C%7C%5E2&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='L_{Ridge}(\mathbf{w}) = \sum_{i=1}^N (y_i - f(\mathbf{x}_i))^2 + \frac{\lambda}{2} \|\mathbf{w}\|^2' title='L_{Ridge}(\mathbf{w}) = \sum_{i=1}^N (y_i - f(\mathbf{x}_i))^2 + \frac{\lambda}{2} \|\mathbf{w}\|^2' class='latex' />
<p>where <img src='http://s.wordpress.com/latex.php?latex=%5Clambda&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\lambda' title='\lambda' class='latex' /> is a parameter which controls the regularization strength. Minimizing <img src='http://s.wordpress.com/latex.php?latex=L_%7BRidge%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='L_{Ridge}' title='L_{Ridge}' class='latex' /> is thus a trade-off between minimizing the sum of errors and keeping the weight vector simple (i.e., small). Put differently, by introducing some <em>bias</em> (the regularization), we are purposely limiting the freedom of the model and thus reducing the <em>variance</em> of the weight vector.  Regularization is a simple way to avoid overfitting and often improves the prediction accuracy on new data, especially if the training set is small.</p>
<p>Calculating the gradient of <img src='http://s.wordpress.com/latex.php?latex=L_%7BRidge%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='L_{Ridge}' title='L_{Ridge}' class='latex' /> with respect to <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bw%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{w}' title='\mathbf{w}' class='latex' /> and solving for <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bw%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{w}' title='\mathbf{w}' class='latex' />, we find that <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bw%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{w}' title='\mathbf{w}' class='latex' /> has a closed-form solution</p>
<img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bw%7D%20%3D%20H%5E%7B-1%7D%20X%5ET%20%5Cmathbf%7By%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{w} = H^{-1} X^T \mathbf{y}' title='\mathbf{w} = H^{-1} X^T \mathbf{y}' class='latex' />
<p>where <img src='http://s.wordpress.com/latex.php?latex=H%20%3D%20%28X%5ET%20X%20%2B%20%5Clambda%20I%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='H = (X^T X + \lambda I)' title='H = (X^T X + \lambda I)' class='latex' />. Note that the matrix inverse is just notational convenience. In practice, we simply solve the corresponding system of linear equations <img src='http://s.wordpress.com/latex.php?latex=H%5Cmathbf%7Bw%7D%3D%20X%5ET%20%5Cmathbf%7By%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='H\mathbf{w}= X^T \mathbf{y}' title='H\mathbf{w}= X^T \mathbf{y}' class='latex' />. Since <img src='http://s.wordpress.com/latex.php?latex=H&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='H' title='H' class='latex' /> is symmetric positive-definite, one typically first finds the <a href="http://en.wikipedia.org/wiki/Cholesky_decomposition">Cholesky decomposition</a> <img src='http://s.wordpress.com/latex.php?latex=H%3DLL%5ET&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='H=LL^T' title='H=LL^T' class='latex' />. Then since <img src='http://s.wordpress.com/latex.php?latex=L&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='L' title='L' class='latex' /> is lower triangular, solving the system is simply a matter of applying <a href="http://en.wikipedia.org/wiki/Triangular_matrix#Forward_and_back_substitution">forward and backward substitution</a>. Another commonly used method is the <a href="http://en.wikipedia.org/wiki/Conjugate_gradient_method">conjugate gradient</a>.</p>
<p>Note that not only does regularization improve accuracy, it is often necessary because it makes <img src='http://s.wordpress.com/latex.php?latex=H&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='H' title='H' class='latex' /> invertible when <img src='http://s.wordpress.com/latex.php?latex=X%5ET%20X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X^T X' title='X^T X' class='latex' /> is not. See <a href="http://en.wikipedia.org/wiki/Tikhonov_regularization">Tikhonov regularization</a>.</p>
<h3>Kernel ridge</h3>
<p>As per the representer theorem, just like we did for <a href="http://www.mblondel.org/journal/2010/10/31/kernel-perceptron-in-python/">kernel perceptron</a>, we can replace <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bw%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{w}' title='\mathbf{w}' class='latex' /> with <img src='http://s.wordpress.com/latex.php?latex=X%5ET%5Cboldsymbol%7B%5Calpha%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X^T\boldsymbol{\alpha}' title='X^T\boldsymbol{\alpha}' class='latex' />. Substituting into <img src='http://s.wordpress.com/latex.php?latex=L_%7BRidge%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='L_{Ridge}' title='L_{Ridge}' class='latex' /> then deriving with respect to <img src='http://s.wordpress.com/latex.php?latex=%5Cboldsymbol%7B%5Calpha%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\boldsymbol{\alpha}' title='\boldsymbol{\alpha}' class='latex' /> and solving for <img src='http://s.wordpress.com/latex.php?latex=%5Cboldsymbol%7B%5Calpha%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\boldsymbol{\alpha}' title='\boldsymbol{\alpha}' class='latex' />, we find</p>
<img src='http://s.wordpress.com/latex.php?latex=%5Cboldsymbol%7B%5Calpha%7D%20%3D%20G%5E%7B-1%7D%20%5Cmathbf%7By%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\boldsymbol{\alpha} = G^{-1} \mathbf{y}' title='\boldsymbol{\alpha} = G^{-1} \mathbf{y}' class='latex' />
<p>where <img src='http://s.wordpress.com/latex.php?latex=G%20%3D%28K%20%2B%20%5Clambda%20I%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G =(K + \lambda I)' title='G =(K + \lambda I)' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=K&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='K' title='K' class='latex' /> is the kernel matrix. The prediction function becomes</p>
<img src='http://s.wordpress.com/latex.php?latex=y%5E%2A%20%3D%20f%28%5Cmathbf%7Bx%7D%5E%2A%29%20%3D%20%5Csum_%7Bi%3D1%7D%5EN%20%5Calpha_i%20k%28%5Cmathbf%7Bx%7D%5E%2A%2C%5Cmathbf%7Bx%7D_i%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='y^* = f(\mathbf{x}^*) = \sum_{i=1}^N \alpha_i k(\mathbf{x}^*,\mathbf{x}_i)' title='y^* = f(\mathbf{x}^*) = \sum_{i=1}^N \alpha_i k(\mathbf{x}^*,\mathbf{x}_i)' class='latex' />
<p>where <img src='http://s.wordpress.com/latex.php?latex=k&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='k' title='k' class='latex' /> is the kernel function. However, in contrast to SVM, <img src='http://s.wordpress.com/latex.php?latex=%5Cboldsymbol%7B%5Calpha%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\boldsymbol{\alpha}' title='\boldsymbol{\alpha}' class='latex' /> is not sparse. This means that the whole training set needs to be used at prediction time. </p>
<h3>Multiple output</h3>
<p>So far we have considered the case when there is only one output <img src='http://s.wordpress.com/latex.php?latex=y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='y' title='y' class='latex' /> per instance. We can easily extend to the <img src='http://s.wordpress.com/latex.php?latex=M&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='M' title='M' class='latex' /> output case. To do that, we simply need to solve <img src='http://s.wordpress.com/latex.php?latex=HW%3D%20X%5ET%20Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='HW= X^T Y' title='HW= X^T Y' class='latex' />, where <img src='http://s.wordpress.com/latex.php?latex=W&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='W' title='W' class='latex' /> is <img src='http://s.wordpress.com/latex.php?latex=D%20%5Ctimes%20M&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='D \times M' title='D \times M' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' /> is <img src='http://s.wordpress.com/latex.php?latex=N%20%5Ctimes%20M&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='N \times M' title='N \times M' class='latex' />. Note that the Cholesky decomposition needs to be computed only once. Since it takes <img src='http://s.wordpress.com/latex.php?latex=O%28D%5E3%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='O(D^3)' title='O(D^3)' class='latex' /> time, it is much more efficient to solve <img src='http://s.wordpress.com/latex.php?latex=HW%3DX%5ET%20Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='HW=X^T Y' title='HW=X^T Y' class='latex' /> directly than to solve <img src='http://s.wordpress.com/latex.php?latex=H%5Cmathbf%7Bw%7D%3DX%5ET%20%5Cmathbf%7By%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='H\mathbf{w}=X^T \mathbf{y}' title='H\mathbf{w}=X^T \mathbf{y}' class='latex' />, <img src='http://s.wordpress.com/latex.php?latex=M&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='M' title='M' class='latex' /> times. </p>
<h3>Classification</h3>
<p>M-class classification can easily be achieved by combining multiple-output ridge regression and the 1-of-M coding scheme. This is sometimes referred by some authors as Regularized Least Square Classifier (RLSC). Concretely, we just need to construct a matrix <img src='http://s.wordpress.com/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' /> by setting <img src='http://s.wordpress.com/latex.php?latex=Y_%7Bij%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y_{ij}' title='Y_{ij}' class='latex' /> to <img src='http://s.wordpress.com/latex.php?latex=1&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='1' title='1' class='latex' /> if the <img src='http://s.wordpress.com/latex.php?latex=i%5E%7Bth%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='i^{th}' title='i^{th}' class='latex' /> instance is associated with the <img src='http://s.wordpress.com/latex.php?latex=j%5E%7Bth%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='j^{th}' title='j^{th}' class='latex' /> label and to <img src='http://s.wordpress.com/latex.php?latex=-1&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='-1' title='-1' class='latex' /> otherwise. Then prediction is just a matter of  taking the most confident label for each instance, just like one-vs-all SVM.</p>
<p>Note that the square error is sometimes criticized for classification because it penalizes overconfident predictions. For example, predicting 3 instead of 1 is penalized just as much as predicting -1 instead of 1, even though 3 would still give the correct prediction and -1 would not. However, in practice, RLSC often achieves accuracy on a par with other classifiers such as SVM.</p>
<h3>Efficient leave-one-out cross-validation</h3>
<p>Estimating <img src='http://s.wordpress.com/latex.php?latex=%5Clambda&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\lambda' title='\lambda' class='latex' /> is a model selection problem, usually performed by k-fold or leave-one-out cross-validation. The naive approach to leave-one-out is to learn <img src='http://s.wordpress.com/latex.php?latex=N&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='N' title='N' class='latex' /> models by holding out one instance at a time. Obviously, this is computationally very expensive. Fortunately, there exist nice computational tricks. Below, I show how to use them in the kernel case but they can equally be applied to the normal case.</p>
<p>The first trick is to take the eigendecomposition </p>
<img src='http://s.wordpress.com/latex.php?latex=K%3DQ%20%5CLambda%20Q%5ET.&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='K=Q \Lambda Q^T.' title='K=Q \Lambda Q^T.' class='latex' />
<p>Using basic linear algebra, we obtain</p>
<img src='http://s.wordpress.com/latex.php?latex=G%5E%7B-1%7D%3DQ%28%5CLambda%20%2B%20%5Clambda%20I%29%5E%7B-1%7D%20Q%5ET.&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G^{-1}=Q(\Lambda + \lambda I)^{-1} Q^T.' title='G^{-1}=Q(\Lambda + \lambda I)^{-1} Q^T.' class='latex' />
<p>Since <img src='http://s.wordpress.com/latex.php?latex=%28%5CLambda%20%2B%20%5Clambda%20I%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(\Lambda + \lambda I)' title='(\Lambda + \lambda I)' class='latex' /> is diagonal, it can easily be inverted by taking its reciprocal. Therefore by first paying the price of an eigendecomposition, <img src='http://s.wordpress.com/latex.php?latex=G&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G' title='G' class='latex' /> can thus be inverted inexpensively for many different <img src='http://s.wordpress.com/latex.php?latex=%5Clambda&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\lambda' title='\lambda' class='latex' />.</p>
<p>Let <img src='http://s.wordpress.com/latex.php?latex=f_i%28%5Cmathbf%7Bx%7D_i%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f_i(\mathbf{x}_i)' title='f_i(\mathbf{x}_i)' class='latex' /> be the prediction value when the model was fitted to all training instances but <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bx%7D_i&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{x}_i' title='\mathbf{x}_i' class='latex' />. The second trick, sometimes called generalized cross-validation, will allow us to calculate the <img src='http://s.wordpress.com/latex.php?latex=f_i%28%5Cmathbf%7Bx%7D_i%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f_i(\mathbf{x}_i)' title='f_i(\mathbf{x}_i)' class='latex' /> almost for free.</p>
<p>First, let <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7By%7D%5Ei&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{y}^i' title='\mathbf{y}^i' class='latex' /> be the label vector of the model when <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bx%7D_i&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{x}_i' title='\mathbf{x}_i' class='latex' /> is held out.</p>
<img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7By%7D%5Ei%20%3D%20%28y_1%2C%5Cdots%2Cf_i%28%5Cmathbf%7Bx%7D_i%29%2C%5Cdots%2Cy_N%29%5ET&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{y}^i = (y_1,\dots,f_i(\mathbf{x}_i),\dots,y_N)^T' title='\mathbf{y}^i = (y_1,\dots,f_i(\mathbf{x}_i),\dots,y_N)^T' class='latex' />
<p>When we replace <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7By%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{y}' title='\mathbf{y}' class='latex' /> by <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7By%7D%5Ei&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{y}^i' title='\mathbf{y}^i' class='latex' /> in <img src='http://s.wordpress.com/latex.php?latex=L_%7BRidge%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='L_{Ridge}' title='L_{Ridge}' class='latex' />, clearly, <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bx_i%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{x_i}' title='\mathbf{x_i}' class='latex' /> doesn&#8217;t affect the solution (as desired) and we have</p>
<img src='http://s.wordpress.com/latex.php?latex=%5Cboldsymbol%7B%5Calpha%7D%5Ei%20%3D%20G%5E%7B-1%7D%20%5Cmathbf%7By%7D%5Ei&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\boldsymbol{\alpha}^i = G^{-1} \mathbf{y}^i' title='\boldsymbol{\alpha}^i = G^{-1} \mathbf{y}^i' class='latex' /><br />
<br/><br />
<img src='http://s.wordpress.com/latex.php?latex=f_i%28%5Cmathbf%7Bx%7D_i%29%20%3D%20%28K%20%5Cboldsymbol%7B%5Calpha%7D%5Ei%29_i.&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f_i(\mathbf{x}_i) = (K \boldsymbol{\alpha}^i)_i.' title='f_i(\mathbf{x}_i) = (K \boldsymbol{\alpha}^i)_i.' class='latex' />
<p>However, this is a circular definition, since <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7By%7D%5Ei&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{y}^i' title='\mathbf{y}^i' class='latex' /> itself depends on <img src='http://s.wordpress.com/latex.php?latex=f_i%28%5Cmathbf%7Bx%7D_i%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f_i(\mathbf{x}_i)' title='f_i(\mathbf{x}_i)' class='latex' />. We can calculate the difference with <img src='http://s.wordpress.com/latex.php?latex=f%28%5Cmathbf%7Bx%7D_i%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f(\mathbf{x}_i)' title='f(\mathbf{x}_i)' class='latex' />, which will allow us to factorize <img src='http://s.wordpress.com/latex.php?latex=f_i%28%5Cmathbf%7Bx%7D_i%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f_i(\mathbf{x}_i)' title='f_i(\mathbf{x}_i)' class='latex' />.</p>
<img src='http://s.wordpress.com/latex.php?latex=f_i%28%5Cmathbf%7Bx%7D_i%29%20-%20f%28%5Cmathbf%7Bx%7D_i%29%20%3D%20%28K%20G%5E%7B-1%7D%20%5Cmathbf%7By%7D%5Ei%29_i%20-%20%28KG%5E%7B-1%7D%5Cmathbf%7By%7D%29_i&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f_i(\mathbf{x}_i) - f(\mathbf{x}_i) = (K G^{-1} \mathbf{y}^i)_i - (KG^{-1}\mathbf{y})_i' title='f_i(\mathbf{x}_i) - f(\mathbf{x}_i) = (K G^{-1} \mathbf{y}^i)_i - (KG^{-1}\mathbf{y})_i' class='latex' />
<p>By definition, the <img src='http://s.wordpress.com/latex.php?latex=i%5E%7Bth%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='i^{th}' title='i^{th}' class='latex' /> element of <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7By%7D%5Ei&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{y}^i' title='\mathbf{y}^i' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7By%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{y}' title='\mathbf{y}' class='latex' /> are <img src='http://s.wordpress.com/latex.php?latex=f_i%28%5Cmathbf%7Bx%7D_i%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f_i(\mathbf{x}_i)' title='f_i(\mathbf{x}_i)' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=y_i&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='y_i' title='y_i' class='latex' />, respectively. Therefore,</p>
<img src='http://s.wordpress.com/latex.php?latex=f_i%28%5Cmathbf%7Bx%7D_i%29%20-%20f%28%5Cmathbf%7Bx%7D_i%29%20%3D%20%28K%20G%5E%7B-1%7D%29_%7Bii%7D%20%28f_i%28%5Cmathbf%7Bx%7D_i%29%20-%20y_i%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f_i(\mathbf{x}_i) - f(\mathbf{x}_i) = (K G^{-1})_{ii} (f_i(\mathbf{x}_i) - y_i)' title='f_i(\mathbf{x}_i) - f(\mathbf{x}_i) = (K G^{-1})_{ii} (f_i(\mathbf{x}_i) - y_i)' class='latex' />
<p>and finally, by simplifying, we obtain</p>
<img src='http://s.wordpress.com/latex.php?latex=f_i%28%5Cmathbf%7Bx%7D_i%29%20%3D%20%5Cfrac%7Bf%28%5Cmathbf%7Bx%7D_i%29%20-%28K%20G%5E%7B-1%7D%29_%7Bii%7D%20y_i%7D%7B1%20-%20%28K%20G%5E%7B-1%7D%29_%7Bii%7D%7D.&#038;bg=ffffff&#038;fg=000000&#038;s=1' alt='f_i(\mathbf{x}_i) = \frac{f(\mathbf{x}_i) -(K G^{-1})_{ii} y_i}{1 - (K G^{-1})_{ii}}.' title='f_i(\mathbf{x}_i) = \frac{f(\mathbf{x}_i) -(K G^{-1})_{ii} y_i}{1 - (K G^{-1})_{ii}}.' class='latex' />
<p>Similar computational tricks exist for Gaussian Process Regression as well.</p>
<h3>References</h3>
<p><a href="http://www.mit.edu/~9.520/spring07/Classes/rlsslides.pdf">Regularized Least Squares</a>, Ryan Rifkin</p>
<span class="net_nemein_favourites">1 <a href="http://maemo.org/news/?net_nemein_favourites_execute=fav&net_nemein_favourites_execute_for=36362a30345a11e0bad41d41a7b549214921&net_nemein_favourites_url=https://maemo.org/news/favorites//json/fav/midgard_article/36362a30345a11e0bad41d41a7b549214921/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-favorite.png" style="border: none;" alt="Add to favourites" title="Add to favourites" /></a>2 <a href="http://maemo.org/news/?net_nemein_favourites_execute=bury&net_nemein_favourites_execute_for=36362a30345a11e0bad41d41a7b549214921&net_nemein_favourites_url=https://maemo.org/news/favorites//json/bury/midgard_article/36362a30345a11e0bad41d41a7b549214921/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-buried.png" style="border: none;" alt="Bury" title="Bury" /></a></span>]]></description>
            <author>Mathieu Blondel &lt;mblondel@rubyforge.org&gt;</author>
            <category>feed:46b1d6b26651a331cde2ad188d699e0c</category>
            <pubDate>Wed, 09 Feb 2011 14:20:28 +0000</pubDate>
            <guid>http://maemo.org/midcom-permalink-36362a30345a11e0bad41d41a7b549214921</guid>
        </item>
        <item>
            <title>Kernel Perceptron in Python</title>
            <link>http://www.mblondel.org/journal/2010/10/31/kernel-perceptron-in-python/</link>
            <description><![CDATA[
<p>The Perceptron (Rosenblatt, 1957) is one of the oldest and simplest Machine Learning algorithms. It&#8217;s also trivial to <em>kernelize</em>, which makes it an ideal candidate to gain insights on kernel methods.<br />
<span id="more-129"></span></p>
<h3>Perceptron</h3>
<p>The Perceptron predicts the class of an input <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bx%7D%20%5Cin%20%5Cmathcal%7BX%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{x} \in \mathcal{X}' title='\mathbf{x} \in \mathcal{X}' class='latex' /> with the function</p>
<img src='http://s.wordpress.com/latex.php?latex=f%28%5Cmathbf%7Bx%7D%29%20%3D%20sign%28%5Cmathbf%7Bw%7D%5Ccdot%5Cphi%28%5Cmathbf%7Bx%7D%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f(\mathbf{x}) = sign(\mathbf{w}\cdot\phi(\mathbf{x}))' title='f(\mathbf{x}) = sign(\mathbf{w}\cdot\phi(\mathbf{x}))' class='latex' />
<p>where sign(y) = +1 if y > 0 and -1 if y < 0, <img src='http://s.wordpress.com/latex.php?latex=%5Cphi&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\phi' title='\phi' class='latex' /> is a feature-space transformation and  <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bw%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{w}' title='\mathbf{w}' class='latex' /> is a feature weight vector. If  <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bx%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{x}' title='\mathbf{x}' class='latex' /> is already a feature vector and a projection to another space is not needed, then <img src='http://s.wordpress.com/latex.php?latex=%5Cphi%28%5Cmathbf%7Bx%7D%29%3D%5Cmathbf%7Bx%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\phi(\mathbf{x})=\mathbf{x}' title='\phi(\mathbf{x})=\mathbf{x}' class='latex' />.</p>
<p>Given a data set comprising <img src='http://s.wordpress.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='n' title='n' class='latex' /> training examples <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bx%7D_1%2C%5Cdots%2C%5Cmathbf%7Bx%7D_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{x}_1,\dots,\mathbf{x}_n' title='\mathbf{x}_1,\dots,\mathbf{x}_n' class='latex' /> and their corresponding labels <img src='http://s.wordpress.com/latex.php?latex=y_1%2C%5Cdots%2Cy_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='y_1,\dots,y_n' title='y_1,\dots,y_n' class='latex' />, where <img src='http://s.wordpress.com/latex.php?latex=y_n%20%5Cin%20%5C%7B-1%2C%2B1%5C%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='y_n \in \{-1,+1\}' title='y_n \in \{-1,+1\}' class='latex' />, the Perceptron makes a prediction for each <img src='http://s.wordpress.com/latex.php?latex=%28%5Cmathbf%7Bx%7D_i%2C%20y_i%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(\mathbf{x}_i, y_i)' title='(\mathbf{x}_i, y_i)' class='latex' /> using the current estimate of <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bw%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{w}' title='\mathbf{w}' class='latex' />. When the prediction is correct (equal to the label <img src='http://s.wordpress.com/latex.php?latex=y_i&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='y_i' title='y_i' class='latex' />), the algorithm jumps to the next example. When the prediction is incorrect, in order to correct for the mistake, if <img src='http://s.wordpress.com/latex.php?latex=y_i%20%3D%20%2B1&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='y_i = +1' title='y_i = +1' class='latex' /> then <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bw%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{w}' title='\mathbf{w}' class='latex' /> is incremented by <img src='http://s.wordpress.com/latex.php?latex=%5Cphi%28%5Cmathbf%7Bx%7D_i%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\phi(\mathbf{x}_i)' title='\phi(\mathbf{x}_i)' class='latex' />, otherwise it is decremented by <img src='http://s.wordpress.com/latex.php?latex=%5Cphi%28%5Cmathbf%7Bx%7D_i%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\phi(\mathbf{x}_i)' title='\phi(\mathbf{x}_i)' class='latex' />. Since <img src='http://s.wordpress.com/latex.php?latex=y_i%20%5Cin%20%5C%7B-1%2C%2B1%5C%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='y_i \in \{-1,+1\}' title='y_i \in \{-1,+1\}' class='latex' />, the update rule is thus:</p>
<img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bw%7D%20%3D%20%5Cmathbf%7Bw%7D%20%2B%20y_i%20%5Cphi%28%5Cmathbf%7Bx%7D_i%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{w} = \mathbf{w} + y_i \phi(\mathbf{x}_i)' title='\mathbf{w} = \mathbf{w} + y_i \phi(\mathbf{x}_i)' class='latex' />
<p><center><img src="http://www.mblondel.org/images/perceptron_update.png" /></center><br />
Figure 1: The decision boundary (hyperplane), to which <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bw%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{w}' title='\mathbf{w}' class='latex' /> is normal, partitions the vector space into two sets, one for each class. An update changes the direction of the decision boundary to correct for the incorrect prediction. (From this <a href="http://www.cse.wustl.edu/~mgeorg/readPapers/kernelTrick/kernels.pdf">document</a>)</p>
<p><center><img src="http://www.mblondel.org/images/perceptron_linear.png" /></center><br />
Figure 2: Decision boundary in the linear case. </p>
<h3>Kernel Perceptron</h3>
<p>From the update rule above, we clearly see that, if <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bw%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{w}' title='\mathbf{w}' class='latex' /> is initialized to the zero vector, it is a linear combination of the training examples.</p>
<img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bw%7D%20%3D%20%5Csum_i%20%5Calpha_i%20y_i%20%5Cphi%28%5Cmathbf%7Bx%7D_i%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{w} = \sum_i \alpha_i y_i \phi(\mathbf{x}_i)' title='\mathbf{w} = \sum_i \alpha_i y_i \phi(\mathbf{x}_i)' class='latex' />
<p>Injecting <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bw%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{w}' title='\mathbf{w}' class='latex' /> into the prediction function <img src='http://s.wordpress.com/latex.php?latex=f%28%5Cmathbf%7Bx%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f(\mathbf{x})' title='f(\mathbf{x})' class='latex' />, we get</p>
<img src='http://s.wordpress.com/latex.php?latex=f%28%5Cmathbf%7Bx%7D%29%20%3D%20sign%28%5Csum_i%20%5Calpha_i%20y_i%20%5Cphi%28%5Cmathbf%7Bx%7D_i%29%5Ccdot%5Cphi%28%5Cmathbf%7Bx%7D%29%29%20%3D%20sign%28%5Csum_i%20%5Calpha_i%20y_i%20K%28%5Cmathbf%7Bx%7D_i%2C%5Cmathbf%7Bx%7D%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f(\mathbf{x}) = sign(\sum_i \alpha_i y_i \phi(\mathbf{x}_i)\cdot\phi(\mathbf{x})) = sign(\sum_i \alpha_i y_i K(\mathbf{x}_i,\mathbf{x}))' title='f(\mathbf{x}) = sign(\sum_i \alpha_i y_i \phi(\mathbf{x}_i)\cdot\phi(\mathbf{x})) = sign(\sum_i \alpha_i y_i K(\mathbf{x}_i,\mathbf{x}))' class='latex' />
<p>where <img src='http://s.wordpress.com/latex.php?latex=K%28%5Cmathbf%7Bx%7D_i%2C%5Cmathbf%7Bx%7D%29%20%3D%20%5Cphi%28%5Cmathbf%7Bx%7D_i%29%5Ccdot%5Cphi%28%5Cmathbf%7Bx%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='K(\mathbf{x}_i,\mathbf{x}) = \phi(\mathbf{x}_i)\cdot\phi(\mathbf{x})' title='K(\mathbf{x}_i,\mathbf{x}) = \phi(\mathbf{x}_i)\cdot\phi(\mathbf{x})' class='latex' /> is a Mercer kernel.</p>
<p>The update rule for when a mistake is made predicting <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bx%7D_i&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{x}_i' title='\mathbf{x}_i' class='latex' /> now simply becomes</p>
<img src='http://s.wordpress.com/latex.php?latex=%5Calpha_i%20%3D%20%5Calpha_i%20%2B%201&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\alpha_i = \alpha_i + 1' title='\alpha_i = \alpha_i + 1' class='latex' />
<p>i.e., <img src='http://s.wordpress.com/latex.php?latex=%5Calpha_i&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\alpha_i' title='\alpha_i' class='latex' /> is the number of times a mistake has been made with respect to <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bx%7D_i&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{x}_i' title='\mathbf{x}_i' class='latex' />.</p>
<p>A few remarks:</p>
<ul>
<li>Instead of learning a weight vector <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bw%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{w}' title='\mathbf{w}' class='latex' /> with respect to features, we&#8217;re now learning a weight vector <img src='http://s.wordpress.com/latex.php?latex=%5Cboldsymbol%7B%5Calpha%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\boldsymbol{\alpha}' title='\boldsymbol{\alpha}' class='latex' /> with respect to examples.</li>
<li>To predict the class of an input <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bx%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{x}' title='\mathbf{x}' class='latex' />, we need to store the training examples <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bx%7D_i&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{x}_i' title='\mathbf{x}_i' class='latex' />. Kernel methods are memory-based methods, like k-NN. However, we only need to store examples for which a mistake has been made, i.e. <img src='http://s.wordpress.com/latex.php?latex=%7B%5Calpha%7D_i%20%5Cneq%200%20&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='{\alpha}_i \neq 0 ' title='{\alpha}_i \neq 0 ' class='latex' />. In the context of Support Vector Machines (SVMs), these are called support vectors. SVMs, however, not only store examples for which a mistake has been made, they also store examples that lie inside the margin, i.e. <img src='http://s.wordpress.com/latex.php?latex=y_i%20%28%5Cmathbf%7Bw%7D%20%5Ccdot%20%5Cphi%28%5Cmathbf%7Bx%7D_i%29%29%20%5Cle%201&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='y_i (\mathbf{w} \cdot \phi(\mathbf{x}_i)) \le 1' title='y_i (\mathbf{w} \cdot \phi(\mathbf{x}_i)) \le 1' class='latex' />. (See Figure 4 from my <a href="http://www.mblondel.org/journal/2010/09/19/support-vector-machines-in-python/">post on SVMs</a>)</li>
<li>In the online learning setting, the number of support vectors can grow and grow as more mistakes are made. The Forgetron is an extension of the Kernel Perceptron which can learn with a &#8220;memory budget&#8221;. When the budget is exceeded, some support vectors are &#8220;forgotten&#8221;.</li>
<li>For some kinds of objects like sequences, trees and graphs, it might be difficult to map objects to feature vectors while it is easy to come up with a similarity measure <img src='http://s.wordpress.com/latex.php?latex=K%28%5Cmathbf%7Bx%7D_i%2C%5Cmathbf%7Bx%7D_j%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='K(\mathbf{x}_i,\mathbf{x}_j)' title='K(\mathbf{x}_i,\mathbf{x}_j)' class='latex' /> between any two objects <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bx%7D_i&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{x}_i' title='\mathbf{x}_i' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bx%7D_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{x}_j' title='\mathbf{x}_j' class='latex' />. In this case, kernel methods are very useful, since they can be used &#8220;as is&#8221;.</li>
</ul>
<p>For a brief and intuitive introduction to kernel classifiers, I highly recommend <a href="http://128.148.32.110/courses/csci1950-f/docs/lecture_10-27.pdf">these slides</a>, by Mark Johnson.</p>
<p><center><img src="http://www.mblondel.org/images/perceptron_gaussian.png" /></center><br />
Figure 3: Decision boundary when using a gaussian kernel. Green dots indicate support vectors.</p>
<p>The voted and average Perceptrons are two straightforward extensions to the Perceptron, which for some applications are competitive with SVMs. For details, see &#8220;Large Margin Classiﬁcation Using the Perceptron Algorithm&#8221; by Freund and Schapire.</p>
<h3>Source</h3>
<p><a href="http://gist.github.com/656147">http://gist.github.com/656147</a></p>
<span class="net_nemein_favourites">0 <a href="http://maemo.org/news/?net_nemein_favourites_execute=fav&net_nemein_favourites_execute_for=457b67d41b6011e0a0e7696c1a6799069906&net_nemein_favourites_url=https://maemo.org/news/favorites//json/fav/midgard_article/457b67d41b6011e0a0e7696c1a6799069906/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-favorite.png" style="border: none;" alt="Add to favourites" title="Add to favourites" /></a>0 <a href="http://maemo.org/news/?net_nemein_favourites_execute=bury&net_nemein_favourites_execute_for=457b67d41b6011e0a0e7696c1a6799069906&net_nemein_favourites_url=https://maemo.org/news/favorites//json/bury/midgard_article/457b67d41b6011e0a0e7696c1a6799069906/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-buried.png" style="border: none;" alt="Bury" title="Bury" /></a></span>]]></description>
            <author>Mathieu Blondel &lt;mblondel@rubyforge.org&gt;</author>
            <category>feed:46b1d6b26651a331cde2ad188d699e0c</category>
            <pubDate>Sun, 31 Oct 2010 05:36:32 +0000</pubDate>
            <guid>http://maemo.org/midcom-permalink-457b67d41b6011e0a0e7696c1a6799069906</guid>
        </item>
        <item>
            <title>Support Vector Machines in Python</title>
            <link>http://www.mblondel.org/journal/2010/09/19/support-vector-machines-in-python/</link>
            <description><![CDATA[
<p><a href="http://en.wikipedia.org/wiki/Support_vector_machine">Support Vector Machines</a> (SVM) are the state-of-the-art classifier in many applications and have become ubiquitous thanks to the wealth of open-source libraries implementing them. However, you learn a lot more by actually doing than by just reading, so let&#8217;s play a little bit with SVM in Python! To make it easier to read, we will use the same notation as in Christopher Bishop&#8217;s book &#8220;Pattern Recognition and Machine Learning&#8221;.<br />
<span id="more-128"></span></p>
<h3>Maximum margin</h3>
<p>In SVM, the class of an input vector <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bx%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{x}' title='\mathbf{x}' class='latex' /> can be decided by evaluating the sign of <img src='http://s.wordpress.com/latex.php?latex=y%28%5Cmathbf%7Bx%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='y(\mathbf{x})' title='y(\mathbf{x})' class='latex' />.</p>
<img src='http://s.wordpress.com/latex.php?latex=%287.1%29~y%28%5Cmathbf%7Bx%7D%29%20%3D%20%5Cmathbf%7Bw%7D%5ET%5Cphi%28%5Cmathbf%7Bx%7D%29%2Bb&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(7.1)~y(\mathbf{x}) = \mathbf{w}^T\phi(\mathbf{x})+b' title='(7.1)~y(\mathbf{x}) = \mathbf{w}^T\phi(\mathbf{x})+b' class='latex' />
<p>If <img src='http://s.wordpress.com/latex.php?latex=y%28%5Cmathbf%7Bx%7D%29%20%3E%200&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='y(\mathbf{x}) &gt; 0' title='y(\mathbf{x}) &gt; 0' class='latex' /> we assign <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bx%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{x}' title='\mathbf{x}' class='latex' />  to class +1 and if <img src='http://s.wordpress.com/latex.php?latex=y%28%5Cmathbf%7Bx%7D%29%20%3C%200&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='y(\mathbf{x}) &lt; 0' title='y(\mathbf{x}) &lt; 0' class='latex' />, we assign it to class -1. Here <img src='http://s.wordpress.com/latex.php?latex=%5Cphi%28%5Cmathbf%7Bx%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\phi(\mathbf{x})' title='\phi(\mathbf{x})' class='latex' /> is a feature-space transformation, which can map <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bx%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{x}' title='\mathbf{x}' class='latex' /> to a space of higher, possibly infinite, dimensions.</p>
<p>Given a data set comprising <img src='http://s.wordpress.com/latex.php?latex=N&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='N' title='N' class='latex' /> input vectors <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bx%7D_1%2C%5Cdots%2C%5Cmathbf%7Bx%7D_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{x}_1,\dots,\mathbf{x}_n' title='\mathbf{x}_1,\dots,\mathbf{x}_n' class='latex' /> and their corresponding labels <img src='http://s.wordpress.com/latex.php?latex=t_1%2C%5Cdots%2Ct_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='t_1,\dots,t_n' title='t_1,\dots,t_n' class='latex' />, where <img src='http://s.wordpress.com/latex.php?latex=t_n%20%5Cin%20%5C%7B-1%2C%2B1%5C%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='t_n \in \{-1,+1\}' title='t_n \in \{-1,+1\}' class='latex' />, we would like to find <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bw%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{w}' title='\mathbf{w}' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=b&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='b' title='b' class='latex' /> such that it explains the training data: <img src='http://s.wordpress.com/latex.php?latex=y%28%5Cmathbf%7Bx%7D_n%29%20%5Cge%201&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='y(\mathbf{x}_n) \ge 1' title='y(\mathbf{x}_n) \ge 1' class='latex' /> when <img src='http://s.wordpress.com/latex.php?latex=t_n%3D%2B1&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='t_n=+1' title='t_n=+1' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=y%28%5Cmathbf%7Bx%7D_n%29%20%5Cle%201&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='y(\mathbf{x}_n) \le 1' title='y(\mathbf{x}_n) \le 1' class='latex' /> when <img src='http://s.wordpress.com/latex.php?latex=t_n%3D-1&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='t_n=-1' title='t_n=-1' class='latex' />. This can be rewritten in a single constraint:</p>
<img src='http://s.wordpress.com/latex.php?latex=%287.5%29~t_n%20%28%5Cmathbf%7Bw%7D%5ET%5Cphi%28%5Cmathbf%7Bx%7D%29%2Bb%29%20%5Cge%201%2C~n%3D1%2C%5Cdots%2CN&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(7.5)~t_n (\mathbf{w}^T\phi(\mathbf{x})+b) \ge 1,~n=1,\dots,N' title='(7.5)~t_n (\mathbf{w}^T\phi(\mathbf{x})+b) \ge 1,~n=1,\dots,N' class='latex' />
<p>In addition, <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bw%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{w}' title='\mathbf{w}' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=b&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='b' title='b' class='latex' /> are chosen so that the distance between the decision boundary <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bw%7D%5ET%5Cphi%28%5Cmathbf%7Bx%7D%29%2Bb%3D0&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{w}^T\phi(\mathbf{x})+b=0' title='\mathbf{w}^T\phi(\mathbf{x})+b=0' class='latex' /> (a line in the 2-d case, a plane in the 3-d case, a hyperplane in the n-d case) and the closest points to it is maximized. This distance is called the margin, hence the name <em>maximum margin</em> classifier. Geometrically, the margin is found to be <img src='http://s.wordpress.com/latex.php?latex=2%2F%5C%7C%5Cmathbf%7Bw%7D%5C%7C%5E2&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='2/\|\mathbf{w}\|^2' title='2/\|\mathbf{w}\|^2' class='latex' /> and so the maximum margin problem can be equivalently expressed as the minimization problem:</p>
<img src='http://s.wordpress.com/latex.php?latex=%287.6%29~arg%20min_%7B%5Cmathbf%7Bw%7D%2Cb%7D%5Cfrac%7B1%7D%7B2%7D%5C%7C%5Cmathbf%7Bw%7D%5C%7C%5E2&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(7.6)~arg min_{\mathbf{w},b}\frac{1}{2}\|\mathbf{w}\|^2' title='(7.6)~arg min_{\mathbf{w},b}\frac{1}{2}\|\mathbf{w}\|^2' class='latex' />
<p>subject to constraint (7.5).</p>
<p><img src="http://www.mblondel.org/images/svm_linear.png" /></p>
<p>Figure 1: The linearly separable case. The decision line is the plain line and the margin is the gap between the two dotted lines. Only 3 support vectors (green dots) out of 180 training examples are necessary.</p>
<h3>Dual representation</h3>
<p>Since this is a constrained optimization problem, we can introduce <a href="http://en.wikipedia.org/wiki/Lagrange_multipliers">Lagrange multipliers</a> <img src='http://s.wordpress.com/latex.php?latex=a_n%20%5Cge%200&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='a_n \ge 0' title='a_n \ge 0' class='latex' /> (one per training example), differentiate the Lagrangian function with respect to <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bw%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{w}' title='\mathbf{w}' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=b&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='b' title='b' class='latex' /> and inject the solution back in the Lagrangian function (equations (7.7) to (7.9) in Bishop&#8217;s book), so that it depends only on the Lagrange multipliers. Doing so, we find that the maximum margin equivalently emerges from the maximization of</p>
<img src='http://s.wordpress.com/latex.php?latex=%287.10%29~%5Ctilde%7BL%7D%28%5Cmathbf%7Ba%7D%29%20%3D%20%5Csum_%7Bn%3D1%7D%5ENa_n-%5Cfrac%7B1%7D%7B2%7D%5Csum_%7Bn%3D1%7D%5EN%5Csum_%7Bm%3D1%7D%5EN%20a_n%20a_m%20t_n%20t_m%20k%28%5Cmathbf%7Bx%7D_n%2C%5Cmathbf%7Bx%7D_m%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(7.10)~\tilde{L}(\mathbf{a}) = \sum_{n=1}^Na_n-\frac{1}{2}\sum_{n=1}^N\sum_{m=1}^N a_n a_m t_n t_m k(\mathbf{x}_n,\mathbf{x}_m)' title='(7.10)~\tilde{L}(\mathbf{a}) = \sum_{n=1}^Na_n-\frac{1}{2}\sum_{n=1}^N\sum_{m=1}^N a_n a_m t_n t_m k(\mathbf{x}_n,\mathbf{x}_m)' class='latex' />
<p>subject to the constraints</p>
<img src='http://s.wordpress.com/latex.php?latex=%287.11%29~a_n%20%5Cge%200%2C~n%3D1%2C%5Cdots%2CN&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(7.11)~a_n \ge 0,~n=1,\dots,N' title='(7.11)~a_n \ge 0,~n=1,\dots,N' class='latex' /><br />
</br><br />
<img src='http://s.wordpress.com/latex.php?latex=%287.12%29~%5Csum_%7Bn%3D1%7D%5ENa_nt_n%20%3D%200&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(7.12)~\sum_{n=1}^Na_nt_n = 0' title='(7.12)~\sum_{n=1}^Na_nt_n = 0' class='latex' />
<p>This is the so-called <em>dual representation</em> and is a <a href="http://en.wikipedia.org/wiki/Quadratic_programming">quadratic programming</a> (QP) problem. <img src='http://s.wordpress.com/latex.php?latex=k%28%5Cmathbf%7Bx%7D_n%2C%5Cmathbf%7Bx%7D_m%29%3D%20%5Cphi%28%5Cmathbf%7Bx%7D_n%29%5ET%5Cphi%28%5Cmathbf%7Bx%7D_m%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='k(\mathbf{x}_n,\mathbf{x}_m)= \phi(\mathbf{x}_n)^T\phi(\mathbf{x}_m)' title='k(\mathbf{x}_n,\mathbf{x}_m)= \phi(\mathbf{x}_n)^T\phi(\mathbf{x}_m)' class='latex' /> is called the kernel function.</p>
<p>Similarly to the objective function, <img src='http://s.wordpress.com/latex.php?latex=y%28%5Cmathbf%7Bx%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='y(\mathbf{x})' title='y(\mathbf{x})' class='latex' /> can also be re-expressed solely in terms of the Lagrange multipliers.</p>
<img src='http://s.wordpress.com/latex.php?latex=%287.13%29~y%28%5Cmathbf%7Bx%7D%29%3D%20%5Csum_%7Bn%3D1%7D%5EN%20a_n%20t_n%20k%28%5Cmathbf%7Bx%7D%2C%5Cmathbf%7Bx%7D_n%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(7.13)~y(\mathbf{x})= \sum_{n=1}^N a_n t_n k(\mathbf{x},\mathbf{x}_n)' title='(7.13)~y(\mathbf{x})= \sum_{n=1}^N a_n t_n k(\mathbf{x},\mathbf{x}_n)' class='latex' />
<p>The important thing to notice here is that we&#8217;ve gone from a sum over <img src='http://s.wordpress.com/latex.php?latex=M&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='M' title='M' class='latex' /> dimensions (the <a href="http://en.wikipedia.org/wiki/Dot_product">dot product</a> in equation (7.1)) to a sum over <img src='http://s.wordpress.com/latex.php?latex=N&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='N' title='N' class='latex' /> points. This may seem like a disadvantage as the number of training examples <img src='http://s.wordpress.com/latex.php?latex=N&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='N' title='N' class='latex' /> is usually bigger than the number of dimensions <img src='http://s.wordpress.com/latex.php?latex=M&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='M' title='M' class='latex' />. However, this is very useful and is called the <a href="http://en.wikipedia.org/wiki/Kernel_trick">kernel trick</a>: this allows to use SVM, originally a linear classifier, to solve a non-linear problem by mapping the original non-linear observations into a higher-dimensional space, but without explicitly computing <img src='http://s.wordpress.com/latex.php?latex=%5Cphi%28%5Cmathbf%7Bx%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\phi(\mathbf{x})' title='\phi(\mathbf{x})' class='latex' />.</p>
<p>In many situations, only a small proportion of the Lagrange multipliers <img src='http://s.wordpress.com/latex.php?latex=a_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='a_n' title='a_n' class='latex' /> will be non-zero, therefore we only need to store the corresponding training examples <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bx%7D_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{x}_n' title='\mathbf{x}_n' class='latex' />. These are called the <em>support vectors</em> and this is why SVMs are sometimes called sparse kernel machines.</p>
<p>That being said, in the linear case, i.e. when <img src='http://s.wordpress.com/latex.php?latex=%5Cphi%28%5Cmathbf%7Bx%7D%29%3D%5Cmathbf%7Bx%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\phi(\mathbf{x})=\mathbf{x}' title='\phi(\mathbf{x})=\mathbf{x}' class='latex' />, it is faster to directly compute <img src='http://s.wordpress.com/latex.php?latex=y%28%5Cmathbf%7Bx%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='y(\mathbf{x})' title='y(\mathbf{x})' class='latex' /> from equation (7.1). <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bw%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{w}' title='\mathbf{w}' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=b&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='b' title='b' class='latex' /> can be computed in terms of the Lagrange multipliers by equations (7.8) and (7.18) in Bishop&#8217;s book.</p>
<p><img src="http://www.mblondel.org/images/svm_non_linear.png" /></p>
<p>Figure 2: The non-linearly separable case. Example of a gaussian kernel with parameter sigma=5.0. Perfect prediction is achieved on the held-out 20 data points.</p>
<h3>QP solver</h3>
<p>We want to find the Lagrange multipliers <img src='http://s.wordpress.com/latex.php?latex=a_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='a_n' title='a_n' class='latex' /> maximizing equation (7.10) subject to the constraints (7.11) and (7.11). This can be done by a standard QP solver such as <a href="http://abel.ee.ucla.edu/cvxopt/">cvxopt</a>.</p>
<p>Minimize</p>
<img src='http://s.wordpress.com/latex.php?latex=%5Cfrac%7B1%7D%7B2%7D%20%5Cmathbf%7Bx%7D%5ET%20P%5Cmathbf%7Bx%7D%20%2B%20%5Cmathbf%7Bq%7D%5ET%20%5Cmathbf%7Bx%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\frac{1}{2} \mathbf{x}^T P\mathbf{x} + \mathbf{q}^T \mathbf{x}' title='\frac{1}{2} \mathbf{x}^T P\mathbf{x} + \mathbf{q}^T \mathbf{x}' class='latex' />
<p>subject to</p>
<img src='http://s.wordpress.com/latex.php?latex=G%5Cmathbf%7Bx%7D%20%5Cleq%20%5Cmathbf%20h&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G\mathbf{x} \leq \mathbf h' title='G\mathbf{x} \leq \mathbf h' class='latex' /> (inequality constraint)<br />
</br><br />
<img src='http://s.wordpress.com/latex.php?latex=A%5Cmathbf%7Bx%7D%20%3D%20%5Cmathbf%20b&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='A\mathbf{x} = \mathbf b' title='A\mathbf{x} = \mathbf b' class='latex' /> (equality constraint)</p>
<p>The unknow is <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bx%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{x}' title='\mathbf{x}' class='latex' />, which in our case correspond to the Lagrange multipliers <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Ba%7D%3Da_1%2C%5Cdots%2Ca_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{a}=a_1,\dots,a_n' title='\mathbf{a}=a_1,\dots,a_n' class='latex' />. We just need to rework the formulation a little bit to use matrix notation and be a minimization (hence the -1 multiplicative factors).</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;"># Gram matrix</span>
K = np.<span style="color: black;">zeros</span><span style="color: black;">&#40;</span><span style="color: black;">&#40;</span>n_samples, n_samples<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span>n_samples<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">for</span> j <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span>n_samples<span style="color: black;">&#41;</span>:
        K<span style="color: black;">&#91;</span>i,j<span style="color: black;">&#93;</span> = <span style="color: #008000;">self</span>.<span style="color: black;">kernel</span><span style="color: black;">&#40;</span>X<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span>, X<span style="color: black;">&#91;</span>j<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
&nbsp;
P = cvxopt.<span style="color: black;">matrix</span><span style="color: black;">&#40;</span>np.<span style="color: black;">outer</span><span style="color: black;">&#40;</span>y,y<span style="color: black;">&#41;</span> <span style="color: #66cc66;">*</span> K<span style="color: black;">&#41;</span>
q = cvxopt.<span style="color: black;">matrix</span><span style="color: black;">&#40;</span>np.<span style="color: black;">ones</span><span style="color: black;">&#40;</span>n_samples<span style="color: black;">&#41;</span> <span style="color: #66cc66;">*</span> -<span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span>
A = cvxopt.<span style="color: black;">matrix</span><span style="color: black;">&#40;</span>y, <span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span>,n_samples<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
b = cvxopt.<span style="color: black;">matrix</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">0.0</span><span style="color: black;">&#41;</span>
G = cvxopt.<span style="color: black;">matrix</span><span style="color: black;">&#40;</span>np.<span style="color: black;">diag</span><span style="color: black;">&#40;</span>np.<span style="color: black;">ones</span><span style="color: black;">&#40;</span>n_samples<span style="color: black;">&#41;</span> <span style="color: #66cc66;">*</span> -<span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
h = cvxopt.<span style="color: black;">matrix</span><span style="color: black;">&#40;</span>np.<span style="color: black;">zeros</span><span style="color: black;">&#40;</span>n_samples<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># Solve QP problem</span>
solution = cvxopt.<span style="color: black;">solvers</span>.<span style="color: black;">qp</span><span style="color: black;">&#40;</span>P, q, G, h, A, b<span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># Lagrange multipliers</span>
a = np.<span style="color: black;">ravel</span><span style="color: black;">&#40;</span>solution<span style="color: black;">&#91;</span><span style="color: #483d8b;">'x'</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>Note here that <img src='http://s.wordpress.com/latex.php?latex=P&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='P' title='P' class='latex' /> is a <img src='http://s.wordpress.com/latex.php?latex=N%20%5Ctimes%20N&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='N \times N' title='N \times N' class='latex' /> matrix. Thus, a standard QP solver can&#8217;t be used for a large number of training examples, as P needs to be stored in memory. There exists a number of algorithms in order to decompose the original QP problem into smaller QP problems that target only a few training samples at a time. One such algorithm is <a href="http://en.wikipedia.org/wiki/Sequential_Minimal_Optimization">Sequential Minimal Optimization</a> (SMO). One advantage of SMO is that the smaller QP problems can be solved analytically and so SMO doesn&#8217;t even need a QP solver.</p>
<h3>Soft margin</h3>
<p>The problem with the formulation we have used thus far is that it doesn&#8217;t allow for misclassification of the training examples. This can lead to poor generalization if there is overlap between the distributions of the two classes. To solve this problem, we can rework constraint (7.5) as</p>
<p><img src='http://s.wordpress.com/latex.php?latex=%287.20%29~t_n%20y%28%5Cmathbf%7Bx%7D_n%29%20%5Cge%201%20-%20%5Cxi_n%2C~n%3D1%2C%5Cdots%2CN&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(7.20)~t_n y(\mathbf{x}_n) \ge 1 - \xi_n,~n=1,\dots,N' title='(7.20)~t_n y(\mathbf{x}_n) \ge 1 - \xi_n,~n=1,\dots,N' class='latex' /><br />
</br><br />
<img src='http://s.wordpress.com/latex.php?latex=%5Cxi_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\xi_n' title='\xi_n' class='latex' /> are called slack variables and are introduced to allow the misclassification of some examples. If <img src='http://s.wordpress.com/latex.php?latex=%5Cxi_n%3D0&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\xi_n=0' title='\xi_n=0' class='latex' />, the corresponding training example is correctly classified. If <img src='http://s.wordpress.com/latex.php?latex=0%20%3C%20%5Cxi_n%20%5Cle%201&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='0 &lt; \xi_n \le 1' title='0 &lt; \xi_n \le 1' class='latex' />, the training example lies inside the margin but is still on the correct side of the decision boundary. If <img src='http://s.wordpress.com/latex.php?latex=%5Cxi_n%20%3E%201&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\xi_n &gt; 1' title='\xi_n &gt; 1' class='latex' />, the training example is misclassified. Equation (7.6) then becomes</p>
<p><img src='http://s.wordpress.com/latex.php?latex=%287.21%29~C%20%5Csum_%7Bn%3D1%7D%5EN%20%5Cxi_n%20%2B%20%5Cfrac%7B1%7D%7B2%7D%5C%7C%5Cmathbf%7Bw%7D%5C%7C%5E2&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(7.21)~C \sum_{n=1}^N \xi_n + \frac{1}{2}\|\mathbf{w}\|^2' title='(7.21)~C \sum_{n=1}^N \xi_n + \frac{1}{2}\|\mathbf{w}\|^2' class='latex' /><br />
</br></p>
<p><img src='http://s.wordpress.com/latex.php?latex=C%20%3E%200&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='C &gt; 0' title='C &gt; 0' class='latex' /> is the parameter which controls the trade-off between the slack variable penality and the margin. Again, we can introduce Lagrange multipliers, derive the Lagrangian function with respect to <img src='http://s.wordpress.com/latex.php?latex=%5Cmathbf%7Bw%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mathbf{w}' title='\mathbf{w}' class='latex' />, <img src='http://s.wordpress.com/latex.php?latex=b&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='b' title='b' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=%5Cxi_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\xi_n' title='\xi_n' class='latex' />, and inject the solutions back in the Lagragian function (equations (7.22) to (7.31) in Bishop&#8217;s book).</p>
<p><img src='http://s.wordpress.com/latex.php?latex=%287.32%29~%5Ctilde%7BL%7D%28%5Cmathbf%7Ba%7D%29%20%3D%20%5Csum_%7Bn%3D1%7D%5ENa_n-%5Cfrac%7B1%7D%7B2%7D%5Csum_%7Bn%3D1%7D%5EN%5Csum_%7Bm%3D1%7D%5EN%20a_n%20a_m%20t_n%20t_m%20k%28%5Cmathbf%7Bx%7D_n%2C%5Cmathbf%7Bx%7D_m%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(7.32)~\tilde{L}(\mathbf{a}) = \sum_{n=1}^Na_n-\frac{1}{2}\sum_{n=1}^N\sum_{m=1}^N a_n a_m t_n t_m k(\mathbf{x}_n,\mathbf{x}_m)' title='(7.32)~\tilde{L}(\mathbf{a}) = \sum_{n=1}^Na_n-\frac{1}{2}\sum_{n=1}^N\sum_{m=1}^N a_n a_m t_n t_m k(\mathbf{x}_n,\mathbf{x}_m)' class='latex' />
<p>Which is identical to the hard margin case! The constraints become:</p>
<img src='http://s.wordpress.com/latex.php?latex=%287.33%29~0%20%5Cle%20a_n%20%5Cle%20C&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(7.33)~0 \le a_n \le C' title='(7.33)~0 \le a_n \le C' class='latex' /><br />
</br><br />
<img src='http://s.wordpress.com/latex.php?latex=%287.34%29~%5Csum_%7Bn%3D1%7D%5EN%20a_n%20t_n%20%3D%200&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(7.34)~\sum_{n=1}^N a_n t_n = 0' title='(7.34)~\sum_{n=1}^N a_n t_n = 0' class='latex' />
<p>Interestingly, the slack variables <img src='http://s.wordpress.com/latex.php?latex=%5Cxi_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\xi_n' title='\xi_n' class='latex' /> have vanished and the only difference with the hard margin is that the inequality constraint now has an upper bound, <img src='http://s.wordpress.com/latex.php?latex=C&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='C' title='C' class='latex' />.</p>
<p>The attentive reader will have noticed that the inequality constraints in cvxopt have an upper bound but no lower bound.</p>
<img src='http://s.wordpress.com/latex.php?latex=G%5Cmathbf%7Bx%7D%20%5Cleq%20%5Cmathbf%20h&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G\mathbf{x} \leq \mathbf h' title='G\mathbf{x} \leq \mathbf h' class='latex' />
<p>The trick is to rewrite constraint (7.33) as a system of inequations, in matrix notation. Example with 2 training examples:</p>
<img src='http://s.wordpress.com/latex.php?latex=%5Cbegin%7Bpmatrix%7D-1%20%26%200%20%5C%5C%200%20%26%20-1%20%5C%5C%201%20%26%200%20%20%5C%5C%200%20%26%201%5Cend%7Bpmatrix%7D%5Cbegin%7Bpmatrix%7Da_1%5C%5C%20a_2%5Cend%7Bpmatrix%7D%20%5Cle%20%5Cbegin%7Bpmatrix%7D0%20%5C%5C%200%20%5C%5C%20C%20%5C%5C%20C%5Cend%7Bpmatrix%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\begin{pmatrix}-1 &amp; 0 \\ 0 &amp; -1 \\ 1 &amp; 0  \\ 0 &amp; 1\end{pmatrix}\begin{pmatrix}a_1\\ a_2\end{pmatrix} \le \begin{pmatrix}0 \\ 0 \\ C \\ C\end{pmatrix}' title='\begin{pmatrix}-1 &amp; 0 \\ 0 &amp; -1 \\ 1 &amp; 0  \\ 0 &amp; 1\end{pmatrix}\begin{pmatrix}a_1\\ a_2\end{pmatrix} \le \begin{pmatrix}0 \\ 0 \\ C \\ C\end{pmatrix}' class='latex' />
<p><img src="http://www.mblondel.org/images/svm_hard.png" /></p>
<p>Figure 3: The hard margin case. 180 vectors out of 180 are support vectors! And the classifier only achieves 11 correct predictions out of 20, on held-out data.</p>
<p><img src="http://www.mblondel.org/images/svm_soft.png" /></p>
<p>Figure 4: The soft margin case (C=0.1). 36 vectors out of 180 are support vectors. The classifier achieves 19 correct predictions out of 20!</p>
<h3>Source</h3>
<p><a href="http://gist.github.com/586753">http://gist.github.com/586753</a></p>
<h3>References</h3>
<p>Pattern Recognition and Machine Learning, Christopher Bishop, 2006.</p>
<p><a href="http://d.hatena.ne.jp/aidiary/20100503/1272889097">ソフトマージンSVM</a>, 人工知能に関する断想録</p>
<span class="net_nemein_favourites">0 <a href="http://maemo.org/news/?net_nemein_favourites_execute=fav&net_nemein_favourites_execute_for=43ea3d6e1b6011e0a0e7696c1a6799069906&net_nemein_favourites_url=https://maemo.org/news/favorites//json/fav/midgard_article/43ea3d6e1b6011e0a0e7696c1a6799069906/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-favorite.png" style="border: none;" alt="Add to favourites" title="Add to favourites" /></a>0 <a href="http://maemo.org/news/?net_nemein_favourites_execute=bury&net_nemein_favourites_execute_for=43ea3d6e1b6011e0a0e7696c1a6799069906&net_nemein_favourites_url=https://maemo.org/news/favorites//json/bury/midgard_article/43ea3d6e1b6011e0a0e7696c1a6799069906/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-buried.png" style="border: none;" alt="Bury" title="Bury" /></a></span>]]></description>
            <author>Mathieu Blondel &lt;mblondel@rubyforge.org&gt;</author>
            <category>feed:46b1d6b26651a331cde2ad188d699e0c</category>
            <pubDate>Sun, 19 Sep 2010 14:07:15 +0000</pubDate>
            <guid>http://maemo.org/midcom-permalink-43ea3d6e1b6011e0a0e7696c1a6799069906</guid>
        </item>
        <item>
            <title>Latent Dirichlet Allocation in Python</title>
            <link>http://www.mblondel.org/journal/2010/08/21/latent-dirichlet-allocation-in-python/</link>
            <description><![CDATA[
<p>Like Latent Semantic Analysis (LSA) and probabilistic LSA (pLSA) &#8211; see my previous post &#8220;<a href="http://www.mblondel.org/journal/2010/06/13/lsa-and-plsa-in-python/">LSA and pLSA in Python</a>&#8220;, Latent Dirichlet Allocation (LDA) is an algorithm which, given a collection of documents and nothing more (no supervision needed), can uncover the &#8220;topics&#8221; expressed by documents in that collection. LDA can be seen as a <a href="http://en.wikipedia.org/wiki/Bayesian_inference">Bayesian</a> extension of pLSA.<br />
<span id="more-127"></span><br />
As Blei, the author of LDA, points out, the topic proportions in pLSA are tied with the training documents. This is problematic: 1) the number of parameters grows linearly with the number of training documents, which can cause serious <a href="http://en.wikipedia.org/wiki/Overfitting">overfitting</a> 2) it is difficult to generalize to new documents and requires so-called &#8220;folding-in&#8221;. LDA fixes those issues by being a fully generative model: where pLSA uses a matrix of P(topic|document) probabilities, LDA uses a distribution over topics.</p>
<p>To date, there exists several parameter estimation schemes for LDA: variational Bayes, expectation propagation and Gibbs sampling. I&#8217;ve chosen to implement the latter. It has first been described in a paper entitled &#8220;Finding scientific topics&#8221;, by Griffiths and Steyvers.</p>
<h3>Artificial data</h3>
<p>As with all model-based algorithms, during the early development phase, it is useful to work with artificial data, generated by following the model assumptions. In the case of LDA (and pLSA), the core assumption is that words (w) in documents are generated by mixture of topics (z). In other words, the probability of a word is:</p>
<img src='http://s.wordpress.com/latex.php?latex=P%28w%29%20%3D%20%5Csum_%7Bz%7D%20P%28w%7Cz%29%20P%28z%29&#038;bg=ffffff&#038;fg=000000&#038;s=1' alt='P(w) = \sum_{z} P(w|z) P(z)' title='P(w) = \sum_{z} P(w|z) P(z)' class='latex' />
<p>The generative process can be summarized as follows: 1) set the topic proportions once for all when the collection is instantiated and 2) for each document and for as many words as needed, sample a topic from the topic distribution and sample a word from the word distribution of the selected topic. Obviously, this is only an approximation of how documents are created in reality.</p>
<p>To generate an artificial dataset, we can fix the word distribution of each topic and then generate documents as explained above. Since we generated documents by sticking to the generative assumption of the model, if the algorithm is correctly implemented, it should be able to recover the word distribution of each topic, from the generated documents.</p>
<h3>Graphical example</h3>
<p>To gain insight and intuition, we can reuse the graphical example from Griffiths and Steyvers&#8217; paper. </p>
<p>In the <a href="http://en.wikipedia.org/wiki/Bag_of_words_model">bag-of-words model</a>, documents are represented by vectors of dimension <img src='http://s.wordpress.com/latex.php?latex=V&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='V' title='V' class='latex' />, where <img src='http://s.wordpress.com/latex.php?latex=V&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='V' title='V' class='latex' /> is the vocabulary size. Moreover, an image of size <img src='http://s.wordpress.com/latex.php?latex=%5Csqrt%7BV%7D%20%5Ctimes%20%5Csqrt%7BV%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\sqrt{V} \times \sqrt{V}' title='\sqrt{V} \times \sqrt{V}' class='latex' /> has <img src='http://s.wordpress.com/latex.php?latex=V&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='V' title='V' class='latex' /> pixels: it can thus be stored as a string/vector of length/size <img src='http://s.wordpress.com/latex.php?latex=V&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='V' title='V' class='latex' />. This means that a document in the bag-of-words model can be represented as an image, where pixels correspond to words and pixel intensities correspond to word counts!</p>
<p>As put previously, we first need to fix the word distribution of each topic. Let&#8217;s arbitrarily create <strong>10</strong> topics.</p>
<p><strong>5</strong> with &#8220;vertical&#8221; bars:</p>
<table border="0" cellspacing="10">
<tr>
<td><img src="http://www.mblondel.org/images/lda/topic0.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic1.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic2.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic3.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic4.png" style="border:1px solid" /></td>
</tr>
</table>
<p>and another <strong>5</strong> with &#8220;horizontal&#8221; bars:</p>
<table border="0" cellspacing="10">
<tr>
<td><img src="http://www.mblondel.org/images/lda/topic5.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic6.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic7.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic8.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic9.png" style="border:1px solid" /></td>
</tr>
</table>
<p>Each topic distribution is represented by a <strong>5&#215;5</strong> image, so the vocabulary is of size <strong>25</strong>. Black pixels correspond to words that the topic will never possibly generate. White pixels correspond to words that the topic can generate with probability 1/5.</p>
<p>Now let&#8217;s generate <strong>500</strong> documents using the generative process previously described. Here are 3 examples of such generated documents.</p>
<table border="0" cellspacing="10">
<tr>
<td><img src="http://www.mblondel.org/images/lda/doc0.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/doc1.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/doc2.png" style="border:1px solid" /></td>
</tr>
</table>
<p>We clearly see bars emerging from the documents and can thus confirm that documents are mixtures of topics.</p>
<p>We can now use the generated documents as training data. If the Gibbs sampler is correctly implemented, we should be able to recover the original topics. Here are the results for the 1st, 6th and 26th iterations. The number between brackets is the log-likelihood.</p>
<p>1st iteration (-278541.7835):</p>
<table border="0" cellspacing="10">
<tr>
<td><img src="http://www.mblondel.org/images/lda/topic0-0.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic0-1.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic0-2.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic0-3.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic0-4.png" style="border:1px solid" /></td>
</tr>
<tr>
<td><img src="http://www.mblondel.org/images/lda/topic0-5.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic0-6.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic0-7.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic0-8.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic0-9.png" style="border:1px solid" /></td>
</tr>
</table>
<p>5th iteration (-165139.56193):</p>
<table border="0" cellspacing="10">
<tr>
<td><img src="http://www.mblondel.org/images/lda/topic5-0.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic5-1.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic5-2.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic5-3.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic5-4.png" style="border:1px solid" /></td>
</tr>
<tr>
<td><img src="http://www.mblondel.org/images/lda/topic5-5.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic5-6.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic5-7.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic5-8.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic5-9.png" style="border:1px solid" /></td>
</tr>
</table>
<p>[...]</p>
<p>26th iteration (-129272.328181):</p>
<table border="0" cellspacing="10">
<tr>
<td><img src="http://www.mblondel.org/images/lda/topic25-0.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic25-1.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic25-2.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic25-3.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic25-4.png" style="border:1px solid" /></td>
</tr>
<tr>
<td><img src="http://www.mblondel.org/images/lda/topic25-5.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic25-6.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic25-7.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic25-8.png" style="border:1px solid" /></td>
<td><img src="http://www.mblondel.org/images/lda/topic25-9.png" style="border:1px solid" /></td>
</tr>
</table>
<p>After a few iterations, we see that the algorithm recovered the topics correctly. Also, the log-likelihood increases: as the number of iterations increases, it becomes more and more likely that the model generated the data. The fact that it works pretty well is not surprising: the data used were generated by sticking to the model assumptions.</p>
<h3>Gibbs sampling</h3>
<p>The <a href="http://en.wikipedia.org/wiki/Gibbs_sampling">Gibbs sampler</a> used is said to be collapsed: the parameters of interest are not sampled directly. Instead we sample the topic assignments and the parameters can be computed in terms of those.</p>
<p>It is not necessarily obvious from the equation of the full conditional distribution (from which the topic assignments are sampled) but the sampler is naturally sparse: it doesn&#8217;t need to iterate over words with zero-count. This is a nice property, given that sampling algorithms are often considered slow.</p>
<h3>Source code</h3>
<p><a href="http://gist.github.com/542786">http://gist.github.com/542786</a></p>
<p>Fairly readable and compact code but to be considered a toy implementation.</p>
<h3>Useful Resources</h3>
<h4>MCMC</h4>
<p>- &#8220;<a href="http://videolectures.net/mlss09uk_murray_mcmc/">MCMC lecture at MLSS09</a>&#8221; (Iain Murray). Nice for a first general overview and the insights.</p>
<p>- &#8220;Gibbs sampling for the uninitiated&#8221; (Resnik and Hardisty). Nice for a first general overview and the insights.</p>
<p>- &#8220;Pattern Recognition and Machine Learning&#8221; (Bishop), Chapters 8 and 11 on graphical models and sampling methods. Excellent chapters.</p>
<p>- &#8220;<a href="http://users.aims.ac.za/~ioana/">Review Course: Markov Chains and Monte Carlo Methods</a>&#8221; (Cosma and Evers). Very nice free online course and solutions to exercises in Python and R!</p>
<h4>LDA</h4>
<p>- &#8220;Latent Dirichlet Allocation&#8221; (Blei et al, 2003). By Blei himself.</p>
<p>- &#8220;Finding scientific topics&#8221; (Griffiths and Steyvers). Insightful comments and nice intuitive graphical example.</p>
<p>- &#8220;Parameter Estimation for text analysis&#8221; (Heinrich). Very nice introduction to Bayesian thinking. Pseudo-code for the LDA Gibbs sampler.</p>
<p>- &#8220;On an equivalence between PLSI and LDA&#8221; (Girolami and Kaban). Connections between pLSA and LDA.</p>
<p>- &#8220;Integrating Out Multinomial Parameters in Latent Dirichlet Allocation and Naive Bayes for Collapsed Gibbs Sampling&#8221; (Carpenter). Very detailed, step-by-step derivation of the collapsed Gibbs samplers for LDA and NB.</p>
<p>- &#8220;Distributed Gibbs Sampling of Latent Dirichlet Allocation: The Gritty Details&#8221; (Wang). Insightful comments and pseudo-code of the LDA Gibbs sampler.</p>
<h4>Other Python implementations</h4>
<p>- <a href="http://github.com/nrolland/pyLDA">nrolland&#8217;s pyLDA</a>. Works fine but mixes Python-style and Numpy-style.</p>
<p>- <a href="http://github.com/alextp/pylda">alextp&#8217;s pylda</a>. Numpy-style but not tested.</p>
<span class="net_nemein_favourites">0 <a href="http://maemo.org/news/?net_nemein_favourites_execute=fav&net_nemein_favourites_execute_for=425f4f021b6011e0a0e7696c1a6799069906&net_nemein_favourites_url=https://maemo.org/news/favorites//json/fav/midgard_article/425f4f021b6011e0a0e7696c1a6799069906/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-favorite.png" style="border: none;" alt="Add to favourites" title="Add to favourites" /></a>0 <a href="http://maemo.org/news/?net_nemein_favourites_execute=bury&net_nemein_favourites_execute_for=425f4f021b6011e0a0e7696c1a6799069906&net_nemein_favourites_url=https://maemo.org/news/favorites//json/bury/midgard_article/425f4f021b6011e0a0e7696c1a6799069906/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-buried.png" style="border: none;" alt="Bury" title="Bury" /></a></span>]]></description>
            <author>Mathieu Blondel &lt;mblondel@rubyforge.org&gt;</author>
            <category>feed:46b1d6b26651a331cde2ad188d699e0c</category>
            <pubDate>Sat, 21 Aug 2010 20:52:00 +0000</pubDate>
            <guid>http://maemo.org/midcom-permalink-425f4f021b6011e0a0e7696c1a6799069906</guid>
        </item>
        <item>
            <title>Semi-supervised Naive Bayes in Python</title>
            <link>http://www.mblondel.org/journal/2010/06/21/semi-supervised-naive-bayes-in-python/</link>
            <description><![CDATA[
<h3>Expectation-Maximization</h3>
<p>The <a href="http://en.wikipedia.org/wiki/Expectation-maximization_algorithm">Expectation-Maximization</a> (EM) algorithm is a popular algorithm in statistics and machine learning to estimate the parameters of a model that depends on latent variables. (A latent variable is a variable that is not expressed in the dataset and thus that you can&#8217;t directly count. For example, in <a href="http://www.mblondel.org/journal/2010/06/13/lsa-and-plsa-in-python/">pLSA</a>, the document topics z are latent variables.) EM is very intuitive. It works by pretending that we know what we&#8217;re looking for: the model parameters. First, we make an initial guess, which can be either random or &#8220;our best bet&#8221;. Then, in the E-step, we use our current model parameters to estimate some &#8220;measures&#8221;, the ones we would have used to compute the parameters, had they been available to us. In the M-step, we use these measures to compute the model parameters. The beauty of EM is that by iteratively repeating these two steps, the algorithm will provably converge to a local maximum for the likelihood that the model generated the data.<br />
<span id="more-126"></span></p>
<h3>Naive Bayes trained with EM</h3>
<p>In their paper &#8220;Semi-supervised Text Classification Using EM&#8221;, Nigam et al. describe how to use EM to train a Naive Bayes classifier in a <a href="http://en.wikipedia.org/wiki/Semi-supervised_learning">semi-supervised</a> fashion, that is with both labeled and unlabeled data. The algorithm is very intuitive:</p>
<ul>
<li>Train a classifier with your labeled data</li>
<li>While the model likelihood increases:
<ul>
<li>E-step: Use your current classifier to find P(c|x) for all classes c and all unlabeled examples x. These can be thought as probabilistic/fractional labels.</li>
<li>M-step: Train your classifier with the union of your labeled and probabilistically-labeled data.</li>
</ul>
</ul>
<p>The hope is that using (abundantly available) unlabeled data, in addition to (labor-intensive) labeled data, improves the quality of the classifier.</p>
<h3>Code</h3>
<p>I made a simple implementation of it in Python + Numpy. The code is fairly optimized.</p>
<p>$ git clone http://www.mblondel.org/code/seminb.git</p>
<p><a href="http://www.mblondel.org/gitweb?p=seminb.git">web interface</a></p>
<h3>Implementation details</h3>
<p>Here are implementation details that were not mentioned in the original paper and that I found necessary to get a correct implementation.</p>
<p>Naive Bayes is called naive because of the (obviously wrong) assumption that words are conditionally independent given the class:</p>
<img src='http://s.wordpress.com/latex.php?latex=P%28x_i%7Cc_j%29%20%3D%20%5Cprod_%7Bt%7D%5EV%20P%28w_t%7Cc_j%29%5E%7Bx_%7Bit%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=2' alt='P(x_i|c_j) = \prod_{t}^V P(w_t|c_j)^{x_{it}}' title='P(x_i|c_j) = \prod_{t}^V P(w_t|c_j)^{x_{it}}' class='latex' />
<p>However, since the vocabulary size V can be pretty big and the probabilities P(w|c) can be pretty small, P(x|c) can quickly exceed the precision of the computer and become zero. The solution is to perform the computations in the log domain:</p>
<img src='http://s.wordpress.com/latex.php?latex=%5Clog%20P%28x_i%7Cc_j%29%20%3D%20%5Csum_%7Bt%7D%5EV%20x_%7Bit%7D%20%5Clog%20P%28w_t%7Cc_j%29&#038;bg=ffffff&#038;fg=000000&#038;s=2' alt='\log P(x_i|c_j) = \sum_{t}^V x_{it} \log P(w_t|c_j)' title='\log P(x_i|c_j) = \sum_{t}^V x_{it} \log P(w_t|c_j)' class='latex' />
<p>To turn around P(x|c), we use Bayes&#8217;rule:</p>
<img src='http://s.wordpress.com/latex.php?latex=P%28y_i%3Dc_j%7Cx_i%29%20%3D%20%5Cfrac%7BP%28c_j%29P%28x_i%7Cc_j%29%7D%7BP%28x_i%29%7D%20%3D%20%5Cfrac%7BP%28c_j%29P%28x_i%7Cc_j%29%7D%7B%5Csum_k%20P%28c_k%29P%28x_i%7Cc_k%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=2' alt='P(y_i=c_j|x_i) = \frac{P(c_j)P(x_i|c_j)}{P(x_i)} = \frac{P(c_j)P(x_i|c_j)}{\sum_k P(c_k)P(x_i|c_k)}' title='P(y_i=c_j|x_i) = \frac{P(c_j)P(x_i|c_j)}{P(x_i)} = \frac{P(c_j)P(x_i|c_j)}{\sum_k P(c_k)P(x_i|c_k)}' class='latex' />
<p>By posing <img src='http://s.wordpress.com/latex.php?latex=z_j%20%3D%20%5Clog%20P%28c_j%29%20%2B%20%5Clog%20P%28x_i%7Cc_j%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='z_j = \log P(c_j) + \log P(x_i|c_j)' title='z_j = \log P(c_j) + \log P(x_i|c_j)' class='latex' />, we get:</p>
<img src='http://s.wordpress.com/latex.php?latex=P%28y_i%3Dc_j%7Cx_i%29%20%3D%20%5Cfrac%7Be%5E%7Bz_j%7D%7D%7B%5Csum_k%20e%5E%7Bz_k%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=2' alt='P(y_i=c_j|x_i) = \frac{e^{z_j}}{\sum_k e^{z_k}}' title='P(y_i=c_j|x_i) = \frac{e^{z_j}}{\sum_k e^{z_k}}' class='latex' />
<p>This is the <a href="http://en.wikipedia.org/wiki/Softmax_activation_function">softmax</a> function. However, we are back to our initial problem because, since the <img src='http://s.wordpress.com/latex.php?latex=z_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='z_j' title='z_j' class='latex' /> are likely to tend to -inf, the exponentials are likely to in turn underflow. The trick is to multiply the numerator and denominator by the same constant <img src='http://s.wordpress.com/latex.php?latex=e%5E%7B-m%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='e^{-m}' title='e^{-m}' class='latex' />:</p>
<img src='http://s.wordpress.com/latex.php?latex=P%28y_i%3Dc_j%7Cx_i%29%20%3D%20%5Cfrac%7Be%5E%7Bz_j-m%7D%7D%7B%5Csum_k%20e%5E%7Bz_k-m%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=2' alt='P(y_i=c_j|x_i) = \frac{e^{z_j-m}}{\sum_k e^{z_k-m}}' title='P(y_i=c_j|x_i) = \frac{e^{z_j-m}}{\sum_k e^{z_k-m}}' class='latex' />
<p>Setting m to <img src='http://s.wordpress.com/latex.php?latex=max_j~z_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='max_j~z_j' title='max_j~z_j' class='latex' />, the <img src='http://s.wordpress.com/latex.php?latex=z_j-m&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='z_j-m' title='z_j-m' class='latex' /> values will get closer to zero. The rationale for this is that computation of the exponential overflows earlier and is less precise for big values (positive or negative) than for small values.</p>
<p>In case this is not enough, we can further define:</p>
<img src='http://s.wordpress.com/latex.php?latex=P%28y_i%3Dc_j%7Cx_i%29%20%3D%20%5Cbegin%7Bcases%7D%200%2C%20%26%20%5Cmbox%7Bif%20%7D%20%20z_j%20-%20m%20%20%5Cle%20t%20%20%5C%5C%20%5Cfrac%7Be%5E%7Bz_j-m%7D%7D%7B%5Csum_%7B%5C%7Bk~%3A~z_k-m%20%3E%20t%5C%7D%7D%20e%5E%7Bz_k-m%7D%7D%2C%20%20%26%20%5Cmbox%7Botherwise%7D%20%5Cend%7Bcases%7D&#038;bg=ffffff&#038;fg=000000&#038;s=2' alt='P(y_i=c_j|x_i) = \begin{cases} 0, &amp; \mbox{if }  z_j - m  \le t  \\ \frac{e^{z_j-m}}{\sum_{\{k~:~z_k-m &gt; t\}} e^{z_k-m}},  &amp; \mbox{otherwise} \end{cases}' title='P(y_i=c_j|x_i) = \begin{cases} 0, &amp; \mbox{if }  z_j - m  \le t  \\ \frac{e^{z_j-m}}{\sum_{\{k~:~z_k-m &gt; t\}} e^{z_k-m}},  &amp; \mbox{otherwise} \end{cases}' class='latex' />
<p>This truncates the exponentials to zero when <img src='http://s.wordpress.com/latex.php?latex=e%5E%7Bz_j-m%7D%20%5Cle%20e%5Et&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='e^{z_j-m} \le e^t' title='e^{z_j-m} \le e^t' class='latex' />. For t=-10, this corresponds to 0.000045. Equivalently we can see it as setting the exponentials to zero when <img src='http://s.wordpress.com/latex.php?latex=e%5E%7Bz_j%7D%20%5Cle%20e%5E%7Bt%2Bm%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='e^{z_j} \le e^{t+m}' title='e^{z_j} \le e^{t+m}' class='latex' />. Since both t and and m are negative, this shows that subtracting the maximum m, as explained before, does help improving the precision.</p>
<h3>Reference</h3>
<blockquote><p>Kamal Nigam, Andrew McCallum and Tom Mitchell. Semi-supervised Text Classification Using EM. In Chapelle, O., Zien, A., and Scholkopf, B. (Eds.) Semi-Supervised Learning. MIT Press: Boston. 2006.</p></blockquote>
<span class="net_nemein_favourites">0 <a href="http://maemo.org/news/?net_nemein_favourites_execute=fav&net_nemein_favourites_execute_for=40d496b01b6011e0a0e7696c1a6799069906&net_nemein_favourites_url=https://maemo.org/news/favorites//json/fav/midgard_article/40d496b01b6011e0a0e7696c1a6799069906/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-favorite.png" style="border: none;" alt="Add to favourites" title="Add to favourites" /></a>0 <a href="http://maemo.org/news/?net_nemein_favourites_execute=bury&net_nemein_favourites_execute_for=40d496b01b6011e0a0e7696c1a6799069906&net_nemein_favourites_url=https://maemo.org/news/favorites//json/bury/midgard_article/40d496b01b6011e0a0e7696c1a6799069906/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-buried.png" style="border: none;" alt="Bury" title="Bury" /></a></span>]]></description>
            <author>Mathieu Blondel &lt;mblondel@rubyforge.org&gt;</author>
            <category>feed:46b1d6b26651a331cde2ad188d699e0c</category>
            <pubDate>Mon, 21 Jun 2010 17:47:50 +0000</pubDate>
            <guid>http://maemo.org/midcom-permalink-40d496b01b6011e0a0e7696c1a6799069906</guid>
        </item>
        <item>
            <title>LSA and pLSA in Python</title>
            <link>http://www.mblondel.org/journal/2010/06/13/lsa-and-plsa-in-python/</link>
            <description><![CDATA[
<p><a href="http://en.wikipedia.org/wiki/Latent_semantic_analysis">Latent Semantic Analysis</a> (LSA) and its <a href="http://en.wikipedia.org/wiki/Probabilistic_latent_semantic_analysis">probabilistic counterpart</a> pLSA are two well known techniques in Natural Language Processing that aim to analyze the co-occurrences of terms in a corpus of documents in order to find hidden/latent factors, regarded as topics or concepts. Since the number of topics/concepts is usually greatly inferior to the number of words and since it is not necessary to know the document categories/classes, LSA and pLSA are thus <a href="http://en.wikipedia.org/wiki/Unsupervised_learning">unsupervised</a> <a href="http://en.wikipedia.org/wiki/Dimension_reduction">dimensionality reduction</a> techniques. Applications include <a href="http://en.wikipedia.org/wiki/Information_Retrieval">information retrieval</a>, document classification and <a href="http://en.wikipedia.org/wiki/Collaborative_filtering">collaborative filtering</a>.<br />
<span id="more-125"></span><br />
Note: LSA and pLSA are also known in the Information Retrieval community as LSI and pLSI, where I stands for Indexing. </p>
<h3>Comparison</h3>
<table border="1" style="margin-top:10px">
<tr>
<th>&nbsp;</th>
<th>LSA</th>
<th>pLSA</th>
</tr>
<tr>
<td>1. Theoretical background</td>
<td>Linear Algebra</td>
<td>Probabilities and Statistics</td>
</tr>
<tr>
<td>2. Objective function</td>
<td>Frobenius norm</td>
<td>Likelihood function</td>
</tr>
<tr>
<td>3. Polysemy</td>
<td>No</td>
<td>Yes</td>
</tr>
<tr>
<td>4. Folding-in</td>
<td>Straightforward</td>
<td>Complicated</td>
</tr>
</table>
<p>1. LSA stems from Linear Algebra as it is nothing more than a <a href="http://en.wikipedia.org/wiki/Singular_value_decomposition">Singular Value Decomposition</a>. On the other hand, pLSA has a strong probabilistic grounding (latent variable models).</p>
<p>2. SVD is a least squares method (it finds a low-rank matrix approximation that minimizes the <a href="http://en.wikipedia.org/wiki/Matrix_norm">Frobenius norm</a> of the difference with the original matrix). Moreover, as it is well known in Machine Learning, the <a href="http://en.wikipedia.org/wiki/Least_squares">least squares</a> solution corresponds to the <a href="http://en.wikipedia.org/wiki/Maximum_likelihood">Maximum Likelihood</a> solution when experimental errors are gaussian. Therefore, LSA makes an implicit assumption of gaussian noise on the term counts. On the other hand, the objective function maximized in pLSA is the likelihood function of multinomial sampling. </p>
<p>The values in the concept-term matrix found by LSA are not normalized and may even contain negative values. On the other hand, values found by pLSA are probabilities which means they are interpretable and can be combined with other models.</p>
<p>Note: SVD is equivalent to PCA (Principal Component Analysis) when the data is centered (has zero-mean).</p>
<p>3. Both LSA and pLSA can handle synonymy but LSA cannot handle polysemy, as words are defined by a unique point in a space.</p>
<p>4. LSA and pLSA analyze a corpus of documents in order to find a new low-dimensional representation of it. In order to be comparable, new documents that were not originally in the corpus must be projected in the lower-dimensional space too. This is called &#8220;folding-in&#8221;. Clearly, new documents folded-in don&#8217;t contribute to learning the factored representation so it is necessary to rebuild the model using all the documents from time to time. </p>
<p>In LSA, folding-in is as easy as a matrix-vector product. In pLSA, this requires several iterations of the EM algorithm. </p>
<h3>Implementation in Python</h3>
<p>LSA is straightforward to implement as it is nothing more than a SVD and Numpy&#8217;s Linear Algebra module has a function &#8220;svd&#8221; already. This function has an argument full_matrices which when set to False greatly reduces the time required. This argument doesn&#8217;t mean that the SVD is not full, just that the returned matrices don&#8217;t contain vectors corresponding to zero singular values. Scipy&#8217;s Linear Algebra package unfortunately doesn&#8217;t seem to have a sparse SVD. Likewise, there&#8217;s no truncated SVD (there exists fast algorithms to directly compute a truncated SVD rather than computing the full SVD then taking the top K singular values).</p>
<p>pLSA&#8217;s source code is a bit longer although quite compact too. Although the Python/Numpy code was quite optimized, it took half a day to compute on a 50000 x 8000 term-document matrix. I rewrote the training part in C and it now takes half an hour. Keeping the Python version is quite nice for checking the correctness of the C version and as a reference as the C version is a straightforward port of it.</p>
<p>The implementation is sparse. It works with both Numpy&#8217;s ndarrays and Scipy&#8217;s sparse matrices.</p>
<p>$ git clone http://www.mblondel.org/code/plsa.git</p>
<p><a href=" http://www.mblondel.org/gitweb?p=plsa.git">web interface</a></p>
<p>Next, I would like to explore Fisher Kernels as there seems to have nice interactions with pLSA. I would also like to implement Latent Dirichlet Allocation (LDA), although it&#8217;s more challenging. LDA is a Bayesian extension of pLSA : pLSA is equivalent to LDA under a uniform Dirichlet prior distribution.</p>
<span class="net_nemein_favourites">0 <a href="http://maemo.org/news/?net_nemein_favourites_execute=fav&net_nemein_favourites_execute_for=3f3c2f7a1b6011e0a0e7696c1a6799069906&net_nemein_favourites_url=https://maemo.org/news/favorites//json/fav/midgard_article/3f3c2f7a1b6011e0a0e7696c1a6799069906/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-favorite.png" style="border: none;" alt="Add to favourites" title="Add to favourites" /></a>0 <a href="http://maemo.org/news/?net_nemein_favourites_execute=bury&net_nemein_favourites_execute_for=3f3c2f7a1b6011e0a0e7696c1a6799069906&net_nemein_favourites_url=https://maemo.org/news/favorites//json/bury/midgard_article/3f3c2f7a1b6011e0a0e7696c1a6799069906/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-buried.png" style="border: none;" alt="Bury" title="Bury" /></a></span>]]></description>
            <author>Mathieu Blondel &lt;mblondel@rubyforge.org&gt;</author>
            <category>feed:46b1d6b26651a331cde2ad188d699e0c</category>
            <pubDate>Sun, 13 Jun 2010 17:42:58 +0000</pubDate>
            <guid>http://maemo.org/midcom-permalink-3f3c2f7a1b6011e0a0e7696c1a6799069906</guid>
        </item>
        <item>
            <title>Seam Carving in Python</title>
            <link>http://www.mblondel.org/journal/2010/02/09/seam-carving-in-python/</link>
            <description><![CDATA[
<p><a href="http://en.wikipedia.org/wiki/Seam_carving">Seam Carving</a> is an algorithm for image resizing introduced in 2007 by S. Avidan and A. Shamir in their paper &#8220;<a href="http://www.shaiavidan.org/papers/imretFinal.pdf">Seam Carving for Content-Aware Image Resizing</a>&#8220;.</p>
<p><a href="http://www.flickr.com/photos/ippei-janine/2165260667/"><img src="http://www.mblondel.org/images/seam-carving/okinawa.jpg" style="border: 1px solid black" /></a><br />
<em>Miyako Island, Okinawa, Japan.</em><br />
<span id="more-123"></span><br />
The principle is very simple. Find the connected paths of low energy pixels (&#8220;the seams&#8221;). This can be done efficiently by <a href="http://en.wikipedia.org/wiki/Dynamic_programming">dynamic programming</a> (see my post on <a href="http://www.mblondel.org/journal/2009/08/31/dynamic-time-warping-theory/">DTW</a>). </p>
<p><img src="http://www.mblondel.org/images/seam-carving/okinawa_gradient.jpg" /><br />
<em>Same image in the gradient domain showing the vertical and horizontal seams of lowest cumulated energy.</em></p>
<p>The seams of lowest cumulated energy can be seen as the pixels contributing the least to an image. By repeatedly removing or adding seams, it is thus possible to perform &#8220;content-aware&#8221; image reduction or extension. The resulting images feel more natural, less &#8220;streched&#8221;.</p>
<p><img src="http://www.mblondel.org/images/seam-carving/okinawa_good.jpg" style="border: 1px solid black" /><br />
<em>Height reduced by 50% by seam carving.</em></p>
<p><img src="http://www.mblondel.org/images/seam-carving/okinawa_bad.jpg" style="border: 1px solid black"  /><br />
<em>Height reduced by 50% by traditional rescaling.</em></p>
<p>Although seam carving doesn&#8217;t need human intervention, in the original paper, a graphical user interface (GUI) was also developed to let the user define areas that can&#8217;t be removed, or conversely, that must be removed.</p>
<p>In my opinion, seam carving is simple and elegant. No sophisticated object recognition algorithm was used, yet the results are quite impressive.</p>
<p>You can find my implementation in 250 lines of Python in my git repo:</p>
<p>$ git clone http://www.mblondel.org/code/seam-carving.git</p>
<p><a href="http://www.mblondel.org/gitweb?p=seam-carving.git;a=tree">web interface</a></p>
<p>Unfortunately, it&#8217;s too slow to be real-time.</p>
<span class="net_nemein_favourites">0 <a href="http://maemo.org/news/?net_nemein_favourites_execute=fav&net_nemein_favourites_execute_for=3be15b481b6011e0a0e7696c1a6799069906&net_nemein_favourites_url=https://maemo.org/news/favorites//json/fav/midgard_article/3be15b481b6011e0a0e7696c1a6799069906/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-favorite.png" style="border: none;" alt="Add to favourites" title="Add to favourites" /></a>0 <a href="http://maemo.org/news/?net_nemein_favourites_execute=bury&net_nemein_favourites_execute_for=3be15b481b6011e0a0e7696c1a6799069906&net_nemein_favourites_url=https://maemo.org/news/favorites//json/bury/midgard_article/3be15b481b6011e0a0e7696c1a6799069906/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-buried.png" style="border: none;" alt="Bury" title="Bury" /></a></span>]]></description>
            <author>Mathieu Blondel &lt;mblondel@rubyforge.org&gt;</author>
            <category>feed:46b1d6b26651a331cde2ad188d699e0c</category>
            <pubDate>Tue, 09 Feb 2010 15:57:20 +0000</pubDate>
            <guid>http://maemo.org/midcom-permalink-3be15b481b6011e0a0e7696c1a6799069906</guid>
        </item>
        <item>
            <title>Easy parallelization with data decomposition</title>
            <link>http://www.mblondel.org/journal/2009/11/27/easy-parallelization-with-data-decomposition/</link>
            <description><![CDATA[
<p>Recently I came across this <a href="http://gael-varoquaux.info/blog/?p=119">blog post</a> which introduced me to the new <a href="http://docs.python.org/library/multiprocessing.html">multiprocessing</a> module in Python 2.6, a module to execute multiple concurrent processes.  It makes parallelizing your programs very easy. The author also provided a smart code snippet that makes using multiprocessing even easier. I studied how the snippet works and I came up with an alternative solution which is in my opinion very elegant and easy to read. I&#8217;m so excited about the new possibilities provided by this module that I had to spread the word. But first, off to some background.</p>
<p><span id="more-120"></span></p>
<h3>The multi-core trend</h3>
<p>Moore&#8217;s law states that:</p>
<blockquote><p>The density of transistors on chips doubles every 24 month. </p></blockquote>
<p>Although <a href="http://en.wikipedia.org/wiki/Moore%27s_law">Moore&#8217;s law</a>, contrary to what is often thought <a href="http://www.thinkingparallel.com/2007/07/09/moores-law-is-dead-long-live-moores-law/">still holds true</a>, the exponential processor transistor growth predicted by Moore does not always translate into exponentially greater practical computing performance. Therefore parallel computation has recently become necessary to take full advantage of the gains allowed by Moore&#8217;s law. This explains the recent <a href="http://en.wikipedia.org/wiki/Multi-core">multi-core</a> trend: most recent computers are now equipped with 2 or more cores.</p>
<p>The problem is that you can&#8217;t just use multi-core equipped computers and hope that your programs will run faster on them. Programs need be modified to operate in a parallel fashion as opposed to a sequential fashion.</p>
<p>At the same time, languages like Ruby and Python are famous for their GIL (<a href="http://en.wikipedia.org/wiki/Global_Interpreter_Lock">Global Interpreter Lock</a>). Because of the GIL, even programs that are designed to be parallel can effectively use only one core at a time, resulting in no speed improvement. Parallelism here is just an illusion: the processor switches between threads but does so frequently that the user perceive the operations as being performed in parallel. </p>
<p>The novelty of the multiprocessing module in Python 2.6 is that is uses processes instead of threads (see <a href="http://en.wikipedia.org/wiki/Thread_%28computer_science%29#Threads_compared_with_processes">Threads compared with processes</a>) and it does not suffer from the GIL. Programs running on multi-cores can therefore operate in a truly parallel fashion.</p>
<h3>Parallelizing programs</h3>
<p>To make things simpler, let me quote the excellent blog post <a href="http://www.thinkingparallel.com/2007/09/06/how-to-split-a-problem-into-tasks/">How-to Split a Problem into Tasks</a>.</p>
<blockquote><p>
The very first step in every successful parallelization effort is always the same: you take a look at the problem that needs to be solved and start splitting it into tasks that can be computed in parallel. [...] what I am describing here is also called problem decomposition. The goal here is to divide the problem into several smaller subproblems, called tasks that can be computed in parallel later on. The tasks can be of different size and must not necessarily be independent.
</p></blockquote>
<p>And, about data decomposition:</p>
<blockquote><p>
When data structures with large amounts of similar data need to be processed, data decomposition is usually a well-performing decomposition technique. The tasks in this strategy consist of groups of data. These can be either input data, output data or even intermediate data, decompositions to all varieties are possible and may be useful. All processors perform the same operations on these data, which are often independent from one another. This is my favorite decomposition technique, because it is usually easy to do, often has no dependencies in between tasks and scales really well.
</p></blockquote>
<p>Data decomposition is so straightforward that it can without any doubt be called <a href="http://en.wikipedia.org/wiki/Embarrassingly_parallel">embarrassingly parallel</a>.</p>
<h3>Map</h3>
<p>If you are a Python user, you most probably know <a href="http://en.wikipedia.org/wiki/List_comprehension">list comprehensions</a>:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> <span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">math</span> <span style="color: #ff7700;font-weight:bold;">import</span> sqrt
<span style="color: #66cc66;">&gt;&gt;&gt;</span> <span style="color: black;">&#91;</span>sqrt<span style="color: black;">&#40;</span>i<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">4</span>, <span style="color: #ff4500;">9</span>, <span style="color: #ff4500;">16</span><span style="color: black;">&#93;</span><span style="color: black;">&#93;</span> 
<span style="color: black;">&#91;</span><span style="color: #ff4500;">1.0</span>, <span style="color: #ff4500;">2.0</span>, <span style="color: #ff4500;">3.0</span>, <span style="color: #ff4500;">4.0</span><span style="color: black;">&#93;</span></pre></div></div>

<p>In this example, sqrt is applied to each element of the list and a list is returned. The resulting list and the input list are therefore the same size.</p>
<p>Probably less known are generator comprehensions, which can be written by replacing the outer brackets with parentheses:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> gen = <span style="color: black;">&#40;</span>sqrt<span style="color: black;">&#40;</span>i<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">4</span>, <span style="color: #ff4500;">9</span>, <span style="color: #ff4500;">16</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span> 
<span style="color: #66cc66;">&lt;</span>generator <span style="color: #008000;">object</span> at 0xb7cec56c<span style="color: #66cc66;">&gt;</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> gen: <span style="color: #ff7700;font-weight:bold;">print</span> i
<span style="color: #ff4500;">1.0</span>
<span style="color: #ff4500;">2.0</span>
<span style="color: #ff4500;">3.0</span>
<span style="color: #ff4500;">4.0</span></pre></div></div>

<p>The difference between list and generator comprehensions is that list comprehensions are evaluated entirely before returning, while generator comprehensions yield results one by one. Generators are therefore more &#8220;lazy&#8221; and can results in big memory savings when iterating over large lists.</p>
<p>The outer parentheses can even be omitted when calling functions with only 1 argument:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> <span style="color: #ff7700;font-weight:bold;">print</span><span style="color: black;">&#40;</span>sqrt<span style="color: black;">&#40;</span>i<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">4</span>, <span style="color: #ff4500;">9</span>, <span style="color: #ff4500;">16</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span> 
<span style="color: #66cc66;">&lt;</span>generator <span style="color: #008000;">object</span> at 0xb7cec68c<span style="color: #66cc66;">&gt;</span></pre></div></div>

<p>For those more familiar with functional programming, list comprehensions are similar to the <a href="http://en.wikipedia.org/wiki/Map_%28higher-order_function%29">map</a> higher-order function.</p>

<div class="wp_syntax"><div class="code"><pre class="scheme" style="font-family:monospace;">&nbsp;</pre></div></div>

<p>In fact, Python has a built-in map function too.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> <span style="color: #008000;">map</span><span style="color: black;">&#40;</span>sqrt, <span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">4</span>, <span style="color: #ff4500;">9</span>, <span style="color: #ff4500;">16</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
<span style="color: black;">&#91;</span><span style="color: #ff4500;">1.0</span>, <span style="color: #ff4500;">2.0</span>, <span style="color: #ff4500;">3.0</span>, <span style="color: #ff4500;">4.0</span><span style="color: black;">&#93;</span></pre></div></div>

<h3>Reduce</h3>
<p>While map applies a function to each element of a list and returns the resulting list, <a href="http://en.wikipedia.org/wiki/Fold_%28higher-order_function%29">reduce</a> is a higher-order function that uses another function to combine the elements of a list in some way.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> plus = concatenate = <span style="color: #ff7700;font-weight:bold;">lambda</span> x,y: x+y
<span style="color: #66cc66;">&gt;&gt;&gt;</span> <span style="color: #008000;">reduce</span><span style="color: black;">&#40;</span>plus, <span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span>,<span style="color: #ff4500;">2</span>,<span style="color: #ff4500;">3</span>,<span style="color: #ff4500;">4</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
<span style="color: #ff4500;">10</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> <span style="color: #008000;">reduce</span><span style="color: black;">&#40;</span>concatenate, <span style="color: black;">&#91;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span>,<span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span>, <span style="color: black;">&#91;</span><span style="color: #ff4500;">3</span>,<span style="color: #ff4500;">4</span><span style="color: black;">&#93;</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">2</span>, <span style="color: #ff4500;">3</span>, <span style="color: #ff4500;">4</span><span style="color: black;">&#93;</span></pre></div></div>

<h3>multiprocessing.Pool &#8216;s map</h3>
<p>Applying a function to each element of a list with map kind of assumes that the function is <a href="http://en.wikipedia.org/wiki/Pure_function">pure</a>, i.e. that the result output by the function is only function of its input arguments. Although nothing prevents you from giving an impure function as argument to map, it is dirty, potentially dangerous and not the functional philosophy. Concretely, it means that you&#8217;d better not use global variables or anything the state of which may be changed during the program execution, in your functions. This thus also includes instance methods (an object, in essence, encapsulates a state).</p>
<p>To reuse the terminology above, if we think of applying our function to each element of the list as tasks, then our tasks are independent from each other and so there&#8217;s is no reason to operate over the list sequentially. Independence is also very nice because communication and collaboration between threads/processes happen to be one of the most difficult aspect of concurrent programming. Here, no communication between threads/processes is required.</p>
<p>And here comes the new multiprocessing module and more particularly its Pool class. This class represents pools of worker processes and has a map method, which is similar to the map built-in function.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> <span style="color: #ff7700;font-weight:bold;">from</span> multiprocessing <span style="color: #ff7700;font-weight:bold;">import</span> Pool
<span style="color: #66cc66;">&gt;&gt;&gt;</span> pool = Pool<span style="color: black;">&#40;</span>processes=<span style="color: #ff4500;">4</span><span style="color: black;">&#41;</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> pool.<span style="color: #008000;">map</span><span style="color: black;">&#40;</span>sqrt, <span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span>,<span style="color: #ff4500;">4</span>,<span style="color: #ff4500;">9</span>,<span style="color: #ff4500;">16</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
<span style="color: black;">&#91;</span><span style="color: #ff4500;">1.0</span>, <span style="color: #ff4500;">2.0</span>, <span style="color: #ff4500;">3.0</span>, <span style="color: #ff4500;">4.0</span><span style="color: black;">&#93;</span></pre></div></div>

<p>The difference with the built-in map here is that 4 processes are used. This will result in about a 4x speedup if the computer running the program has at least 4 cores. Of course, sqrt is a toy example but here&#8217;s a real-life example in a Machine Learning context.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> image_sets = <span style="color: black;">&#91;</span>set1, ..., setn<span style="color: black;">&#93;</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> preprocessed = pool.<span style="color: #008000;">map</span><span style="color: black;">&#40;</span>preprocess, images_sets<span style="color: black;">&#41;</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> feat_sets = pool.<span style="color: #008000;">map</span><span style="color: black;">&#40;</span>feat_extract, preprocessed<span style="color: black;">&#41;</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> models = pool.<span style="color: #008000;">map</span><span style="color: black;">&#40;</span>train, feat_sets<span style="color: black;">&#41;</span></pre></div></div>

<p>As long as you can write your code as list comprehensions, you can apply the data decomposition approach. It&#8217;s easy, abuse it!</p>
<p>However, spawning a process has a cost because of context switching. Therefore, when the function to be applied on each element returns quasi instantaneously, it may be worth splitting the data into larger chunks, run each chunk in a separate process and then recombine the results with reduce. (See also <a href="http://en.wikipedia.org/wiki/MapReduce">MapReduce</a>)</p>
<h3>Helpers</h3>
<p>Here are some helpers which make parallelizing your list comprehensions even more straightforward and easy to read.</p>
<p>As mentioned before, the <a href="http://gael-varoquaux.info/blog/?p=119">blog post</a> that introduced me to this new multiprocessing module also came with a smart code snippet. I reworked it to fit my liking and this is what it looks like now:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> sqrtd = delayed<span style="color: black;">&#40;</span>sqrt<span style="color: black;">&#41;</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> powd = delayed<span style="color: black;">&#40;</span><span style="color: #008000;">pow</span><span style="color: black;">&#41;</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> squares = <span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">4</span>, <span style="color: #ff4500;">9</span>, <span style="color: #ff4500;">16</span><span style="color: black;">&#93;</span>
&nbsp;
<span style="color: #66cc66;">&gt;&gt;&gt;</span> pool_parallelize<span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>sqrtd<span style="color: black;">&#40;</span>i<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> squares<span style="color: black;">&#93;</span>, njobs=<span style="color: #ff4500;">4</span><span style="color: black;">&#41;</span>
<span style="color: black;">&#91;</span><span style="color: #ff4500;">1.0</span>, <span style="color: #ff4500;">2.0</span>, <span style="color: #ff4500;">3.0</span>, <span style="color: #ff4500;">4.0</span><span style="color: black;">&#93;</span>
&nbsp;
<span style="color: #66cc66;">&gt;&gt;&gt;</span> pool_parallelize<span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>powd<span style="color: black;">&#40;</span>i, <span style="color: #ff4500;">0.5</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> squares<span style="color: black;">&#93;</span>, njobs=<span style="color: #ff4500;">4</span><span style="color: black;">&#41;</span>
<span style="color: black;">&#91;</span><span style="color: #ff4500;">1.0</span>, <span style="color: #ff4500;">2.0</span>, <span style="color: #ff4500;">3.0</span>, <span style="color: #ff4500;">4.0</span><span style="color: black;">&#93;</span></pre></div></div>

<p>Contrary to Pool&#8217;s map, this supports parallelizing functions of any arity.</p>
<p>Then I came up with this solution, which reduces the typing and is quite elegant.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> sqrtp = parallelized<span style="color: black;">&#40;</span>sqrt<span style="color: black;">&#41;</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> powp = parallelized<span style="color: black;">&#40;</span><span style="color: #008000;">pow</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #66cc66;">&gt;&gt;&gt;</span> sqrtp<span style="color: black;">&#40;</span>squares, njobs=<span style="color: #ff4500;">4</span><span style="color: black;">&#41;</span>
<span style="color: black;">&#91;</span><span style="color: #ff4500;">1.0</span>, <span style="color: #ff4500;">2.0</span>, <span style="color: #ff4500;">3.0</span>, <span style="color: #ff4500;">4.0</span><span style="color: black;">&#93;</span>
&nbsp;
<span style="color: #66cc66;">&gt;&gt;&gt;</span> powp<span style="color: black;">&#40;</span><span style="color: black;">&#91;</span><span style="color: black;">&#40;</span>i, <span style="color: #ff4500;">0.5</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> squares<span style="color: black;">&#93;</span>, njobs=<span style="color: #ff4500;">4</span><span style="color: black;">&#41;</span>
<span style="color: black;">&#91;</span><span style="color: #ff4500;">1.0</span>, <span style="color: #ff4500;">2.0</span>, <span style="color: #ff4500;">3.0</span>, <span style="color: #ff4500;">4.0</span><span style="color: black;">&#93;</span></pre></div></div>

<p>Code available <strong><a href="http://www.mblondel.org/files/parallel.py.txt">here</a></strong>.</p>
<h3>Conclusion</h3>
<p>You really gotta love functional programming in Python! (By the way, see also Charming Python: Functional programming in Python <a href="http://www.ibm.com/developerworks/library/l-prog.html">part 1</a> and <a href="http://www.ibm.com/developerworks/library/l-prog2.html">part 2</a>)</p>
<span class="net_nemein_favourites">0 <a href="http://maemo.org/news/?net_nemein_favourites_execute=fav&net_nemein_favourites_execute_for=efb7588c268c11e2a45d8963ed4b32ac32ac&net_nemein_favourites_url=https://maemo.org/news/favorites//json/fav/midgard_article/efb7588c268c11e2a45d8963ed4b32ac32ac/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-favorite.png" style="border: none;" alt="Add to favourites" title="Add to favourites" /></a>0 <a href="http://maemo.org/news/?net_nemein_favourites_execute=bury&net_nemein_favourites_execute_for=efb7588c268c11e2a45d8963ed4b32ac32ac&net_nemein_favourites_url=https://maemo.org/news/favorites//json/bury/midgard_article/efb7588c268c11e2a45d8963ed4b32ac32ac/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-buried.png" style="border: none;" alt="Bury" title="Bury" /></a></span>]]></description>
            <author>Mathieu Blondel &lt;mblondel@rubyforge.org&gt;</author>
            <category>feed:46b1d6b26651a331cde2ad188d699e0c</category>
            <pubDate>Fri, 27 Nov 2009 18:17:28 +0000</pubDate>
            <guid>http://maemo.org/midcom-permalink-efb7588c268c11e2a45d8963ed4b32ac32ac</guid>
        </item>
        <item>
            <title>First release of Tegaki</title>
            <link>http://www.mblondel.org/journal/2009/02/11/first-release-of-tegaki/</link>
            <description><![CDATA[
<p>Today I&#8217;m releasing Tegaki 0.1. Tegaki is an ongoing project which aims to develop a free and open-source modern implementation of handwriting recognition software, that is suitable for both the desktop and mobile devices, and that is designed from the ground up to work well with Chinese and Japanese.</p>
<p><img src="http://tegaki.sourceforge.net/tegaki.png" /></p>
<p><strong>Screencast video</strong>: <a href="http://tegaki.sourceforge.net/tegaki.ogv">ogg</a> or <a href="http://www.youtube.com/watch?v=FkheUhvRjsc">youtube</a>.</p>
<p>This release features desktop and <a href="http://www.scim-im.org">SCIM</a> integration. However, the main &#8220;innovation&#8221; brought to you by this release is the user interface. It uses two drawing areas for continuous writing. The user can eventually fix recognition errors by choosing alternative candidates or editing characters. Since a video is worth a thousand words, see the screencast above. This interface is largely inspired from the Nintendo DS game &#8220;Kanji Sono Mama Rakubiki Jiten&#8221; (漢字そのまま楽引辞典).</p>
<p>Tegaki is designed to be able to use several recognition engines. However so far it only supports <a href="http://zinnia.sourceforge.net">Zinnia</a>, which is the only recognition engine that I know with acceptable recognition accuracy and good performance on mobile devices. One challenge of the project in the future will be to create a new recognition engine that can yield better results than Zinnia.</p>
<p>A take that I have on this project is to use Python whenever this is possible and only use C or C++ when performance is critical, like in recognition engines. Compared to <a href="http://tomoe.sourceforge.jp">Tomoe</a>, which implements everything in C and provides bindings for several languages, this means less reusability of the components but I hope this will make the project go forward faster.</p>
<p>There are still a lot of things that can be done in various areas but I really wanted to release the code I&#8217;ve put together so far because I think it can already be useful to end-users. By the way, Maemo supports both pygtk and SCIM through third-party projects, thus Tegaki is just a few Debian packages away from being available on Maemo.</p>
<p>For further details:<br />
<a href="http://tegaki.sourceforge.net/">http://tegaki.sourceforge.net/</a>
</p>
<span class="net_nemein_favourites">8 <a href="http://maemo.org/news/?net_nemein_favourites_execute=fav&net_nemein_favourites_execute_for=1248426ef89011dd9c5f419bdb2cb23bb23b&net_nemein_favourites_url=https://maemo.org/news/favorites//json/fav/midgard_article/1248426ef89011dd9c5f419bdb2cb23bb23b/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-favorite.png" style="border: none;" alt="Add to favourites" title="Add to favourites" /></a>0 <a href="http://maemo.org/news/?net_nemein_favourites_execute=bury&net_nemein_favourites_execute_for=1248426ef89011dd9c5f419bdb2cb23bb23b&net_nemein_favourites_url=https://maemo.org/news/favorites//json/bury/midgard_article/1248426ef89011dd9c5f419bdb2cb23bb23b/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-buried.png" style="border: none;" alt="Bury" title="Bury" /></a></span>]]></description>
            <author>Mathieu Blondel &lt;mblondel@rubyforge.org&gt;</author>
            <category>feed:46b1d6b26651a331cde2ad188d699e0c</category>
            <pubDate>Wed, 11 Feb 2009 22:33:25 +0000</pubDate>
            <guid>http://maemo.org/midcom-permalink-1248426ef89011dd9c5f419bdb2cb23bb23b</guid>
        </item>
        <item>
            <title>Linux in a Virtual Machine</title>
            <link>http://www.mblondel.org/journal/2008/12/26/linux-in-a-virtual-machine/</link>
            <description><![CDATA[
<p>I own a Macbook on which I&#8217;ve been running Linux 99% of the time for over a year now. Although a Macbook is not necessarily the best choice to run Linux, I made that decision because installing Linux on a Macbook is very well documented. However, as far as you can get, it&#8217;s always difficult to get a configuration you are 100% happy with (no subwoofer support, flaky suspend&#8230;). With recent advances in virtualization technologies, both in software and hardware, I&#8217;ve been willing to test running Linux and Windows (the guest OSes) inside Mac OS X (the host OS).</p>
<p><a id="more-97"></a></p>
<h3>Feature set</h3>
<p>Here are some of the features you can expect from virtualization software.</p>
<p>- <strong>Near-native speed</strong>: thanks to recent hardware support for virtualization, it&#8217;s now possible to run a x86-based guest OS inside a host x86 OS at considerably reduced speed penalty. As a result, the guest OS is perfectly comfortable to use. See <a href="http://en.wikipedia.org/wiki/X86_virtualization">X86 virtualization</a>.</p>
<p>- <strong>Hardware emulation</strong>: in order to run unmodified guest OSes, virtual machines emulate hardware for which most of the OSes have drivers. As a result, it&#8217;s very easy to install the guest OS because the hardware emulated is standard. The virtual machine can be seen has an abstraction layer between the actual hardware and the guest OS.</p>
<p>- <strong>Growing-size hard disk</strong>:  while it&#8217;s usually possible to boot directly from a disk partition, most of the virtualization softwares support growing-size file disks. Those files contain the contents of the emulated hard disk and are able to grow dynamically in size. If you allocated 32 GB to your virtual hard disk but it only contains 8 GB so far, the guest OS will see a 32 GB disk available but it will only take 8 GB on the actual hard disk.</p>
<p>Personally, I have a partition for Windows on my Macbook but I used it only once to release <a href="http://projects.gnome.org/fantasdic/">Fantasdic</a> for Windows. With their dynamically growing file disk, virtual machines would have allowed me to gain a lot of space. Some softwares like QEMU support compressed disk files, to gain even more space.</p>
<p>- <strong>NAT</strong>: this allows to transfer network traffic from and to the virtual machine. Whether you&#8217;re connected to Internet through Ethernet, Wifi or 3G, for the guest OS, this is the same. It only sees an Ethernet network card and you can connect to Internet from the guest OS transparently. However, this also has some drawbacks because you wouldn&#8217;t be able to run software that directly needs the Wifi card (e.g. network scanner).</p>
<p>- <strong>Bridge networking</strong> or <strong>Host interface networking</strong>: this allows your virtual machines to have their own IP address in your local network. If you install the Apache HTTP server in your Linux virtual machine, you can access it from Mac OS X or any other computer on the local network. This also allows me to ssh into my Nokia internet tablet or copy files to it.</p>
<p>Although most virtualization software have built-in convenience support for <strong>shared folders</strong>, bridge networking allows you to share folders across virtual machines and even real machines over Samba or SSH.</p>
<p>- <strong>USB</strong>: if you plug USB devices in your computer, they can be redirected at will to the virtual machine.</p>
<p>- <strong>CD Rom</strong>: you can redirect your real CD Rom to the virtual machine or select an .iso image and it will be seen as a CD Rom from the virtual machine.</p>
<p>- <strong>Seamless integration</strong> in the host OS: seamless mouse pointer integration, copy-paste between host and guest OS&#8230;</p>
<p>- <strong>3D acceleration</strong>: although I don&#8217;t play games on computers, this may be of interest for some people.</p>
<h3>Parallels Desktop and Vmware Fusion</h3>
<p>The first thing I did was to try out the trial version of commercial products Parallels Desktop and Vmware Fusion. Both work well, are easy to use, are highly-integrated in Mac OS X and share a fair amount of features. Both cost $79.99. I didn&#8217;t test them in full depth but from what I saw, it would be difficult for me to choose one between those two if had to.</p>
<p>I digress but I was the witness of a shameless commercial practice at Parallels Desktop. Since my browser is in French, it was automatically redirected to the French version of Parallels Desktop&#8217;s webiste and the software was priced 79.99 euros. However, if you go to the English version, the software costs 79.99 dollars. That is 56.98 euros&#8230; Is that the cost of localization?!</p>
<h3>QEMU</h3>
<p>I got all excited about virtualization but not to the point of paying this prohibitive price so I started to have a look at open-source solutions. I tried out Q, which is a Mac OS X port of QEMU. QEMU does more than Parallels Desktop and Vmware Fusion. It allows to run a guest OS for a processor architecture inside a host OS with another processor architecture. It&#8217;s thus also a processor emulator.</p>
<p>Although it worked well, Q was terribly slow. While looking for acceleration extensions for QEMU, I found QVM86 but Wikipedia mentions that the developer ceased development when VirtualBox was released so I quickly switched to trying out VirtualBox.</p>
<h3>VirtualBox</h3>
<p>VirtualBox was THE pleasant surprise of all the software I tested. It&#8217;s free, it&#8217;s open-source, it&#8217;s easy to use, it&#8217;s well integrated in Mac OS X. Of all the feature set mentioned above, VirtualBox has everything. I&#8217;m simply amazed by this software.</p>
<p>A limited set of components such as USB support are closed-source. This means that if you install VirtualBox on a Linux host using the distribution package, you get the open-source edition, which misses some features. However, it is always possible to download the binaries from the official website and install them by hand.</p>
<p>VirtualBox was originally developed by German company innotek. However, the company has been acquired by Sun in February 2008. Also note that VirtualBox includes code from QEMU.</p>
<h3>Maemo development</h3>
<p>I wanted to check whether VirtualBox could be used for <a href="http://www.maemo.org">Maemo</a> development or not. Since the Maemo SDK itself relies on virtualization software (QEMU) to emulate the ARMEL architecture, my concern was that running virtualization software inside virtualization software may not work well or be too slow.</p>
<p>I installed Scratchbox and the Maemo SDK inside my Linux virtual machine and it turned out that my concerns were not justified. The SDK works well and is very responsive, even when running graphical applications with Xephyr. </p>
<p>As mentioned before, Bridge networking allowed me to ssh into my tablet and copy files to it with scp, from the virtual machine. While the USB is correctly redirected to the virtual machine when the tablet is in USB mass storage mode, I was unable to get USB networking working because Mac OS X intercepts/detects the network connection.</p>
<p>I think VirtualBox is a good solution for Mac OS X users willing to develop Maemo applications.
</p>
<span class="net_nemein_favourites">1 <a href="http://maemo.org/news/?net_nemein_favourites_execute=fav&net_nemein_favourites_execute_for=02e41a7cd35211ddaad9358fef02c8d3c8d3&net_nemein_favourites_url=https://maemo.org/news/favorites//json/fav/midgard_article/02e41a7cd35211ddaad9358fef02c8d3c8d3/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-favorite.png" style="border: none;" alt="Add to favourites" title="Add to favourites" /></a>1 <a href="http://maemo.org/news/?net_nemein_favourites_execute=bury&net_nemein_favourites_execute_for=02e41a7cd35211ddaad9358fef02c8d3c8d3&net_nemein_favourites_url=https://maemo.org/news/favorites//json/bury/midgard_article/02e41a7cd35211ddaad9358fef02c8d3c8d3/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-buried.png" style="border: none;" alt="Bury" title="Bury" /></a></span>]]></description>
            <author>Mathieu Blondel &lt;mblondel@rubyforge.org&gt;</author>
            <category>feed:46b1d6b26651a331cde2ad188d699e0c</category>
            <pubDate>Fri, 26 Dec 2008 12:39:14 +0000</pubDate>
            <guid>http://maemo.org/midcom-permalink-02e41a7cd35211ddaad9358fef02c8d3c8d3</guid>
        </item>
        <item>
            <title>Zinnia</title>
            <link>http://www.mblondel.org/journal/2008/09/20/zinnia/</link>
            <description><![CDATA[
<p>In my <a href="http://www.mblondel.org/journal/2008/09/09/hccr-using-ajax-and-svm/">last post</a>, I was writing about this impressive Chinese character recognition demo using AJAX on the client side and Support Vector Machines (SVM) on the server side, for the recognition process. Well, I don&#8217;t know if it&#8217;s just a coincidence (this demo was from 2 years ago) but Taku Kudo released last week the backend he&#8217;s using as free software. Needless to say that this was awesome news for me! I know the basic principle of SVM but time to learn more about it I guess&#8230;</p>
<p>His project, called <a href="http://zinnia.sourceforge.net/">Zinnia</a>, has been rewritten from scratch to be more flexible and reusable. Models for Japanese and Chinese are included but models for other languages can be built easily provided that you have training data. I&#8217;m pretty sure that this package could also be useful for Gesture Recognition because it&#8217;s so close to Handwriting Recognition&#8230;</p>
<p>For the sake of comparison, I wanted to evaluate how Zinnia performs compared to both Tomoe and my own HMM experiment. I used the same evaluation corpus as I wrote about in earlier posts, that is two sets of 50 kanjis written by a Japanese friend of mine and me. The characters have the correct stroke order and were drawn carefully. Therefore, the results below indicate how the different recognizers perform in ideal conditions and don&#8217;t indicate how robust they would be in more difficult conditions.</p>
<h3>Tomoe - Zinnia</h3>
<p></p>
<table border="1">
<tr>
<th>&nbsp;</th>
<th>Tomoe</th>
<th>Zinnia</th>
</tr>
<tr>
<td>1st match accuracy</td>
<td>61%</td>
<td>77%</td>
</tr>
<tr>
<td>5 matches accuracy</td>
<td>74%</td>
<td>92%</td>
</tr>
<tr>
<tr>
<td>10 matches accuracy</td>
<td>74%</td>
<td>93%</td>
</tr>
<tr>
<td>Recognition time</td>
<td>21 / 100 = 0.21 s</td>
<td>3 / 100 = 0.03 s</td>
</tr>
<tr>
<td>Total number of kanji</td>
<td>3000</td>
<td>3000</td>
</tr>
</table>
<p>1st match accuracy is the percentage of characters that were recognized as first match.<br />
5 matches accuracy is the percentage of characters that were recognized in the first 5 matches.</p>
<p>You can download my evaluation script for Zinnia <a href="http://www.mblondel.org/files/zinnia-eval.tar.gz">here</a>. Tomoe&#8217;s evaluation script is sitting in Tomoe&#8217; SVN, in the benchmark/ folder.</p>
<p>A few remarks:<br />
- Zinnia is notably better than Tomoe in terms of accuracy<br />
- Zinnia is about 7 times faster than Tomoe, making it a good candidate for an embedded platform<br />
- In both cases, 5 matches and 10 matches accuracy are about the same, meaning that it would be enough for the user interface to display the first 5 matches only.</p>
<h3> Project Tegaki - Zinnia</h3>
<p>Due to lack of training data, my personal HMM experiment (project Tegaki) was only conducted over a set of 50 characters. However, Zinnia supports over 3000 characters.  For fair comparison, I thus created new models for Zinnia using the same training data as I used for my experiment.</p>
<p>Zinnia was trained with only one sample per character, using the same data as Tomoe, which is template-based. While SVM seems to be able to cope with only one sample per character, it&#8217;s a little bit more complicated to do that with HMM because of the need to find the parameters of the Observation Probability Density Function (e.g. mean and variance for a Gaussian).<br />
</p>
<table border="1">
<tr>
<th>&nbsp;</th>
<th>Project Tegaki</th>
<th>Zinnia</th>
</tr>
<tr>
<td>1st match accuracy</td>
<td>92%</td>
<td>100%</td>
</tr>
<tr>
<td>5 matches accuracy</td>
<td>100%</td>
<td>100%</td>
</tr>
<tr>
<tr>
<td>10 matches accuracy</td>
<td>100%</td>
<td>100%</td>
</tr>
<tr>
<td>Recognition time</td>
<td>14 / 100 = 0.14 s</td>
<td>1.50 / 100 = 0.015 s</td>
</tr>
<tr>
<td>Total number of kanji</td>
<td>50</td>
<td>50</td>
</tr>
</table>
<p>A few remarks:<br />
- My experiment is slow, which is probably due to the fact that I&#8217;m using Character-level models. Stroke-level models are known to scale much better.<br />
- My experiment has slightly worse accuracy, which is probably because I&#8217;m only using two features per observation.</p>
<h3>Handwriting database</h3>
<p>If you follow my <a href="http://www.mblondel.org/journal/category/handwriting-recognition/">adventures</a> in the world of handwritten Chinese character recognition, you probably know that I&#8217;m planning to create a handwriting database website. This database will aim to 1) make it easy and attractive for people to contribute their handwriting samples and 2) make it easy for the database staff to manage and organize what is supposed to become a large collection of handwriting samples.</p>
<p>The database will use a client/server architecture. So far I&#8217;m thinking of four important clients:<br />
- A client that people will be able to use directly in their web browser, using my <a href="http://www.mblondel.org/journal/2008/08/01/web-canvas/">web canvas</a><br />
- A client for the Maemo platform<br />
- A client for the Iphone<br />
- A multi-platform client for the Destkop</p>
<p>A client of slightly lesser priority would be a Facebook application.</p>
<p>The handwriting samples collected will be distributed in free software license. For projects like Zinnia or Project Tegaki, this will mean more training data and more means to evaluate the performance. I consider this database one of my priorities among my free software projects but it&#8217;s going to be quite hard for me to find time for that before December&#8230; </p>
<h3>Contribute</h3>
<p>As always, more people are welcome to contribute. </p>
<p>To download the source code of my work, </p>
<p>$ git clone http://www.mblondel.org/code/hwr.git</p>
<p><a href="http://www.mblondel.org/gitweb?p=hwr.git">web interface</a>
</p>
<span class="net_nemein_favourites">0 <a href="http://maemo.org/news/?net_nemein_favourites_execute=fav&net_nemein_favourites_execute_for=26177ea486ee11dda7ef552dc5602c7a2c7a&net_nemein_favourites_url=https://maemo.org/news/favorites//json/fav/midgard_article/26177ea486ee11dda7ef552dc5602c7a2c7a/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-favorite.png" style="border: none;" alt="Add to favourites" title="Add to favourites" /></a>1 <a href="http://maemo.org/news/?net_nemein_favourites_execute=bury&net_nemein_favourites_execute_for=26177ea486ee11dda7ef552dc5602c7a2c7a&net_nemein_favourites_url=https://maemo.org/news/favorites//json/bury/midgard_article/26177ea486ee11dda7ef552dc5602c7a2c7a/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-buried.png" style="border: none;" alt="Bury" title="Bury" /></a></span>]]></description>
            <author>Mathieu Blondel &lt;mblondel@rubyforge.org&gt;</author>
            <category>feed:46b1d6b26651a331cde2ad188d699e0c</category>
            <pubDate>Sat, 20 Sep 2008 07:55:45 +0000</pubDate>
            <guid>http://maemo.org/midcom-permalink-26177ea486ee11dda7ef552dc5602c7a2c7a</guid>
        </item>
        <item>
            <title>Web Canvas</title>
            <link>http://www.mblondel.org/journal/2008/08/01/web-canvas/</link>
            <description><![CDATA[
<p>In my <a href="http://www.mblondel.org/journal/2008/07/13/handwriting-renderers/">last post</a>, I was calling for contributors to write a web canvas using the &lt;canvas&gt; tag. If you don&#8217;t know it, &lt;canvas&gt; is a new tag specified in HTML5 which allows you to draw using a Javascript API. It is already supported in Firefox, Opera, Safari and is supported in Internet Explorer through a third-party Javascript. </p>
<p>Since nobody responded to my call (sic), I decided to tackle it by myself. It turns out that it was a nice little project. The canvas Javascript API is very similar to the cairo API so it was easy to use. I also improved my level in Javascript a lot. So far the web canvas supports draw, import (JSON), export (XML), save as an image and replay (stroke by stroke animation).</p>
<p>You can try it by using the online <strong><a href="http://www.mblondel.org/files/tegaki-webcanvas/test.html">DEMO</a></strong>.</p>
<p>What can it be useful for?</p>
<p>- I&#8217;m planning to use it for the handwriting database website that I wrote about some time ago. While it will be possible to contribute your handwriting using a pygtk client (Desktop or Maemo), you will also be able to contribute your handwriting using your browser directly. Not having to install any program should help increase the number of people contributing their handwriting.</p>
<p>- A second way of using it would be to do handwriting recognition directly in the browser. For example, one could install <a href="http://tomoe.sourceforge.jp/">Tomoe</a> (or my recognizer when it&#8217;s ready ;-)) on the server side and the web canvas on the client side. Since Tomoe has Python and Ruby bindings, this is fairly easy!</p>
<p>You can reuse the web canvas for your own projects if you like but I would appreciate if you could send me any feature improvement. In particular, the web canvas has a bug under Internet Explorer that I couldn&#8217;t figure out&#8230;</p>
<p><strong>Source code</strong> (GPL) : <a href="http://www.mblondel.org/gitweb?p=hwr.git;a=tree">gitweb</a>
</p>
<span class="net_nemein_favourites">6 <a href="http://maemo.org/news/?net_nemein_favourites_execute=fav&net_nemein_favourites_execute_for=9bcaa71e5fd611ddb8b10b44f0f1aea4aea4&net_nemein_favourites_url=https://maemo.org/news/favorites//json/fav/midgard_article/9bcaa71e5fd611ddb8b10b44f0f1aea4aea4/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-favorite.png" style="border: none;" alt="Add to favourites" title="Add to favourites" /></a>0 <a href="http://maemo.org/news/?net_nemein_favourites_execute=bury&net_nemein_favourites_execute_for=9bcaa71e5fd611ddb8b10b44f0f1aea4aea4&net_nemein_favourites_url=https://maemo.org/news/favorites//json/bury/midgard_article/9bcaa71e5fd611ddb8b10b44f0f1aea4aea4/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-buried.png" style="border: none;" alt="Bury" title="Bury" /></a></span>]]></description>
            <author>Mathieu Blondel &lt;mblondel@rubyforge.org&gt;</author>
            <category>feed:46b1d6b26651a331cde2ad188d699e0c</category>
            <pubDate>Fri, 01 Aug 2008 13:46:35 +0000</pubDate>
            <guid>http://maemo.org/midcom-permalink-9bcaa71e5fd611ddb8b10b44f0f1aea4aea4</guid>
        </item>
        <item>
            <title>Handwriting renderers</title>
            <link>http://www.mblondel.org/journal/2008/07/13/handwriting-renderers/</link>
            <description><![CDATA[
<h3>Canvas</h3>
<p>If you didn&#8217;t read my <a href="http://www.mblondel.org/journal/2008/07/04/a-roadmap-for-project-tegaki/">previous post</a>, for short, project Tegaki is a framework for handwritten Chinese character recognition (HCCR) written in Python. It includes reusable components and is a placeholder for experimentation. The goal is to create the next-generation open-source HCCR software but it may be useful for academic researchers as well.</p>
<p>One reusable component is the Canvas. This is the user interface component that allows to draw characters. In addition, the Canvas supports &#8220;replaying&#8221; the character (stroke by stroke animation) and setting a background model (to help users draw an unknown character). It is multi-platform.</p>
<p><img src="http://www.mblondel.org/images/tegaki/tegaki-canvas.png" alt="The Canvas" /><br />
<em>Example of a character drawn using the Canvas provided by libtegaki-gtk</em></p>
<p>The Canvas has a get_writing() method. It allows to retrieve the Writing object for the handwriting currently displayed in the Canvas. </p>
<h3>XML representation</h3>
<p>The Writing object supports reading from and writing to an XML file. The XML file can optionally be compressed using gzip or bz2. On my hard drive, I have a small set of handwriting samples. 500 characters take about 10 MB. That&#8217;s why compression is very useful.</p>
<p>The XML representation of a handwriting sample looks like that.</p>
<pre>
&lt;character&gt;
  &lt;utf8&gt;無&lt;/utf8&gt;
  &lt;strokes&gt;
    &lt;stroke&gt;
      &lt;point x="306" y="163" timestamp="0" /&gt;
      &lt;point x="303" y="163" timestamp="21" /&gt;
      &lt;point x="303" y="166" timestamp="29" /&gt;
      [...]
    &lt;/stroke&gt;
    &lt;stroke&gt;
      &lt;point x="266" y="240" timestamp="912" /&gt;
      &lt;point x="270" y="240" timestamp="917" /&gt;
      &lt;point x="273" y="240" timestamp="925" /&gt;
      [...]
    &lt;/stroke&gt;
    [...]
  &lt;/strokes&gt;
&lt;/character&gt;
</pre>
<h3>Renderers</h3>
<p>I&#8217;ve recently added support for what I named &#8220;renderers&#8221;. They take a Writing object as parameter and generate a visual representation of it. Since I used the cairo graphics library as drawing backend, the representation can be saved to PNG, SVG and PDF! Those renderers will be very useful for the handwriting database website that I wrote about in my <a href="http://www.mblondel.org/journal/2008/07/04/a-roadmap-for-project-tegaki/">previous post</a>!</p>
<h3>Complete character renderer</h3>
<p><img src="http://www.mblondel.org/images/tegaki/cairo_kanji.png" style="border:1px solid black" alt="Kanji" /></p>
<h3>Stroke order renderer</h3>
<p><img src="http://www.mblondel.org/images/tegaki/cairo_steps.png" style="border:1px solid black" alt="Kanji" /><br />
<em>Stroke order with each single stroke</em></p>
<p><img src="http://www.mblondel.org/images/tegaki/cairo_steps_2.png" style="border:1px solid black" alt="Kanji" /><br />
<em>Stroke order with stroke groups</em></p>
<p>Strokes can be grouped together when the stroke order is obvious. However, this requires to know which strokes to combine together. A dictionary must be created for that. A entry example would be:</p>
<p>駅 1,1,3,1,4,2,2</p>
<h3>&lt;canvas&gt; HTML tag</h3>
<p>The canvas I was writing about above is written in pygtk and is intended to be used for the Desktop or for Maemo. However, in the case of the handwriting database website, since we want as many people to contribute their handwriting as possible, it would be nice to not require any particular installation. For that, a canvas directly in the browser would be the ideal solution.</p>
<p>One solution would be to use Flash but I would prefer to use the &lt;canvas&gt; tag. It can be used in combination with Javascript to do drawing in the browser. It is supported natively by Firefox, Opera and Safari. It is supported in Internet Explorer through a third-party Javascript called ExplorerCanvas. </p>
<p><strong>I am looking for a contributor</strong> to create a new canvas using this technology. The canvas should support drawing, displaying existing handwriting and replay (stroke by stroke animation).</p>
<p>For more information:</p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Canvas_(HTML_element)">Canvas (HTML element)</a> (Wikipedia)</li>
<li><a href="http://developer.mozilla.org/en/docs/Canvas_tutorial">Canvas tutorial</a> (developer.mozilla.org)</li>
<li><a href="http://caimansys.com/painter/">Canvas painter</a> (a paint-like application in the browser)</li>
<li><a href="http://excanvas.sourceforge.net/">ExplorerCanvas</a> (by Google)</li>
</ul>
<h3>GIF stroke animation</h3>
<p>Even though GIF uses a patented compression, GIF is still the only format with support for animations and wide support in the browsers. Therefore it would be very cool to be able to generate GIF stroke animations from a writing object.</p>
<p>I had a look at python-imagemagick and Python Imaging Library (PIL) but they both seem to have very limited support for GIF animations. So I&#8217;m thinking of writing my own library for GIF generation in Python. <a href="http://people.freedesktop.org/~company/byzanz/">Byzanz</a>, a software to create screencasts as GIF animations, can be used as inspiration because it includes a GIF encoder. It also supports color quantization (using octrees) and dithering. From what I see, it should take less than 1000 lines of Python code.</p>
<p>I read a little bit about color quantization. I found it very interesting. Here&#8217;s a short explanation about color quantization for those who don&#8217;t know about it. Basically, each pixel in an image may have three components Red Blue Green. For a 400&#215;400 picture, this is about 400*400*3=480KB. To gain space, an idea is to store colors in a palette (a table index => color). Then each pixel only needs to refer to the index in the palette instead of having to define the three components. For a 256-color palette, this saves two bytes for each pixel. However, since we now use 256 colors only instead of 256 * 256 * 256 = 16,777,216 colors, there&#8217;s a color precision loss. The challenge is thus to find what colors to put in the palette to have the smallest precision loss possible. For example, we may want to put in the palette colors that are the closest to the most frequently used colors. This is a 3-dimensional clustering problem, thus it reminded me of Machine Learning, a topic in which I&#8217;ve been very interested recently.</p>
<p>For more information, I recommend the reading of those Wikipedia articles:</p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Palette_(computing)">Palette</a></li>
<li><a href="http://en.wikipedia.org/wiki/Indexed_color">Indexed color</a></li>
<li><a href="http://en.wikipedia.org/wiki/Color_quantization">Color quantization</a></li>
<li><a href="http://en.wikipedia.org/wiki/Dithering">Dithering</a></li>
<li><a href="http://en.wikipedia.org/wiki/Octree">Octree</a></li>
</ul>
<span class="net_nemein_favourites">3 <a href="http://maemo.org/news/?net_nemein_favourites_execute=fav&net_nemein_favourites_execute_for=bf4869a450b511dd9b7673c89937aca7aca7&net_nemein_favourites_url=https://maemo.org/news/favorites//json/fav/midgard_article/bf4869a450b511dd9b7673c89937aca7aca7/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-favorite.png" style="border: none;" alt="Add to favourites" title="Add to favourites" /></a>0 <a href="http://maemo.org/news/?net_nemein_favourites_execute=bury&net_nemein_favourites_execute_for=bf4869a450b511dd9b7673c89937aca7aca7&net_nemein_favourites_url=https://maemo.org/news/favorites//json/bury/midgard_article/bf4869a450b511dd9b7673c89937aca7aca7/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-buried.png" style="border: none;" alt="Bury" title="Bury" /></a></span>]]></description>
            <author>Mathieu Blondel &lt;mblondel@rubyforge.org&gt;</author>
            <category>feed:46b1d6b26651a331cde2ad188d699e0c</category>
            <pubDate>Sun, 13 Jul 2008 07:03:22 +0000</pubDate>
            <guid>http://maemo.org/midcom-permalink-bf4869a450b511dd9b7673c89937aca7aca7</guid>
        </item>
        <item>
            <title>A roadmap for project Tegaki</title>
            <link>http://www.mblondel.org/journal/2008/07/04/a-roadmap-for-project-tegaki/</link>
            <description><![CDATA[
<h3>Codename Project Tegaki</h3>
<p>I wrote in a previous post about my <a href="http://www.mblondel.org/journal/2008/05/25/first-hmm-experiment/">first experiment</a> with applying a modern technique, namely Hidden Markov Models, for handwritten Chinese character recognition. I&#8217;m quite motivated in making this more than just a single isolated experiment so I decided to give a name to the project. I named it <strong>Project Tegaki</strong>. This is going to be the codename for the effort starting from now. Tegaki means Handwriting in Japanese.</p>
<h3>Project statement</h3>
<p>The aim of Project Tegaki is to push forward the creation of <strong>the next-generation open-source handwritten Chinese character  recognition (HCCR) software</strong>. </p>
<p>Currently, the only open-source package for HCCR is <a href="http://tomoe.sourceforge.jp/">Tomoe</a>. This is a project that I have been contributing to and that I used for my <a href="http://www.mblondel.org/journal/category/google-soc/">Google Summer of Code</a> project, &#8220;Japanese/Chinese handwriting recognition on maemo&#8221;. Maemo is the open-source platform used by Nokia PDAs. I have decided to start Project Tegaki as an external effort because I considered that Tomoe would not be a good environment to welcome the effort. However, if the Tomoe community is ready to help me in this effort, I will be happy to merge Project Tegaki back into Tomoe once Project Tegaki becomes ready for prime-time.</p>
<p><img src="http://www.mblondel.org/images/tomoe_vkb.png" /><br />
<em>Handwritten Chinese character recognition in a PDA&#8230;</em></p>
<p>Here are some goals for the project:</p>
<p>- <strong>Free and open-source</strong>. The goal is to produce the next-generation free and open-source HCCR software.</p>
<p>- <strong>Modern</strong>. The software should use modern approaches to Handwriting recognition and be in tight connection with research.</p>
<p>- <strong>Embedded</strong>. The project must be designed to work with devices with restricted resources such as cell phones or PDAs.</p>
<p>- <strong>Online</strong>, as opposed to offline. In online recognition, characters are drawn using a device, typically a mouse, a tablet or a PDA stylus. In this setting, characters can be represented as <strong>sequences  of points</strong>. In offline recognition, characters are scanned a posteriori. In this setting, characters are represented as images (width * height pixels).</p>
<p>- <strong>Isolated Chinese character recognition</strong>. Here Chinese character doesn&#8217;t restrict to Chinese language, since Japanese kanji are also Chinese characters! Even though the package should theoretically be generalizable to any kind of character, Chinese characters have some specific challenges and some approaches that give good results for Chinese characters may not give good results with other kinds of characters, due to the unique properties of Chinese characters. &#8220;Isolated character recognition&#8221; means that user will have to draw one character at a time in a separate box, as opposed to continuous handwriting recognition. This makes things much easier and in the case of Chinese characters, this is a reasonable limitation. </p>
<p>- <strong>Stroke order dependent and independent</strong>. Both situations have useful applications so Project Tegaki should ideally support both. </p>
<h3>Python?</h3>
<p>Usually I&#8217;m more of a Ruby fan but the project was started in Python due to dependencies on third-party libraries that only exist in Python. Even though I&#8217;m slowly getting away from those dependencies, I don&#8217;t want to re-implement everything just for the sake of using Ruby. So I keep up with Python. </p>
<p>As it was emphasized, this project is highly experimental. Moreover, a collaborative website will be created (see below) and it will reuse number of existing components. It thus makes sense to use a high-level language to focus on the experiments and to create the website.</p>
<h3>Subprojects</h3>
<p>Project Tegaki is now split into several subprojects.</p>
<h4>libtegaki</h4>
<p>This Python library contains functionality that will be useful to other subprojects. This includes array manipulation, character input/output, viterbi decoder&#8230;</p>
<h4>libtegaki-gtk</h3>
<p>This Python library contains user interface elements that will be useful to other subprojects. So far it only includes a Canvas, which can be used to draw characters. It is replacement for TomoeCanvas with some additional benefits:</p>
<p>- Truly reusable. TomoeCanvas assumes that a recognizer is connected to the canvas. However, there are situations when a recognizer is not needed.</p>
<p>- Resizable. TomoeCanvas cannot be resized at will.</p>
<p>- Animation. A stroke animation of a character can be displayed.</p>
<p>- Background character. A background character can be set as a model and animations will be displayed to help draw the same character stroke by stroke.</p>
<p>- Features other than (x,y) coordinates are supported such as pen pressure and pen inclination when available, stroke duration, point timestamp.</p>
<p>libtegaki-gtk is written in pygtk and depends on libtegaki.</p>
<h4>tegaki-db</h4>
<p>The most successful handwriting recognition systems nowadays use a &#8220;learn by example&#8221; philosophy. For each character supported, several samples of the handwritten character must be provided to the system in order to learn from them. Because those samples are used to train the system, they are called &#8220;training samples&#8221;. The challenge for the final recognizer is to be able to recognize unseen handwritten instances of the same characters. This is the ability of the recognizer to &#8220;generalize&#8221; the acquired knowledge.</p>
<p> A &#8220;training corpus&#8221; is a set of training samples. A good corpus should contain dozens of handwritten samples for each character. The corpus should be representative enough of all handwriting styles. Collecting all the handwriting samples and designing a good corpus is a huge task for Chinese characters because there exist thousands of them!</p>
<p>Such handwritten Chinese character databases do exist but they have a fee and they are usually restricted to academic research. They are by no means suitable for free software. The goal of the tegaki-db subproject is to create a collaborative web platform to collect handwriting samples. Native speakers and learners alike will be able to log in and contribute their own handwriting. The collected data will be published in a free license so that it can benefit to academic research as well. The tegaki-db will use a client / server architecture.</p>
<h4>tegaki-db-client</h4>
<p>tegaki-db-client is a client for people to input their handwriting. It will be written in Python and use the canvas provided by libtegaki-gtk. The client will communicate with the server through web services. The client should be distributed for several platforms such as Linux, Windows and Maemo to increase the number of potential contributors. A detailed  specification of tegaki-db and tegaki-db-client will be provided later in a separate post.</p>
<h4>tegaki-models</h4>
<p>tegaki-models is by no means an end-user package and will only be used by developers. It is the placeholder for experimentation. Thanks to this package, model ideas will be tested and evaluated.</p>
<p>I continued to work on new model ideas&#8230; However, because my current training corpus is so small, it&#8217;s kind of irrelevant to spend to much time on models. The top priority now is to create tegaki-db.</p>
<h4>tegaki-decoder</h4>
<p>tegaki-decoder is going to be a high-performance decoder (recognizer). It should be a fast implementation of the Viterbi decoder. It will be written in C and designed to work with embedded systems. This is going to be the end-product that people will use. Once sufficient data have been collected, good models have been generated and the tegaki decoder is ready, then Project Tegaki will be ready for real use! Currently, implementing tegaki-decoder is not the top priority.</p>
<h3>Roadmap</h3>
<p>- Launch tegaki-db and tegaki-db-client.<br />
- Hope that the collaborative effort is successfull and collect lots of handwriting samples from many different people.<br />
- Create new models, especially stroke-based models.<br />
- Implement tegaki-decoder. </p>
<p>If I continue to be the only one interested in this project, at this rate it will take from several months to a couple of years to achieve everything. That&#8217;s why I hope I can attract a few contributors.</p>
<h3>Download</h3>
<p>The work completed so far is still very experimental and thus targets potential contributors. If you want to test it with your own handwriting anyway, please see <a href="http://www.mblondel.org/journal/2008/05/25/first-hmm-experiment/">my previous post</a>. </p>
<p>To download the source code, you can use </p>
<p>$ git clone http://www.mblondel.org/code/hwr.git</p>
<p>or </p>
<p>$ git pull</p>
<p>from the repository folder if you already have the repository on your computer.</p>
<p>The code can be browsed online using <a href="http://www.mblondel.org/gitweb?p=hwr.git">gitweb</a>. By clicking the &#8220;snapshot&#8221; links you can get a complete copy of the source code at a given revision.</p>
<p>See my <a href="http://www.mblondel.org/journal/2008/05/25/git-memo/">memo</a> on git if you don&#8217;t know it yet. </p>
<p>I published my work under GPL license.
</p>
<span class="net_nemein_favourites">8 <a href="http://maemo.org/news/?net_nemein_favourites_execute=fav&net_nemein_favourites_execute_for=32aed96649f711dd953ef508345370657065&net_nemein_favourites_url=https://maemo.org/news/favorites//json/fav/midgard_article/32aed96649f711dd953ef508345370657065/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-favorite.png" style="border: none;" alt="Add to favourites" title="Add to favourites" /></a>0 <a href="http://maemo.org/news/?net_nemein_favourites_execute=bury&net_nemein_favourites_execute_for=32aed96649f711dd953ef508345370657065&net_nemein_favourites_url=https://maemo.org/news/favorites//json/bury/midgard_article/32aed96649f711dd953ef508345370657065/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-buried.png" style="border: none;" alt="Bury" title="Bury" /></a></span>]]></description>
            <author>Mathieu Blondel &lt;mblondel@rubyforge.org&gt;</author>
            <category>feed:46b1d6b26651a331cde2ad188d699e0c</category>
            <pubDate>Fri, 04 Jul 2008 18:16:49 +0000</pubDate>
            <guid>http://maemo.org/midcom-permalink-32aed96649f711dd953ef508345370657065</guid>
        </item>
        <item>
            <title>Handwriting recognition inside the VKB</title>
            <link>http://www.mblondel.org/journal/2007/09/08/handwriting-recognition-inside-the-vkb/</link>
            <description><![CDATA[
<p>Since my last post, MaemoCJK has been <a href="http://maemocjk.garage.maemo.org/beta_release.html">released</a> for beta-testing with support for Japanese/Chinese handwriting recognition, together with a few new interesting features such as switch between input methods at runtime. </p>
<p><a href="http://www.mblondel.org/images/tomoe_vkb.png"><img src="http://www.mblondel.org/images/tomoe_vkb.png" alt="Tomoe inside the VKB"  /></a><br />
<a id="more-70"></a><br />
This screenshot showcases the latest development I made. This is not shipped with the beta-release mentioned above but should hopefully be part of the upcoming release. As suggested in the comments of my last post, I tried to include the input pad directly inside the VKB (Virtual Keyboard) rather than using a separate window. Quite unexpectedly (for me), it gave very good results in terms of usability. The main advantage of this approach is that the input pad doesn&#8217;t take the whole screen and therefore you can still see the characters getting input when you click on them. Even though, the input pad is much smaller than before, it doesn&#8217;t seem to affect recognition accuracy, which is a very good point. As you can see from the screenshot, you can switch between VKB mode and handwriting mode with the button on the right. You can also switch between Chinese and Japanese with the menu on the left. A positive side-effect of the inclusion in the VKB is that we are sure that the dictionary gets loaded only once.</p>
<p>Attentive readers will have noticed that there is a small bug with fonts. Even though on this screenshot Chinese is selected, Pango (GTK&#8217;s text layout engine) picks up a Japanese font for some characters and a Chinese font for some others. This explains why the characters&#8217; font don&#8217;t look consistent and the first candidate is displayed as <span lang="ja" style="font-size: 40px">真</span> (Japanese) instead of  <span lang="zh" style="font-size: 40px">真</span> (Chinese). The problem comes from the fact that that those two characters share the same Unicode value so if Pango cannot detect the language from the context, no assumption can be made as to which font will be selected to render the character (see <a href="http://en.wikipedia.org/wiki/Han_unification">Han unification controversy</a> for more details). To solve this problem, we just have to tell Pango explicitly which language it is. </p>
<p>Apart from inclusion in the VKB, I gave a hand for other parts of the MaemoCJK project. I added scim-anthy support, added a French keyboard layout (had to fix a problem in libfakekey for that) and improved the scim-config UI on small screens, among other things.</p>
<p>In other news, Nokia announced this week that they released their proprietary Hildon IM engine under LGPL. I wonder how this will affect the overall MaemoCJK project.</p>
<p>The Google Summer of Code is now over. I think overall it went really not bad. I reached my objectives which were adding handwriting recognition in MaemoCJK with particular focus on performance and smooth integration. Performance is now quite reasonable and integration is pretty good with inclusion in the VKB. One big regret that I have is that many of my changes to Tomoe have not been merged yet to the upstream project and I think I could have contributed much more to the project if the maintainers had been ready for that. However, I now have a good knowledge of the source base so I&#8217;m quite confident that I can help the project in the future. My main interest is in adding handwriting recognition using Hidden Markov Model. And of course, I&#8217;ll continue to maintain the handwriting recognition portion of MaemoCJK.
</p>
<span class="net_nemein_favourites">1 <a href="http://maemo.org/news/?net_nemein_favourites_execute=fav&net_nemein_favourites_execute_for=e63073085f6c11dcbc634fcaab77edbeedbe&net_nemein_favourites_url=https://maemo.org/news/favorites//json/fav/midgard_article/e63073085f6c11dcbc634fcaab77edbeedbe/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-favorite.png" style="border: none;" alt="Add to favourites" title="Add to favourites" /></a>0 <a href="http://maemo.org/news/?net_nemein_favourites_execute=bury&net_nemein_favourites_execute_for=e63073085f6c11dcbc634fcaab77edbeedbe&net_nemein_favourites_url=https://maemo.org/news/favorites//json/bury/midgard_article/e63073085f6c11dcbc634fcaab77edbeedbe/" class="net_nemein_favourites_create"><img src="http://static.maemo.org:81/net.nemein.favourites/not-buried.png" style="border: none;" alt="Bury" title="Bury" /></a></span>]]></description>
            <author>Mathieu Blondel &lt;mblondel@rubyforge.org&gt;</author>
            <category>feed:46b1d6b26651a331cde2ad188d699e0c</category>
            <pubDate>Sat, 08 Sep 2007 12:49:59 +0000</pubDate>
            <guid>http://maemo.org/midcom-permalink-e63073085f6c11dcbc634fcaab77edbeedbe</guid>
        </item>
    </channel>
</rss>
