Vincent Zoonekynd's Blog

Sat, 01 Jun 2013: Collocations and Map-Reduce

Learning a foreign language is time-consuming.

If you were learning English as a foreign language, you would have a wealth of resources at your disposal. You would be using monolingual dictionaries built with a controlled vocabulary, not unlike Wikipedia's "simple English", i.e., the definitions would use a limited set of 2000 words, assumed to be known by the users, with exceptions dutifully highlighted. You would be provided with sample sentences, showing in which context and with which grammatical structure words should be used. You would also have lists of collocations -- collocations are pairs of words that often appear together: you can think of them as "minimal sample sentences".

For other languages, the learning materials are less adequate.

In this note, we will see how to automatically generate some of those missing resources, taking Japanese (and sometimes English) as an example.

2011-12-02_NLP_hive.png

More...

posted at: 19:18 | path: /Linux | permanent link to this entry

Sat, 01 Jun 2013: Gentoo 2007.0

I decided to reinstall Gentoo Linux on my computer, but this time I wanted the whole drive to be encrypted -- including the system partition and the swap, and not only the data partition, as before. Among the other non-standard details, I also have a Wacom tablet and want to be able to type Japanese text. I proceeded as follows (if you do not want that much encryption, just use the livecd: you have a complete, useable system in less than one hour, instead of one or two days).

More...

posted at: 19:17 | path: /Linux | permanent link to this entry

Sat, 01 Jun 2013: Programming languages

Programmers should learn a new programming language every year -- not necessary to ditch the one(s) they are currently using, but we might glean new ideas, new programming paradigms and a more informed, objective, critical view of the programming landscape and its latest fads. In the past few months, I "learned" three of them (it might sound a lot, but because of lack of time, my survey remained very superficial): Oz, Erlang and Chuck.

More...

posted at: 19:17 | path: /Linux | permanent link to this entry

Sat, 01 Jun 2013: Blosxom

Since I decided to revive my web page, to update it on a more regular basis, to resume a frequent web activity, I have a look at a couple of Blog engines. I want one that allows me to write in a text editor, not in a web browser, that allows me to type text, not HTML, that allows me to use my own (pre-wiki) tagging, that produces static files.

Most blog engines are dynamic (meaning that the contents are stored on the web server, as text files or inside a database, and that the actual HTML pages are only generated when requested), but Blosxom (and probably PyBlosxom, its Python cousin, as well) also allows for the generation of static web sites (i.e., we can generate all the HTML pages at once and only transfer them to the web server).

More...

posted at: 19:17 | path: /Linux | permanent link to this entry

Sat, 01 Jun 2013: Haskell

Since I had not learnt a new programming langage for years (the previous one was Ruby, three years ago), I decided to try a new one: Haskell. It is a functionnal language, i.e., one keeping the good ideas of Lisp and forgetting the awful syntax -- Lisp is still being used, in domains where, in spite of its excessive use of parentheses, it keeps an edge on easier programming languages -- for instance in computer music. Haskell often wins the ICFP contest, which is not limited to functional languages.

http://www.amazon.com/Notes-Metalevel-Studies-Music-Research/dp/9026519753
http://en.wikipedia.org/wiki/International_Conference_on_Functional_Programming_Contest
http://linuxfr.org/2007/01/11/21899.html

Other marginal (i.e., not yet mainstream) languages one might want to learn are OCaml (another functional language, as fast as C or C++ according to the programming languages shootout and used in some real-world applications, e.g., in finance), Erlang (another functional language, developped and used by Ericson, that allows you to write multi-threaded or even distributed applications without really thinking about it) or Squeak (a Smalltalk offspring, that prominently features in the OLPC (One laptop per child) initiative).

http://www.galois.com/cufp/slides/2006/YaronMinsky.pdf
http://shootout.alioth.debian.org/
http://www.algorithm.com.au/downloads/talks/Concurrency-and-Erlang-LCA2007-andrep.pdf
http://www.squeak.org/
http://linuxfr.org/2007/01/06/21860.html

Here are the notes I took while learning Haskell.

More...

posted at: 19:17 | path: /Linux | permanent link to this entry

Sat, 01 Jun 2013: Gentoo

After ditching Mandriva because my machine was becoming slower and slower (instead of faster and faster), after experiencing serious latency problems while trying to watch videos on Suse, after straining my eyes on Ubuntu (I do not remember if it was because of the fonts or because I was desperately looking for a useable application), I finally turn to Gentoo.

It is a very different breed of distribution: everything is compiled from source, so that you install only the applications you need and you can fine-tune the compile flags for them (distributions on CDs are often compiled with -Os so as to minimize disk space -- this might not be what you want). It also gets rid of the patent-related problems.

That sounds very appealing, but be warned: should you decide to install this distribution, you must have several days in front of you, you must not be afraid of the command line, you must not be bewildered when you are told that the system is "finally installed" even though there is no media player, no web browser, no KDE, no Gnome, no word processor, no X -- yet.

More...

posted at: 19:17 | path: /Linux | permanent link to this entry

Sat, 01 Jun 2013: Ubuntu

Ubuntu was the next stop on my journey away from Mandriva. Though the distribution has a very good reputation, though it is touted as rock-solid and easy-to-use even for beginners, it left my computer in an appalling state: it is of course plagued by the patent-induced lack of media players that infects more and more distributions, but the rare present applications lacked most of the features, most of the appeal of their KDE counterparts. Even worse, the distribution failled to install -- a power-user will manage to fix things, but the begininners, the very target of this distribution, will throw the CDs in the bin.

While you might keep Gnome alongside KDE on your computer, for the rare worthy applications (Gimp, Gnumeric) and to keep an eye on the competitors, Ubuntu is, for me, a no-go.

More...

posted at: 19:17 | path: /Linux | permanent link to this entry

Sat, 01 Jun 2013: Suse 10.0

Following my problems with Mandriva 2006, I start looking around for other Linux distributions. First on my list is Suse.

More...

posted at: 19:17 | path: /Linux | permanent link to this entry

Sat, 01 Jun 2013: Mandriva 2006

I have been using Mandriva (formerly known as Mandrake) for several years, but I am less and less happy with it. I usually reinstall the system twice a year, in order to have (the impression to have) a brand new, faster computer each time: here are the notes I took during the latest (last?) install -- and the reasons why I am drifting away from this distribution.

More...

posted at: 19:17 | path: /Linux | permanent link to this entry

Sat, 01 Jun 2013: Mandrake 10.1

Here are the notes I took while installing Mandrake 10.1 (well, actually, it is a slight rewriting of the previous notes).

More...

posted at: 19:17 | path: /Linux | permanent link to this entry