KDE and the Semantic Desktop

Morgawr · on March 14, 2015

I've been a KDE user for a couple of years now. I've moved between DEs a lot in the past, XFCE, LXDE, Gnome2/3, Cinnamon, Unity... I think I've tried them all, even more obscure WM like AwesomeWM, XMonad and similars. I have to say KDE beats them all on plenty of things. It is absolutely amazing and I couldn't be happier using it.

However, on the other side of the coin, I've always hated semantic search and indexing. Maybe it's because I'm a power user, but even when I was using windows I hated the indexing of files. I hate the idea of something going through my entire hard drive multiple times at intervals to track and scan all my data, index it and make it readily available for me. Why? Because it's the perfect invitation to have people more easily snoop on your stuff, because it consumes resources, spins up disk I/O unnecessarily, introduces unexplainable slowdowns. All of this for a comfort that I really don't require. I know where I put my files, thank you very much. If I can't find a file, good ol' find/grep can do the job equally well.

I had this massive problem with Baloo on KDE slowing down my entire machine, I had to run iotop to figure out what the issue was. I run multiple sshfs mounted partitions in my home and Baloo in its indexing would constantly crawl through them multiple times every day, which means it'd send a lot of network requests, slow down I/O operations everywhere and grind my machine to a halt.

It took me a while to figure out how to disable the entire thing (I wasn't even aware KDE did indexing before that, to be honest) but now my machine is better than ever. The first thing I do when I install a fresh KDE setup is to turn off all that stuff and I would advice every power user on KDE to do that as well.

baghira · on March 14, 2015

Leaving aside the fact that anybody intended on snooping on you doesn't really need the semantic desktop to accomplish his goal, I would argue that while on your setup disabling baloo is clearly the right thing to do, when dealing with TB of (local) storage baloo is faster than find/grep (also, grep cannot read inside odf files). In this sense it is similar to akonadi: there is a threshold after which, if you have a fast connection, it makes dealing with 20GB of email better than using an email client which uses sqlite. In a sense I always felt that while nepomuk and akonadi where plagued with implementation issues, part of the hostility in the KDE community was due to selection bias: people who like KDE in general grok the "hierachical filesystem" metaphor, viz. how kcm is structured. However the random Windows/OSX user (to which a desktop environment like KDE has also to cater, to a certain extent) more often than not has a terrible time organizing files: he creates "dumping ground" folders, and then nest them eight times, or simply puts everything on the desktop and ends up relying on finder for everything. Thankfully baloo now can be easily disabled (and works better than the old nepomuk), and users who need a fast Qt-based IMAP client can use Trojità.

sixbrx · on March 14, 2015

Turning on search for special cases like 20GB of local email sounds like the perfect reason to opt-in to some sort of specialized search system of the user's choosing.

bztzt · on March 14, 2015

even if you're "good at" hierarchical filesystems they have a few inherent problems:

1) there's overhead associated with coming up with a hierarchical file structure, keeping it updated whenever your problems or your understanding of them change, etc.

2) there's overhead associated with always remembering to put a file in the "right" place every time you save it, actually navigating there to save it, and then fishing it out of the "right" place every time you need a file (getting to a file with search can sometimes be faster than navigating to it even if you know exactly where it is)

3) there is no perfect hierarchical structure for information; there will always be situations where x/y makes as much sense as y/x and both are sometimes inconvenient

this is not to suggest that search, tags and other alternative mechanisms don't have their own problems.

thangalin · on March 14, 2015

There is also the locate command. When combined with grep, they make a fast way to find files and search within them. For example, to find a Java class anywhere on the system:

    sudo updatedb

    for i in $(locate "*.jar");
      do echo $i; jar -tvf $i | grep -Hsi classname;
    done

The best you can do with vanilla Windows is search from the root directory, but it could take a while:

    for /R %G in (*.jar) do @jar -tvf %G | find "ClassName" > NUL && echo %G

shadowfox · on March 14, 2015

For an experience similar to locate on Windows + NTFS systems, there is Everything [1]. It also has a command line tool that you can use (well) from the command line.

[1] http://www.voidtools.com/

ehvatum · on March 14, 2015

The many times I did whatever it took to make Nepomuk and/or strigi stop using CPU time, I always felt a vague sense of guilt.

Am I a bad person for depending on the combination of having an SSD + endless variations of the find command, strings, and grep? Such as find ./ -iname '.c' -or -iname '.h' -exec grep -Hn pattern \{\} \; ?

I just want my data in text; I don't care about semantic anything, and I'm sorry :( I wish I had time to appreciate whatever the hell it is these processes I must stop are trying to accomplish, but I'm relieved they are going away or being cut down to doing just one thing in a well defined role.

lifeisstillgood · on March 14, 2015

Don't feel guilty. I read the article twice and still have no real idea what it was about - sea of names and acronyms that just don't tell a story.

The zen of python has a line ... If it is hard to explain, it's probably a bad idea.

lmm · on March 14, 2015

Good riddance. It was a bad idea at the time and still is, just like the Semantic Web itself. Nepomuk was complex and fragile, as was Akonadi; I've lost count of the number of times I was unable to read my email because this supposedly "optional" piece failed (usually due to an akonadi problem). In the end I resorted to https://www.trinitydesktop.org/ - KDE3 which was less flashy, but worked.

This post is a good sign - maybe KDE is belatedly paying attention to user-facing functionality rather than academic technology exercises. Maybe I can switch back to "mainline" KDE.

fafner · on March 14, 2015

Akonadi really annoys me. Who thought it would be a good idea to run mysql on each desktop to store email metadata? That's just ridiculous bloat. Mysql is among the top 10 entries for me in powertop. Why would MySQL be needed to deal with the metadata? All other mail applications seem to do fine with Maildir and an index.

And in case anybody thought that using MySQL would allow them to scale (you know because people receiving a million mails/s is such a common use case) then the answer is no! Thanks to Nepomuk many operations were bound by the RDF triple store they used. Maybe it has improved now thanks to Baloo. But once I tried to delete a folder containing a mailing list with a few thousand mails and I wondered why my laptop got so hot until I realised that Nepomuk was struggling.

Maybe the situation has improved now though. But the whole KMail transition was really painful for no tangible benefits to the user. This really feels like a prime example of overengineering.

mreiland · on March 14, 2015

> Maybe the situation has improved now though. But the whole KMail transition was really painful for no tangible benefits to the user. This really feels like a prime example of overengineering.

That's how I felt about most of KDE4.

Fiahil · on March 14, 2015

Nepomuk was nothing else, to me, than a series of repeated Segfaults.

The semantic web bring ideas for structuring informations, enabling machines-exploitable databases to exists. The Semantic Desktop was -maybe- ahead of its time, but certainly too poorly implemented.

kdomanski · on March 16, 2015

http://nepomuk.semanticdesktop.org/Project+Summary.html

The semantic desktop was nothing more than a way to leech EU grants for "innovation". They've been looking for use cases long after the money ran out.

PythonicAlpha · on March 14, 2015

Almost everybody I know, that was using KDE, did it because of the pure desktop and not because of the semantic Desktop. Every time, the discussion came to the later, it was how to disable it.

I think, it was a bold idea, but not well implemented. Maybe it also was to soon for such a bold move. With the advent of more cores and faster disks (eg. SSD) there might come the time for an other desktop to implement such thing. When you have eight or more cores, you don't have to worry when one is doing an indexing job all the time.

pbhjpbhj · on March 14, 2015

For me the semantic desktop was too resource hungry so it was turn it off or move to a different DE - at one point I had to remove the actual binaries in order to disable it the devs were so keen to push their ideal DE it seemed.

TBH if I'd been using the full KDE communication suite then perhaps it would have made sense but the refusal to support the sending of html email for kmail (did they change that yet) moved be back to Thunderbird a long time ago.

I applaud the innovation and feel that the community would have really been behind it if it had been optional from the get-go. Having never used Activities I still appreciate that KDE devs tried something different, the great thing with Activities is that – whilst initially you couldn't turn them off – they kept out the way and used little-to-no resources.

The desktop search, integrated to Dolphin, is great. Usually I use locate/find/grep but I'd use an integrated search if it could be tamed in its resource usage; but that's really the limit of the "semantic desktop" that I find useful under my desktop use at present.

emilsedgh · on March 14, 2015

KMail sends HTML emails just fine. With a nice WYSIWYG editor.

pbhjpbhj · on March 16, 2015

Cool, I'm glad they caved in the end. Next time I'm trying new MUAs I'll include it in the list.

davidgerard · on March 14, 2015

When I hear the words "desktop" and "innovation", I reach for my revolver.

My current fervent hope is that Xfce just does 4.x versions forever and never goes to a CADT-cursed 5.x.

1ris · on March 15, 2015

I think the Desktop needs innovation more than ever. But right now everything IMO just gets worse. I'm using a reactionary and dead simple XFCE. But I don't want this to be the future.

davidgerard · on March 15, 2015

The important thing to remember - and that the CADT development model forgets - is that most new ideas are bad.

I recall the hilarity when GNOME 3, after claiming "no no we're making it tablet-friendly!" was busted clearly not actually having a tablet amongst any of their developers ... because it was literally impossible to get out of the screensaver using the on-screen keyboard. http://news.slashdot.org/comments.pl?sid=3017371&cid=4083522... Their claims of developing for tablet users were literally delusional.

"You misunderstand. Our goal is to make computers easier to use, not to make them more useful." http://commandcenter.blogspot.co.uk/2011/09/we-open-in-well-...

frik · on March 14, 2015

I would say "Baloo", based on Xapian search engine and SQLite, is still a "desktop search": http://en.wikipedia.org/wiki/Desktop_search

The older KDE desktop search implementations Strigi (based on C++ based Lucene port) and the EU sponsored ontology based Nepomuk research project failed. http://en.wikipedia.org/wiki/NEPOMUK_(framework). I remember KDE "D-Bus" cause many problems in the early days of Strigi and Nepomuk.

Gnome and Ubuntu use/used the Nepomuk ontology as well: http://en.wikipedia.org/wiki/MetaTracker

The desktop search engine war era (2003-2006) between Microsoft Windows Longhorn WinFS (2003-2006), Microsoft Windows Vista desktop search (2006), Apple MacOS X 10.4+ Spotlight (2005) and Google desktop search (2004-2011) also brought desktop search to the Linux desktop. http://en.wikipedia.org/wiki/Windows_Search#Windows_Desktop_... , http://en.wikipedia.org/wiki/Spotlight_(software) , http://en.wikipedia.org/wiki/Google_Desktop

One of the reason why WinFS failed was its complex ontology (and it was coded in C# in user mode, so it was very slow). Windows Vista shipped with traditional desktop search with an advanced search dialog and a very good basic onotolgy (sadly the advanced search dialog is missing since Windows 7). WinXP already had the optional "indexing service" predecessor and the desktop search was also available as "MSN" addon download. http://en.wikipedia.org/wiki/Windows_Search#Windows_Desktop_...

Edit: to the downvoter: D-Bus was inspired by OLE, DCOM, CORBA, KParts, Bonobo (http://en.wikipedia.org/wiki/D-Bus). Search on Google for "strigi dbus problem", "nepomuk dbus problem". D-Bus caused problems in the early days of KDE4, strigi and later nepomuk era.

toyg · on March 14, 2015

DBus is totally separate from Strigi and Nepomuk -- it's a IPC bus, and a GNOME invention.

cgh · on March 14, 2015

Dbus was created by freedesktop.org and was based on KDE's DCOP. I don't think it's fair to call it a Gnome invention.

mreiland · on March 14, 2015

Which was itself based off of things like COM and CORBA. How far back do we want to go?

Zitrax · on March 14, 2015

It's very rare that I have to search my whole machine. Mostly my search need is to find a string in a specific directory using grep.

I am happy to get rid of the automatic indexing, it has caused me nothing but pain.

pnathan · on March 14, 2015

I have used KDE for a few years now. Fundamentally, I don't mind the idea of a semantic desktop, but I don't want the indexer to be running unpredictably, and I want to construct my own layers of meaning.

The way I have addressed this is: I have a large blob of files; I've ordered them through typical directories. This is portable through Linux/Windows/OSX (some of these files have migrated from DOS). The directory structure and naming itself is my semantic categorization. It's a bit hinky in places, but it is (1) portable and (2) supported by any operating system work using on the desktop.

At some point I will probably write an indexing system designed to handle tagging to deal with my files: however, at present, I get what I want when it comes to files.

anonbanker · on March 15, 2015

I feel about Activities the same way I feel about nepomuk and virtual desktops: great idea, but useless for my workflow.

That said, KDE 5 (even with huge showstopping kwin_x11 bugs in latest Arch) is light years beyond every other desktop, including Yosemite and Aero.

mark_l_watson · on March 14, 2015

Too bad this semantic desktop concept did not work out.

I have two Linux laptops, and as I started reading the article, I was thinking of installing KDE, only to be disappointed to read that the project is basically dead.

I have written two Semantic Web books and have had some semantic markup on my web sites for about ten years, but my view of the SW is changing. Google Knowledge Graph, which I worked with in 2013, is basically a huge triple store, but different than SW because it is one giant curated repository, and not a distributed interlinked graph comprised of many sources.

frik · on March 14, 2015

As you mentioned Google Knowledge Graph, it's based on the open Freebase ontology. Google bought the company behind Freebase and the ontology will be no more in March 31, 2015: http://www.freebase.com , http://en.wikipedia.org/wiki/Freebase

A rather sad demise. There is an half-hearted(?) attempt to donate some data to WikiData of Wikipedia. But given the track record of WikiData history and how different the ontology is, it looks like Freebase will go offline forever and used as internal datasource for Google Knowledge Graph.

It would be great if Google would donate the data and the tools to generate Freebase to Archive.org or another open source community, so that they can regenerate the ontology on a monthly interval from its Wikipedia, and various other data sources. Given that Freebase is also what powers Siri, Cortana and Watson, maybe another corporation can help.

mark_l_watson · on March 14, 2015

I agree. I have used Freebase since Metaweb became a company. Really sorry to see Freebase disappear. Archives are runnable on a publicly available AWS EC2 image, but there will be no more community contributions to Freebase.

pnathan · on March 14, 2015

Is there a way to migrate Freebase into a community-funded project? Having that kind of knowledge set disappear is very concerning.

lawlesst · on March 14, 2015

Is there evidence that the migration to Wikidata is half-hearted? This was just announced in December.

Not sure this is evidence of a "sad demise". Wikidata is non-commercial project managed by a foundation. Freebase/Metaweb was always a commercial project. Seems like this is a move towards openness if anything.

Full Freebase data dumps have long been available.

https://developers.google.com/freebase/data

frik · on March 14, 2015

Freebase updated the data dumps monthly. The main data source is Wikipedia, with many other data sources that update their content frequently.

Without releasing the tools and without a community who adds/fixes the data, it will be a lot less useful/useless. It's like reading a years old printed lexicon.

There is a lot evidence that the up-coming event will be a big loss - do a web search. A lot of software has been written and a lot of effort at e.g. http://schema.org has been made, all based on Freebase ontology with links to Freebase website. Freebase is basically the machine readable form of Wikipedia plus a lot of other data sources.

It's basically like closing down Wikipedia. And then a data dump of Wikipedia of March 2015 will be useless in 2020!

Do Facebook, Microsoft, Nuance, IBM, WolframResearch all have already an in-house Freebase clone. Or how do they update their databases in near future - the data used for Siri, Cortana, Siri, Watson, WolframAlpha. If not, would one of these companies be so kind and boot-strap/support an open Freebase rescue project?

lawlesst · on March 14, 2015

Freebase doesn't have an ontology in the Semantic Web sense. It's always been a specialized tool with non standard protocols like MQL. Schema.org is somewhat related but not tied whatsoever to Freebase.

Wikidata is fully editable and has an ecosystem for updating data so concerns about stale data aren't valid. Since Wikipedia will be pulling from it (is already in some cases) you could argue that their will be more sunlight on the data.

There's been a lot of code written against Freebase APIs (me included) but it isn't sad it's going away. It's the risk you take when relying on 3rd party services. I think your confusing short term developer convenience with a real loss of open data.

frik · on March 15, 2015

Freebase has 2,751,750,754 entries, Wikidata has only 13,734,841.

Lydia Pintscher (Wikidata manager) admits that the Wikipedia/Wikidata notability guideline is a real problem and the process of re-using Freebase data is slow: https://groups.google.com/forum/#!topic/freebase-discuss/s_B...

How cares about an API, it's all about the data dump that is available for download and used to be updated monthly.

I am not sure what your agenda is, if you are somehow related to Wikidata or Google and why this is only your eight post in 1526days on HN. But one it cannot be denied that there is evident that the demise of Freebase will hurt us all (except Google) in the long term and a lot of data won't survive/be included in Wikidata.

(I do like Wikipedia, but I noticed their notability guidelines and some admins gone wrong do hurt for example the German Wikipedia a lot, which is actually shrinking as more pages get deleted than added. Some good projects from German Wikipedia like the Toolserver which hosts the map-data and the geo-database of all cities and landmarks are great. But Wikidata with its long development history should have been done by Wikimedia itself not Wikipedia Germany. It started from SemanticWiki research project and it took way too long and is still not that good) http://www.heise.de/newsticker/meldung/Blutet-Wikipedia-aus-..., http://www.heise.de/newsticker/foren/S-Artikel-schrumpfen/fo...

snassar · on March 14, 2015

I am not sure what your second paragraph is referring to. KDE might be many things, but dead is not a valid descriptor. It's been going strong with a new iteration based on Qt 5, the project just released Frameworks 5.8 which support compilation against Qt 5.5.

In addition to Frameworks 5, the KDE project is porting their applications to Qt 5 while adding security fixes, usability improvements and new features.

The KDE project has a full-featured mail client, including smart OpenPGP integration, an IDE, a modern XMPP client with OTR integration and a pretty wonderful general document viewer.

There are people who don't like KDE for whatever reason, but it is far from dead.

On a related note: I've interacted with several KDE developers when submitting bug reports over the years and I have found them to be really nice, responsive and respectful overall.

[Edited: added valid to first paragraph]

kasabali · on March 14, 2015

He's saying semantic desktop in KDE is dead, not KDE itself.

legulere · on March 15, 2015

The semantic approach seems to work quite well for gnome 3 with tracker.