Reading “Programming The Semantic Web”

I’ve already been writing about one of Toby Segarans books in the past. Remembering these days, I have been pretty enthusiastic regarding both his style of writing and his style of providing people with knowledge. It’s nice seeing this happen again also dealing with another book of his, “Programming the semantic web”.

At the moment, for various reasons, I am into refreshing some of my knowledge related to ontologies, RDF, inference, reasoning and most of the technologies and concepts related to what usually is referred to as “semantic web”. I was searching for a quick guide on things, ideally a “hands-on” one covering one of the technologies we’re already using. Considering this a “match” as far as the latter is concerned was just a couple of moments: Same as the “Collective Intelligence” book, the “Semantic Web” one provides an extensive amount of sample code implemented in Python, which is fine for us as we use both Python and Jython in a production environment and I’m quite familiar with both. So, feels like home from this point of view, even though “Semantic Web” is less “Python only” than its predecessor.

Programming the semantic web

Asides this, in terms of being “hands-on”, the book is rather good as well. Also following the same style “Collective Intelligence” did, the “Semantic Web” book provides a vast load of step-by-step samples to try out all the concepts and approaches introduced throughout the course. Again, the authors provide a whole load of well-prepared sample data, mainly in .csv and .txt, on the books web site, and, again, I am amazed by the refreshingly pragmatic way of hacking things up to make them initially work, to give the reader something to play with, something to figure out how things go, and then eventually to dive deeper into what happens, extending and optimizing here and there. Python surely has a sweet spot here in its interactive mode, allowing for trying out all the examples almost immediately without having to bother with too big a stack of technology to be mastered before getting anything to run (which, unfortunately, seems the case once in a while in the Java+IDE+Application-Server – world). In some ways, despite enjoying solving “real-world” problems using Java, I quite often thoroughly admire the refreshingly straightforward approach to doing things in Python, and this book to me once again points out why Python is quite a good learning language, too: In case of most of the explanations, it doesn’t seem a long way to go from a more or less formal description of a problem to a working prototype implementation that can be used on a sample dataset, which is technically the definite strength of this book.

As far as the non-technical things go, it is obvious that Segaran and his co-authors do have a profound experience in explaining complex things in an easy, straightforward fashion. I am still astounded by the way they took from “structured”, schema-driven database design to modeling RDF triples, pointing out both why one might want to follow this path and how to do so in a step-by-step way. I have seen quite some explanations of what triple stores are about, and most of them are way more arcane, way worse. If this is something you suffer from, “Semantic Web Programming” is definitely up to set that right. And this is the style of things throughout the book: Concise, smart explanations of things, building upon each other, with each subchapter obviously being placed and structured in a way to be just the “logical next step” if you consciously read the chapters before. All the book seems just a continuous flow of information without any obvious breaks or loose ends. I guess this is quite a subjective point of view especially in terms of how one wants to dive into things (again), but at least to me this seems close to optimum.

Plus, there’s another good thing also massively making me remember the “Collective Intelligence” book: All along with the main issue of the book, you get plenty of chances to sharpen your (Python) coding skills, and you are subject to quick-and-dirty explanations of a bunch of technologies all along the way. No matter whether geocoding, graphviz, rdflib, networkx, dbpedia or freebase – reading this book provides one with a fair understanding of these things as a mere “by-product”, and, as I have to state, as a rather valuable by-product as most of these tools and services are just things thare are around, mostly free and ready to be used in order to solve problems, so this doesn’t just provide you with a clear understanding of what happens (theoretically) but also, again, with a set of tools to really solve these problems in day-to-day life. Ultimately, you even end up finding a framework like CherryPy described here in order to quickly serve dynamic content.

So, overally, this book doesn’t too much differ from the “Collective Intelligence” one in many ways and is same as highly recommended. No matter whether refreshing your knowledge or, I dare to say, are into learning these tricks all anew, Segaran et al will provide you with a profound and extensive introduction to the subject, leaving you both with lots of things to try yourself, with lots of ways how to apply what you learnt to real-world problems and, last but not least, eventually also will provide you with a whole load of new ideas of things that could, and possibly should, be done in your business applications. Very inspiring.