Write a book review using schema.org and RDFa . A HTML fragment useful as a template to generate Linked Open Data

2013-09-16

I want to write reviews of the books I recently read. I want these reviews to be

  1. Open Data as CC0
  2. Linked Open Data
  3. Using schema.org

As I could not find a web app for doing this (e.g. you can't write reviews at Open Library) I thought to do this using my own blog. And it's easy enough. That is, if you ever wrote HTML by hand. It is kind of 90ties - but then, the internet was nicer in those days (no ads, no spam, no flash, lesser javascript, lesser tracking, no PRISM ... ok I stop this nostalgia nonsense now).  Here is how it's done.

Prerequisites:

Using a fairly uncommon blog software named pebble I was'nt too surprised that RDFa is not supported in any way. But the software allows me to input HTML source blocks. And that's all you need to write a review using schema.org vocabulary and RDFa.

To be able to use Linked Open Data (LOD) , we need other LOD so that we can link them. I read a german edition of Lamentation by Ken Scholes . Sadly, I wasn't able to find that manifestation as LOD - but hey : it is Open Library, and I can simply add the missing resource by my own! Which I did. Thus creating "http://openlibrary.org/books/OL25430867M" (btw I added the "about the book" description and some tags at the work level of the manifestation.)

Why RDFa and not microdata ?

Because RDFa is as simple as microdata but far more powerful.

How to ?

I inserted the following HTML fragment:

<div vocab="http://schema.org/" typeof="WebPage Review" about="http://www.dr0i.de/lib/2013/09/14/ken_scholes_sndenfall.html">    This is a review about <a property="itemReviewed" href="http://openlibrary.org/books/OL25430867M/">    'Ken Scholes: Lamentation'. </a> <span property="reviewRating">    2 </span>    stars. Written by <a property="author" href="http://lobid.org/person/pc">    dr0i </a> <span content="2013-09-14" property="publishDate">    on September 2013 </span> <span property="reviewBody" xml:lang="en">    The setting of the book .... blah ... reminds me of <a property="citation" href="http://viaf.org/viaf/52700">    Paulo Coelho </a>    ... it should not be recommended by <a href="http://libraryjournal.com/" property="citation" >    library journal </a> </span> </div>

The first line is necessary to indicate that the schema.org vocabulary is used and that's a webpage with one review. The review is explicitly given the URI using "about" - so this becomes the "subject" of the triples. This is not needed, because as default the URI of the webpage of the review is taken - and they are the same. (There may occur situations when you need to distinct between the thing you want to talk about and the thing you talk with , e.g. providing metadata for an other webpage or non-webpage , or having a list of reviews where each review has it's own website (to be able to make statements about each single review.))

You can understand the HTML fragment above on your own if you lookup the URIs : suffix the value of the "property" attribute to the vocab url "http://schema.org/",  thus e.g. you get "http://schema.org/reviewBody" . The HTML attribute "property" can be used in HTML tags: you may use the link tag "<a>" when you link something or the "<span>" tag if you just want make literal statements. Having literals it makes sense to give them language tags, e.g. 'xml:lang="en"'. 

Now, have a look at http://www.dr0i.de/lib/2013/09/14/ken_scholes_sndenfall.html to see how it renders for you. And what does a machine see, say google? Let us use a tool from the W3C, the "Distiller", showing us the turtle notation of the underlying RDF:

@prefix cc: <http://creativecommons.org/ns#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix rdfa: <http://www.w3.org/ns/rdfa#> .
@prefix schema: <http://schema.org/> .
<http://www.dr0i.de:80/lib/> cc:license <http://creativecommons.org/publicdomain/zero/1.0/>;
    dc:creator <http://lobid.org/person/pc>;
    rdfa:usesVocabulary schema: .
<http://www.dr0i.de/lib/2013/09/14/ken_scholes_sndenfall.html> a schema:Review, schema:WebPage;
    schema:author <http://lobid.org/person/pc>;
    schema:citation <http://libraryjournal.com/>,
        <http://viaf.org/viaf/52700>;
    schema:itemReviewed <http://openlibrary.org/books/OL25430867M/>;
    schema:publishDate "2013-09-14";
    schema:reviewBody "The setting ..."@en;
    schema:reviewRating "2".

Nice and clearly structured, semantics all clear, isn't it ? If you think so - you are probably a machine ! 0110100001101001 !

Btw, of course you are allowed to use RDFa anywhere in your webpage. Pebble uses an info box about the blog on the upper right side, and nothing prevents you from pasting some HTML in there. We have tow statements here about the whole blog: 1. the license of the content and 2. the author of the blog .

Now, I am not totally happy with that. For me, it is not the writing of RDFa and HTML by hand, but that the source of the webpage looks something like a mess and it is not really valid RDFa in HTML. Rapper, a nice tool for working with RDF, rejects the parsing of the RDFa in this webpage. But by using Distiller, we can get the RDF even by command line (and remember: Distiller is coming from the W3C, so this RDFa is "valid enough" to serve it's purpose):

$ curl http://www.w3.org/2012/pyRdfa/extract?uri=http://www.dr0i.de/lib/2013/09/14/ken_scholes_sndenfall.html

So this is good enough for me. I will go on writing book reviews using the template above. For more information, examples and tutorials see http://schema.rdfs.org (yes, the site is a little bit outdated, but none the less helpful, since the schema.org examples still only shows microdata, (you can see with a glimpse that it is microdata when the "itemprop" attribute is used instead of "property")), even if microdata may be discontinued.