Borort

The sky’s the limit…

What is RDF?

leave a comment »

RDF is a very simple way of describing things to a computer. An RDF document contains one or more statements, each consisting of a subject (what am I describing?), a predicate (which property of the subject am I describing?), and an object (what is the value of this property?).The interesting thing about RDF is that each of these three parts can be a URL. For example, the subject of a statement could be the URL of a web page – very straightforward. The predicate is more interesting. Let’s imagine for now that we want to write an RDF statement that a web page has the title “Foo”. We can simply use the string “title” as the predicate, but human language is very ambigous. In one context, the word title means the subject of a document, but in a different context, title could mean a rank or position in some kind of hiearchy (e.g. King).

Since RDF is about describing things so that computers can understand them, we should be very exact about the meaning of our statement. Luckily, there is a group called the Dublin Core Metadata Initiative (DCMI), which has compiled a list of terms like title, description, creator, along with short definitions and, more importantly, a URL for each term. We can use that URL in RDF to make sure we agree about the definition of title.

This is how our RDF statement might look:

  • Subject: <http://example.org/my_document>
  • Predicate: <http://purl.org/dc/elements/1.1/title>
  • Object: “My Sample Document”

Of course that is not the actual syntax of RDF. In fact, there are quite a few ways to express RDF, but they are all equivalent. The most simple and straightforward RDF representation is called N-Triples, and it looks like this:

<http://example.org/my_document> <http://purl.org/dc/elements/1.1/title> "My Sample Document" .

As you can see, N-Triples is little more than the subject, predicate, and object on one line, separated by white spaces and terminated with a period.

Probably the most important RDF format is RDF/XML. It looks like this:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"           xmlns:dc="http://purl.org/dc/elements/1.1/">     <rdf:Description rdf:about="http://example.org/my_document">         <dc:title>My Sample Document</dc:title>     </rdf:Description> </rdf:RDF>

RDF/XML makes heavy use of XML namespaces. In the example above, the prefix “rdf” is bound to the namespace URI <http://www.w3.org/1999/02/22-rdf-syntax-ns#>, and the prefix “dc” is bound to <http://purl.org/dc/elements/1.1/>. When an RDF/XML parser encounters the element <dc:title>, it looks up its namespace URI and appends the element name to it, resulting in the predicate URL <http://purl.org/dc/elements/1.1/title>.

Both documents above have the same meaning to an RDF-aware application. They say: There is a thing identified by the URL <http://example.org/my_document>, and it has a property <http://purl.org/dc/elements/1.1/title> whose value is My Sample Document.

It is important to understand that the URLs themselves do not convey any automatic meaning to the RDF application. The application does not download them and look at their contents; in fact, they might not even exist. That is why RDF uses URIs, Uniform Resource Identifiers, and not URLs, Uniform Resource Locators. They have the same syntax, but their purpose is not to point to or locate any specific information, they are just meant to represent something uniquely. We might as well give our document a randomly-generated unique ID and assign it the URI <uuid:A72E998D-59EA-11D9-ABB5-000393935A56>. But of course there is nothing wrong with using the actual URL of a document on the internet – in fact, it makes perfect sense. Just keep in mind that URIs in RDF are not restricted to existing HTTP URLs.

The biggest advantage of RDF is its very simple data model. Everything RDF is just a collection of statements: subject, predicate, object. That makes it very easy to merge statements from different sources. Imagine a large link database of web pages, which stores the title of each page. Also imagine a separate database which does not know anything about the contents of web pages, but it knows who wrote them. If both databases are available as RDF, an application can simply collect the RDF statements of both into one large database. Now you have a database of titles and authors, without having to remodel anything, because you are simply collecting triples of subjects, predicates, and objects.

In RDF, everyone can make statements about everything. I can put an RDF document on my web site that provides information about a completely different site on the internet. If anyone knows the location of my document, they can merge it into their RDF database, and combine it with whatever information they already have.

Of course, whether they believe in the statements I make is a completely different matter. RDF does not have any built-in way of tracking provenance (who said what), but most RDF databases have the ability to store a fourth component along with each statement: the context, which is usually another URI. That makes it possible to have contradicting statements in your database, but through the context you can now see where a statement comes from and decide whether you want to trust it or not.

There are of course more efficient ways to store information in databases, but RDF was modeled especially with the internet in mind: it is meant to represent a large variety of information using a very simple format, and it allows you to combine knowledge from many different sources.

RDF and RSS

So what does RDF have to do with RSS? Both terms often seem to be used for the same thing. This is actually wrong. RSS is a syndication format; RDF is a generic language for expressing knowledge. However, there is a connection: RSS 1.0 is based on RDF/XML, meaning that you can read it with an RDF parser and extract statements from it. RSS 2.0 and the other big syndication format, Atom, are not RDF-based.

Pointers

This article skips over quite a few important details. If you would like to learn more about RDF, here are a few useful links:


Source: http://kianga.kcore.de/2004/12/30/rdf

Written by borort

February 25, 2008 at 1:07 pm

Posted in Semantic Web

Tagged with

Leave a Reply