Semantic Crawler Library

During my work over the last months I have written at least three times code that parses FOAF files. Always slightly different with different frameworks (Jena, Jaxen) on different platforms. Now I have decided to consolidate at least the parsing in a central library.

The library with the name Semantic Crawler or shortly scrawler should not only parse FOAF file but also the other main ontologies. It’s more a parser than a crawler.

Advantage

  • Able to skip the parsing step and use directly the class representation. Only needed the URL/IRI of the RDF file.
  • Easier use of Semantic Web Technology for Software Engineers and Programmer.
  • Easy to extend the parsing step.
  • OpenSource (as all stuff on this website)
  • More time for the essential parts of Semantic Web programs. For example: Reasoning

Interface Example

For the interface I have introduced Java Annotations to be able to generate from the object a RDF file. For this purpose there is not enough to store the raw data, but also the meta data how the sub tree has looked liked before the parsing.

A RDF file that represent the Java interface should look like the following example (only a excerpt with the data shown also in the interface):

The automatic transformation from Java Object to RDF with the help of the annotation is not written yet.

The Java Annotations describes at the moment only the classes and the properties. The implemented interface is only the first approach to handle efficiently the transformation from Java to RDF and back. The name value and the lat/long values are returned from the methods.

  • subNodeDeep:

Indicates how far the node is away from the root node (described by value). 1 (one) means that the subNode is a child, 2 (two) means that is the child of a child.

  • subNodeType/type:

The type that could be stored in this node. Possible values: RESOURCE, ROOTNODE, LITERAL, RESOURCE_LITERAL, UNDEFINED

  • ontoURI:

The URI to the ontology. Stored in the Ontolgies class.

Example for Programmers/Software Engineers

GIT WebAccess

The library is a maven project. The two important maven commands for the project are listed below:

  • mvn package

Creates the library as jar, the sources as jar and the JavaDoc as jar.

  • mvn site

Creates a website with useful information.

If you want to learn what ontologies are supported and what values you can gain from such a RDF file you should read the JavaDoc and/or look into the interface package to.networld.scrawler.interfaces. Is possible that not all interfaces have an implementation, but I am working on it.

Now to the interesting part. The following code reads out my Name from my FOAF file and prints it to the STDOUT. The following code is only an excerpt of the important part.

Looks easy? That was the intention to write a simple library that could be used in more complex and more interesting application.

Please feel free to contact me if you have question or improvements. I will try to answer or to fix it. And of course let me know in what application you use my library. And keep in mind that at the moment the best working parts is the parsing of FOAF files but I will for sure extend the library with other useful ontologies.

About Alex Oberhauser

Alex Oberhauser’s current private and professional interests are Research and Development in the area of Semantic Web and Service Oriented Architecture with the focus on the applicability of new technologies in real world scenarios. He is founder of the meta project Networld and the Crisis Information Platform Sigimera (http://www.sigimera.org).



Fork me on GitHub
Category(s): Semantic Web
Tags: , , , , , , , ,

12 Responses to Semantic Crawler Library

  1. it was very interesting to read.
    I want to quote your post in my blog. It can?
    And you et an account on Twitter?

  2. it was very interesting to read networld.to
    I want to quote your post in my blog. It can?
    And you et an account on Twitter?

  3. I would like to exchange links with your site networld.to
    Is this possible?

  4. Hats off to whoeevr wrote this up and posted it.

One Response in another blog

  1. [...] This post was mentioned on Twitter by Semantic Web Blogs, Alex Oberhauser. Alex Oberhauser said: Semantic Crawler: http://networld.to/?p=382 [...]

Leave a Reply

Your email address will not be published. Required fields are marked *

*


*

 

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>