Developer:OntologyOverview

From myExperiment
Jump to: navigation, search

Publishing knowledge to the Web in RDF

myExperiment's ability to share information is one of its key advantages when it comes to closing the loop of the experimental lifecycle. However it is important to consider the mechanisms for how this information is shared. The myExperiment website is designed for human users whereas the REST API is machine-oriented, designed for the purpose of developing new interfaces in the form of "mashups" and "gadgets". Both these interfaces can be quite monolithic and provide documents rather than knowledge. To publish knowledge it needs to be possible to “join up” data. RDF [1] provides a very simple subject-predicate-object (triple) structure for this very reason.

myExperiment publishes all its public data as RDF at http://rdf.myexperiment.org/. To make sense of this data, myExperiment also provides a meta-structure in the form of an OWL [2] ontology to formalize relationships within this data. The myExperiment ontology, http://rdf.myexperiment.org/ontologies/, reuses properties from more generic ontologies/schemas, in particular:

  • FOAF and SIOC for representing the social network
  • Creative Commons for contribution licenses.
  • Dublin Core for common metadata properties
  • OAI-ORE for representing packs and experiments

Through this reuse it is possible to make some sense of myExperiment data outside its domain, allowing data from different sources to be collated.

Ontology Modules Architecture

Instead of writing as a single monolith, the myExperiment ontology is a built as a set of modules that can be assembled to provide a comprehensive representation. There is an initial base module to define and reconcile basic terms for content management, object annotation and social networking. On top of this there are a number of modules that relate to specific aspects of the ontology, (types of contribution, types of annotation, creditation and attribution, usage statistics, packs, experiments and workflow components). A final module performs the assembly using the OWL's import property and adds the most specific terms.

Modularizing the myExperiment ontology makes it less restrictive and more suitable for reuse, allowing analogous projects (see section 5) to map their data in a very similar way. Significant effort is currently being given to how to represent experiments and the data they produce in such a way that their insights can be shared across multiple fields. The Scientific Discourse subgroup of the W3C's Health Care and Life Sciences (http://esw.w3.org/topic/HCLSIG/SWANSIOC) has been considering how to reconcile a number of ontologies, including the myExperiment ontology, that treat experiments as first class objects.

Finding knowledge

SPARQL (http://www.w3.org/TR/rdf-sparql-query/) is a query language that facilitates the “joining up” of RDF data to find knowledge. A SPARQL endpoint is an interface from which to perform such queries. Many well-known knowledge bases including Wikipedia [3], CIA World Factbook and the Gene Ontology Database provide a SPARQL endpoint to their data [4].

myExperiment's SPARQL endpoint, http://rdf.myexperiment.org/sparql, allows queries to performed across all its RDF data. SPARQL's flexible nature, (it essentially just maps networks where one or more of the nodes or links are unknown), allows anything from simple queries, comparable to REST API calls, to much more complex bespoke queries. E.g. myExperiment's RDF provides a listing of components (sources, sinks, processors and links) of Taverna 1 workflows. It is possible to construct a SPARQL query to represent the interlinking of these components in a specific user-defined way, allowing workflows to be found that are tailored to a particular person's requirements.