ExperimentObjects

From myExperiment
Jump to: navigation, search

Encapsulated myExperiment objects (a.k.a Research Objects)

We've focused first on Taverna workflows because that's the community we're engaging with and workflows are what makes myExperiment more than other social networking sites. But really - like it says on the tin - myExperiment is about experiments. Which means not just workflows but all sorts of associated stuff in various forms in various places.

  • We know that workflows aren't standalone but that when people want to upload them they'll want to associate other stuff too (data, documentation, pointers to web sites, powerpoints, youtube videos...)
  • We know that workflows aren't singular items - we'll need some kind of grouping structure above workflows, like workflow packs.
  • When people run a workflow they have inputs and get products (and a service invocation log which provides data for provenance functionality) - and they may execute a workflow many times
  • In our chemistry use case there are items in the physical world to link in too (like samples).

We could work with workflows, and have other stuff hanging off them. Or we could work with experiment objects which include workflows. We call these Encapsulated myExperiment Objects or EMOs. So, can we define an EMO to work with in myExperiment?

What follows is the results of our discussions about EMOs...

Background Reading

Features

  • EMOs are the basic unit of sharing in myExperiment
  • In their lifecycle, EMOs can include online and offline (local) data
  • An EMO is a workflow together with the information needed to make it as reusable as possible; e.g. documentation, descriptions of services, data, provenance, validation, authentication, tags
  • EMOs have a number of standard operations defined on them and others can be added
  • EMOs support core metadata elements so that they can be published and can be described in OAI-HP
  • EMOs support EMO lifecycle/provenance/versioning
  • EMOs may have multiple representations and visualisations, including for example a web page, a folder or a tarball
  • They can be indexed by Google
  • They can be linked to
  • EMOs are easy to use across Web 2.0 interfaces
    • RESTFful, comfortable in AJAX, JSON etc
    • Can be used in mashups
  • EMOs are extensible so they can evolve with use, informing the evolving EMO definition
  • EMOs relate directly to Research Objects being developed in SharePoint (by Jits)
  • The EMO metamodel design is being conducted in the context of science ontologies (e.g. EXPO)

Challenges

  • What happens when the parts are scattered across multiple stores?
  • What happens if someone updates a part?
  • How will my EMO be discovered on the Web?
  • How can I work with an EMO offline?
  • What is the provenance of the EMO and its parts?
  • What happens if a part is unavailable?
  • How do I send an EMO by email?
  • Can I turn an EMO into a tarball?
  • Can I archive an EMO to a CDROM?
  • If I delete this file will it break anyone’s EMOs?
  • How do I trust an EMO?
  • How do I handle an EMO RESTfully?
  • Can my EMO link to objects outside the EMO?

Approach

  • Designed for compatibility with Linked Data and with Open Archives Initiative – Object Reuse and Exchange (OAI-ORE) which deals with compound object information and aims to build standardised and interoperable mechanisms
  • An EMO file is a Resource Map describing all the distinct parts contained in the EMO and links elsewhere
  • It’s an open, extensible format using RDF
  • It supports provenance functionality & versioning
  • There can be local offline copies of EMO parts
  • Content metadata includes hashes for checking EMO integrity
  • EMOs map to the familiar folders and files interface

Although we aim for OAI-ORE compatibility, there may be multiple serialisations. For example, an EMO might be represented in RSS, ATOM, XHTML (thus be searchable) using a microformat, RDFa or GRDDL and with links to data objects which may in turn be remote or encapsulated with the EMO. Also see POWDER.

Proposal

After much discussion in chats and on Dave's whiteboard in July, we have chosen a useful point in the design space which we propose to explore further by trying it out. This is described here. The reason we chose this over many alternatives is that - like all of them - it raises a set of issues, but with this proposal the issues coincide with the purpose of myExperiment.

The basic idea is that everything lives in its native formats and the only new thing we introduce is the EMO file itself, which is simply a "Resource Map" file which contains URIs for all the objects to be included in that EMO, and for each URI it states the relationship to the EMO. This could be represented in XML, for example.

Some of the entities used by myExperiment will need to be represented and pointed to in this way, such as tag clouds, people and projects.

A minimal core set of relationships will be defined. These can draw upon

And we could add something like

A Resource Map is similar to OAI-ORE but provides:

  • Alternatives for URIs in case the original is not available
  • A hash (e.g. SHA1, MD5 or some other ID) to confirm the version
  • A pointer to a local (offline) copy on the user's local filesystem

In outline...

EMO description
EMO provenance
EMO attribution
URI, Alternative URI, local copy, version information e.g. hash
URI, Alternative URI, local copy, version information e.g. hash
URI, Alternative URI, local copy, version information e.g. hash
URI, Alternative URI, local copy, version information e.g. hash
RDF Graph (OAI-ORE)

An EMO file does 'not' contain any information relating to user identification, access rights etc - this is between the user, the web servers and the local filesystem

Interface

EMOs can use a familiar folder and files interface. For example, a number of objects can be dragged into an EMO folder (and the relationships will be default).

The interface could also enable typed links between objects. These can be created automatically or manually (by clicking on the objects and choosing a relationship from a drop down menu). This interface would also enable you to drag objects from one EMO to another, or to a folder, desktop etc. It may be possible for the relationships to be inferred.

Notes

  • When you upload a workflow, myExperiment could create the EMO file automatically. It would contain a URI to the uploaded workflow and other metadata.
  • Data security is separate to EMO security. The EMO file can be mailed around, stored, pointed at etc and is valid as long as the URIs inside it are valid to the person using it. So if you send an EMO to someone who doesn't have access to your data, they can't get at your data. Same as sending emails with URIs.
  • We need to think about the lifecycle of EMOs all the way from the laptop to the digital library - myExperiment supports this lifecycle. On the laptop you could simply have a folder which you upload to myExperiment - indeed, all the way through the lifecycle it might look like folder, it's just the that data may be elsewhere.
  • URIs can point at local files, so it is meaningful to have an EMO file pointing at objects in the local file system. We have thought about transport, archiving etc; e.g. you could make a tarball out of an EMO, or send an EMO file along with some data. Of course the problem is to keep everything pointing at the right things with the right visibility, but then that's exactly what myExperiment is for.
  • Under this model, if you have a workflow and want to find the associated EMOs you need to ask myExperiment - i.e. myExperiment works as a kind of EMO registry. It could also provide services for matching based on workflow features (a sophisticated area and something that should be part of myExperiment's added value).
  • Yes, you can drop this almost directly into an RDF triplestore, which could also carry info about who owns what, who can execute etc etc - the other aspects of myExperiment (people, projects, tags).

Trying it out

We can implement this but we need real examples of content in order to try it out. Are there Web pages out there which have everything we need? If you have what we need please get in touch! We especially need an example that takes us into servcies and service descriptions, because myExperiment hasn't gone there sufficiently yet.

Acknowledgements

Thanks to Jits and Simon Coles.