Developer:Contribution Model

From myExperiment

Jump to: navigation, search

Contribution Model

By Sergejs Aleksejevs, 12th August 2008


Contents


Contibutors, Contributions and Contibutables

A contribution is a generalised container for all specialised resources - contributables [1] - and metadata about them that can be stored within myExperiment. The idea is similar to how the subclasses extend the superclass in OO programming. When a contributable is first uploaded to the system by a contributor [2], a contribution object is created around the contributable to abstract it. This way we have two different levels, at which we can look at the resources within the system: high level of abstraction, where we can list all the contributions with some limited details (without the need to display all available specific metadata about particular types of contributables); and low level, where we can visit each specific contributable and retrieve all stored metadata for it.


[1] - At the moment system allows the following resources to act as contributables:

   * Workflows
   * Files
   * Packs

[2] - The backend of the system supports various types of contributors (potentially, any actor in the system can be contributor - including users, groups, etc; this makes it possible to take record of such things as, for example, a blog being contributed by a group). The only type of contributors which is used in the system at the moment is a 'user'.

 Note: the uploader might not be the original creator/author of the resource - see section Credits and Attributions below.


Structural Constraints

There is a set of structural constraints in this model:

  • For contributions:
    • A contribution has only one contributable;
    • It is an object in the backend database containing all the necessary details about the original submission of a resource:
      • type and id of the contributor;
      • type and id of the contributable that this contribution abstracts;
      • submission date;
      • policy (see Ownership, Sharing and Permissions);
      • download and viewing counts.
  • For contributables:
    • A contributable belongs to one contributor;
    • A contributable has only one contribution (therefore, we have 1:1 relationship between contributions and contributables);
    • There is no separate table for contributables in the backend database - instead, contributable id and type are stored within each entry in contributions table; and the actual specific data for each contributable is kept in different tables named by contributable type: i.e. workflows, blobs (standing for files), etc.
    • Only some types of resources can act as contributables - these are listed in [1] above;
  • For contributors:


Design Decisions

Original Relationship Schema

Originally, the system was designed to have the structure as in the image below. This allowed for the following idea: after being updated, the resource is re-uploaded by a new contributor - therefore, it should be possible to have information about the last uploader (of the current version of the resource), as well to keep record of to the original uploader of the resource.

Image:Original_Design.png


Current Design Problem

At the moment there is a problem with the design (see image below). When an updated version of the resource is uploaded, the system would still point to the original uploader as they would be the last person to edit the resource. This makes it impossible to get information of the real contributor who has made most recent changes to the contributable. Which, in turn, introduces an issue that original contributor would be accounted for the most recent changes even though they might have nothing to do with them.

Image:Current_Design.png


Data Structure in More Detail

Looking at the relationships between contributors, contributions and contributables from anouther point of view, it is worth explaining the way the data is stored and used internally. All contributables are split into different types - like workflows, files, etc. For each contributable the system also stores the respective contribution as an entry point for further querying the system about the current contributable.


Contributions table in the database links together all the information about the contributors and contributables, making it possible to refer to any particular contributable by a single ID (Contributions.id). This table stores contributable type (this saves from having a separate Contributables table) and a relevant ID within that type as well - due to this from an entry in the Contributions table we can track down the specific record in the relevant table containing information about a particular type of resources.


Consider the following scenario: there is a need to add a review for a contributable, which is already existent in the system. There are two ways to link the review with the parent contributable - either by supplying the Contributions.id value, or a pair of values {Contributable type and ID within that type}. The scenario makes it clear that there is some data redundancy. However, this also introduces some flexibility. For instance, when it is needed to search for all contributables by a particular user we can query just one table - Contributions, instead of going through all the various types of resources separately one-by-one and searching for records about the required user. On the other hand, in some cases it might be useful to refer to a particular contributable directly (by type and id), rather than tracking towards the relevant resource all the way from Contributions table.

Image:DataStructure.png


Versioning Support

Some types of contributables have inherent properties of being updated over time. That is we come across a notion of versioning of the resource. It is useful for users to have ability to see how the versions have been developed from the initial submission till the latest state. Therefore, there is a need to store the current and all previous versions of the resource. This concept is currently implemented only for workflows (and is explained in more detail below), but we also expect to introduce versioning of files.

The diagram below shows how versioning of workflows is implemented in myExperiment at the moment. Two tables in the DB are used for this purpose: workflows and workflow_versions. These tables have very similar same set of fields, apart from workflow_versions table storing id of a workflow (as a foreign key into workflows table) and its version number as well as textual comments for a particular version. These tables operate in sync: workflow_versions tables stores all versions of the workflow (starting with first); workflow table keeps record of the latest version of that particular workflow. Once a new version of a workflow is uploaded to the system, entry in workflow table is replaced with the new data and also appropriate entry is added to workflow_versions table.

For completeness, it is also worth to present the whole picture of work with workflows. Returning back to the concept of contributions and contributables we have the following: an entry for each workflow in contributions table states that its contributable_type is a workflow with the id for workflows table provided as value of contributable_id field. The actual contributable is stored in workflows table and all its versions in workflow_versions table - just as explained in the paragraph above.

  Note: myExperiment uses custom-made explicit_vesioning library to drive the versioning mechanism described above.
(This library can be found in "\lib\explicit_versioning.rb")

Image:Workflows_and_Workflow_Versions.png


Implications

Having such model in place, myExperiment codebase currently provides the following features in relation with contributors and contributables:

  • searchng for all the contributions by particular contributor;
  • checking whether a user is authorized to perform particular actions (see Ownership, Sharing and Permissions about policies and permissions);
  • checking whether a user belongs to 'public' or 'protected' user group (see Ownership, Sharing and Permissions about policies and permissions);
  • getting original contributor (the user, who originally uploaded the initial version of the contributable) for a particular contributable;
  • getting the last contributor to update a particular contributable (currently not possible, see details about this problem);


Credits and Attributions

When user posts a new resource on myExperiment website, it is possible to make note of:

  • Credits - to mention people or groups that should get the credit (if they helped making the resource, or if they are authors of the original version of it, which is now updated by the current contributor);
  • Attributions - to refer to other resources (like files or workflows) that have been used during the creation of the current resource (or it's current updated version).
Personal tools