Skip to content

Versioning metadata and/or files? #19

@LukasKalbertodt

Description

@LukasKalbertodt

Opencast currently has versioning for files, kind of. So the question came up whether we want it in the new data model or not. When we talked about this in the architecture meeting once, no one had amazing arguments/use cases for the feature, except one:

Data recovery after humans errors, bugs or hardware failure. Of course, everyone should have DB backups, but a full DB backup is large and thus they won't be created super regularly. And even if there is a suitable backup, it's usually infeasible to use it in a selective fashion, i.e. only restore one event. DB backups are rather for catastrophic failure situations where you want to roll back the whole DB. A versioning feature inside Opencast could potentially allow for fine-grained data recovery and a more complete "backup" (e.g. accidentally deleting a video that was uploaded the same day).

Of course, file and metadata are completely different beasts: keeping old file versions around seriously eats into your storage budget, while keeping metadata around eats very few server resources but is likely more complex to implement.

Files

Due to the large file size, admins usually don't want to keep them around for too long. So Opencast should support tools to automatically clean old versions based on rules (file age, file size, storage usage, ...). Depending on our file system discussion, we might just never delete files, but we have a garbage collector that prunes files that are not referenced by anything.

Metadata

(To be clear: in this discussion, metadata is all data about Opencast items that are not files, and thus includes ACLs and more.)

This is mostly a question about implementation and if it's worth the complexity. From a software architecture viewpoint, I think it's important that metadata versioning is implemented mostly as an independent layer on top of the core Opencast logic. I don't want this to seep through every piece of code dealing with metadata, but it should be ideally be handled at one central place. One idea would be to have DB triggers (in the DBMS itself or as an ORM layer in code) that, whenever some data is modified, writes the old data to a separate table. Or alternatively, write the change set to a separate table.


Having written this all down, it occured to me that this issue very likely does not need to be discussed as part of the core data model! We want this to be a thing on top anyway and we won't expose this versioning information in a new API anytime soon. It's an admin tool, at least for the foreseeable future. So maybe this is rather low priority.

Previous discussion:

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussA discussion issue: we need to decide how to handle a specific thing in the new data model.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions