Skip to content

Multi-tenancy #10

@LukasKalbertodt

Description

@LukasKalbertodt

Reasons we want to have multi tenancy:

The main argument is that we want to share resources between "multiple Opencasts". Especially if someone wants to offer Opencast to many small institutions, setting up a separate installation for each is extremely wasteful. For one, Opencast is quite a heavy application, so running it multiple times would waste lots resources, most importantly system memory. Second, we want different institutions to share workers, to increase both worker utilization and processing speed for individual tenants.

In this discussion I first mentioned Kubernetes as silver bullet to make all that happen, but Opencast today is woefully inadequate to make that happen. It has slow startup/shutdown, uses lots of memory, workers cannot be shared, upgrades are tricky, and more. Yes ideally, most of these should be fixed, but that's a huge undertaking, even larger than data model change. So I think we need to offer some kind of multi-tenancy solution in OC itself.

Problems with the current implementation

  • Opencast's implementation of multi-tenancy is not great:
    • Isolation between tenants is sometimes broken.
    • All code in Opencast has to separately handle multi-tenancy, making it more complex and error-prone. The way it is implemented, the complexity spreads through the whole code base.
    • It's not uncommon that code paths forget to deal with multi-tenancy.
    • Support for tenant-specific config options need to be implemented manually instead of just being able to configure (almost) anything per tenant automatically.
  • Often breaks: due to the previously mentioned problems, the low number of users and the fact that hardly any developer tests stuff in multi-tenancy mode, Opencast updates for multi-tenancy installation are spicy! There is a decent chance something breaks in multi-tenancy.

Ideas how to improve the situation

My main idea is to "lift up" the multitenancy to a higher level, instead of having it "at the bottom of each code". For example, right now, events have an organization column in the database. Instead, I would suggest having one database (not DBMS!) per tenant. The OC process would simple connect to multiple DBs then (which is not a problem). This would very cleanly separate per-tenant data. Most OC code then also be written as if there was only one tenant: it would always run in the context of a single tenant. Only very little code needs to actually care about multi-tenancy then. Files on the file system would be separated by tenant as well, disallowing links between these files. Ideally, the upmost file system level has one folder per tenant.

That should fix or improve basically all current problems.

But of course there are lots of details to figure out here (not to mention: decide whether this approach is actually good). Hence this issue.


Previous discussions:

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussA discussion issue: we need to decide how to handle a specific thing in the new data model.needs-researchSomeone needs to sit down and collect some information to continue this discussion

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions