-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Reasons we want to have multi tenancy:
The main argument is that we want to share resources between "multiple Opencasts". Especially if someone wants to offer Opencast to many small institutions, setting up a separate installation for each is extremely wasteful. For one, Opencast is quite a heavy application, so running it multiple times would waste lots resources, most importantly system memory. Second, we want different institutions to share workers, to increase both worker utilization and processing speed for individual tenants.
In this discussion I first mentioned Kubernetes as silver bullet to make all that happen, but Opencast today is woefully inadequate to make that happen. It has slow startup/shutdown, uses lots of memory, workers cannot be shared, upgrades are tricky, and more. Yes ideally, most of these should be fixed, but that's a huge undertaking, even larger than data model change. So I think we need to offer some kind of multi-tenancy solution in OC itself.
Problems with the current implementation
- Opencast's implementation of multi-tenancy is not great:
- Isolation between tenants is sometimes broken.
- All code in Opencast has to separately handle multi-tenancy, making it more complex and error-prone. The way it is implemented, the complexity spreads through the whole code base.
- It's not uncommon that code paths forget to deal with multi-tenancy.
- Support for tenant-specific config options need to be implemented manually instead of just being able to configure (almost) anything per tenant automatically.
- Often breaks: due to the previously mentioned problems, the low number of users and the fact that hardly any developer tests stuff in multi-tenancy mode, Opencast updates for multi-tenancy installation are spicy! There is a decent chance something breaks in multi-tenancy.
Ideas how to improve the situation
My main idea is to "lift up" the multitenancy to a higher level, instead of having it "at the bottom of each code". For example, right now, events have an organization column in the database. Instead, I would suggest having one database (not DBMS!) per tenant. The OC process would simple connect to multiple DBs then (which is not a problem). This would very cleanly separate per-tenant data. Most OC code then also be written as if there was only one tenant: it would always run in the context of a single tenant. Only very little code needs to actually care about multi-tenancy then. Files on the file system would be separated by tenant as well, disallowing links between these files. Ideally, the upmost file system level has one folder per tenant.
That should fix or improve basically all current problems.
But of course there are lots of details to figure out here (not to mention: decide whether this approach is actually good). Hence this issue.
Previous discussions: