How to create data sharing and collaboration services in kubernetes environments

May 16, 2022


Introduction

Customer requirements are constantly changing and the digital transformation of any company requires not only support for cultural aspects in the collaboration between departments and colleagues, but also the adaptation of processes, modernisation of programmes and the management of company-related data. In my article “How to create a data pipeline for Next Generation Sequencing”, I described how it is possible to create one or more data lake(s) across departments and be aware of the benefits it brings in terms of data security and availability.

Link: https://open011prod.wpengine.com/2020/12/14/industry-use-case-next-generation-sequencing-for-a-distributed-data-pipeline/

In addition to the central data exchange, there is of course also the collegial collaboration on various files. These include office suites, project-specific files and also the internal and external release of these, so that end customers or service providers can be quickly and securely included in the exchange of information. The ways of working differ massively in the various sectors. Here are a few rough examples of how the use of data exchange differs:

Media & Entertainment: Playlists based on XML files in the kilobyte range or also video segments up to entire videos from 1 – 200 gigabytes via USB hard disks

Research und Development: Research data in the 1 – 4 gigabytes range

Public service: Microsoft Word/Excel or LibreOffice file from kilobyte to megabyte

Manufacturing: Quality assurance based on production images in the megabyte area

Use Case: Data Sharing and Collaboration in kubernetes environments

In the following two chapters I will talk about the company Nextcloud and how they provide their applications as containers and how the current cloud-native k8s (Kubernetes) state is. There are many examples based on Docker / Podman, some of you might use it at home to synchronize files or share calendars, because you can’t or don’t always want to use a cloud service. One of the most important aspects is the data protection and the costs of cloud services.

Nextcloud

What is Nextcloud?

Nextcloud is a widely used open source software solution for private and public clouds. The focus is on data exchange and processing inside and outside a company. With this solution, the customer gets control over his data by ensuring secure operation in closed or publicly restricted corporate environments through encryption (end-to-end encryption on the client side), file access control, auditing mechanisms or even release controls. Compliance, security and transparency are priorities for Nextcloud. You can read more about this at https://opensource.com/tags/nextcloud.

How does Nextcloud work as a company?

Nextcloud, like many other companies, relies on a sustainable and future-proof subscription model. Such a subscription includes direct and unlimited access to their engineers. The amount of subscriptions depends on the number of users.

How engaged is Nextcloud in the community as an Open Source company?

The community is the mainstay for advancing Nextcloud and implementing customer requirements in the long term. There are many ways to be part of this and other communities. You can help with compiling, bugfixing, enhancements, designing the interface for better UX (user experience), testing or helping others with deployment. Get, Pull and Go > https://github.com/nextcloud

How could a sync & share stack based on OpenShift / kubernetes (k8s) look like?

Requirements & Recommendations:

  • 2 to 4 applications servers
    • LDAP read only slaves on each application server 
  • 2 DB servers (i.e. MariaDB, PostgreSQL)
  • 1 Load Balancer (HAProxy)
  • S3 storage server as needed for all the user created data
  • Compute with low latency storage is recommended

Architecture design for enterprise customers with different locations and a different number of employees could look like the following picture:

Necessity of persistent storage? Long-term consideration of object storage

Since we are working with data that is exchanged between departments and many people, it must always be available. This availability must be designed in such a way that applications such as database servers replicate themselves and are ideally operated in a master/slave concept. The underlying hardware, which provides for compute and (internal) storage, for example, must be designed with corresponding redundancy. Databases require a low latency to ensure a fast response time, which is nowadays exclusively realized with flash / SSD or hybrid in-memory storage. Since not every IO has to reach the storage for read operations, corresponding caches such as Redis are also deployed and connected.

Redundancy also includes the availability of data across multiple locations, regardless of where the data is created and stored. Such a construct cannot be managed with classic storage systems in a size of petabytes, which is why object storage for the pure data is ideal here. An object provides an infinitely scalable storage pool, usually with S3 or Swift as the protocol interface. Nextcloud itself allows object storage for primary data storage and uses one or the other API compared to traditional file servers or file systems.

https://docs.nextcloud.com/server/latest/admin_manual/configuration_files/primary_storage.html

How to deploy Nextcloud in OpenShift or OKD?

There is a basic framework of how Nextcloud is usually deployed, but that is based on bare metal or container based on podman / docker. Most examples exist for Docker and OpenShift 3.x.

Images for Docker / Podman are the foundation to be deployed in kubernetes / k8s. But I would like to deploy it not only on plain kubernetes, but rather on OpenShift or OKD, which is used in many highly secure and (partly) closed environments. For example, it is commonly used in finance and insurance. The ecosystem around OpenShift allows customers not only to build a deployment according to the principle “set and forget it”, but grants strategic large-scale projects in a complete CI/CD pipeline with GitOps approach. This results in a complete standardized life cycle concept with a continuous approach to act also on short release changes, i.e. in case of an important common vulnerabilities and exposures (CVE).

As it stands, there is unfortunately no recommended way to deploy Nextcloud scalable and highly available on OpenShift 4.x. Consultants are needed to modify the image and build a solution based on customer requirements.

What is the reason for this?

Nextcloud forces the deployment with its own image on port 80 and runs with root privileges, which is not allowed to be used under OpenShift by default. One of the principles of OpenShift is to provide security and you cannot run Pods with root privileges. This poses a problem if you use the Nextcloud image on DockerHub, as it uses port 80 for the Apache server. All ports below 1024 are considered privileged ports, so the image will cause a CrashLoopBackOff error as the first thing it tries to do after startup is bind to port 80.

There are queries about this in the Nextcloud forum and at GitHub:

https://help.nextcloud.com/t/how-to-change-apache-port-for-openshift-4-x-deployment/

https://github.com/nextcloud/docker/issues/760

If NextCloud provides in one of the next releases changes to address this point then DEPLOYMENT STEPS for Nextcloud on OpenShift with OpenShift Data Foundation can look like as following. On the other hand, a custom image is unavoidable.

Draft files for initial deployment:

https://github.com/mschindl/nextcloud-on-ocp4

Note: Further changes will be made in collaboration with the Community from Nextcloud and will be shared in another post.

1. Create the persistent volume claim (PVC) for Apache files and database:

# oc create -f nextcloud-shared-pvc.yaml

https://github.com/mschindl/nextcloud-on-ocp4/blob/main/nextcloud-shared-pvc.yaml

2. Create a secret called nextcloud-db-secret for the db with the variables and content:

MYSQL_ROOT_PASSWORD

MYSQL_USER

MYSQL_PASSWORD

! Note capitalisation and the underscore !

# oc create -f secret-db.yaml

MYSQL_ROOT_PASSWORD:  # Test1234

MYSQL_USER:  # testadm

MYSQL_PASSWORD # Test1234

Decrypted passwords

https://github.com/mschindl/nextcloud-on-ocp4/blob/main/nextcloud-secret-db.yaml

3. Create the Mariadb Pod using the file: nextcloud-db.yaml

# oc create -f nextcloud-db.yaml

https://github.com/mschindl/nextcloud-on-ocp4/blob/main/nextcloud-db.yaml

4. Create the Webserver Pod using the file: nextcloud-server.yaml

# oc create -f nextcloud-server.yaml

https://github.com/mschindl/nextcloud-on-ocp4/blob/main/nextcloud-server.yaml

5. After that, create the routes/ingresses to the webserver on port: 8443

6. Last step is access the address where Nextcloud is running (defined in the ingress/route) and configure the admin user and password as much as MySQL as database with the information stored on the secret with the variables above  (MYSQL_ROOT_PASSWORD, MYSQL_USER and MYSQL_PASSWORD) after that you should be able to start using the environment.

Since the deployment with the original Nextcloud image from DockerHub does not work smoothly in OpenShift at the moment, I have added a short tutorial based on Podman. This one is for private use and will not fulfil enterprise customers requirements.

Deployment example via Podman on RHEL8

Prepare network

# sudo vi /etc/containers/registries.conf

unqualified-search-registries = ["registry.fedoraproject.org", "registry.access.redhat.com", "registry.centos.org", "docker.io"]Code language: JavaScript (javascript)

# podman network create host_local 

# podman network ls

/etc/cni/net.d/host_local.conflist

{
   "cniVersion": "0.4.0",
   "name": "host_local",
   "plugins": [
      {
         "type": "macvlan",
         "master": "enp2s0",
         "ipam": {
            "type": "host-local",
            "ranges": [
                [
                    {
                        "subnet": "10.1.1.0/24",
                        "rangeStart": "10.1.1.212",
                        "rangeEnd": "10.1.1.214",
                        "gateway": "10.1.1.254" 
                    }
                ]
            ],
            "routes": [
                {"dst": "0.0.0.0/0"}
            ]
         }
      },
      {
         "type": "tuning",
         "capabilities": {
            "mac": true
         }
      }
   ]
}Code language: JSON / JSON with Comments (json)

Create Container with Podman

Database (MariaDB)
locpath=/PersistentStorage/nextcloud-db
mkdir -p $locpath

podman run -d \
-e MYSQL_ROOT_HOST='%' \
-e MYSQL_ROOT_PASSWORD=root_pw \
-e MYSQL_USER=db_user \
-e MYSQL_PASSWORD=db_pw \
-e MYSQL_DATABASE=mysql \
-e TZ='Europe/Berlin' \
-v $locpath:/var/lib/mysql \
--net host_local \
--restart unless-stopped \
--name nc-db \
mariadb:10.5.12Code language: PHP (php)
Server (Nextcloud and Apache)
locpath=/PersistentStorage/nextcloud-data/var/www/html  
mkdir -p $locpath  

podman run -d \
-p 8888:80 \
-e TZ='Europe/Berlin' \
-e NEXTCLOUD_ADMIN_USER=admin \
-e NEXTCLOUD_ADMIN_PASSWORD=admin_pw \
-e MYSQL_DATABASE=mysql \
-e MYSQL_USER=root \
-e MYSQL_PASSWORD=root_pw \
-e MYSQL_HOST=nc-db \
-e NEXTCLOUD_TRUSTED_DOMAINS=0.0.0.0 \
-v $locpath:/var/www/html \
-v /etc/localtime:/etc/localtime:ro \
--net host_local \
--restart unless-stopped \
--name nc \
nextcloud:latestCode language: JavaScript (javascript)

Any issues while deployment?

Check log files with

# podman logs -f nextcloud-db

# podman logs -f nextcloud

Access the address where Nextcloud is running (<server/IP>:8888) and configure the admin user and password as much as MySQL as database.

Conclusion

Nextcloud is a great sync and share, as well as collaborative working solution, but in the area of deployment method and documentation under kubernetes or OpenShift it is unfortunately still lacking. Here it is important to work actively with the community, as the company itself does little to change this.

As it stands, it is necessary to build a custom image for your OpenShift and ideally integrate it into the GitOps approach through a fully automated pipeline.

One reply on “How to create data sharing and collaboration services in kubernetes environments”

Leave a Reply

close

Subscribe to our newsletter.

Please select all the ways you would like to hear from Open Sourcerers:

You can unsubscribe at any time by clicking the link in the footer of our emails. For information about our privacy practices, please visit our website.

We use Mailchimp as our newsletter platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices here.