Installation

This chapter describes the installation of each application.

Installing Kuha Document Store

This guide will provide step-by-step instructions in installing Kuha Document Store and a MongoDB database. Operating system used in this guide is Ubuntu 20.04, but other modern Linux variants may be used.

If you only need to install the Python package, see Install Document Store. If you wish to upgrade an existing Document Store install, see Upgrade Document Store.

Note

While this manual provides step-by-step instructions for MongoDB installation, always refer to the official MongoDB manual for proper installation procedure.

In this guide the installation of the database is done on a separate server. However, Document Store and MongoDB may be installed on the same server.

There are multiple ways to setup the MongoDB service. This guide describes two alternative setups. Refer to Install MongoDB replicaset and Setup MongoDB Replicaset for Document Store to setup Document Store with MongoDB replicas. For a more straightforward setup see Install Standalone MongoDB Instance and Setup Standalone MongoDB for Document Store. Please note that the installation guides are meant to be used as examples.

It is recommended to use the latest version of MongoDB which can be obtained from MongoDB’s own repository. Refer to MongoDB manual on how to install MongoDB to your operating system. At the time of writing the installation to Ubuntu 20.04 was done as follows.

Install MongoDB replicaset

Note

These actions should be done on the MongoDB server.

This is an example setup with a single virtual machine containing three MongoDB replicas. This example instructs to use the following configuration parameters for replicas:

Replica

Port

Configuration file

Log file

Database path

r1

27017

/etc/mongodb/r1.conf

/var/lib/mongodb/r1.log

/var/lib/mongodb/r1

r2

27018

/etc/mongodb/r2.conf

/var/lib/mongodb/r2.log

/var/lib/mongodb/r2

r3

27019

/etc/mongodb/r3.conf

/var/lib/mongodb/r3.log

/var/lib/mongodb/r3

The replicaset will be configured to use the name rs_kuha. Keyfile for replica authorization will be stored to /var/lib/mongodb/auth_key.

Replica services will be controlled by SystemD. Unit definitions will be stored to /etc/systemd/system/. Each replica will have its own dedicated unit file mongod_r1.service, mongod_r2.service and mongod_r3.service.

  1. Obtain MongoDB public key.

wget -qO - https://www.mongodb.org/static/pgp/server-5.0.asc | sudo apt-key add -
  1. Add MongoDB source.

echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/5.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-5.0.list
  1. Update indexes and install.

sudo apt-get update && sudo apt-get install -y mongodb-org
  1. Create directories for replica data.

sudo mkdir /var/lib/mongodb/{r1,r2,r3}
sudo chown mongodb:mongodb /var/lib/mongodb/{r1,r2,r3}
sudo chmod 0755 /var/lib/mongodb/{r1,r2,r3}
  1. Configure a single MongoDB instance to use r1 replica data directory.

sudo sed -i 's#dbPath: /var/lib/mongodb#dbPath: /var/lib/mongodb/r1#' /etc/mongod.conf
  1. Start MongoDB. Leave it running in this terminal.

sudo -u mongodb /usr/bin/mongod --config /etc/mongod.conf
  1. Open another terminal to create rootadmin user using the mongo shell. Replace <user> and <password> with proper credentials. Replace <mongodb-ip> with the IP of your mongodb server.. Close the terminal after completing this step.

mongo <mongodb-ip>
use admin
db.createUser({user: <user>, pwd: <password>, roles: [{role: 'root', db: 'admin'}]})
exit
  1. Stop MongoDB by pressing CTRL+C in the terminal running MongoDB.

  2. Create directory for mongodb replica configuration.

sudo mkdir /etc/mongodb
sudo chmod 0755 /etc/mongodb
  1. Generate keyfile for replica authentication.

sudo openssl rand -base64 756 | sudo tee /var/lib/mongodb/auth_key
sudo chown mongodb:mongodb /var/lib/mongodb/auth_key
sudo chmod 0600 /var/lib/mongodb/auth_key
  1. Configure replicas. Below is an example for /etc/mongodb/r1.conf, which instructs that the replica belongs to a replicaset named rs_kuha. Create similar configurations for other replicas as well (r2.conf, r3.conf). Note that each replica must use a distinct port when serving from a single host. You may use 27018 for r2, and 27019 for r3.

storage:
  dbPath: /var/lib/mongodb/r1
  journal:
    enabled: true

systemLog:
  destination: file
  logAppend: true
  path: /var/lib/mongodb/r1.log

net:
  port: 27017
  bindIp: 0.0.0.0

processManagement:
  timeZoneInfo: /usr/share/zoneinfo

security:
  authorization: enabled
  keyFile: /var/lib/mongodb/auth_key

replication:
  replSetName: rs_kuha
  1. Ensure permissions.

sudo chmod 0644 /etc/mongodb/{r1,r2,r3}.conf
  1. Create systemd units for replicas. Below is an example for /etc/systemd/system/mongod_r1.service. Create similar units for other replicas as well (mongod_r2.service, mongod_r3.service). Note that the ExecStart directive must point to the correct replica configuration file and each unit must have a distinct PIDFile path.

[Unit]
Description=MongoDB Database Server
Documentation=https://docs.mongodb.org/manual
After=network.target

[Service]
Type=simple
User=mongodb
Group=mongodb
ExecStart=/usr/bin/mongod --config /etc/mongodb/r1.conf
Restart=always
PIDFile=/var/run/mongodb/mongod_r1.pid
# file size
LimitFSIZE=infinity
# cpu time
LimitCPU=infinity
# virtual memory size
LimitAS=infinity
# open files
LimitNOFILE=64000
# processes/threads
LimitNPROC=64000
# locked memory
LimitMEMLOCK=infinity
# total threads (user+kernel)
TasksMax=infinity
TasksAccounting=false
# Recommended limits for mongod as specified in
# http://docs.mongodb.org/manual/reference/ulimit/#recommended-settings

[Install]
WantedBy=multi-user.target
  1. Ensure permissions.

sudo chmod 0644 /etc/systemd/system/mongod_r{1,2,3}.service
  1. Enable replica services.

sudo systemctl enable mongod_r1.service
sudo systemctl enable mongod_r2.service
sudo systemctl enable mongod_r3.service
  1. Reload systemd manager configuration.

sudo systemctl daemon-reload
  1. Start services.

sudo systemctl start mongod_r1.service
sudo systemctl start mongod_r2.service
sudo systemctl start mongod_r3.service

MongoDB replicas are now running and configured to work as a replicaset. Next step is to Install Document Store and Setup MongoDB Replicaset for Document Store

Install Standalone MongoDB Instance

Note

MongoDB manual instructs that a standalone database should not be used for production environments. Consider using a Replica Set for production and setting up the database with scripts/setup_mongodb_replicaset.sh.

Note

These actions should be done on the MongoDB server.

This guide installs a standalone instance of MongoDB, which is not recommended for production use. This guide uses the default MongoDB port 27017.

  1. Obtain MongoDB public key.

wget -qO - https://www.mongodb.org/static/pgp/server-5.0.asc | sudo apt-key add -
  1. Add MongoDB source.

echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/5.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-5.0.list
  1. Update indexes and install.

sudo apt-get update && sudo apt-get install -y mongodb-org
  1. Configure MongoDB to accept incoming connections. Use IP of your MongoDB server in <mongodb-ip>.

sudo sed -i 's/  bindIp: 127.0.0.1/  bindIp: <mongodb-ip>/' /etc/mongod.conf
  1. Start MongoDB.

sudo systemctl start mongod
  1. Create rootadmin user using the mongo shell. Replace <user> and <password> with proper credentials and <mongodb-ip> with your MongoDB server IP.

mongo <mongodb-ip>
use admin
db.createUser({user: <user>, pwd: <password>, roles: [{role: 'root', db: 'admin'}]})
exit
  1. Enable authentication.

sudo sed -i 's/#security:/security:\n  authorization: enabled/' /etc/mongod.conf
  1. Restart MongoDB.

sudo systemctl restart mongod

MongoDB instance is now running. Next step is to Install Document Store and Setup Standalone MongoDB for Document Store.

Install Document Store

Note

These actions should be done on the Document Store server.

  1. Create directory for document store and Python virtualenv.

mkdir kuha2
  1. Clone package to subdirectory.

You can clone the latest release with the following command.

git clone --depth 1 --branch releases https://gitlab.tuni.fi/fsd/kuha_document_store kuha2/kuha_document_store

Or clone a specific release by tag. Change <tag> to the release version.

git clone --depth 1 --branch <tag> https://gitlab.tuni.fi/fsd/kuha_document_store kuha2/kuha_document_store
  1. Install Python virtual environment.

sudo apt install -y python3-venv
  1. Make installation script executable.

chmod +x ./kuha2/kuha_document_store/scripts/install_kuha_document_store_virtualenv.sh
  1. Install Kuha Document Store to virtual environment.

./kuha2/kuha_document_store/scripts/install_kuha_document_store_virtualenv.sh

Upgrade Document Store

Note

These actions should be done on the Document Store server.

In order to upgrade an existing install, fetch changes to code repository, checkout a version and re-install.

  1. Change directory to package directory.

cd kuha2/kuha_document_store
  1. Fetch changes and checkout a version to upgrade to.

git fetch --all --tags
git checkout <version>
  1. Leave package directory, make installation script executable and install.

cd ../..
chmod +x ./kuha2/kuha_document_store/scripts/install_kuha_document_store_virtualenv.sh
./kuha2/kuha_document_store/scripts/install_kuha_document_store_virtualenv.sh

Setup MongoDB Replicaset for Document Store

Note

These actions should be done on the Document Store server.

Document store provides a script which will help setup MongoDB. The script will create the required collections and database users. It will also setup indexes for the collections to speed up database queries.

The script will prompt for MongoDB rootadmin credentials. You may wish to provide them via configuration options for a noninteractive setup. See --help for configuration reference.

Give hostname/IP & port of your MongoDB replicas as command line parameters. Pass in the configured replicaset as well.

Note

You may wish to provide DB credentials for editor and reader. Give parameter --help to see how.

  1. Make the setup script executable.

chmod +x ./kuha2/kuha_document_store/scripts/setup_mongodb_replicaset.sh

2. Run the MongoDB setup script. Repeat --replica for each replicaset member. Replace <replica-ip-port> with the IP & port combination of your MongoDB replicas. Replace <replicaset> with the configured replicaset.

./kuha2/kuha_document_store/scripts/setup_mongodb_replicaset.sh --replica <replica-ip-port> --replicaset <replicaset>

Now the database is ready to be used with Document Store. Care should be taken to secure the MongoDB instance. For Kuha2 the only IP that needs access to the database is Kuha Document Store’s IP.

Setup Standalone MongoDB for Document Store

Note

These actions should be done on the Document Store server.

Document store provides a script which will help setup MongoDB. Give hostname/IP of your MongoDB server as command line parameter.

The script will prompt for MongoDB rootadmin credentials. You may wish to provide them via configuration options for a noninteractive setup. See --help for configuration reference.

The script will create needed collections and database users. It will also setup indexes for the collections to speed up database queries.

Note

You may wish to provide DB credentials for editor and reader. Give parameter --help to see how.

  1. Make the setup script executable.

chmod +x ./kuha2/kuha_document_store/scripts/setup_mongodb.sh
  1. Run the MongoDB setup script. Replace <mongodb-ip> with the IP of your MongoDB server.

./kuha2/kuha_document_store/scripts/setup_mongodb.sh --replica <mongodb-ip>:27017 --replicaset ''

Now the database is ready to be used with Document Store. Care should be taken to secure the MongoDB instance. For Kuha2 the only IP that needs access to the database is Kuha Document Store’s IP.

Running the Document Store

Note

These actions should be done on the Document Store server.

  1. Make the run-script executable.

chmod +x ./kuha2/kuha_document_store/scripts/run_kuha_document_store.sh
  1. Start serving Document Store.

Connect to a replicaset. Repeat --replica option for each replicaset member. Replace <replica-ip-port> with the hostname/IP & port combination of your MongoDB replicas. Replace <replicaset> with the configured replicaset.

./kuha2/kuha_document_store/scripts/run_kuha_document_store.sh --replica=<replica-ip-port> --replicaset <replicaset>

Or connect to a standalone MongoDB instance. Replace <mongodb-ip-port> with the hostname/IP & port combination of your MongoDB server.

./kuha2/kuha_document_store/scripts/run_kuha_document_store.sh --replica=<mongodb-ip-port> --replicaset ''

Install as a service

Note

These actions should be done on the Document Store server.

SystemD is used to manage services in Ubuntu. A server application can be installed as a SystemD Unit to make it a background process that gets launched every time the operating system boots up.

This is an example of creating and enabling a SystemD Unit, which controls the Kuha Document Store server process. You must have Document Store installed before completing these steps.

  1. Enter the configuration using environment variables to runtime_env. You need to at least configure the database replicas. Replace <replica_host_port_1>, <replica_host_port_2>, <replica_host_port_3>. See the Document Store documentation for full configuration reference.

echo 'export KUHA_DS_DBREPLICAS="[<replica_host_port_1>, <replica_host_port_2>, <replica_host_port_3>]"' >> kuha2/kuha_document_store/scripts/runtime_env

Afterwards the file contents is similar to

#!/bin/bash

HERE="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

KUHA_VENV_PATH="${HERE}/../../kuha_document_store-env"
export KUHA_DS_DBREPLICAS="[<replica_host_port_1>, <replica_host_port_2>, <replica_host_port_3>]"
  1. Define a systemd unit file. Create a new file kuha_document_store.service and open it in text editor. In this example I’m using nano.

nano kuha_document_store.service
  1. Write the following contents to the file. Make sure the path in ExecStart is correct. It should point to the run_kuha_document_store.sh script. Also define a User and Group that have execute permissions to the file. Replace <path-to-script>, <user> and <group> with correct values.

[Unit]
Description=Kuha document store
After=network.target

[Service]
Type=simple
ExecStart=<path-to-script>
KillSignal=SIGINT
TimeoutStopSec=5
User=<user>
Group=<group>

[Install]
WantedBy=multi-user.target
  1. To write the contents to the file in nano, press CTRL+o. It prompts for the file name, which should be “kuha_document_store.service”. Press ENTER to confirm.

  2. To exit nano, press CTRL+x

  3. Copy the file to /etc/systemd/system folder.

sudo cp kuha_document_store.service /etc/systemd/system/kuha_document_store.service
  1. Give correct permissions and owner.

sudo chmod 0644 /etc/systemd/system/kuha_document_store.service
sudo chown root:root /etc/systemd/system/kuha_document_store.service
  1. Reload systemd unit files and configuration.

sudo systemctl daemon-reload
  1. Enable the kuha_document_store.service.

sudo systemctl enable kuha_document_store
  1. Start the kuha_document_store.service:

sudo systemctl start kuha_document_store

Now you may confirm that the service is running and listening to port 6001 (default):

curl localhost:6001/v0/studies

The Document Store is now installed as a service. It will be restarted when the server is rebooted. The start/stop/enable/disable are handled using systemctl-command. The logs can be read using the journalctl-command.

Installing Kuha OSMH Repo Handler

The operating system used in these steps is Ubuntu 16.04. Other modern Linux variants may be used.

  1. Create directory for OSMH Repo Handler and Python virtualenv.

mkdir kuha2
  1. Clone package to subdirectory.

git clone --single-branch --branch releases https://bitbucket.org/tietoarkisto/kuha_osmh_repo_handler kuha2/kuha_osmh_repo_handler
  1. Install Python virtual environment.

sudo apt install -y python3-venv
  1. Make install script executable.

chmod +x ./kuha2/kuha_osmh_repo_handler/scripts/install_kuha_osmh_repo_handler_virtualenv.sh
  1. Install Kuha OSMH Repo Handler to virtual environment.

./kuha2/kuha_osmh_repo_handler/scripts/install_kuha_osmh_repo_handler_virtualenv.sh

To run Kuha OSMH Repo Handler you need access to Kuha Document Store. First you will need to make run script executable.

chmod +x ./kuha2/kuha_osmh_repo_handler/scripts/run_kuha_osmh_repo_handler.sh

Run by calling the script. Replace <document-store-url> with the URL to the Document Store.

./kuha2/kuha_osmh_repo_handler/scripts/run_kuha_osmh_repo_handler.sh --document-store-url=<document-store-url>

Upgrade OSMH Repo Handler

In order to upgrade an existing install, fetch changes to code repository, checkout a version and re-install.

  1. Change directory to package directory.

cd kuha2/kuha_osmh_repo_handler
  1. Fetch changes and checkout a version to upgrade to.

git fetch --all --tags
git checkout <version>
  1. Leave package directory, make installation script executable and install.

cd ../..
chmod +x ./kuha2/kuha_osmh_repo_handler/scripts/install_kuha_osmh_repo_handler_virtualenv.sh
./kuha2/kuha_osmh_repo_handler/scripts/install_kuha_osmh_repo_handler_virtualenv.sh

Installing Kuha OAI-PMH Repo Handler

The operating system used in these steps is Ubuntu 16.04. Other modern Linux variants may be used.

  1. Create directory for OAI-PMH Repo Handler and Python virtualenv.

mkdir kuha2
  1. Clone package to subdirectory.

You can clone the latest release with the following command.

git clone --depth 1 --branch releases https://gitlab.tuni.fi/fsd/kuha_oai_pmh_repo_handler kuha2/kuha_oai_pmh_repo_handler

Or clone a specific release by tag. Change <tag> to the release version.

git clone --depth 1 --branch <tag> https://gitlab.tuni.fi/fsd/kuha_oai_pmh_repo_handler kuha2/kuha_oai_pmh_repo_handler
  1. Install Python virtual environment.

sudo apt install -y python3-venv
  1. Make install script executable.

chmod +x ./kuha2/kuha_oai_pmh_repo_handler/scripts/install_kuha_oai_pmh_repo_handler_virtualenv.sh
  1. Install Kuha OAI-PMH Repo Handler to virtual environment.

./kuha2/kuha_oai_pmh_repo_handler/scripts/install_kuha_oai_pmh_repo_handler_virtualenv.sh

Upgrade OAI-PMH Repo Handler

In order to upgrade an existing install, fetch changes to code repository, checkout a version and re-install.

  1. Change directory to package directory.

cd kuha2/kuha_oai_pmh_repo_handler
  1. Fetch changes and checkout a version to upgrade to.

git fetch --all --tags
git checkout <version>
  1. Leave package directory, make installation script executable and install.

cd ../..
chmod +x ./kuha2/kuha_oai_pmh_repo_handler/scripts/install_kuha_oai_pmh_repo_handler_virtualenv.sh
./kuha2/kuha_oai_pmh_repo_handler/scripts/install_kuha_oai_pmh_repo_handler_virtualenv.sh

To run Kuha OAI-PMH Repo Handler you need access to Kuha Document Store. First make the run script executable.

chmod +x ./kuha2/kuha_oai_pmh_repo_handler/scripts/run_kuha_oai_pmh_repo_handler.sh

Run by calling the script. Replace <document-store-url> with the URL to the Document Store. You also need to specify few configuration values for OAI-PMH: base_url and admin_email.

./kuha2/kuha_oai_pmh_repo_handler/scripts/run_kuha_oai_pmh_repo_handler.sh --document-store-url=<document-store-url> --oai-pmh-base-url=<base_url> --oai-pmh-admin-email=<email>

Installing Kuha Client

  1. Create directory for Kuha Client and Python virtualenv.

mkdir kuha2
  1. Clone package to subdirectory.

You can clone the latest release with the following command.

git clone --depth 1 --branch releases https://gitlab.tuni.fi/fsd/kuha_client kuha2/kuha_client

Or clone a specific release by tag. Change <tag> to the release version.

git clone --depth 1 --branch <tag> https://gitlab.tuni.fi/fsd/kuha_client kuha2/kuha_client
  1. Install Python virtual environment.

sudo apt install -y python3-venv
  1. Install Kuha Client to virtual environment

cd kuha2
python3 -m venv kuha_client-env
source ./kuha_client-env/bin/activate
cd kuha_client
pip install -r requirements.txt
pip install .

Upgrade Kuha Client

In order to upgrade an existing install, fetch changes to code repository, checkout a version and re-install.

  1. Change directory to package directory

cd kuha2/kuha_client
  1. Fetch changes and checkout a version to upgrade to

git fetch --all --tags
git checkout <version>
  1. Activate Kuha Client virtual environment

source ../kuha_client-env/bin/activate
  1. Upgrade.

pip3 install -r requirements.txt --upgrade --upgrade-strategy=only-if-needed
pip3 install . --upgrade --upgrade-strategy=only-if-needed