Installation
This chapter describes the installation of each application.
Installing Kuha Document Store
This guide will provide step-by-step instructions in installing Kuha Document Store and a MongoDB database. Operating system used in this guide is Ubuntu 20.04, but other modern Linux variants may be used.
If you only need to install the Python package, see Install Document Store. If you wish to upgrade an existing Document Store install, see Upgrade Document Store.
Note
While this manual provides step-by-step instructions for MongoDB installation, always refer to the official MongoDB manual for proper installation procedure.
In this guide the installation of the database is done on a separate server. However, Document Store and MongoDB may be installed on the same server.
There are multiple ways to setup the MongoDB service. This guide describes two alternative setups. Refer to Install MongoDB replicaset and Setup MongoDB Replicaset for Document Store to setup Document Store with MongoDB replicas. For a more straightforward setup see Install Standalone MongoDB Instance and Setup Standalone MongoDB for Document Store. Please note that the installation guides are meant to be used as examples.
It is recommended to use the latest version of MongoDB which can be obtained from MongoDB’s own repository. Refer to MongoDB manual on how to install MongoDB to your operating system. At the time of writing the installation to Ubuntu 20.04 was done as follows.
Install MongoDB replicaset
Note
These actions should be done on the MongoDB server.
This is an example setup with a single virtual machine containing three MongoDB replicas. This example instructs to use the following configuration parameters for replicas:
Replica |
Port |
Configuration file |
Log file |
Database path |
---|---|---|---|---|
r1 |
27017 |
/etc/mongodb/r1.conf |
/var/lib/mongodb/r1.log |
/var/lib/mongodb/r1 |
r2 |
27018 |
/etc/mongodb/r2.conf |
/var/lib/mongodb/r2.log |
/var/lib/mongodb/r2 |
r3 |
27019 |
/etc/mongodb/r3.conf |
/var/lib/mongodb/r3.log |
/var/lib/mongodb/r3 |
The replicaset will be configured to use the name rs_kuha
. Keyfile
for replica authorization will be stored to /var/lib/mongodb/auth_key
.
Replica services will be controlled by SystemD. Unit definitions will
be stored to /etc/systemd/system/
. Each replica will have its own
dedicated unit file mongod_r1.service
, mongod_r2.service
and
mongod_r3.service
.
Obtain MongoDB public key.
wget -qO - https://www.mongodb.org/static/pgp/server-5.0.asc | sudo apt-key add -
Add MongoDB source.
echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/5.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-5.0.list
Update indexes and install.
sudo apt-get update && sudo apt-get install -y mongodb-org
Create directories for replica data.
sudo mkdir /var/lib/mongodb/{r1,r2,r3}
sudo chown mongodb:mongodb /var/lib/mongodb/{r1,r2,r3}
sudo chmod 0755 /var/lib/mongodb/{r1,r2,r3}
Configure a single MongoDB instance to use r1 replica data directory.
sudo sed -i 's#dbPath: /var/lib/mongodb#dbPath: /var/lib/mongodb/r1#' /etc/mongod.conf
Start MongoDB. Leave it running in this terminal.
sudo -u mongodb /usr/bin/mongod --config /etc/mongod.conf
Open another terminal to create rootadmin user using the mongo shell. Replace <user> and <password> with proper credentials. Replace <mongodb-ip> with the IP of your mongodb server.. Close the terminal after completing this step.
mongo <mongodb-ip>
use admin
db.createUser({user: <user>, pwd: <password>, roles: [{role: 'root', db: 'admin'}]})
exit
Stop MongoDB by pressing
CTRL+C
in the terminal running MongoDB.Create directory for mongodb replica configuration.
sudo mkdir /etc/mongodb
sudo chmod 0755 /etc/mongodb
Generate keyfile for replica authentication.
sudo openssl rand -base64 756 | sudo tee /var/lib/mongodb/auth_key
sudo chown mongodb:mongodb /var/lib/mongodb/auth_key
sudo chmod 0600 /var/lib/mongodb/auth_key
Configure replicas. Below is an example for /etc/mongodb/r1.conf, which instructs that the replica belongs to a replicaset named rs_kuha. Create similar configurations for other replicas as well (r2.conf, r3.conf). Note that each replica must use a distinct port when serving from a single host. You may use 27018 for r2, and 27019 for r3.
storage:
dbPath: /var/lib/mongodb/r1
journal:
enabled: true
systemLog:
destination: file
logAppend: true
path: /var/lib/mongodb/r1.log
net:
port: 27017
bindIp: 0.0.0.0
processManagement:
timeZoneInfo: /usr/share/zoneinfo
security:
authorization: enabled
keyFile: /var/lib/mongodb/auth_key
replication:
replSetName: rs_kuha
Ensure permissions.
sudo chmod 0644 /etc/mongodb/{r1,r2,r3}.conf
Create systemd units for replicas. Below is an example for
/etc/systemd/system/mongod_r1.service
. Create similar units for other replicas as well (mongod_r2.service, mongod_r3.service). Note that the ExecStart directive must point to the correct replica configuration file and each unit must have a distinct PIDFile path.
[Unit]
Description=MongoDB Database Server
Documentation=https://docs.mongodb.org/manual
After=network.target
[Service]
Type=simple
User=mongodb
Group=mongodb
ExecStart=/usr/bin/mongod --config /etc/mongodb/r1.conf
Restart=always
PIDFile=/var/run/mongodb/mongod_r1.pid
# file size
LimitFSIZE=infinity
# cpu time
LimitCPU=infinity
# virtual memory size
LimitAS=infinity
# open files
LimitNOFILE=64000
# processes/threads
LimitNPROC=64000
# locked memory
LimitMEMLOCK=infinity
# total threads (user+kernel)
TasksMax=infinity
TasksAccounting=false
# Recommended limits for mongod as specified in
# http://docs.mongodb.org/manual/reference/ulimit/#recommended-settings
[Install]
WantedBy=multi-user.target
Ensure permissions.
sudo chmod 0644 /etc/systemd/system/mongod_r{1,2,3}.service
Enable replica services.
sudo systemctl enable mongod_r1.service
sudo systemctl enable mongod_r2.service
sudo systemctl enable mongod_r3.service
Reload systemd manager configuration.
sudo systemctl daemon-reload
Start services.
sudo systemctl start mongod_r1.service
sudo systemctl start mongod_r2.service
sudo systemctl start mongod_r3.service
MongoDB replicas are now running and configured to work as a replicaset. Next step is to Install Document Store and Setup MongoDB Replicaset for Document Store
Install Standalone MongoDB Instance
Note
MongoDB manual instructs that a standalone database should not be
used for production environments. Consider using a Replica Set for
production and setting up the database with
scripts/setup_mongodb_replicaset.sh
.
Note
These actions should be done on the MongoDB server.
This guide installs a standalone instance of MongoDB, which is not recommended for production use. This guide uses the default MongoDB port 27017.
Obtain MongoDB public key.
wget -qO - https://www.mongodb.org/static/pgp/server-5.0.asc | sudo apt-key add -
Add MongoDB source.
echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/5.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-5.0.list
Update indexes and install.
sudo apt-get update && sudo apt-get install -y mongodb-org
Configure MongoDB to accept incoming connections. Use IP of your MongoDB server in <mongodb-ip>.
sudo sed -i 's/ bindIp: 127.0.0.1/ bindIp: <mongodb-ip>/' /etc/mongod.conf
Start MongoDB.
sudo systemctl start mongod
Create rootadmin user using the mongo shell. Replace <user> and <password> with proper credentials and <mongodb-ip> with your MongoDB server IP.
mongo <mongodb-ip>
use admin
db.createUser({user: <user>, pwd: <password>, roles: [{role: 'root', db: 'admin'}]})
exit
Enable authentication.
sudo sed -i 's/#security:/security:\n authorization: enabled/' /etc/mongod.conf
Restart MongoDB.
sudo systemctl restart mongod
MongoDB instance is now running. Next step is to Install Document Store and Setup Standalone MongoDB for Document Store.
Install Document Store
Note
These actions should be done on the Document Store server.
Create directory for document store and Python virtualenv.
mkdir kuha2
Clone package to subdirectory.
You can clone the latest release with the following command.
git clone --depth 1 --branch releases https://gitlab.tuni.fi/fsd/kuha_document_store kuha2/kuha_document_store
Or clone a specific release by tag. Change <tag> to the release version.
git clone --depth 1 --branch <tag> https://gitlab.tuni.fi/fsd/kuha_document_store kuha2/kuha_document_store
Install Python virtual environment.
sudo apt install -y python3-venv
Make installation script executable.
chmod +x ./kuha2/kuha_document_store/scripts/install_kuha_document_store_virtualenv.sh
Install Kuha Document Store to virtual environment.
./kuha2/kuha_document_store/scripts/install_kuha_document_store_virtualenv.sh
Upgrade Document Store
Note
These actions should be done on the Document Store server.
In order to upgrade an existing install, fetch changes to code repository, checkout a version and re-install.
Change directory to package directory.
cd kuha2/kuha_document_store
Fetch changes and checkout a version to upgrade to.
git fetch --all --tags
git checkout <version>
Leave package directory, make installation script executable and install.
cd ../..
chmod +x ./kuha2/kuha_document_store/scripts/install_kuha_document_store_virtualenv.sh
./kuha2/kuha_document_store/scripts/install_kuha_document_store_virtualenv.sh
Setup MongoDB Replicaset for Document Store
Note
These actions should be done on the Document Store server.
Document store provides a script which will help setup MongoDB. The script will create the required collections and database users. It will also setup indexes for the collections to speed up database queries.
The script will prompt for MongoDB rootadmin credentials. You may wish
to provide them via configuration options for a noninteractive
setup. See --help
for configuration reference.
Give hostname/IP & port of your MongoDB replicas as command line parameters. Pass in the configured replicaset as well.
Note
You may wish to provide DB credentials for editor and
reader. Give parameter --help
to see how.
Make the setup script executable.
chmod +x ./kuha2/kuha_document_store/scripts/setup_mongodb_replicaset.sh
2. Run the MongoDB setup script. Repeat --replica
for each replicaset member.
Replace <replica-ip-port> with the IP & port combination of your MongoDB replicas.
Replace <replicaset> with the configured replicaset.
./kuha2/kuha_document_store/scripts/setup_mongodb_replicaset.sh --replica <replica-ip-port> --replicaset <replicaset>
Now the database is ready to be used with Document Store. Care should be taken to secure the MongoDB instance. For Kuha2 the only IP that needs access to the database is Kuha Document Store’s IP.
Setup Standalone MongoDB for Document Store
Note
These actions should be done on the Document Store server.
Document store provides a script which will help setup MongoDB. Give hostname/IP of your MongoDB server as command line parameter.
The script will prompt for MongoDB rootadmin credentials. You may wish
to provide them via configuration options for a noninteractive
setup. See --help
for configuration reference.
The script will create needed collections and database users. It will also setup indexes for the collections to speed up database queries.
Note
You may wish to provide DB credentials for editor and
reader. Give parameter --help
to see how.
Make the setup script executable.
chmod +x ./kuha2/kuha_document_store/scripts/setup_mongodb.sh
Run the MongoDB setup script. Replace <mongodb-ip> with the IP of your MongoDB server.
./kuha2/kuha_document_store/scripts/setup_mongodb.sh --replica <mongodb-ip>:27017 --replicaset ''
Now the database is ready to be used with Document Store. Care should be taken to secure the MongoDB instance. For Kuha2 the only IP that needs access to the database is Kuha Document Store’s IP.
Running the Document Store
Note
These actions should be done on the Document Store server.
Make the run-script executable.
chmod +x ./kuha2/kuha_document_store/scripts/run_kuha_document_store.sh
Start serving Document Store.
Connect to a replicaset. Repeat --replica
option for each replicaset member.
Replace <replica-ip-port> with the hostname/IP & port combination of your MongoDB replicas.
Replace <replicaset> with the configured replicaset.
./kuha2/kuha_document_store/scripts/run_kuha_document_store.sh --replica=<replica-ip-port> --replicaset <replicaset>
Or connect to a standalone MongoDB instance. Replace <mongodb-ip-port> with the hostname/IP & port combination of your MongoDB server.
./kuha2/kuha_document_store/scripts/run_kuha_document_store.sh --replica=<mongodb-ip-port> --replicaset ''
Install as a service
Note
These actions should be done on the Document Store server.
SystemD is used to manage services in Ubuntu. A server application can be installed as a SystemD Unit to make it a background process that gets launched every time the operating system boots up.
This is an example of creating and enabling a SystemD Unit, which controls the Kuha Document Store server process. You must have Document Store installed before completing these steps.
Enter the configuration using environment variables to
runtime_env
. You need to at least configure the database replicas. Replace <replica_host_port_1>, <replica_host_port_2>, <replica_host_port_3>. See the Document Store documentation for full configuration reference.
echo 'export KUHA_DS_DBREPLICAS="[<replica_host_port_1>, <replica_host_port_2>, <replica_host_port_3>]"' >> kuha2/kuha_document_store/scripts/runtime_env
Afterwards the file contents is similar to
#!/bin/bash
HERE="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
KUHA_VENV_PATH="${HERE}/../../kuha_document_store-env"
export KUHA_DS_DBREPLICAS="[<replica_host_port_1>, <replica_host_port_2>, <replica_host_port_3>]"
Define a systemd unit file. Create a new file
kuha_document_store.service
and open it in text editor. In this example I’m using nano.
nano kuha_document_store.service
Write the following contents to the file. Make sure the path in ExecStart is correct. It should point to the
run_kuha_document_store.sh
script. Also define a User and Group that have execute permissions to the file. Replace <path-to-script>, <user> and <group> with correct values.
[Unit]
Description=Kuha document store
After=network.target
[Service]
Type=simple
ExecStart=<path-to-script>
KillSignal=SIGINT
TimeoutStopSec=5
User=<user>
Group=<group>
[Install]
WantedBy=multi-user.target
To write the contents to the file in nano, press
CTRL+o
. It prompts for the file name, which should be “kuha_document_store.service”. Press ENTER to confirm.To exit nano, press
CTRL+x
Copy the file to /etc/systemd/system folder.
sudo cp kuha_document_store.service /etc/systemd/system/kuha_document_store.service
Give correct permissions and owner.
sudo chmod 0644 /etc/systemd/system/kuha_document_store.service
sudo chown root:root /etc/systemd/system/kuha_document_store.service
Reload systemd unit files and configuration.
sudo systemctl daemon-reload
Enable the kuha_document_store.service.
sudo systemctl enable kuha_document_store
Start the kuha_document_store.service:
sudo systemctl start kuha_document_store
Now you may confirm that the service is running and listening to port 6001 (default):
curl localhost:6001/v0/studies
The Document Store is now installed as a service. It will be restarted when the server is rebooted. The start/stop/enable/disable are handled using systemctl-command. The logs can be read using the journalctl-command.
Installing Kuha OSMH Repo Handler
The operating system used in these steps is Ubuntu 16.04. Other modern Linux variants may be used.
Create directory for OSMH Repo Handler and Python virtualenv.
mkdir kuha2
Clone package to subdirectory.
git clone --single-branch --branch releases https://bitbucket.org/tietoarkisto/kuha_osmh_repo_handler kuha2/kuha_osmh_repo_handler
Install Python virtual environment.
sudo apt install -y python3-venv
Make install script executable.
chmod +x ./kuha2/kuha_osmh_repo_handler/scripts/install_kuha_osmh_repo_handler_virtualenv.sh
Install Kuha OSMH Repo Handler to virtual environment.
./kuha2/kuha_osmh_repo_handler/scripts/install_kuha_osmh_repo_handler_virtualenv.sh
To run Kuha OSMH Repo Handler you need access to Kuha Document Store. First you will need to make run script executable.
chmod +x ./kuha2/kuha_osmh_repo_handler/scripts/run_kuha_osmh_repo_handler.sh
Run by calling the script. Replace <document-store-url> with the URL to the Document Store.
./kuha2/kuha_osmh_repo_handler/scripts/run_kuha_osmh_repo_handler.sh --document-store-url=<document-store-url>
Upgrade OSMH Repo Handler
In order to upgrade an existing install, fetch changes to code repository, checkout a version and re-install.
Change directory to package directory.
cd kuha2/kuha_osmh_repo_handler
Fetch changes and checkout a version to upgrade to.
git fetch --all --tags
git checkout <version>
Leave package directory, make installation script executable and install.
cd ../..
chmod +x ./kuha2/kuha_osmh_repo_handler/scripts/install_kuha_osmh_repo_handler_virtualenv.sh
./kuha2/kuha_osmh_repo_handler/scripts/install_kuha_osmh_repo_handler_virtualenv.sh
Installing Kuha OAI-PMH Repo Handler
The operating system used in these steps is Ubuntu 16.04. Other modern Linux variants may be used.
Create directory for OAI-PMH Repo Handler and Python virtualenv.
mkdir kuha2
Clone package to subdirectory.
You can clone the latest release with the following command.
git clone --depth 1 --branch releases https://gitlab.tuni.fi/fsd/kuha_oai_pmh_repo_handler kuha2/kuha_oai_pmh_repo_handler
Or clone a specific release by tag. Change <tag> to the release version.
git clone --depth 1 --branch <tag> https://gitlab.tuni.fi/fsd/kuha_oai_pmh_repo_handler kuha2/kuha_oai_pmh_repo_handler
Install Python virtual environment.
sudo apt install -y python3-venv
Make install script executable.
chmod +x ./kuha2/kuha_oai_pmh_repo_handler/scripts/install_kuha_oai_pmh_repo_handler_virtualenv.sh
Install Kuha OAI-PMH Repo Handler to virtual environment.
./kuha2/kuha_oai_pmh_repo_handler/scripts/install_kuha_oai_pmh_repo_handler_virtualenv.sh
Upgrade OAI-PMH Repo Handler
In order to upgrade an existing install, fetch changes to code repository, checkout a version and re-install.
Change directory to package directory.
cd kuha2/kuha_oai_pmh_repo_handler
Fetch changes and checkout a version to upgrade to.
git fetch --all --tags
git checkout <version>
Leave package directory, make installation script executable and install.
cd ../..
chmod +x ./kuha2/kuha_oai_pmh_repo_handler/scripts/install_kuha_oai_pmh_repo_handler_virtualenv.sh
./kuha2/kuha_oai_pmh_repo_handler/scripts/install_kuha_oai_pmh_repo_handler_virtualenv.sh
To run Kuha OAI-PMH Repo Handler you need access to Kuha Document Store. First make the run script executable.
chmod +x ./kuha2/kuha_oai_pmh_repo_handler/scripts/run_kuha_oai_pmh_repo_handler.sh
Run by calling the script. Replace <document-store-url> with the URL to the Document Store. You also need to specify few configuration values for OAI-PMH: base_url and admin_email.
./kuha2/kuha_oai_pmh_repo_handler/scripts/run_kuha_oai_pmh_repo_handler.sh --document-store-url=<document-store-url> --oai-pmh-base-url=<base_url> --oai-pmh-admin-email=<email>
Installing Kuha Client
Create directory for Kuha Client and Python virtualenv.
mkdir kuha2
Clone package to subdirectory.
You can clone the latest release with the following command.
git clone --depth 1 --branch releases https://gitlab.tuni.fi/fsd/kuha_client kuha2/kuha_client
Or clone a specific release by tag. Change <tag> to the release version.
git clone --depth 1 --branch <tag> https://gitlab.tuni.fi/fsd/kuha_client kuha2/kuha_client
Install Python virtual environment.
sudo apt install -y python3-venv
Install Kuha Client to virtual environment
cd kuha2
python3 -m venv kuha_client-env
source ./kuha_client-env/bin/activate
cd kuha_client
pip install -r requirements.txt
pip install .
Upgrade Kuha Client
In order to upgrade an existing install, fetch changes to code repository, checkout a version and re-install.
Change directory to package directory
cd kuha2/kuha_client
Fetch changes and checkout a version to upgrade to
git fetch --all --tags
git checkout <version>
Activate Kuha Client virtual environment
source ../kuha_client-env/bin/activate
Upgrade.
pip3 install -r requirements.txt --upgrade --upgrade-strategy=only-if-needed
pip3 install . --upgrade --upgrade-strategy=only-if-needed