Kuha OAI-PMH Repo Handler

Kuha OAI-PMH Repo Handler is a HTTP API written in Python for serving Kuha Document Store records through OAI-PMH.

Kuha OAI-PMH Repo Handler is a part of Open Source software bundle Kuha2.

Features

OAI-PMH features:

  • Selective harvesting with Sets & Datestamps.

  • List request sequence with ResumptionTokens.

  • OAI-Identifiers.

  • Deleted records.

Supported metadata standards:

  • DDI-C 2.5

  • EAD3

  • OAI-DC

  • Datacite

Dependencies & requirements

  • Python 3.8 or newer

The software is continuously tested against supported Python versions.

Python packages

The following can be obtained from Python package index.

  • tornado (License: Apache License 2.0)

  • Genshi (License: BSD)

Kuha Common is a library used with Kuha2 software. It can be obtained from https://bitbucket.org/tietoarkisto/kuha_common

  • kuha_common (License: EUPL)

License

Kuha OAI-PMH Repo Handler is available under the EUPL. See LICENSE.txt for the full license.

Configuration

The application can be configured with a configuration file, via command line arguments or by environment variables. If a configuration option is specified in more than one place, then command line values override environment variables which override configuration file values which override defaults.

Note

Configuration options for –oai-pmh-base-url and –oai-pmh-admin-email are required.

Some of the configuration options configure the OAI-PMH repository. Refer to OAI-PMH protocol description for more information.

This lists some of the available configuration options. Use –help to list all available options.

-h, --help

Show help message and exit.

--print-configuration

Print active configuration and exit.

--port <port>

Port for serving OAI-PMH Repo Handler. Defaults to 6003 May also be controlled by setting environment variable: KUHA_OPRH_PORT.

--oai-pmh-base-url <base_url>

OAI-PMH base url. Required configuration value. May also be controlled by setting environment variable: KUHA_OPRH_OP_BASE_URL.

--oai-pmh-admin-email <email>

OAI-PMH administrator email address. Required configuration value. Repeat to give multiple addresses. May also be controlled by setting environment variable: KUHA_OPRH_OP_EMAIL_ADMIN.

--oai-pmh-repo-name <repo_name>

OAI-PMH repository name. Defauts to Kuha2 oai-pmh repository. May also be controlled by setting environment variable: KUHA_OPRH_OP_REPO_NAME.

--oai-pmh-namespace-identifier <namespace_id>

Namespace identifier to use with OAI-Identifiers. Set None to disable use of OAI-Identifiers. Defaults to None. May also be controlled by setting environment variable: KUHA_OPRH_OP_NAMESPACE_ID.

--document-store-url <url>

Full URL to Kuha document store database. Defaults to http://localhost/v0. May also be controlled by setting environment variable: KUHA_DS_URL.

--loglevel <loglevel>

Lowest logging level of log messages that get output. Valid values are logging levels supported by Python’s logging [CRITICAL,ERROR,WARNING,INFO,DEBUG]. Defaults to INFO. May also be controlled by setting environment variable: KUHA_LOGLEVEL

Configuration file

Args that start with ‘–’ (eg. –document-store-port) can also be set in a config file. The configuration file lookup searches the file from current working directory and from the package directory. The name of the configuration file is kuha_oai_pmh_repo_handler.ini.

Note

Invoke with --help to print out config file lookup paths.

Environment variables

If the program will be run by using the scripts provided in scripts subdirectory, the runtime environment can be controlled via scripts/runtime_env, which will be created by copying from scripts/runtime_env.dist at installation time by scripts/install_kuha_oai_pmh_repo_handler_virtualenv.sh.

Running the Server

This guide will use convenience scripts from scripts subdirectory. It is assumed that the program was installed by using scripts/install_kuha_oai_pmh_repo_handler_virtualenv.sh.

Run OAI-PMH Repo Handler server:

./scripts/run_kuha_oai_pmh_repo_handler.sh --oai-pmh-base-url=<base-url> --oai-pmh-admin-email=<admin-email>

The script will source scripts/runtime_env and activate the installed virtualenv. Finally it calls kuha_oai_serve, with given command line arguments.

Ensuring OAI-PMH serves correct records

The program contains a helper script to run through all records from OAI-PMH Repo Handler using OAI verb ListRecords. The script will print out all identifiers it encounters and log out the time it took to complete the full ListRecords sequence. Note that the OAI-PMH Repo Handler server must be running and accessible in order to get correct results.

If any error conditions are encountered the best place to determine the cause is Kuha OAI-PMH Repo Handler server log.

Run through all records using oai_dc metadataprefix:

./scripts/list_records.sh oai_dc

See help for more information and configuration options:

./scripts/list_records.sh --help