Kuha OAI-PMH Repo Handler

Kuha OAI-PMH Repo Handler is a HTTP API written in Python for serving Kuha Document Store records through OAI-PMH.

Kuha OAI-PMH Repo Handler is a part of Open Source software bundle Kuha2.

Features

OAI-PMH features:

  • Selective harvesting with Sets & Datestamps.
  • List request sequence with ResumptionTokens.
  • OAI-Identifiers.

Supported metadata standards:

  • DDI-C 2.5
  • OAI-DC

Dependencies & requirements

Versions spesified here are the ones the software has been developed with. Newer versions may be compatible.

  • Python 3.5
  • Recommended: python3-venv 3.5.1

Python packages

The following can be obtained from Python package index.

  • tornado (License: Apache License 2.0)
  • Genshi (License: BSD)

Kuha Common is a library used with Kuha2 software. It can be obtained from https://bitbucket.org/tietoarkisto/kuha_common

  • kuha_common (License: EUPL)

License

Kuha OAI-PMH Repo Handler is available under the EUPL. See LICENSE.txt for the full license.

Configuration

The application can be configured with a configuration file, via command line arguments or by environment variables. If a configuration option is specified in more than one place, then command line values override environment variables which override configuration file values which override defaults.

Note

Configuration options for –oai-pmh-base-url and –oai-pmh-admin-email are required.

Some of the configuration options configure the OAI-PMH repository. Refer to OAI-PMH protocol description for more information.

The following configuration options are available:

-h, --help

Show help message and exit.

--print-configuration

Print active configuration and exit.

--port <port>

Port for serving OAI-PMH Repo Handler. Defaults to 6003 May also be controlled by setting environment variable: KUHA_OPRH_PORT.

--template-folder <folder>

Absolute path to Genshi templates. Defaults to <package-installation-dir>/templates. May also be controlled by setting environment variable: KUHA_OPRH_TEMPLATES.

--oai-pmh-repo-name <repo_name>

OAI-PMH repository name. Defauts to Kuha2 oai-pmh repository. May also be controlled by setting environment variable: KUHA_OPRH_OP_REPO_NAME.

--oai-pmh-base-url <base_url>

OAI-PMH base url. Required configuration value. May also be controlled by setting environment variable: KUHA_OPRH_OP_BASE_URL.

--oai-pmh-namespace-identifier <namespace_id>

Namespace identifier to use with OAI-Identifiers. Set None to disable use of OAI-Identifiers. Defaults to None. May also be controlled by setting environment variable: KUHA_OPRH_OP_NAMESPACE_ID.

--oai-pmh-protocol-version <version>

OAI-PMH protocol version. Note that currently only 2.0 is supported. Defaults to 2.0. May also be controlled by setting environment variable: KUHA_OPRH_OP_PROTO_VERSION.

--oai-pmh-results-per-list <results_per_list>

Set maximum number of results for each list response. Defaults to 500. May also be controlled by setting environment variable: KUHA_OPRH_OP_LIST_SIZE.

--oai-pmh-admin-email <email>

OAI-PMH administrator email address. Required configuration value. Repeat to give multiple addresses. May also be controlled by setting environment variable: KUHA_OPRH_OP_EMAIL_ADMIN.

--oai-pmh-api-version <api_version>

Api version for OAI-PMH Repo Handler. This gets prepended to the URL path. Defaults to v0. May also be controlled by setting environment variable: KUHA_OPRH_API_VERSION.

--document-store-host <host>

Host & scheme of Kuha Document Store. Defaults to http://localhost. May also be controlled by setting environment variable: KUHA_DS_HOST.

--document-store-port <port>

Port of Kuha document store database. Defaults to 6001. May also be controlled by setting environment variable: KUHA_DS_PORT.

--document-store-api-version <api_version>

Api version for document store. This gets prepended to the URL path. Defaults to v0. May also be controlled by setting environment variable: KUHA_DS_API_VERSION.

--document-store-client-request-timeout <timeout>

Request timeout (in seconds) for Document Store client. Defaults to 120. May also be controlled by setting environment variable: KUHA_DOCUMENT_STORE_CLIENT_REQUEST_TIMEOUT.

--document-store-client-connect-timeout <timout>

Connect timeout (in seconds) for Document Store client. Defaults to 120. May also be controlled by setting environment variable: KUHA_DOCUMENT_STORE_CLIENT_CONNECT_TIMEOUT.

--document-store-client-max-clients <max_clients>

Maximum number of simultaneous client connections for Document Store client. Defaults to 10. May also be controlled by setting environment variable: KUHA_DOCUMENT_STORE_CLIENT_MAX_CLIENTS.

--loglevel <loglevel>

Lowest logging level of log messages that get output. Valid values are logging levels supported by Python’s logging [CRITICAL,ERROR,WARNING,INFO,DEBUG]. Defaults to INFO. May also be controlled by setting environment variable: KUHA_LOGLEVEL

--logformat <logformat>

Logging format supported by logging. Defaults to %(asctime)s %(levelname)s(%(name)s): %(message)s) May also be controlled by setting environment variable: KUHA_LOGFORMAT

Configuration file

Args that start with ‘–’ (eg. –document-store-port) can also be set in a config file. The configuration file lookup searches the file from current working directory and from the package directory. The name of the configuration file is kuha_oai_pmh_repo_handler.ini.

Note

Invoke with --help to print out config file lookup paths.

Environment variables

If the program will be run by using the scripts provided in scripts subdirectory, the runtime environment can be controlled via scripts/runtime_env, which will be created by copying from scripts/runtime_env.dist at installation time by scripts/install_kuha_oai_pmh_repo_handler_virtualenv.sh.

Running the program

This guide will use convenience scripts from scripts subdirectory. It is assumed that the program was installed by using scripts/install_kuha_oai_pmh_repo_handler_virtualenv.sh.

Run OAI-PMH Repo Handler server:

./scripts/run_kuha_oai_pmh_repo_handler.sh --oai-pmh-base-url=<base-url> --oai-pmh-admin-email=<admin-email>

The script will source scripts/runtime_env and activate the installed virtualenv. Finally it calls kuha_oai_serve, with given command line arguments.