Kuha OAI-PMH Repo Handler¶
Kuha OAI-PMH Repo Handler is a HTTP API written in Python for serving Kuha Document Store records through OAI-PMH.
Kuha OAI-PMH Repo Handler is a part of Open Source software bundle Kuha2.
Features¶
OAI-PMH features:
- Selective harvesting with Sets & Datestamps.
- List request sequence with ResumptionTokens.
- OAI-Identifiers.
Supported metadata standards:
- DDI-C 2.5
- EAD3
- OAI-DC
Dependencies & requirements¶
- Python 3.5 or newer
- Recommended: python3-venv 3.5.1 or newer
The software is continuously tested against Python versions 3.5, 3.6, 3.7, 3.8. and 3.9.
Python packages
The following can be obtained from Python package index.
- tornado (License: Apache License 2.0)
- Genshi (License: BSD)
Kuha Common is a library used with Kuha2 software. It can be obtained from https://bitbucket.org/tietoarkisto/kuha_common
- kuha_common (License: EUPL)
License¶
Kuha OAI-PMH Repo Handler is available under the EUPL. See LICENSE.txt for the full license.
Configuration¶
The application can be configured with a configuration file, via command line arguments or by environment variables. If a configuration option is specified in more than one place, then command line values override environment variables which override configuration file values which override defaults.
Note
Configuration options for –oai-pmh-base-url and –oai-pmh-admin-email are required.
Some of the configuration options configure the OAI-PMH repository. Refer to OAI-PMH protocol description for more information.
The following configuration options are available:
-
-h
,
--help
¶
Show help message and exit.
-
--print-configuration
¶
Print active configuration and exit.
-
--port
<port>
¶ Port for serving OAI-PMH Repo Handler. Defaults to
6003
May also be controlled by setting environment variable:KUHA_OPRH_PORT
.
-
--template-folder
<folder>
¶ Absolute path to Genshi templates. Defaults to
<package-installation-dir>/templates
. May also be controlled by setting environment variable:KUHA_OPRH_TEMPLATES
.
-
--oai-pmh-repo-name
<repo_name>
¶ OAI-PMH repository name. Defauts to
Kuha2 oai-pmh repository
. May also be controlled by setting environment variable:KUHA_OPRH_OP_REPO_NAME
.
-
--oai-pmh-base-url
<base_url>
¶ OAI-PMH base url. Required configuration value. May also be controlled by setting environment variable:
KUHA_OPRH_OP_BASE_URL
.
-
--oai-pmh-namespace-identifier
<namespace_id>
¶ Namespace identifier to use with OAI-Identifiers. Set
None
to disable use of OAI-Identifiers. Defaults toNone
. May also be controlled by setting environment variable:KUHA_OPRH_OP_NAMESPACE_ID
.
-
--oai-pmh-protocol-version
<version>
¶ OAI-PMH protocol version. Note that currently only 2.0 is supported. Defaults to
2.0
. May also be controlled by setting environment variable:KUHA_OPRH_OP_PROTO_VERSION
.
-
--oai-pmh-results-per-list
<results_per_list>
¶ Set maximum number of results for each list response. Defaults to
500
. May also be controlled by setting environment variable:KUHA_OPRH_OP_LIST_SIZE
.
-
--oai-pmh-admin-email
<email>
¶ OAI-PMH administrator email address. Required configuration value. Repeat to give multiple addresses. May also be controlled by setting environment variable:
KUHA_OPRH_OP_EMAIL_ADMIN
.
-
--oai-pmh-api-version
<api_version>
¶ Api version for OAI-PMH Repo Handler. This gets prepended to the URL path. Defaults to
v0
. May also be controlled by setting environment variable:KUHA_OPRH_API_VERSION
.
-
--document-store-host
<host>
¶ Host & scheme of Kuha Document Store. Defaults to
http://localhost
. May also be controlled by setting environment variable:KUHA_DS_HOST
.
-
--document-store-port
<port>
¶ Port of Kuha document store database. Defaults to
6001
. May also be controlled by setting environment variable:KUHA_DS_PORT
.
-
--document-store-api-version
<api_version>
¶ Api version for document store. This gets prepended to the URL path. Defaults to
v0
. May also be controlled by setting environment variable:KUHA_DS_API_VERSION
.
-
--document-store-client-request-timeout
<timeout>
¶ Request timeout (in seconds) for Document Store client. Defaults to
120
. May also be controlled by setting environment variable:KUHA_DOCUMENT_STORE_CLIENT_REQUEST_TIMEOUT
.
-
--document-store-client-connect-timeout
<timout>
¶ Connect timeout (in seconds) for Document Store client. Defaults to
120
. May also be controlled by setting environment variable:KUHA_DOCUMENT_STORE_CLIENT_CONNECT_TIMEOUT
.
-
--document-store-client-max-clients
<max_clients>
¶ Maximum number of simultaneous client connections for Document Store client. Defaults to
10
. May also be controlled by setting environment variable:KUHA_DOCUMENT_STORE_CLIENT_MAX_CLIENTS
.
-
--loglevel
<loglevel>
¶ Lowest logging level of log messages that get output. Valid values are logging levels supported by Python’s
logging
[CRITICAL,ERROR,WARNING,INFO,DEBUG]
. Defaults toINFO
. May also be controlled by setting environment variable:KUHA_LOGLEVEL
-
--logformat
<logformat>
¶ Logging format supported by
logging
. Defaults to%(asctime)s %(levelname)s(%(name)s): %(message)s)
May also be controlled by setting environment variable:KUHA_LOGFORMAT
Configuration file
Args that start with ‘–’ (eg. –document-store-port) can also be set
in a config file. The configuration file lookup searches the file
from current working directory and from the package directory.
The name of the configuration file is kuha_oai_pmh_repo_handler.ini
.
Note
Invoke with --help
to print out config file lookup paths.
Environment variables
If the program will be run by using the scripts provided in scripts
subdirectory, the runtime environment can be controlled via scripts/runtime_env
,
which will be created by copying from scripts/runtime_env.dist
at
installation time by scripts/install_kuha_oai_pmh_repo_handler_virtualenv.sh
.
Running the Server¶
This guide will use convenience scripts from scripts
subdirectory.
It is assumed that the program was installed by using
scripts/install_kuha_oai_pmh_repo_handler_virtualenv.sh
.
Run OAI-PMH Repo Handler server:
./scripts/run_kuha_oai_pmh_repo_handler.sh --oai-pmh-base-url=<base-url> --oai-pmh-admin-email=<admin-email>
The script will source scripts/runtime_env
and activate the
installed virtualenv. Finally it calls kuha_oai_serve
, with given
command line arguments.
Ensuring OAI-PMH serves correct records¶
The program contains a helper script to run through all records from OAI-PMH Repo Handler using OAI verb ListRecords. The script will print out all identifiers it encounters and log out the time it took to complete the full ListRecords sequence. Note that the OAI-PMH Repo Handler server must be running and accessible in order to get correct results.
If any error conditions are encountered the best place to determine the cause is Kuha OAI-PMH Repo Handler server log.
Run through all records using oai_dc
metadataprefix:
./scripts/list_records.sh oai_dc
See help for more information and configuration options:
./scripts/list_records.sh --help