kuha_oai_pmh_repo_handler

Kuha OAI-PMH Repo Handler application.

Serve records from Kuha Document Store throught OAI-PMH protocol.

constants.py

OAI-PMH Repo handler constants.

serve.py

Main entry point for starting OAI-PMH Repo Handler.

exception kuha_oai_pmh_repo_handler.serve.KuhaEntryPointException[source]

Base for entry point exceptions.

exception kuha_oai_pmh_repo_handler.serve.InvalidMetadataFormatException[source]

A loaded metadataformat is constructed wrong.

Raised for instance, when a metadataformat implements different sets that other metadataformats in same repository.

exception kuha_oai_pmh_repo_handler.serve.ConflictingMetadataPrefixException[source]

Raise when non-overridable metadataformats implement the same metadataprefix

exception kuha_oai_pmh_repo_handler.serve.NoMetadataFormatsException[source]

Unable to load any metadataformats for OAI-PMH Repo Handler

kuha_oai_pmh_repo_handler.serve.load_metadataformats(entry_point_group)[source]

Load metadataformat using plugin discovery via setuptools entry-points.

The following constraints apply to loaded metadataformats:

  • Every loaded metadataformat must have a unique mdprefix. Consults overridable (bool) attribute to check if a certain mdprefix can be overridden by another metadataformat. Raises ConflictingMetadataPrefixException, if metadataformats have same mdprefix and are non-overridable.

  • Every loaded metadataformat must implement the same sets; sets-attribute and list_sets method must be the same for every loaded metadataformat. Note that overridden metadataformats can have different sets that the ones that are finally loaded, as long as the loaded ones implement the same sets. Raises InvalidMetadataFormatException if loaded metadataformats implement different sets.

Also, it is mandatory to load at least one metadataformat. Raises NoMetadataFormatsException if no metadataformats are loaded.

Parameters

entry_point_group (str) – Entry point group for metadataformats.

Returns

Loaded metadataformat classes.

Return type

list

kuha_oai_pmh_repo_handler.serve.configure(mdformats)[source]

Configure OAI-PMH Server.

Parameters

mdformats (list) – Metadataformats to serve

Returns

Server settings

Return type

argparse.Namespace

kuha_oai_pmh_repo_handler.serve.main()[source]

Application main function.

Parse commandline for settings. Setup and serve webapp. Exit on exceptions propagated at this level.

Returns

None

controller.py

OAI-PMH Repo Handler controller

Connects backend components together, mainly metadataformats, XML templates and the oai.Protocol. Is responsible for cathing oai.errors and routing them to the correct template.

class kuha_oai_pmh_repo_handler.controller.Controller(respond_with_requested_url, mdformats, stylesheet_url=None)[source]

Controls processing of OAI requests

Holds information about the expected behaviour and routes requests to correct handlers using defined metadataformats. Interprets the OAI request and calls the correct metadataformats. Catches oai.errors and routes them to the error handler & template.

Parameters
  • respond_with_requested_url (bool) – Configures how to create the OAI response url. If True, the url is generated based on the incoming OAI request. Otherwise, the configured base_url is used. :seealso: oai.protocol.Response

  • mdformats (list) – Loaded metadataformats

  • stylesheet_url (str) – Optional XML stylesheet URL that is added to templates. Defaults to an empty string.

async static oai_error(ctx, *args, **kwargs)

Handle OAI errors

:param oai.Protocol oai_protocol: OAI protocol

instantiated with the current request.

Returns

Response context

Return type

dict

async static identify(ctx, *args, **kwargs)

Handle Identify OAI request

:param Controller._mdfwrapperclass mdformats: Wrapped & loaded

metadataformats

:param oai.Protocol oai_protocol: OAI protocol

instantiated with the current request

Returns

Response context

Return type

dict

async static listsets(ctx, *args, **kwargs)

Handle ListSets OAI request

:param Controller._mdfwrapperclass mdformats: Wrapped & loaded

metadataformats

:param oai.Protocol oai_protocol: OAI protocol

instantiated with the current request

Returns

Response context

Return type

dict

async static listmetadataformats(ctx, *args, **kwargs)

Handle ListMetadataFormats OAI request

:param Controller._mdfwrapperclass mdformats: Wrapped & loaded

metadataformats

:param oai.Protocol oai_protocol: OAI protocol

instantiated with the current request

Returns

Response context

Return type

dict

async static listidentifiers(ctx, *args, **kwargs)

Handle ListIdentifiers OAI request

:param Controller._mdfwrapperclass mdformats: Wrapped & loaded

metadataformats

:param oai.Protocol oai_protocol: OAI protocol

instantiated with the current request

Returns

Response context

Return type

dict

async static listrecords(mdformats, oai_protocol)[source]

Handle ListRecords OAI request

:param Controller._mdfwrapperclass mdformats: Wrapped & loaded

metadataformats

:param oai.Protocol oai_protocol: OAI protocol

instantiated with the current request

Returns

Response context

Return type

dict

async static getrecord(mdformats, oai_protocol)[source]

Handle GetRecord OAI request

:param Controller._mdfwrapperclass mdformats: Wrapped & loaded

metadataformats

:param oai.Protocol oai_protocol: OAI protocol

instantiated with the current request

Returns

Response context

Return type

dict

async oai_request(args, correlation_id_header, tornado_request)[source]

Create a request from requested arguments.

Parameters
  • args (list) – List of 2-item tuples [(key, value]] containing request arguments.

  • correlation_id (dict) – Correlation id header as dict

  • tornado_request – Current request from tornado handler

Returns

Two-tuple. First item is an iterable with the full HTTP response body. Second item is the current oai.Protocol instance.

kuha_oai_pmh_repo_handler.controller.from_settings(settings, mdformats)[source]

Get Controller from settings.

:param argparse.Namespace settings: Loaded settings. :param list mdformats: Loaded metadataformats. :returns: Initialized controller. :rtype: Controller

kuha_oai_pmh_repo_handler.controller.add_cli_args()[source]

Add command line arguments required by controller.

genshi_loader.py

Load genshi templates.

Usage:

from genshi_loader import add_template_folder, GenPlate

add_template_folder('/path/to/genshi/templates')

@GenPlate('identify.xml')
def handler_identify(genplate_instance):
    return {}
kuha_oai_pmh_repo_handler.genshi_loader.FOLDERS = []

Template folders

kuha_oai_pmh_repo_handler.genshi_loader.add_template_folders(*folders)[source]

Add folder to lookup for templates.

Parameters

folder (str) – absolute path to folder containing genshi templates.

kuha_oai_pmh_repo_handler.genshi_loader.get_template_folders()[source]

Get template folders.

Returns

template folders.

Return type

list

class kuha_oai_pmh_repo_handler.genshi_loader.KuhaXMLSerializer(**kw)[source]

Subclass XMLSerializer to add a custom filter.

class kuha_oai_pmh_repo_handler.genshi_loader.GenPlate(template_file, **kw)[source]

Genshi template decorator.

Decorate functions that should write output to genshi-templates. The decorated function must be an asynchronous function and it must return a dictionary.

Example:

from genshi_loader import GenPlate
class Handler:
    @GenPlate('error.xml')
    async def build_error_message(self, genplate_instance):
        ...
        return {'msg': 'there was an error'}
Parameters
  • template_file (str) – filename of the template to use.

  • template_folder (str) – optional parameter to use a different template folder to lookup for given template_file.

Raises

ValueError if decorated function returns invalid type.

static set_stylesheet_url(path)[source]

Set stylesheet URL to KuhaXMLSerializer.

Call this to replace ‘${stylesheet_url}’ notation in templates with ‘path’.

Parameters

path – Replaces stylesheet_url in templates.

http_api.py

Declare HTTP API

This module is responsible for interpreting the HTTP request parameters and headers. All OAI-PMH specific parameters are interpreted in controller logic. The controller must be linked to HTTP API via Tornado WebApplication initialization (see get_app()) and is used in OAIRouteHandler via self.application.settings[‘controller’].

class kuha_oai_pmh_repo_handler.http_api.OAIRouteHandler(*args, **kwargs)[source]

Declares the OAI-PMH HTTP API.

Takes in HTTP verbs GET and POST. Gathers their parameters and dispatches the parameters, correlation_id HTTP headers and a tornado request object to the controller.

The dispatch is done by calling the controller’s oai_request async method with parameters: args, correlation_id_header, tornado_request

  • args is a list of two-tuples containing request parameters that were submitted

via GET or POST. * correlation_id_header is a dictionary that can be used as is for further requests using Tornado’s HTTP clients: {‘X-REQUEST-ID’: <corr_id>}. * tornado_request is a tornado request object that wraps the current request. It should be used as a read-only object.

async prepare()[source]

Prepare response by settings the correct output content type.

async write_output(iterable)[source]

Write output by iterating the parameter.

Parameters

iterable – output to be written as HTTP response.

async get()[source]

HTTP-GET handler

Gathers request arguments. Calls dispatch. Finishes the response.

“URLs for GET requests have keyword arguments appended to the base URL”

http://www.openarchives.org/OAI/openarchivesprotocol.html#ProtocolFeatures

async post()[source]

HTTP-POST handler

Validates request content type. Gathers request arguments. Calls dispatch. Finishes the response.

“Keyword arguments are carried in the message body of the HTTP POST. The Content-Type of the request must be application/x-www-form-urlencoded.”

http://www.openarchives.org/OAI/openarchivesprotocol.html#ProtocolFeatures

kuha_oai_pmh_repo_handler.http_api.get_app(api_version, controller, app_class=None)[source]

Setup routes and return initialized Tornado web application.

Parameters
  • api_version (str) – HTTP Api version gets prepended to routes.

  • controller – Controller logic for HTTP API.

  • app_class – Use custom WebApplication class. Defaults to kuha_common.server.WebApplication.

Returns

Tornado web application.

Return type

tornado.web.Application

list_records.py

Run list records sequence on-demand against an OAI-PMH Repo Handler.

Helper script runs through the entire list records sequence with a given metadataPrefix and conditions. Can be used to ensure that all records within a repository are good to serve by catching timeouts from Document Store Client and non-serializable Document Store records.

Logs out the time it takes to complete the full sequence. Prints out all identifiers found by the requested conditions.

If any error conditions are encountered, the best place to look for the cause is the Kuha OAI-PMH Repo Handler log output and Kuha Document Store log output.

exception kuha_oai_pmh_repo_handler.list_records.InvalidOAIResponse[source]

The response was not expected.

Raised when:

  • HTTP response code is invalid

  • Result cannot be parsed as XML

  • OAI response has error <error> element

kuha_oai_pmh_repo_handler.list_records.main()[source]

Command line interface entry point.

Gather configuration. Setup application. Run sequence and report encountered identifiers.

Returns

0 on success

Return type

int

metadataformats

Define metadata formats.

Metadataformats create contexts by calling oai_response object and declare templates if needed. Metadataformats raise oai_errors if needed.

exception kuha_oai_pmh_repo_handler.metadataformats.DuplicateSetSpec[source]

Every OAI set must have a unique spec value

class kuha_oai_pmh_repo_handler.metadataformats.MDFormat(oai, corr_id_header)[source]

Base class for metadata formats.

Defines common attributes and methods. Subclass to define metadataformats.

overridable = False

overridable controls how plugin discovery handles metadataformats with same mdprefix. Built-in metadataformats could be overridable, those developed as a plugin should not.

study_class

alias of kuha_common.document_store.records.Study

variable_class

alias of kuha_common.document_store.records.Variable

question_class

alias of kuha_common.document_store.records.Question

class MDSet(mdformat)

Subclass to define OAI-Sets

classmethod add_cli_args(parser)

Add command line arguments to parser.

:param configargparse.ArgumentParser parser: Active command line parser.

classmethod configure(settings)

Configure set using settings.

Consult settings and configure set. Called via MDFormat on server startup. Return False if this set should not be loaded.

:param argparse.Namespace setting: Loaded settings :returns: False to bypass loading of this set.

async fields()

Return list of fields to include when querying for record headers.

This is used when gathering all docstore fields that are needed to construct oai headers.

Returns

list of fields

Return type

list

async filter(value)

Return a query filter that includes all studies matching ‘value’.

This is used when constructing docstore query that will include all records in this OAI-set group. In other words, in selective harvesting.

Parameters

value (str or None) – Request setspec value after colon (setspec = <self.spec>:<value>). If the requested setspec only includes the top-level setspec part (setspec = <self.spec>) the parameter is None.

Returns

query filter

Return type

dict

async get(study)

Get values from record used in setspec: ‘<key>:<value>’. A None item ([None]) will leave out the <value> part: ‘<key>’

This is used when constructing setspecs for a specific record.

Parameters

study – study record to get set values from

Returns

List of values

async query(on_set_cb)

Query and add distinct values for setspecs

This is used when constructing ListSets OAI response.

Parameters

on_set_cb – Async callback with signature (spec, name=None, description=None), where spec is the setSpec value.

Returns

None

classmethod add_cli_args(parser)[source]

Add command line arguments to parser.

Adds required command line arguments regarding metadataformats & sets.

This should be called on program startup along with other command line argument definitions if the program is allowing configuration of metadataformats & sets.

:param configargparse.ArgumentParser parser: Active command line parser.

classmethod configure_sets(settings)[source]

Configure & load sets using settings.

Calls configure() of each MDSet class stored in class variable ‘sets’. The configure() will be called with ‘settings’-parameter. If the configure() return False the set will not be loaded, but will be discarded instead. Otherwise, the configured set will be stored in module level variable and used to serve OAI requests.

:param argparse.Namespace setting: Loaded settings :raises: DuplicateSetSpec if two configured sets should

have duplicate value in ‘spec’ class level variable.

classmethod configure(settings)[source]

Configure metadataformats & sets using settings.

:param argparse.Namespace setting: Loaded settings

static get_deleted_record()[source]

Get DeletedRecord OAI-PMH property

classmethod get_set(setspec)[source]

Get set matching ‘setspec’ value.

Parameters

setspec (str) – Set to lookup.

Returns

Found set, which is a subclass of MDSet

Raises

exc.NoSuchSet if a set is not found.

async get_earliest_datestamp()[source]

Get earliest datestamp as python datetime object.

Returns

earliest datestamp for this metadataformat.

Return type

datetime.datetime

async list_sets()[source]

Outputs all sets from all records in the whole repository.

If overridden, this should be overridden in all subclasses. It should also have the same behaviour in all subclasses:

async def _list_sets():
    ...
class MyMetadataFormat(MDFormat):
    list_sets = _list_sets
async list_identifiers()[source]

Query record identifiers from backend.

Queries records and raises NoRecordsMatch oai error if the request is selective and no records were found.

async list_metadata_formats()[source]

Adds information regarding this metadataformat to response.

If the request contains an identifiers, first makes sure the record exists in backend, then adds the metadataformat information to response.

async get_record()[source]

Adds record to response.

This is an abstract method that must be implemented in subclass. Note that also the correct templates needs to be defined in subclass via decoration.

The implementation must query the backend for the requested record, raise OAI errors if needed and return the correct oai.response.context.

Raises

NotImplementedError

async list_records()[source]

Adds records to response.

This is an abstract method that must be implemented in subclass. The subclass must also define the correct template via decoration.

The implementation must query the backend for the requested records, raise OAI errors when needed and return the correct oai.response.context.

Raises

NotImplementedError

class kuha_oai_pmh_repo_handler.metadataformats.DCMetadataFormat(oai, corr_id_header)[source]
classmethod add_cli_args(parser)[source]

Add command line arguments to parser.

Adds required command line arguments regarding metadataformats & sets.

This should be called on program startup along with other command line argument definitions if the program is allowing configuration of metadataformats & sets.

:param configargparse.ArgumentParser parser: Active command line parser.

classmethod configure(settings)[source]

Configure metadataformats & sets using settings.

:param argparse.Namespace setting: Loaded settings

async get_record(*args, **kwargs)

Adds record to response.

This is an abstract method that must be implemented in subclass. Note that also the correct templates needs to be defined in subclass via decoration.

The implementation must query the backend for the requested record, raise OAI errors if needed and return the correct oai.response.context.

Raises

NotImplementedError

async list_records(*args, **kwargs)

Adds records to response.

This is an abstract method that must be implemented in subclass. The subclass must also define the correct template via decoration.

The implementation must query the backend for the requested records, raise OAI errors when needed and return the correct oai.response.context.

Raises

NotImplementedError

class kuha_oai_pmh_repo_handler.metadataformats.EAD3MetadataFormat(oai, corr_id_header)[source]
classmethod add_cli_args(parser)[source]

Add command line arguments to parser.

Adds required command line arguments regarding metadataformats & sets.

This should be called on program startup along with other command line argument definitions if the program is allowing configuration of metadataformats & sets.

:param configargparse.ArgumentParser parser: Active command line parser.

classmethod configure(settings)[source]

Configure metadataformats & sets using settings.

:param argparse.Namespace setting: Loaded settings

async get_record(*args, **kwargs)

Adds record to response.

This is an abstract method that must be implemented in subclass. Note that also the correct templates needs to be defined in subclass via decoration.

The implementation must query the backend for the requested record, raise OAI errors if needed and return the correct oai.response.context.

Raises

NotImplementedError

async list_records(*args, **kwargs)

Adds records to response.

This is an abstract method that must be implemented in subclass. The subclass must also define the correct template via decoration.

The implementation must query the backend for the requested records, raise OAI errors when needed and return the correct oai.response.context.

Raises

NotImplementedError

static get_daterange_pairs(colldates)[source]

Record helper method extracts daterange pairs from a list of Study.collection_periods.

Returns a list of two-tuples [(start, end)]. Both items inside tuple are instances of Study.collection_periods values.

Parameters

colldates (list) – collection periods list

Returns

List of date range pairs in two-tuples (start, end)

Return type

list

static get_singledates(colldates)[source]

Record helper method extracts single dates from a list of Study.collection_periods.

Returns a list Study.collection_periods values.

Parameters

colldates (list) – collection periods list

Returns

List of single dates

Return type

list

class kuha_oai_pmh_repo_handler.metadataformats.DDICMetadataFormat(oai, corr_id_header)[source]
classmethod add_cli_args(parser)[source]

Add command line arguments to parser.

Adds required command line arguments regarding metadataformats & sets.

This should be called on program startup along with other command line argument definitions if the program is allowing configuration of metadataformats & sets.

:param configargparse.ArgumentParser parser: Active command line parser.

classmethod configure(settings)[source]

Configure metadataformats & sets using settings.

:param argparse.Namespace setting: Loaded settings

async get_record(*args, **kwargs)

Adds record to response.

This is an abstract method that must be implemented in subclass. Note that also the correct templates needs to be defined in subclass via decoration.

The implementation must query the backend for the requested record, raise OAI errors if needed and return the correct oai.response.context.

Raises

NotImplementedError

async list_records(*args, **kwargs)

Adds records to response.

This is an abstract method that must be implemented in subclass. The subclass must also define the correct template via decoration.

The implementation must query the backend for the requested records, raise OAI errors when needed and return the correct oai.response.context.

Raises

NotImplementedError

class kuha_oai_pmh_repo_handler.metadataformats.OAIDDI25MetadataFormat(oai, corr_id_header)[source]
classmethod add_cli_args(parser)[source]

Add command line arguments to parser.

Adds required command line arguments regarding metadataformats & sets.

This should be called on program startup along with other command line argument definitions if the program is allowing configuration of metadataformats & sets.

:param configargparse.ArgumentParser parser: Active command line parser.

classmethod configure(settings)[source]

Configure metadataformats & sets using settings.

:param argparse.Namespace setting: Loaded settings

async get_record(*args, **kwargs)

Adds record to response.

This is an abstract method that must be implemented in subclass. Note that also the correct templates needs to be defined in subclass via decoration.

The implementation must query the backend for the requested record, raise OAI errors if needed and return the correct oai.response.context.

Raises

NotImplementedError

async list_records(*args, **kwargs)

Adds records to response.

This is an abstract method that must be implemented in subclass. The subclass must also define the correct template via decoration.

The implementation must query the backend for the requested records, raise OAI errors when needed and return the correct oai.response.context.

Raises

NotImplementedError

class kuha_oai_pmh_repo_handler.metadataformats.OAIDataciteMetadataFormat(oai, corr_id_header)[source]

Metadataformat for OpenAIRE DataCite

classmethod add_cli_args(parser)[source]

Add command line arguments to parser.

Adds required command line arguments regarding metadataformats & sets.

This should be called on program startup along with other command line argument definitions if the program is allowing configuration of metadataformats & sets.

:param configargparse.ArgumentParser parser: Active command line parser.

classmethod configure(settings)[source]

Configure metadataformats & sets using settings.

:param argparse.Namespace setting: Loaded settings

async classmethod get_preferred_identifier(study)[source]

OpenAIRE datacite requires a certain type of ID.

Identifier type must be one of (also the lookup order):
  • DOI

  • ARK

  • Handle

  • PURL

  • URN

  • URL

:param kuha_common.document_store.records.Study study:

Currently serialized study.

Returns

(<str:type>, <str:id>)

Return type

tuple

OpenAIRE Datacite requires a certain type of relatedIdentifier.

:param kuha_common.document_store.records.Study study:

Currently serialized study.

Returns

List of two-tuples containing type & id [(<str:type>, <str:id>)]

Return type

list

async static get_publisher_lang_value_pair(study)[source]

Get publisher language & value pair as tuple.

:param kuha_common.document_store.records.Study study:

Currently serialized study.

Returns

(<str:language>, <str:publisher>)

Return type

tuple

async static get_funders(study)[source]

Get OpenAIRE Datacite funders.

OpenAIRE Datacite requires a certain nameIdentifier for Contributor. The syntax is described at https://guidelines.openaire.eu/en/latest/data/field_contributor.html#nameidentifier-ma-o This method filters in study.grant_number values that conform to the syntax.

:param kuha_common.document_store.records.Study study: Currently serialized study. :returns: list of three-tuples [(<str:language>,

<str:nameidentifier>, <str:agency>)]

Return type

list

async get_record(*args, **kwargs)

Adds record to response.

This is an abstract method that must be implemented in subclass. Note that also the correct templates needs to be defined in subclass via decoration.

The implementation must query the backend for the requested record, raise OAI errors if needed and return the correct oai.response.context.

Raises

NotImplementedError

async list_records(*args, **kwargs)

Adds records to response.

This is an abstract method that must be implemented in subclass. The subclass must also define the correct template via decoration.

The implementation must query the backend for the requested records, raise OAI errors when needed and return the correct oai.response.context.

Raises

NotImplementedError

metadataformats/const.py

metadataformats/exc.py

exception kuha_oai_pmh_repo_handler.metadataformats.exc.NoSuchSet[source]

A nonexisting set was requested.

Raised for programming errors for metadataformats. Not used when a HTTP request asks for nonexisting set, instead use OAI-errors for such conditions.

metadataformats/_mdsets.py

class kuha_oai_pmh_repo_handler.metadataformats._mdsets.MDSet(mdformat)[source]

Subclass to define OAI-Sets

classmethod add_cli_args(parser)[source]

Add command line arguments to parser.

:param configargparse.ArgumentParser parser: Active command line parser.

classmethod configure(settings)[source]

Configure set using settings.

Consult settings and configure set. Called via MDFormat on server startup. Return False if this set should not be loaded.

:param argparse.Namespace setting: Loaded settings :returns: False to bypass loading of this set.

async fields()[source]

Return list of fields to include when querying for record headers.

This is used when gathering all docstore fields that are needed to construct oai headers.

Returns

list of fields

Return type

list

async query(on_set_cb)[source]

Query and add distinct values for setspecs

This is used when constructing ListSets OAI response.

Parameters

on_set_cb – Async callback with signature (spec, name=None, description=None), where spec is the setSpec value.

Returns

None

async get(study)[source]

Get values from record used in setspec: ‘<key>:<value>’. A None item ([None]) will leave out the <value> part: ‘<key>’

This is used when constructing setspecs for a specific record.

Parameters

study – study record to get set values from

Returns

List of values

async filter(value)[source]

Return a query filter that includes all studies matching ‘value’.

This is used when constructing docstore query that will include all records in this OAI-set group. In other words, in selective harvesting.

Parameters

value (str or None) – Request setspec value after colon (setspec = <self.spec>:<value>). If the requested setspec only includes the top-level setspec part (setspec = <self.spec>) the parameter is None.

Returns

query filter

Return type

dict

class kuha_oai_pmh_repo_handler.metadataformats._mdsets.LanguageSet(mdformat)[source]

OAI-Set Language

async fields()[source]

Return a list of fields to include in query for header fields.

These fields are used to build record headers for language OAI-set.

Returns

list of fields

async query(on_set_cb)[source]

Query and add distinct values for languages.

Parameters

on_set_cb – Async callback called for each setspec

Returns

None

async get(study)[source]

Get language oai-set values from docstore record.

Parameters

study – Document store Study

Returns

list of available languages of the study

Return type

list

async filter(value)[source]

Return a query filter that includes all studies matching ‘value’

Parameters

value (str or None) – Request setspec value after colon

Returns

query filter

Return type

dict

class kuha_oai_pmh_repo_handler.metadataformats._mdsets.StudyGroupsSet(mdformat)[source]

OAI-Set Study groups

async fields()[source]

Return a list of fields to include in query for header fields.

Returns

list of fields

Return type

list

async query(on_set_cb)[source]

Query and add distinct values for study groups.

Parameters

on_set_cb – Async callable called for each setspec

Returns

None

async get(study)[source]

Get values from study used to construct setspecs for study groups.

Parameters

study – Document store study

Returns

list of distinct study group values.

async filter(value)[source]

Return a query filter that includes all studies matching ‘value’

Parameters

value (str or None) – Request setspec value after colon

Returns

query filter

Return type

dict

class kuha_oai_pmh_repo_handler.metadataformats._mdsets.DataKindSet(mdformat)[source]

OAI-Set Data kind

async fields()[source]

Return list of fields to include when querying for header fields.

Returns

list of fields

Return type

list

async query(on_set_cb)[source]

Query and add distinct values for Data kinds.

Parameters

on_set_cb – Async callable called for each setspec.

Returns

None

async get(study)[source]

Get values from record for oai-set Data kinds.

Parameters

study – Document store study

Returns

list of study’s data kinds

Return type

list

async filter(value)[source]

Return a query filter that includes all studies matching ‘value’

Parameters

value (dict) – Request setspec value after colon

Returns

query filter

Return type

dict

class kuha_oai_pmh_repo_handler.metadataformats._mdsets.OpenAIREDataSet(mdformat)[source]

OAI-Set OpenAIRE data

async fields()[source]

Return a list of fields to include in query for header fields

Returns

list of fields

Return type

list

async query(on_set_cb)[source]

Query and add distinct values for OpenAIRE data.

The openaire_data setspec is non-hiearchical, but only contains a single setspec: ‘<setSpec>openaire_data</setSpec>’

Parameters

on_set_cb – Async callable, called for each setspec.

Returns

None

async get(study)[source]

Get values from study used to construct setspecs for openaire_data.

OpenAIRE set does not have a hiearchy of values. Instead a study belongs in the set, if the study has a suitable identifier.

Parameters

study – Document store study

Returns

[None] if study belongs to the set, [] if not.

Return type

list

async filter(value)[source]

Return a query filter that includes all studies in set.

The parameter ‘value’ is discarded.

Parameters

value – parameter is discarded.

Returns

Query filter

Return type

dict

oai

Defines OAI-PMH protocol.

Provides classes for handling requests and responses supported by the protocol.

oai/errors.py

Errors for OAI-protocol

exception kuha_oai_pmh_repo_handler.oai.errors.OAIError(msg=None, context=None)[source]

Base for OAI errors

get_code()[source]

Get OAI error code

get_msg()[source]

Get OAI error message

get_context()[source]

Get error context

get_contextual_message()[source]

Get error message with possible context.

Returns

message with context.

Return type

str

exception kuha_oai_pmh_repo_handler.oai.errors.MissingVerb(msg=None, context=None)[source]

OAIError for missing verb

exception kuha_oai_pmh_repo_handler.oai.errors.BadVerb(msg=None, context=None)[source]

OAIError for bad verb

exception kuha_oai_pmh_repo_handler.oai.errors.NoMetadataFormats(msg=None, context=None)[source]

OAIError for no metadata formats

exception kuha_oai_pmh_repo_handler.oai.errors.IdDoesNotExist(msg=None, context=None)[source]

OAIError for no such id

exception kuha_oai_pmh_repo_handler.oai.errors.BadArgument(msg=None, context=None)[source]

OAIError for bad argument

exception kuha_oai_pmh_repo_handler.oai.errors.CannotDisseminateFormat(msg=None, context=None)[source]

OAIError for cannot disseminate format

exception kuha_oai_pmh_repo_handler.oai.errors.NoRecordsMatch(msg=None, context=None)[source]

OAIError for no records match

exception kuha_oai_pmh_repo_handler.oai.errors.BadResumptionToken(msg=None, context=None)[source]

OAIError for bad resumption token

exception kuha_oai_pmh_repo_handler.oai.errors.NoSetHierarchy(msg=None, context=None)[source]

OAIError for repositories not implementing sets

oai/constants.py

OAI constants

kuha_oai_pmh_repo_handler.oai.constants.REGEX_OAI_IDENTIFIER = "oai:[a-zA-Z][a-zA-Z0-9\\-]*(\\.[a-zA-Z][a-zA-Z0-9\\-]*)+:[a-zA-Z0-9\\-_\\.!~\\*'\\(\\);/\\?:@&=\\+$,%]+"

Regex to validate oai-identifier. http://www.openarchives.org/OAI/2.0/guidelines-oai-identifier.htm

kuha_oai_pmh_repo_handler.oai.constants.REGEX_SETSPEC = "([A-Za-z0-9\\-_\\.!~\\*'\\(\\)])+(:[A-Za-z0-9\\-_\\.!~\\*'\\(\\)]+)*"

Sets not complying with this regular expression are invalid according to OAI-PMH schema: see: http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd

oai/protocol.py

Defines the protocol

kuha_oai_pmh_repo_handler.oai.protocol.REGEX_VALID_SETSPEC = re.compile("([A-Za-z0-9\\-_\\.!~\\*'\\(\\)])+(:[A-Za-z0-9\\-_\\.!~\\*'\\(\\)]+)*")

Validation regex for setspec

kuha_oai_pmh_repo_handler.oai.protocol.is_valid_setspec(candidate)[source]

Validates setSpec value.

Parameters

candidate (str) – setSpec value to validate.

Returns

True if valid, False if not.

Return type

bool

kuha_oai_pmh_repo_handler.oai.protocol.as_supported_datetime(datetime_str, raise_oai_exc=True)[source]

Convert string representation of datetime to datetime.

Note

If the datetime_str does not come from HTTP-Request, set raise_oai_exc to False.

Note

The legitimate formats are YYYY-MM-DD and YYYY-MM-DDThh:mm:ssZ.

Parameters
  • datetime_str (str) – datetime to convert

  • raise_oai_exc (bool) – Catch datetime.strptime errors and reraise as oai-error.

Returns

converted datetime.

Return type

datetime

Raises

kuha_oai_pmh_repo_hander.oai.errors.BadArgument for invalid format if raise_oai_exc is True.

kuha_oai_pmh_repo_handler.oai.protocol.as_supported_datestring(datetime_obj, fmt='%Y-%m-%dT%H:%M:%SZ')[source]

Convert datetime to string representation.

The target format is YYYY-MM-DDThh:mm:ssZ

Parameters

datetime_obj (datetime) – datetime to convert.

Returns

string representation of datetime_obj.

Return type

str

class kuha_oai_pmh_repo_handler.oai.protocol.Response(request_url=None)[source]

Represents the response.

The response is stored in a dictionary which then gets submitted to XML-templates. Thus it is required that the dictionary built within this class is supported by the templates.

Parameters

request_url (str) – Requested url.

classmethod set_repository_name(name)[source]

Set repository name.

Parameters

name (str) – repository name.

classmethod set_base_url(url)[source]

Set base url

Parameters

url (str) – url.

classmethod set_admin_email(email)[source]

Set admin email address.

Parameters

email (list) – Admin email(s)

classmethod set_protocol_version(version)[source]

Set protocol version

Parameters

version (float) – OAI-PMH protocol version.

async identify_response(earliest_datestamp=None, deleted_records='no', granularity='YYYY-MM-DDThh:mm:ssZ')[source]

Prepare and return context for OAI verb Identify.

:param datetime.datetime earliest_datestamp: Repository earliest datestamp.

None if docstore contains no records.

Parameters
  • deleted_records (str) – Repository support for deleter records. Must be one of ‘no’, ‘transient’, ‘persistent’. Defaults to ‘no’.

  • granularity (str) – Datestamp granularity. Must be ‘YYYY-MM-DD’ or ‘YYYY-MM-DDThh:mm:ssZ’. Defaults to ‘YYYY-MM-DDThh:mm:ssZ’

Returns

Response context

Return type

dict

async add_available_metadata_format(prefix, schema, namespace)[source]

Set supported metadata format.

set_error(oai_error)[source]

Set OAI-PMH error.

Note

These are the errors that are defined in the OAI-protocol. Programming errors are handled separately in higher levels.

Parameters

oai_error (Subclass of kuha_oai_pmh_repo_handler.oai.errors.OAIError) – OAI error.

async add_sets_element(spec, name=None, description=None)[source]

Add sets elements.

Parameters
  • spec (str) – setSpec-sublement value.

  • name (str or None) – setName-sublement value.

  • description (str or None) – setDescription-subelement value.

set_request_args(args)[source]

Request arguments are added to each succesfull OAI response.

If a request would result in OAI Error, these are not added to response.

These are read in oai_pmh_template.xml.

Parameters

args (list) – List of 2-item tuples [(key, value]] containing request arguments.

class kuha_oai_pmh_repo_handler.oai.protocol.ResumptionToken(cursor=0, from_=None, until=None, complete_list_size=None, metadata_prefix=None, set_=None, from_req=False)[source]

Class representing OAI-PMH Resumption Token.

Holds attributes of the resumption token. Creates a new resumption token with initial values or takes a dictionary of resumption token arguments. Validates the token based on records list size. If the list size has been changed between requests asserts that the token is invalid by raising a kuha_oai_pmh_repo_handler.oai.errors.BadResumptionToken exception.

Note

Since OAIArgument.set_ is not supported by resumption token, changing the requested set may result in falsely valid resumption token. But changing the requested set in the middle of a list request sequence should be seen as bad behaviour by the requester/harvester.

Parameters
  • cursor (int) – Optional parameter for the current position in list.

  • from (str) – Optional parameter for from datestamp. Converted to datetime.datetime on init.

  • until (str) – Optional parameter for until datestamp. Converted to datetime.datetime on init.

  • complete_list_size (int) – Optional parameter for the umber of records in the complete list.

  • metadata_prefix (str) – Optional parameter for the requested metadata prefix.

  • set (str) – Optional parameter containing requested set information.

class Attribute(key, value)

Store ResumptionToken attribute keys and values.

key

Alias for field number 0

value

Alias for field number 1

classmethod load_arg(argument)[source]

Create new resumption token from request arguments.

Use to load resumption token from OAI request.

Parameters

argument (str) – Resumption token argument. This comes from HTTP-request.

Returns

New ResumptionToken

property encoded

Get encoded Resumption Token.

Returns uri-encoded representation of the resumption token if the list request sequence is ongoing. If the list request sequence is over, returns None.

Returns

uri-encoded represenation of the token, or None

Return type

str or None

class kuha_oai_pmh_repo_handler.oai.protocol.Arguments(verb, resumption_token=None, identifier=None, metadata_prefix=None, set_=None, from_=None, until=None)[source]

Arguments of OAI-protocol.

Store arguments. Convert datestamps string to datetime objects. Validate arguments for each verb.

Parameters
  • verb (str) – requested OAI verb.

  • resumption_token (str) – requested resumption token.

  • identifier (str) – requested identifier.

  • metadata_prefix (str) – requested metadata prefix.

  • set (str) – requested set.

  • from (str) – requested datestamp for from attribute.

  • until (str) – requested datestamp for until attribute.

Raises

kuha_oai_pmh_repo_handler.oai.errors.OAIError for OAI errors.

supported_verbs = ['Identify', 'ListSets', 'ListMetadataFormats', 'ListIdentifiers', 'ListRecords', 'GetRecord']

Define supported verbs

resumable_verbs = ['ListSets', 'ListIdentifiers', 'ListRecords']

Define resumption token verbs

is_verb_resumable()[source]

Is the requested verb a resumable list request?

Returns

True if verb is resumable False otherwise

Return type

bool

get_local_identifier()[source]

Get requested local identifier.

Local identifier does not have prefixes for oai and namespace. It is used to identify records locally.

Returns

Local identifier if applicable for the request.

Return type

str

Raises

kuha_oai_pmh_repo_handler.oai.errors.IdDoesNotExist for invalid identifier.

is_selective()[source]

Return True if request is selective.

Selective refers to selective harvesting supported by OAI-PMH.

Returns

True if selective, False if not.

Return type

bool