kuha_document_store

Kuha Document Store application

Query, manipulate and import Document Store records via HTTP API.

configure.py

Configure Document Store.

kuha_document_store.configure.add_database_configs()[source]

Add database configuration values to be parsed.

kuha_document_store.configure.configure()[source]

Get settings for application configuration.

Declares application spesific configuration options and some common options declared in kuha_common.cli_setup

Configure application with arguments specified in configuration file, environment variables and command line arguments.

Note:Calling this function multiple times will not initiate new settings to be parsed, but will return previously parsed settings instead.
Returns:settings
Return type:argparse.Namespace

serve.py

Main entry point for starting Document Store server.

kuha_document_store.serve.get_app(api_version, app_settings=None)[source]

Setup routes and return initialized Tornado web application.

Parameters:
  • api_version (str) – HTTP Api version gets prepended to routes.
  • app_settings (dict or None.) – Settings to store to application.
Returns:

Tornado web application.

Return type:

tornado.web.Application

kuha_document_store.serve.main()[source]

Application main function.

Parse commandline for settings. Initialize database and web application. Start serving via kuha_common.server.serve(). Exit on exceptions propagated at this level.

Returns:exit code, 1 on error, 0 on success.
Return type:int

handlers.py

Define handlers for responding to HTTP-requests.

class kuha_document_store.handlers.BaseHandler(*args, **kwargs)[source]

BaseHandler to derive from.

Provides common methods for subclasses.

Note:use from a subclass
prepare()[source]

Prepare for each request.

Set output content type.

get_db()[source]

Get database object stored in settings.

Returns:database object.
Return type:kuha_document_store.database.DocumentStoreDatabase
assert_body_not_empty(msg=None)[source]

Assert that request body contains data.

kuha_common.server.BadRequest is raised if body is empty.

Parameters:msg (str) – Optional message for exception.
Raises:kuha_common.server.BadRequest if body is empty.
class kuha_document_store.handlers.RestApiHandler(*args, **kwargs)[source]

Handle requests to REST api.

get(collection, resource_id=None)[source]

HTTP-GET to REST api endpoint.

Respond with single record or multiple records, depending on whether resource_id is requested.

Note:

Results will be streamed.

Parameters:
  • collection (str) – type of the requested collection.
  • resource_id (str or None) – optional ID of the requested resource. If left out of request, will return all records of requested type.
Raises:

kuha_common.server.BadRequest if there are recoverable errors in database operation. The error message is passed to BadRequest. See: kuha_document_store.database.DocumentStoreDatabase.recoverable_errors

Raises:

kuha_common.server.ResourceNotFound if requested resource_id does not return results.

post(collection, resource_id=None)[source]

HTTP-POST to REST api endpoint.

Create new resource from data submitted in request body.

Parameters:
  • collection (str) – collection type to create.
  • resource_id (str or None) – receives resource_id for completeness in handler configuration. It is however a kuha_common.server.BadRequest if one is submitted.
Raises:

kuha_common.server.BadRequest if request contains resource_id or if database operations raise recoverable errors. See: kuha_document_store.database.DocumentStoreDatabase.recoverable_errors

put(collection, resource_id=None)[source]

HTTP-PUT to REST api endpoint.

Replace existing resource with data in request body.

Parameters:
  • collection (str) – collection type to replace.
  • resource_id (str or None) – resource ID to replace. Optional for completeness in handler configuration. It is however a kuha_common.server.BadRequest if not submitted.
Raises:

kuha_common.server.BadRequest if requested endpoint does not contain resource_id or if database operation raises one of kuha_document_store.database.DocumentStoreDatabase.recoverable_errors

Raises:

kuha_common.server.ResourceNotFound if resource_id returns no results.

delete(collection, resource_id=None)[source]

HTTP-DELETE to REST api endpoint.

Delete resource or all resources of certain type.

Parameters:
  • collection (str) – type of collection
  • resource_id (str or None) – resource ID to delete.
Raises:

kuha_common.server.BadRequest if database operation raises one of kuha_document_store.database.DocumentStoreDatabase.recoverable_errors

Raises:

kuha_common.server.ResourceNotFound if resoure_id returns no results.

class kuha_document_store.handlers.ImportHandler(*args, **kwargs)[source]

Handle request to import endpoint.

prepare()[source]

Prepare for each request.

All requests must define content type for XML. All requests must contain body data.

post(importer_id, collection=None)[source]

HTTP-POST to import endpoint.

Lookup correct importer. Load iterative parser. Pass iterative parser to database for processing.

Parameters:
  • importer_id (str) – importer to use for importing.
  • collection (str or None) – Optional parameter limits the import to a spesific collection (resource type).
class kuha_document_store.handlers.QueryHandler(*args, **kwargs)[source]

Handle request to query endpoint.

Note:Results will be streamed.
prepare()[source]

Prepare for each request.

Request content type must be JSON. Request body must not be empty. Requested query type must be supported and query must have valid parameters.

post(collection)[source]

HTTP-POST to query endpoint.

Streams the results one JSON document at a time. Thus, the result of a response for multiple records will not a a valid JSON document.

Note:Body must be a JSON object.
Parameters:collection (str) – collection (resource type) to query.

database.py

Database module provides access to MongoDB database.

MongoDB Database is accessed throught this module. The module also provides convenience methods for easy access and manipulation via Document Store records defined in kuha_common.document_store.records

Database can be used directly, via records or with JSON representation of records.

note:This module has strict dependency to kuha_common.document_store.records
kuha_document_store.database.mongodburi(host_port, *hosts_ports, database=None, credentials=None, options=None)[source]

Create and return a mongodb connection string in the form of a MongoURI.

The standard URI connection scheme has the form: mongodb://[username:password@]host1[:port1][,…hostN[:portN]]][/[database][?options]]

Parameters:
  • host_port (str) – One of more host and port of a mongod instance.
  • database (str) – Optional database.
  • credentials (tuple) – Options credentials (user, pwd).
  • options (list) – Optional options as a list of tuples [(opt_key1, opt_val1), (opt_key2, opt_val2)]
Returns:

MongoURI connection string.

Return type:

str

class kuha_document_store.database.RecordsCollection(record_class, indexes_unique=None, indexes=None, validators=None)[source]

Database collection.

Note:Relational Database term table is called a collection in MongoDB.

Contains properties for Document Store collections. Has strict dependency to kuha_common.document_store.records

Parameters:
Returns:

RecordsCollection

isodate_fields = ['_metadata.created', '_metadata.updated']

List common isodate fields

object_id_fields = ['_id']

Fields containing MongoDB ObjectIDs

index_updated = [('_metadata.updated', -1)]

Declare updated field as index.

classmethod bson_to_json(_dict)[source]

Encode BSON dictionary to JSON.

Encodes special type of dictionary that comes from MongoDB queries to JSON representation. Also converts datetimes to strings.

Parameters:_dict (dict) – Source object containing BSON.
Returns:Source object converted to JSON.
Return type:str
get_validator()[source]

Get defined database-level validators.

Note:All validators are combined with AND operator.
Returns:Database level validators to be used on DB setup.
Return type:dict
process_json_for_upsert(json_document, old_metadata=None)[source]

Preprocess JSON for insert/update operations.

Decodes JSON to Python dictionary. Validates the result. Creates metadata for the document if the document has none, otherwise uses the submitted metadata. Decodes submitted metadata datestamps to datetime objects.

Parameters:
  • json_document (str) – JSON representation of a record.
  • old_metadata (dict or None) – old metadata if updating existing record.
Returns:

Document ready to be submitted to database.

Return type:

dict

kuha_document_store.database.RECORD_COLLECTIONS = [<kuha_document_store.database.RecordsCollection object>, <kuha_document_store.database.RecordsCollection object>, <kuha_document_store.database.RecordsCollection object>, <kuha_document_store.database.RecordsCollection object>]

Define Record Collections

class kuha_document_store.database.Database(settings)[source]

MongoDB database.

Provides access to low-level database operations. For fine access control uses two database credentials, one for read-only operations, one for write operations. Chooses the correct credentials to authenticate based on the operation to be performed.

Note:Does not authenticate or connect to the database before actually performing operations that need connecting. Therefore connection/authentication issues will raise when performing operations and not when initiating the database.
Parameters:settings (argparse.Namespace) – settings for database connections
Returns:Database
close()[source]

Close open sockets to database.

query_single(collection_name, query, fields=None, callback=None)[source]

Query for a single database document.

Parameters:
  • collection_name (str) – Name of database collection.
  • query (dict) – Database query.
  • fields (list or None) – Fields to select. None selects all.
  • callback (function or None) – Result callback. Called with result as parameter. If None this method will return the result.
Returns:

A single document or None if no matching document is found. or if callback is given.

Return type:

dict or None

query_multiple(collection_name, query, callback, fields=None, skip=0, sort_by=None, sort_order=1, limit=0)[source]

Query for multiple database documents.

Note:

has mandatory callback parameter.

Parameters:
  • collection_name (str) – Name of database collection.
  • query (dict) – Database query.
  • callback (Function that receives single record result as argument.) – Result callback. Called with each document as parameter.
  • fields (list or None) – Fields to select. None selects all.
  • skip (int) – Skip documents from the beginning of query.
  • sort_by (str) – Sort by field.
  • sort_order (int) – Sort by ascending or descending order. MongoDB users 1 to sort ascending -1 to sort descending.
  • limit (int) – Limit the number of returning documents. 0 returns all documents.
query_distinct(collection_name, fieldname, filter_=None)[source]

Query for distinct values in collection field.

Parameters:
  • collection_name (str) – Name of database collection.
  • fieldname (str) – Field to query for distinct values.
  • filter (dict or None) – Optional filter to use with query.
Returns:

distinct values.

Return type:

list

count(collection_name, filter_=None)[source]

Query for document count.

Parameters:
  • collection_name (str) – Name of database collection.
  • filter (dict or None) – Optional filter to use for query.
Returns:

Count of documents.

Return type:

int

insert(collection_name, document)[source]

Insert single document to database.

Parameters:
  • collection_name (str) – Name of database collection.
  • document (dict) – Document to insert.
Returns:

Insert result

Return type:

pymongo.results.InsertOneResult

replace(collection_name, oid, document)[source]

Replace single document in database.

Parameters:
  • collection_name (str) – Name of database collection.
  • oid (str) – MongoDB object ID as string.
  • document (dict) – Document to store.
Returns:

Update result.

Return type:

pymongo.results.UpdateResult

insert_or_replace(collection_name, query, document)[source]

Insert or replace a single document in database.

Uses special MongoDB method which will replace an existing document if one is found via query. Otherwise it will insert a new document.

Parameters:
  • collection_name (str) – Name of database collection.
  • query (dict) – Database query.
  • document (dict) – Document to store.
Returns:

The document that was stored.

Return type:

dict

delete_one(collection_name, query)[source]

Delete single document.

Parameters:
  • collection_name (str) – Name of database collection.
  • query (dict) – Database query.
Returns:

Delete result

Return type:

pymongo.results.DeleteResult

delete_many(collection_name, query)[source]

Delete multiple documents.

Parameters:
  • collection_name (str) – Name of database collection.
  • query (dict) – Database query.
Returns:

Delete result

Return type:

pymongo.results.DeleteResult

class kuha_document_store.database.DocumentStoreDatabase(settings)[source]

Subclass of Database

Provides specialized methods extending the functionality of Database. Combines database operations with properties of RecordsCollection. Defines exceptions that, when raised, the HTTP-response operation can continue.

recoverable_errors = (<class 'pymongo.errors.WriteError'>, <class 'json.decoder.JSONDecodeError'>, <class 'bson.errors.InvalidId'>, <class 'kuha_document_store.validation.RecordValidationError'>)

These are exceptions that may be raised in normal database operation, so they are not exceptions that should terminate the HTTP-response process. As such, the caller may want to catch these errors.

static json_decode(json_object)[source]

Helper method for converting HTTP input JSON to python dictionary.

Parameters:json_object (str) – json to convert.
Returns:JSON object converted to python dictionary.
Return type:dict
query_multiple(collection_name, query, callback, **kwargs)[source]

Query multiple documents with callback.

Converts resulting BSON to JSON. Calls callback with each resulting record JSON.

Parameters:
  • collection_name (str) – Name of database collection.
  • query (dict) – Database query.
  • callback (function) – Result callback. Called with each document as parameter.
  • **kwargs – additional keyword arguments passed to super method.
query_by_oid(collection_name, oid, callback, fields=None, not_found_exception=None)[source]

Query single record by ObjectID with callback.

Converts BSON result to JSON. Calls the callback with resulting JSON. If parameter for not_found_exception is given, will raise the exception if query ObjectID points to no known database object.

Parameters:
  • collection_name (str) – Name of database collection.
  • oid (str) – ObjectID to query for.
  • callback (function) – function to call with resulting JSON.
  • fields (list or None) – Fields to select. None selects all.
  • not_found_exception (Exception class.) – Raised if ObjectID not found.
query_distinct(collection_name, fieldname, filter_=None)[source]

Query for distinct values in collection field.

If fieldname points to a leaf node, returns a list of values, if it points to a branch node, returns a list of dictionaries.

If fieldname points to leaf node of isodate representations, or to branch node that contains isodates, converts datetimes to datestamps which are JSON serializable.

If ‘fieldname’ points to a leaf node containing MongoDB ObjectID values, cast those values to string.

Note:

Requires changes to logic if collection.object_id_fields should contain paths with multiple components, for example ‘some.path.with.id’. In that case distinct queries that point to brach nodes with OIDs will fail with Exception TypeError: ObjectId(’…’) is not JSON serializable.

Note:

Distinction will not work as expected on datestamp-fields that are stored as signed 64-bit integers with millisecond precision. The returned datestamps are not as precise since they have second precision.

Parameters:
  • collection_name (str) – Name of database collection.
  • fieldname (str) – Field to query for distinct values.
  • filter (dict or None) – Optional filter to use with query.
Returns:

distinct values from database

Return type:

list

insert_or_update_record(record)[source]

Insert or update database document by Document Store record.

Special method that takes a Document Store record instance as parameter and determines whether to insert or update the given record.

Makes a query to MongoDB to determine if the record is already in database. If there is a record, calls the record instance’s updates_record method to update the instance with values that are present in database but not in the submitted instance.

Afterwards calls insert_or_replace() with record instances dictionary representation.

Parameters:record (kuha_common.document_store.records.Study or kuha_common.document_store.records.Variable or kuha_common.document_store.records.Question or kuha_common.document_store.records.StudyGroup) – Document Store record instance.
Returns:operation details: {‘operation’: ‘insert’|’update’, ‘id’: <ObjectID>, <records-unique-values>}
Return type:dict
bulk_insert_or_update_record(records)[source]

Run bulk insert/update operations for Document Store records.

Method that takes an iterable parameter yielding Document Store records. Then calls insert_or_update_record() with each record instance.

Parameters:records (iterable) – Document Store records.
Returns:list of insert_or_update_record methods operation details.
Return type:list
insert_json(collection_name, json_object)[source]

Insert JSON-encoded document to Database.

Special method that takes a JSON object that is then inserted to database.

Parameters:
  • collection_name (str) – Name of database collection.
  • json_object (str) – JSON object representing collection document.
Returns:

Insert result.

Return type:

pymongo.results.InsertOneResult

replace_json(collection_name, oid, json_object, not_found_exception)[source]

Replace JSON-encoded document in Database.

Special method that replaces a document in database with document given as parameter json_object. The document to be replaced is queried by given oid.

This method also takes a not_found_exception as mandatory parameter. The exception is raised if a document with given oid cannot be found.

Note:

if the submitted JSON does not contain metadata for the document. the metadata gets calculated by RecordsCollection.process_json_for_upsert()

Parameters:
  • collection_name (str) – Name of database collection.
  • oid (str) – MongoDB object ID as string.
  • json_object (str) – JSON object representing collection document.
  • not_found_exception (Exception class.) – exception to raise if document is not found with oid
Returns:

Update result.

Return type:

pymongo.results.UpdateResult

delete_by_oid(collection_name, oid)[source]

Delete database document with ObjectID.

Parameters:
  • collection_name (str) – Name of database collection.
  • oid (str) – MongoDB object ID as string.
Returns:

Delete result

Return type:

pymongo.results.DeleteResult

validation.py

Simple validation for dictionary representation of document store records.

note:This module has strict dependency to kuha_common.document_store.records

Validate study record dictionary:

>>> from kuha_common.document_store.records import Study
>>> from kuha_document_store.validation import validate
>>> validate(Study.get_collection(), Study().export_dict(include_metadata=False))
Traceback (most recent call last):
[...]
    def validate(collection, document, raise_error=True, update=False):
kuha_document_store.validation.RecordValidationError: ('Validation of studies failed',
    {'study_number': ['null value not allowed']}
)
class kuha_document_store.validation.RecordValidator(*args, **kwargs)[source]

Subclass cerberus.Validator to customize validation.

JSON does not support sets. Therefore a rule to validate list items for uniquity is needed.

For the sake of simplicity in raising and handling validation errors this class also overrides cerberus.Validator.validate().

validate(document, **kwargs)[source]

Override cerberus.Validator.validate()

Handle unvalidated _id-field here to simplify error message flow and enable validation messages.

If document is to be updated it is allowed to have an _id field. If document is being inserted it is an error to have an _id field.

Parameters:
  • document (dict) – Document to be validated.
  • **kwargs – keyword arguments passed to cerberus.Validator.validate(). Here it is only checked if keyword argument updated is present and True.
Returns:

True if validation passes, False if not.

Return type:

bool

exception kuha_document_store.validation.RecordValidationError(collection, validation_errors, msg=None)[source]

Raised on validation errors.

Parameters:
  • collection (str) – Collection that got validated.
  • validation_errors (dict) – Validation errors from cerberus.Validator.errors. These are stored in RecordValidationError.validation_errors for later processing.
  • msg (str) – Optional message.
Returns:

RecordValidationError

class kuha_document_store.validation.RecordValidationSchema(record_class, *args)[source]

Create validation schema from records in kuha_common.document_store.records to validate user-submitted data.

Schema items are built dynamically by consulting record’s field types.

  • For single value fields the type is string and null values are not accepted.
  • For localizable fields it is required to have a kuha_common.document_store.constants.REC_FIELDNAME_LANGUAGE attribute.
  • Field attributes are strings and they may be null.
  • Subfield values are strings and not nullable.
  • Fallback to string, not null.

Record’s metadata is accepted as input but not required.

Note:kuha_common.document_store.RecordBase._metadata and kuha_common.document_store.RecordBase._id are also validated at database level.
Seealso:kuha_document_store.database.RecordsCollection.get_validator()

Every dynamically built schema item may be overriden by a custom schema item given as a parameter for class constructor.

Parameters:
Returns:

RecordValidationSchema

get_schema()[source]

Get Schema.

Returns:Validation schema supported by cerberus
Return type:dict
kuha_document_store.validation.validate(collection, document, raise_error=True, update=False)[source]

Validate document against collection schema.

Parameters:
  • collection (str) – Collection the document belongs to.
  • document (dict) – Document to validate. Document is a dictionary representation of a document store record.
  • raise_error (bool) – Should a RecordValidationError be raised if validation fails.
  • update (bool) – Validate for an update/replace operation of an existing record?
Returns:

True if document passed validation, False if fails.

Return type:

bool

Raises:

RecordValidationError if raise_error is True and document fails validation.

db_setup.py

Script to help setup Document Store database.

Database administrator may use this script to setup MongoDB instance for usage with Document Store.

kuha_document_store.db_setup.setup_admin_user(admin_username, admin_password, db)[source]

Setup administrator credentials.

Note:

authentication must be disabled in MongoDB to use this operation.

Parameters:
  • admin_username (str) – administrator username.
  • admin_password (str) – administrator password.
  • db (pymongo.database.Database) – MongoDB database
Returns:

MongoDB database command response

kuha_document_store.db_setup.setup_users(settings, client)[source]

Setup database users for Document Store.

Parameters:
  • settings (argparse.Namespace) – Document Store settings.
  • client (pymongo.mongo_client.MongoClient) – MongoDB client
Returns:

list of MongoDB database command responses

kuha_document_store.db_setup.remove_users(settings, client)[source]

Remove Document Store users from database.

Parameters:
  • settings (argparse.Namespace) – Document Store settings.
  • client (pymongo.mongo_client.MongoClient) – MongoDB client
Returns:

list of MongoDB database command responses

kuha_document_store.db_setup.setup_database(settings, client)[source]

Create Document Store database.

Parameters:
  • settings (argparse.Namespace) – Document Store settings.
  • client (pymongo.mongo_client.MongoClient) – MongoDB client
Returns:

PyMongo Database object

kuha_document_store.db_setup.delete_database(settings, client)[source]

Delete Document Store database.

Parameters:
  • settings (argparse.Namespace) – Document Store settings.
  • client (pymongo.mongo_client.MongoClient) – MongoDB client
Returns:

None

kuha_document_store.db_setup.list_databases(settings, client)[source]

List (print) databases.

Note:

Database won’t show in list before it has a collection

Parameters:
  • settings (argparse.Namespace) – Document Store settings.
  • client (pymongo.mongo_client.MongoClient) – MongoDB client
Returns:

list of database names

kuha_document_store.db_setup.setup_collections(settings, client)[source]

Setup Document Store collections (tables).

Parameters:
  • settings (argparse.Namespace) – Document Store settings.
  • client (pymongo.mongo_client.MongoClient) – MongoDB client
Returns:

list of results

kuha_document_store.db_setup.delete_collections(settings, client)[source]

Delete Document Store collections (tables).

Parameters:
  • settings (argparse.Namespace) – Document Store settings.
  • client (pymongo.mongo_client.MongoClient) – MongoDB client
Returns:

list of drop_collection results

kuha_document_store.db_setup.list_collections(settings, client)[source]

List Document Store collections (tables).

Parameters:
  • settings (argparse.Namespace) – Document Store settings.
  • client (pymongo.mongo_client.MongoClient) – MongoDB client
Returns:

List of mongodb collections

kuha_document_store.db_setup.list_db_users(settings, client)[source]

List (print) database users.

Parameters:
  • settings (argparse.Namespace) – Document Store settings.
  • client (pymongo.mongo_client.MongoClient) – MongoDB client
Returns:

dictionary containing database users and their properties.

kuha_document_store.db_setup.OPERATIONS = {'delete_collections': <function delete_collections>, 'list_database_users': <function list_db_users>, 'setup_database': <function setup_database>, 'delete_database': <function delete_database>, 'list_collections': <function list_collections>, 'remove_users': <function remove_users>, 'setup_admin_user': <function setup_admin_user>, 'setup_users': <function setup_users>, 'setup_collections': <function setup_collections>, 'list_databases': <function list_databases>}

Supported operations.

kuha_document_store.db_setup.main()[source]

Script main entry point.

importers

Supported importers are defined in this package.

Declare importers here.

kuha_document_store.importers.importers = {'ddi_c': <bound method XMLParserBase.from_string of <class 'kuha_common.document_store.mappings.ddi.DDI25RecordParser'>>, 'ddi_31': <bound method XMLParserBase.from_string of <class 'kuha_common.document_store.mappings.ddi.DDI31RecordParser'>>, 'ddi_122_nesstar': <bound method XMLParserBase.from_string of <class 'kuha_common.document_store.mappings.ddi.DDI122RecordParser'>>}

Register importers here. {importer_id: importer_function} Importer_id must be unique within importers. Importer_function must accept XML body as string for first argument and Document Store collection as an optional second argument. The importer function must return a generator that will iteratively return populated Document Store record instances.