kuha_document_store

Kuha Document Store application

Persist records into a MongoDB instance. Control records via HTTP API.

serve.py

Main entry point for starting Document Store server.

kuha_document_store.serve.get_app(api_version, **kw)[source]

Setup routes and return initialized Tornado web application.

Additional keyword arguments are passed to WebApplication.

Parameters:

api_version (str) – HTTP Api version gets prepended to routes.

Returns:

Tornado web application.

Return type:

tornado.web.Application

kuha_document_store.serve.main()[source]

Application main function.

Parse commandline for settings. Initialize database and web application. Start serving via kuha_common.server.serve(). Exit on exceptions propagated at this level.

Returns:

exit code, 1 on error, 0 on success.

Return type:

int

handlers.py

Define handlers for responding to HTTP-requests.

class kuha_document_store.handlers.BaseHandler(*args, **kwargs)[source]

BaseHandler to derive from.

Provides common methods for subclasses.

Note:

use from a subclass

async prepare()[source]

Prepare for each request.

Set output content type.

get_db()[source]

Get database object stored in settings.

Returns:

database object.

Return type:

kuha_document_store.database.DocumentStoreDatabase

assert_body_not_empty(msg=None)[source]

Assert that request body contains data.

kuha_common.server.BadRequest is raised if body is empty.

Parameters:

msg (str) – Optional message for exception.

Raises:

kuha_common.server.BadRequest if body is empty.

class kuha_document_store.handlers.RestApiHandler(*args, **kwargs)[source]

Handle requests to REST api.

async get(collection, resource_id=None)[source]

HTTP-GET to REST api endpoint.

Respond with single record or multiple records, depending on whether resource_id is requested.

Note:

Results will be streamed.

Parameters:
  • collection (str) – type of the requested collection.

  • resource_id (str or None) – optional ID of the requested resource. If left out of request, will return all records of requested type.

Raises:

kuha_common.server.BadRequest if there are recoverable errors in database operation. The error message is passed to BadRequest. See: kuha_document_store.database.DocumentStoreDatabase.recoverable_errors

Raises:

kuha_common.server.ResourceNotFound if requested resource_id does not return results.

async post(collection, resource_id=None)[source]

HTTP-POST to REST api endpoint.

Create new resource from data submitted in request body.

Parameters:
  • collection (str) – collection type to create.

  • resource_id (str or None) – receives resource_id for completeness in handler configuration. It is however a kuha_common.server.BadRequest if one is submitted.

Raises:

kuha_common.server.BadRequest if request contains resource_id or if database operations raise recoverable errors. See: kuha_document_store.database.DocumentStoreDatabase.recoverable_errors

async put(collection, resource_id=None)[source]

HTTP-PUT to REST api endpoint.

Replace existing resource with data in request body.

Parameters:
  • collection (str) – collection type to replace.

  • resource_id (str or None) – resource ID to replace. Optional for completeness in handler configuration. It is however a kuha_common.server.BadRequest if not submitted.

Raises:

kuha_common.server.BadRequest if requested endpoint does not contain resource_id or if database operation raises one of kuha_document_store.database.DocumentStoreDatabase.recoverable_errors

Raises:

kuha_common.server.ResourceNotFound if resource_id returns no results.

async delete(collection, resource_id=None)[source]

HTTP-DELETE to REST api endpoint.

Delete resource or all resources of certain type.

Parameters:
  • collection (str) – type of collection

  • resource_id (str or None) – resource ID to delete.

Raises:

kuha_common.server.BadRequest if database operation raises one of kuha_document_store.database.DocumentStoreDatabase.recoverable_errors

Raises:

kuha_common.server.ResourceNotFound if resoure_id returns no results.

class kuha_document_store.handlers.QueryHandler(*args, **kwargs)[source]

Handle request to query endpoint.

Note:

Results will be streamed.

async prepare()[source]

Prepare for each request.

Request content type must be JSON. Request body must not be empty. Requested query type must be supported and query must have valid parameters.

async post(collection)[source]

HTTP-POST to query endpoint.

Streams the results one JSON document at a time. Thus, the result of a response for multiple records will not a a valid JSON document.

Note:

Body must be a JSON object.

Parameters:

collection (str) – collection (resource type) to query.

database.py

Database module provides access to MongoDB database.

MongoDB Database is accessed throught this module. The module also provides convenience methods for easy access and manipulation via Document Store records defined in kuha_common.document_store.records

Database can be used directly, via records or with JSON representation of records.

note:

This module has strict dependency to kuha_common.document_store.records

class kuha_document_store.database.Collection(name, validators, indexes_unique, indexes, isodate_fields, object_id_fields)
indexes

Alias for field number 3

indexes_unique

Alias for field number 2

isodate_fields

Alias for field number 4

name

Alias for field number 0

object_id_fields

Alias for field number 5

validators

Alias for field number 1

kuha_document_store.database.mongodburi(host_port, *hosts_ports, database=None, credentials=None, options=None)[source]

Create and return a mongodb connection string in the form of a MongoURI.

The standard URI connection scheme has the form: mongodb://[username:password@]host1[:port1][,…hostN[:portN]]][/[database][?options]]

Parameters:
  • host_port (str) – One of more host and port of a mongod instance.

  • database (str) – Optional database.

  • credentials (tuple) – Options credentials (user, pwd).

  • options (list) – Optional options as a list of tuples [(opt_key1, opt_val1), (opt_key2, opt_val2)]

Returns:

MongoURI connection string.

Return type:

str

kuha_document_store.database.bson_to_json(collection, _dict)[source]

Encode BSON dictionary to JSON.

Encodes special type of dictionary that comes from MongoDB queries to JSON representation. Also converts datetimes to strings.

Parameters:

_dict (dict) – Source object containing BSON.

Returns:

Source object converted to JSON.

Return type:

str

class kuha_document_store.database.Database(name, reader_uri, editor_uri)[source]

MongoDB database.

Provides access to low-level database operations. For fine access control uses two database credentials, one for read-only operations, one for write operations. Chooses the correct credentials to authenticate based on the operation to be performed.

Note:

Does not authenticate or connect to the database before actually performing operations that need connecting. Therefore connection/authentication issues will raise when performing operations and not when initiating the database.

Parameters:

settings (argparse.Namespace) – settings for database connections

Returns:

Database

close()[source]

Close open sockets to database.

async query_single(collection_name, query, fields=None, callback=None)[source]

Query for a single database document.

Parameters:
  • collection_name (str) – Name of database collection.

  • query (dict) – Database query.

  • fields (list or None) – Fields to select. None selects all.

  • callback (function or None) – Result callback. Called with result as parameter. If None this method will return the result.

Returns:

A single document or None if no matching document is found. or if callback is given.

Return type:

dict or None

async query_multiple(collection_name, query, callback, fields=None, skip=0, sort_by=None, sort_order=1, limit=0)[source]

Query for multiple database documents.

Note:

has mandatory callback parameter.

Parameters:
  • collection_name (str) – Name of database collection.

  • query (dict) – Database query filter.

  • callback (callable) – Result callback. Called with each document as parameter.

  • fields (list or None) – Fields to select. None selects all.

  • skip (int) – Skip documents from the beginning of query.

  • sort_by (str) – Sort by field.

  • sort_order (int) – Sort by ascending or descending order. MongoDB users 1 to sort ascending -1 to sort descending.

  • limit (int) – Limit the number of returning documents. 0 returns all documents.

async query_distinct(collection_name, fieldname, filter_=None)[source]

Query for distinct values in collection field.

Parameters:
  • collection_name (str) – Name of database collection.

  • fieldname (str) – Field to query for distinct values.

  • filter (dict or None) – Optional filter to use with query.

Returns:

distinct values.

Return type:

list

async count(collection_name, filter_=None)[source]

Query for document count.

Parameters:
  • collection_name (str) – Name of database collection.

  • filter (dict or None) – Optional filter to use for query.

Returns:

Count of documents.

Return type:

int

async insert(collection_name, document)[source]

Insert single document to database.

Parameters:
  • collection_name (str) – Name of database collection.

  • document (dict) – Document to insert.

Returns:

Insert result

Return type:

pymongo.results.InsertOneResult

async replace(collection_name, oid, document)[source]

Replace single document in database.

Parameters:
  • collection_name (str) – Name of database collection.

  • oid (str) – MongoDB object ID as string.

  • document (dict) – Document to store.

Returns:

Update result.

Return type:

pymongo.results.UpdateResult

async insert_or_replace(collection_name, query, document)[source]

Insert or replace a single document in database.

Uses special MongoDB method which will replace an existing document if one is found via query. Otherwise it will insert a new document.

Parameters:
  • collection_name (str) – Name of database collection.

  • query (dict) – Database query.

  • document (dict) – Document to store.

Returns:

The document that was stored.

Return type:

dict

async update(collection_name, filter_, update_operations)[source]

Update documents in collection matching filter.

async delete(collection_name, filter_)[source]

Delete documents matching filter.

Parameters:
  • collection_name (str) – Name of database collection.

  • query (dict) – Database query.

Returns:

Deleted count

Return type:

int

class kuha_document_store.database.DocumentStoreDatabase(collections, **kwargs)[source]

Subclass of Database

Provides specialized methods extending the functionality of Database. Combines database operations with properties of RecordsCollection. Defines exceptions that, when raised, the HTTP-response operation can continue.

recoverable_errors = (<class 'pymongo.errors.WriteError'>, <class 'json.decoder.JSONDecodeError'>, <class 'bson.errors.InvalidId'>, <class 'kuha_document_store.validation.RecordValidationError'>)

These are exceptions that may be raised in normal database operation, so they are not exceptions that should terminate the HTTP-response process. As such, the caller may want to catch these errors.

static json_decode(json_object)[source]

Helper method for converting HTTP input JSON to python dictionary.

Parameters:

json_object (str) – json to convert.

Returns:

JSON object converted to python dictionary.

Return type:

dict

async query_multiple(collection_name, query, callback, **kwargs)[source]

Query multiple documents with callback.

Converts resulting BSON to JSON. Calls callback with each resulting record JSON.

Parameters:
  • collection_name (str) – Name of database collection.

  • query (dict) – Database query.

  • callback (function) – Result callback. Called with each document as parameter.

  • **kwargs – additional keyword arguments passed to super method.

async query_by_oid(collection_name, oid, callback, fields=None, not_found_exception=None)[source]

Query single record by ObjectID with callback.

Converts BSON result to JSON. Calls the callback with resulting JSON. If parameter for not_found_exception is given, will raise the exception if query ObjectID points to no known database object.

Parameters:
  • collection_name (str) – Name of database collection.

  • oid (str) – ObjectID to query for.

  • callback (function) – function to call with resulting JSON.

  • fields (list or None) – Fields to select. None selects all.

  • not_found_exception (Exception class.) – Raised if ObjectID not found.

async query_distinct(collection_name, fieldname, filter_=None)[source]

Query for distinct values in collection field.

If fieldname points to a leaf node, returns a list of values, if it points to a branch node, returns a list of dictionaries.

If fieldname points to leaf node of isodate representations, or to branch node that contains isodates, converts datetimes to datestamps which are JSON serializable.

If ‘fieldname’ points to a leaf node containing MongoDB ObjectID values, cast those values to string.

Note:

Requires changes to logic if collection.object_id_fields should contain paths with multiple components, for example ‘some.path.with.id’. In that case distinct queries that point to brach nodes with OIDs will fail with Exception TypeError: ObjectId(’…’) is not JSON serializable.

Note:

Distinction will not work as expected on datestamp-fields that are stored as signed 64-bit integers with millisecond precision. The returned datestamps are not as precise since they have second precision.

Parameters:
  • collection_name (str) – Name of database collection.

  • fieldname (str) – Field to query for distinct values.

  • filter (dict or None) – Optional filter to use with query.

Returns:

distinct values from database

Return type:

list

async insert_json(collection_name, json_object)[source]

Insert JSON-encoded document to Database.

Special method that takes a JSON object that is then inserted to database.

Parameters:
  • collection_name (str) – Name of database collection.

  • json_object (str) – JSON object representing collection document.

Returns:

Insert result.

Return type:

pymongo.results.InsertOneResult

async replace_json(collection_name, oid, json_object, not_found_exception)[source]

Replace JSON-encoded document in Database.

Special method that replaces a document in database with document given as parameter json_object. The document to be replaced is queried by given oid.

This method also takes a not_found_exception as mandatory parameter. The exception is raised if a document with given oid cannot be found.

Note:

if the submitted JSON does not contain metadata for the document. the metadata gets calculated by RecordsCollection.process_json_for_upsert()

Parameters:
  • collection_name (str) – Name of database collection.

  • oid (str) – MongoDB object ID as string.

  • json_object (str) – JSON object representing collection document.

  • not_found_exception (Exception class.) – exception to raise if document is not found with oid

Returns:

Update result.

Return type:

pymongo.results.UpdateResult

async delete_records(collection_name, oid=None, hard_delete=False)[source]

Delete database documents.

Parameters:
  • collection_name (str) – Name of database collection.

  • oid (str) – MongoDB object ID as string.

  • hard_delete (bool) – True to physically delete document. False to logically mark the document deleted.

Returns:

Affected records’ count

Return type:

int

kuha_document_store.database.db_from_settings(settings)[source]

Instantiate DocumentStoreDatabase from loaded settings

:param argparse.Namespace settings: loaded settings :returns: Instance of DocumentStoreDatabase :rtype: DocumentStoreDatabase

kuha_document_store.database.add_cli_args(parser)[source]

Add database configuration values to be parsed.

validation.py

Simple validation for dictionary representation of document store records.

Validate study record dictionary:

>>> from kuha_common.document_store.records import Study
>>> from kuha_document_store.validation import validate
>>> validate(Study.get_collection(), Study().export_dict(include_metadata=False))
Traceback (most recent call last):
[...]
    def validate(collection, document, raise_error=True, update=False):
kuha_document_store.validation.RecordValidationError: ('Validation of studies failed',
    {'study_number': ['null value not allowed']}
)
class kuha_document_store.validation.RecordValidator(*args, **kwargs)[source]

Subclass cerberus.Validator to customize validation.

JSON does not support sets. Therefore a rule to validate list items for uniquity is needed.

For the sake of simplicity in raising and handling validation errors this class also overrides cerberus.Validator.validate().

validate(document, **kwargs)[source]

Override cerberus.Validator.validate()

Handle unvalidated _id-field here to simplify error message flow and enable validation messages.

If document is to be updated it is allowed to have an _id field. If document is being inserted it is an error to have an _id field.

Parameters:
  • document (dict) – Document to be validated.

  • **kwargs – keyword arguments passed to cerberus.Validator.validate(). Here it is only checked if keyword argument updated is present and True.

Returns:

True if validation passes, False if not.

Return type:

bool

exception kuha_document_store.validation.RecordValidationError(collection, validation_errors, msg=None)[source]

Raised on validation errors.

Parameters:
  • collection (str) – Collection that got validated.

  • validation_errors (dict) – Validation errors from cerberus.Validator.errors. These are stored in RecordValidationError.validation_errors for later processing.

  • msg (str) – Optional message.

Returns:

RecordValidationError

class kuha_document_store.validation.RecordValidationSchema(record_class, base_schema, *args)[source]

Create validation schema from records in kuha_common.document_store.records to validate user-submitted data.

Schema items are built dynamically by consulting record’s field types.

  • For single value fields the type is string and null values are not accepted.

  • For localizable fields it is required to have a kuha_common.document_store.constants.REC_FIELDNAME_LANGUAGE attribute.

  • Field attributes are strings and they may be null.

  • Subfield values are strings and not nullable.

  • Fallback to string, not null.

Record’s metadata is accepted as input but not required.

Note:

kuha_common.document_store.RecordBase._metadata and kuha_common.document_store.RecordBase._id are also validated at database level.

Seealso:

kuha_document_store.database.RecordsCollection.get_validator()

Every dynamically built schema item may be overriden by a custom schema item given as a parameter for class constructor.

Parameters:
Returns:

RecordValidationSchema

get_schema()[source]

Get Schema.

Returns:

Validation schema supported by cerberus

Return type:

dict

kuha_document_store.validation.validate(rec_val_schema, document, raise_error=True, update=False)[source]

Validate document against collection schema.

:param RecordValidationSchema rec_val_schema: Record validation schema

to validate against.

Parameters:
  • document (dict) – Document to validate. Document is a dictionary representation of a document store record.

  • raise_error (bool) – Should a RecordValidationError be raised if validation fails.

  • update (bool) – Validate for an update/replace operation of an existing record?

Returns:

True if document passed validation, False if fails.

Return type:

bool

Raises:

RecordValidationError if raise_error is True and document fails validation.

dbadmin

dbadmin/operations.py

Admin operations to setup and manage Kuha Document Store database.

Initial setup:

python -m kuha_document_store.dbadmin initiate_replicaset setup_database setup_collections setup_users

Setup empty database into an existing replicaset:

python -m kuha_document_store.dbadmin setup_database setup_collections setup_users
kuha_document_store.dbadmin.operations.OperationsSetup

alias of client

kuha_document_store.dbadmin.operations.DEFAULT_ARBITER_INDEX = -1

Set -1 to make comparison to int more simple.