kuha_document_store

Kuha Document Store application

Persist records into a MongoDB instance. Control records via HTTP API.

serve.py

Main entry point for starting Document Store server.

kuha_document_store.serve.get_app(api_version, **kw)[source]

Setup routes and return initialized Tornado web application.

Additional keyword arguments are passed to WebApplication.

Parameters

api_version (str) – HTTP Api version gets prepended to routes.

Returns

Tornado web application.

Return type

tornado.web.Application

kuha_document_store.serve.main()[source]

Application main function.

Parse commandline for settings. Initialize database and web application. Start serving via kuha_common.server.serve(). Exit on exceptions propagated at this level.

Returns

exit code, 1 on error, 0 on success.

Return type

int

handlers.py

Define handlers for responding to HTTP-requests.

class kuha_document_store.handlers.BaseHandler(*args, **kwargs)[source]

BaseHandler to derive from.

Provides common methods for subclasses.

Note

use from a subclass

async prepare()[source]

Prepare for each request.

Set output content type.

get_db()[source]

Get database object stored in settings.

Returns

database object.

Return type

kuha_document_store.database.DocumentStoreDatabase

assert_body_not_empty(msg=None)[source]

Assert that request body contains data.

kuha_common.server.BadRequest is raised if body is empty.

Parameters

msg (str) – Optional message for exception.

Raises

kuha_common.server.BadRequest if body is empty.

class kuha_document_store.handlers.RestApiHandler(*args, **kwargs)[source]

Handle requests to REST api.

async get(collection, resource_id=None)[source]

HTTP-GET to REST api endpoint.

Respond with single record or multiple records, depending on whether resource_id is requested.

Note

Results will be streamed.

Parameters
  • collection (str) – type of the requested collection.

  • resource_id (str or None) – optional ID of the requested resource. If left out of request, will return all records of requested type.

Raises

kuha_common.server.BadRequest if there are recoverable errors in database operation. The error message is passed to BadRequest. See: kuha_document_store.database.DocumentStoreDatabase.recoverable_errors

Raises

kuha_common.server.ResourceNotFound if requested resource_id does not return results.

async post(collection, resource_id=None)[source]

HTTP-POST to REST api endpoint.

Create new resource from data submitted in request body.

Parameters
  • collection (str) – collection type to create.

  • resource_id (str or None) – receives resource_id for completeness in handler configuration. It is however a kuha_common.server.BadRequest if one is submitted.

Raises

kuha_common.server.BadRequest if request contains resource_id or if database operations raise recoverable errors. See: kuha_document_store.database.DocumentStoreDatabase.recoverable_errors

async put(collection, resource_id=None)[source]

HTTP-PUT to REST api endpoint.

Replace existing resource with data in request body.

Parameters
  • collection (str) – collection type to replace.

  • resource_id (str or None) – resource ID to replace. Optional for completeness in handler configuration. It is however a kuha_common.server.BadRequest if not submitted.

Raises

kuha_common.server.BadRequest if requested endpoint does not contain resource_id or if database operation raises one of kuha_document_store.database.DocumentStoreDatabase.recoverable_errors

Raises

kuha_common.server.ResourceNotFound if resource_id returns no results.

async delete(collection, resource_id=None)[source]

HTTP-DELETE to REST api endpoint.

Delete resource or all resources of certain type.

Parameters
  • collection (str) – type of collection

  • resource_id (str or None) – resource ID to delete.

Raises

kuha_common.server.BadRequest if database operation raises one of kuha_document_store.database.DocumentStoreDatabase.recoverable_errors

Raises

kuha_common.server.ResourceNotFound if resoure_id returns no results.

class kuha_document_store.handlers.QueryHandler(*args, **kwargs)[source]

Handle request to query endpoint.

Note

Results will be streamed.

async prepare()[source]

Prepare for each request.

Request content type must be JSON. Request body must not be empty. Requested query type must be supported and query must have valid parameters.

async post(collection)[source]

HTTP-POST to query endpoint.

Streams the results one JSON document at a time. Thus, the result of a response for multiple records will not a a valid JSON document.

Note

Body must be a JSON object.

Parameters

collection (str) – collection (resource type) to query.

database.py

Database module provides access to MongoDB database.

MongoDB Database is accessed throught this module. The module also provides convenience methods for easy access and manipulation via Document Store records defined in kuha_common.document_store.records

Database can be used directly, via records or with JSON representation of records.

note

This module has strict dependency to kuha_common.document_store.records

class kuha_document_store.database.Collection(name, validators, indexes_unique, indexes, isodate_fields, object_id_fields)
indexes

Alias for field number 3

indexes_unique

Alias for field number 2

isodate_fields

Alias for field number 4

name

Alias for field number 0

object_id_fields

Alias for field number 5

validators

Alias for field number 1

kuha_document_store.database.mongodburi(host_port, *hosts_ports, database=None, credentials=None, options=None)[source]

Create and return a mongodb connection string in the form of a MongoURI.

The standard URI connection scheme has the form: mongodb://[username:password@]host1[:port1][,…hostN[:portN]]][/[database][?options]]

Parameters
  • host_port (str) – One of more host and port of a mongod instance.

  • database (str) – Optional database.

  • credentials (tuple) – Options credentials (user, pwd).

  • options (list) – Optional options as a list of tuples [(opt_key1, opt_val1), (opt_key2, opt_val2)]

Returns

MongoURI connection string.

Return type

str

kuha_document_store.database.bson_to_json(collection, _dict)[source]

Encode BSON dictionary to JSON.

Encodes special type of dictionary that comes from MongoDB queries to JSON representation. Also converts datetimes to strings.

Parameters

_dict (dict) – Source object containing BSON.

Returns

Source object converted to JSON.

Return type

str

class kuha_document_store.database.Database(name, reader_uri, editor_uri)[source]

MongoDB database.

Provides access to low-level database operations. For fine access control uses two database credentials, one for read-only operations, one for write operations. Chooses the correct credentials to authenticate based on the operation to be performed.

Note

Does not authenticate or connect to the database before actually performing operations that need connecting. Therefore connection/authentication issues will raise when performing operations and not when initiating the database.

Parameters

settings (argparse.Namespace) – settings for database connections

Returns

Database

close()[source]

Close open sockets to database.

async query_single(collection_name, query, fields=None, callback=None)[source]

Query for a single database document.

Parameters
  • collection_name (str) – Name of database collection.

  • query (dict) – Database query.

  • fields (list or None) – Fields to select. None selects all.

  • callback (function or None) – Result callback. Called with result as parameter. If None this method will return the result.

Returns

A single document or None if no matching document is found. or if callback is given.

Return type

dict or None

async query_multiple(collection_name, query, callback, fields=None, skip=0, sort_by=None, sort_order=1, limit=0)[source]

Query for multiple database documents.

Note

has mandatory callback parameter.

Parameters
  • collection_name (str) – Name of database collection.

  • query (dict) – Database query filter.

  • callback (callable) – Result callback. Called with each document as parameter.

  • fields (list or None) – Fields to select. None selects all.

  • skip (int) – Skip documents from the beginning of query.

  • sort_by (str) – Sort by field.

  • sort_order (int) – Sort by ascending or descending order. MongoDB users 1 to sort ascending -1 to sort descending.

  • limit (int) – Limit the number of returning documents. 0 returns all documents.

async query_distinct(collection_name, fieldname, filter_=None)[source]

Query for distinct values in collection field.

Parameters
  • collection_name (str) – Name of database collection.

  • fieldname (str) – Field to query for distinct values.

  • filter (dict or None) – Optional filter to use with query.

Returns

distinct values.

Return type

list

async count(collection_name, filter_=None)[source]

Query for document count.

Parameters
  • collection_name (str) – Name of database collection.

  • filter (dict or None) – Optional filter to use for query.

Returns

Count of documents.

Return type

int

async insert(collection_name, document)[source]

Insert single document to database.

Parameters
  • collection_name (str) – Name of database collection.

  • document (dict) – Document to insert.

Returns

Insert result

Return type

pymongo.results.InsertOneResult

async replace(collection_name, oid, document)[source]

Replace single document in database.

Parameters
  • collection_name (str) – Name of database collection.

  • oid (str) – MongoDB object ID as string.

  • document (dict) – Document to store.

Returns

Update result.

Return type

pymongo.results.UpdateResult

async insert_or_replace(collection_name, query, document)[source]

Insert or replace a single document in database.

Uses special MongoDB method which will replace an existing document if one is found via query. Otherwise it will insert a new document.

Parameters
  • collection_name (str) – Name of database collection.

  • query (dict) – Database query.

  • document (dict) – Document to store.

Returns

The document that was stored.

Return type

dict

async update(collection_name, filter_, update_operations)[source]

Update documents in collection matching filter.

async delete(collection_name, filter_)[source]

Delete documents matching filter.

Parameters
  • collection_name (str) – Name of database collection.

  • query (dict) – Database query.

Returns

Deleted count

Return type

int

class kuha_document_store.database.DocumentStoreDatabase(collections, **kwargs)[source]

Subclass of Database

Provides specialized methods extending the functionality of Database. Combines database operations with properties of RecordsCollection. Defines exceptions that, when raised, the HTTP-response operation can continue.

recoverable_errors = (<class 'pymongo.errors.WriteError'>, <class 'json.decoder.JSONDecodeError'>, <class 'bson.errors.InvalidId'>, <class 'kuha_document_store.validation.RecordValidationError'>)

These are exceptions that may be raised in normal database operation, so they are not exceptions that should terminate the HTTP-response process. As such, the caller may want to catch these errors.

static json_decode(json_object)[source]

Helper method for converting HTTP input JSON to python dictionary.

Parameters

json_object (str) – json to convert.

Returns

JSON object converted to python dictionary.

Return type

dict

async query_multiple(collection_name, query, callback, **kwargs)[source]

Query multiple documents with callback.

Converts resulting BSON to JSON. Calls callback with each resulting record JSON.

Parameters
  • collection_name (str) – Name of database collection.

  • query (dict) – Database query.

  • callback (function) – Result callback. Called with each document as parameter.

  • **kwargs – additional keyword arguments passed to super method.

async query_by_oid(collection_name, oid, callback, fields=None, not_found_exception=None)[source]

Query single record by ObjectID with callback.

Converts BSON result to JSON. Calls the callback with resulting JSON. If parameter for not_found_exception is given, will raise the exception if query ObjectID points to no known database object.

Parameters
  • collection_name (str) – Name of database collection.

  • oid (str) – ObjectID to query for.

  • callback (function) – function to call with resulting JSON.

  • fields (list or None) – Fields to select. None selects all.

  • not_found_exception (Exception class.) – Raised if ObjectID not found.

async query_distinct(collection_name, fieldname, filter_=None)[source]

Query for distinct values in collection field.

If fieldname points to a leaf node, returns a list of values, if it points to a branch node, returns a list of dictionaries.

If fieldname points to leaf node of isodate representations, or to branch node that contains isodates, converts datetimes to datestamps which are JSON serializable.

If ‘fieldname’ points to a leaf node containing MongoDB ObjectID values, cast those values to string.

Note

Requires changes to logic if collection.object_id_fields should contain paths with multiple components, for example ‘some.path.with.id’. In that case distinct queries that point to brach nodes with OIDs will fail with Exception TypeError: ObjectId(’…’) is not JSON serializable.

Note

Distinction will not work as expected on datestamp-fields that are stored as signed 64-bit integers with millisecond precision. The returned datestamps are not as precise since they have second precision.

Parameters
  • collection_name (str) – Name of database collection.

  • fieldname (str) – Field to query for distinct values.

  • filter (dict or None) – Optional filter to use with query.

Returns

distinct values from database

Return type

list

async insert_json(collection_name, json_object)[source]

Insert JSON-encoded document to Database.

Special method that takes a JSON object that is then inserted to database.

Parameters
  • collection_name (str) – Name of database collection.

  • json_object (str) – JSON object representing collection document.

Returns

Insert result.

Return type

pymongo.results.InsertOneResult

async replace_json(collection_name, oid, json_object, not_found_exception)[source]

Replace JSON-encoded document in Database.

Special method that replaces a document in database with document given as parameter json_object. The document to be replaced is queried by given oid.

This method also takes a not_found_exception as mandatory parameter. The exception is raised if a document with given oid cannot be found.

Note

if the submitted JSON does not contain metadata for the document. the metadata gets calculated by RecordsCollection.process_json_for_upsert()

Parameters
  • collection_name (str) – Name of database collection.

  • oid (str) – MongoDB object ID as string.

  • json_object (str) – JSON object representing collection document.

  • not_found_exception (Exception class.) – exception to raise if document is not found with oid

Returns

Update result.

Return type

pymongo.results.UpdateResult

async delete_records(collection_name, oid=None, hard_delete=False)[source]

Delete database documents.

Parameters
  • collection_name (str) – Name of database collection.

  • oid (str) – MongoDB object ID as string.

  • hard_delete (bool) – True to physically delete document. False to logically mark the document deleted.

Returns

Affected records’ count

Return type

int

kuha_document_store.database.db_from_settings(settings)[source]

Instantiate DocumentStoreDatabase from loaded settings

:param argparse.Namespace settings: loaded settings :returns: Instance of DocumentStoreDatabase :rtype: DocumentStoreDatabase

kuha_document_store.database.add_cli_args(parser)[source]

Add database configuration values to be parsed.

validation.py

Simple validation for dictionary representation of document store records.

Validate study record dictionary:

>>> from kuha_common.document_store.records import Study
>>> from kuha_document_store.validation import validate
>>> validate(Study.get_collection(), Study().export_dict(include_metadata=False))
Traceback (most recent call last):
[...]
    def validate(collection, document, raise_error=True, update=False):
kuha_document_store.validation.RecordValidationError: ('Validation of studies failed',
    {'study_number': ['null value not allowed']}
)
class kuha_document_store.validation.RecordValidator(*args, **kwargs)[source]

Subclass cerberus.Validator to customize validation.

JSON does not support sets. Therefore a rule to validate list items for uniquity is needed.

For the sake of simplicity in raising and handling validation errors this class also overrides cerberus.Validator.validate().

validate(document, **kwargs)[source]

Override cerberus.Validator.validate()

Handle unvalidated _id-field here to simplify error message flow and enable validation messages.

If document is to be updated it is allowed to have an _id field. If document is being inserted it is an error to have an _id field.

Parameters
  • document (dict) – Document to be validated.

  • **kwargs – keyword arguments passed to cerberus.Validator.validate(). Here it is only checked if keyword argument updated is present and True.

Returns

True if validation passes, False if not.

Return type

bool

exception kuha_document_store.validation.RecordValidationError(collection, validation_errors, msg=None)[source]

Raised on validation errors.

Parameters
  • collection (str) – Collection that got validated.

  • validation_errors (dict) – Validation errors from cerberus.Validator.errors. These are stored in RecordValidationError.validation_errors for later processing.

  • msg (str) – Optional message.

Returns

RecordValidationError

class kuha_document_store.validation.RecordValidationSchema(record_class, base_schema, *args)[source]

Create validation schema from records in kuha_common.document_store.records to validate user-submitted data.

Schema items are built dynamically by consulting record’s field types.

  • For single value fields the type is string and null values are not accepted.

  • For localizable fields it is required to have a kuha_common.document_store.constants.REC_FIELDNAME_LANGUAGE attribute.

  • Field attributes are strings and they may be null.

  • Subfield values are strings and not nullable.

  • Fallback to string, not null.

Record’s metadata is accepted as input but not required.

Note

kuha_common.document_store.RecordBase._metadata and kuha_common.document_store.RecordBase._id are also validated at database level.

Seealso

kuha_document_store.database.RecordsCollection.get_validator()

Every dynamically built schema item may be overriden by a custom schema item given as a parameter for class constructor.

Parameters
Returns

RecordValidationSchema

get_schema()[source]

Get Schema.

Returns

Validation schema supported by cerberus

Return type

dict

kuha_document_store.validation.validate(rec_val_schema, document, raise_error=True, update=False)[source]

Validate document against collection schema.

:param RecordValidationSchema rec_val_schema: Record validation schema

to validate against.

Parameters
  • document (dict) – Document to validate. Document is a dictionary representation of a document store record.

  • raise_error (bool) – Should a RecordValidationError be raised if validation fails.

  • update (bool) – Validate for an update/replace operation of an existing record?

Returns

True if document passed validation, False if fails.

Return type

bool

Raises

RecordValidationError if raise_error is True and document fails validation.

dbadmin

dbadmin/operations.py

Admin operations to setup and manage Kuha Document Store database.

Initial setup:

python -m kuha_document_store.dbadmin initiate_replicaset setup_database setup_collections setup_users

Setup empty database into an existing replicaset:

python -m kuha_document_store.dbadmin setup_database setup_collections setup_users
kuha_document_store.dbadmin.operations.OperationsSetup

alias of kuha_document_store.dbadmin.operations.client

kuha_document_store.dbadmin.operations.DEFAULT_ARBITER_INDEX = -1

Set -1 to make comparison to int more simple.