kuha_document_store¶
Kuha Document Store application
Query, manipulate and import Document Store records via HTTP API.
configure.py¶
Configure Document Store.
-
kuha_document_store.configure.
add_database_configs
()[source]¶ Add database configuration values to be parsed.
-
kuha_document_store.configure.
configure
()[source]¶ Get settings for application configuration.
Declares application spesific configuration options and some common options declared in
kuha_common.cli_setup
Configure application with arguments specified in configuration file, environment variables and command line arguments.
Note: Calling this function multiple times will not initiate new settings to be parsed, but will return previously parsed settings instead. Returns: settings Return type: argparse.Namespace
serve.py¶
Main entry point for starting Document Store server.
-
kuha_document_store.serve.
get_app
(api_version, app_settings=None)[source]¶ Setup routes and return initialized Tornado web application.
Parameters: Returns: Tornado web application.
Return type: tornado.web.Application
-
kuha_document_store.serve.
main
()[source]¶ Application main function.
Parse commandline for settings. Initialize database and web application. Start serving via
kuha_common.server.serve()
. Exit on exceptions propagated at this level.Returns: exit code, 1 on error, 0 on success. Return type: int
handlers.py¶
Define handlers for responding to HTTP-requests.
-
class
kuha_document_store.handlers.
BaseHandler
(*args, **kwargs)[source]¶ BaseHandler to derive from.
Provides common methods for subclasses.
Note: use from a subclass -
get_db
()[source]¶ Get database object stored in settings.
Returns: database object. Return type: kuha_document_store.database.DocumentStoreDatabase
-
assert_body_not_empty
(msg=None)[source]¶ Assert that request body contains data.
kuha_common.server.BadRequest
is raised if body is empty.Parameters: msg (str) – Optional message for exception. Raises: kuha_common.server.BadRequest
if body is empty.
-
-
class
kuha_document_store.handlers.
RestApiHandler
(*args, **kwargs)[source]¶ Handle requests to REST api.
-
get
(collection, resource_id=None)[source]¶ HTTP-GET to REST api endpoint.
Respond with single record or multiple records, depending on whether
resource_id
is requested.Note: Results will be streamed.
Parameters: Raises: kuha_common.server.BadRequest
if there are recoverable errors in database operation. The error message is passed to BadRequest. See:kuha_document_store.database.DocumentStoreDatabase.recoverable_errors
Raises: kuha_common.server.ResourceNotFound
if requested resource_id does not return results.
-
post
(collection, resource_id=None)[source]¶ HTTP-POST to REST api endpoint.
Create new resource from data submitted in request body.
Parameters: - collection (str) – collection type to create.
- resource_id (str or None) – receives resource_id for completeness
in handler configuration. It is however
a
kuha_common.server.BadRequest
if one is submitted.
Raises: kuha_common.server.BadRequest
if request contains resource_id or if database operations raise recoverable errors. See:kuha_document_store.database.DocumentStoreDatabase.recoverable_errors
-
put
(collection, resource_id=None)[source]¶ HTTP-PUT to REST api endpoint.
Replace existing resource with data in request body.
Parameters: - collection (str) – collection type to replace.
- resource_id (str or None) – resource ID to replace. Optional for
completeness in handler configuration. It is however
a
kuha_common.server.BadRequest
if not submitted.
Raises: kuha_common.server.BadRequest
if requested endpoint does not contain resource_id or if database operation raises one ofkuha_document_store.database.DocumentStoreDatabase.recoverable_errors
Raises: kuha_common.server.ResourceNotFound
if resource_id returns no results.
-
delete
(collection, resource_id=None)[source]¶ HTTP-DELETE to REST api endpoint.
Delete resource or all resources of certain type.
Parameters: Raises: kuha_common.server.BadRequest
if database operation raises one ofkuha_document_store.database.DocumentStoreDatabase.recoverable_errors
Raises: kuha_common.server.ResourceNotFound
if resoure_id returns no results.
-
-
class
kuha_document_store.handlers.
ImportHandler
(*args, **kwargs)[source]¶ Handle request to import endpoint.
-
prepare
()[source]¶ Prepare for each request.
All requests must define content type for XML. All requests must contain body data.
-
-
class
kuha_document_store.handlers.
QueryHandler
(*args, **kwargs)[source]¶ Handle request to query endpoint.
Note: Results will be streamed.
database.py¶
Database module provides access to MongoDB database.
MongoDB Database is accessed throught this module.
The module also provides convenience methods for easy access
and manipulation via Document Store records defined in
kuha_common.document_store.records
Database can be used directly, via records or with JSON representation of records.
note: | This module has strict dependency to
kuha_common.document_store.records |
---|
-
kuha_document_store.database.
mongodburi
(host_port, *hosts_ports, database=None, credentials=None, options=None)[source]¶ Create and return a mongodb connection string in the form of a MongoURI.
The standard URI connection scheme has the form: mongodb://[username:password@]host1[:port1][,…hostN[:portN]]][/[database][?options]]
Parameters: Returns: MongoURI connection string.
Return type:
-
class
kuha_document_store.database.
RecordsCollection
(record_class, indexes_unique=None, indexes=None, validators=None)[source]¶ Database collection.
Note: Relational Database term table is called a collection in MongoDB. Contains properties for Document Store collections. Has strict dependency to
kuha_common.document_store.records
Parameters: - record_class (
kuha_common.document_store.records.Study
orkuha_common.document_store.records.Variable
orkuha_common.document_store.records.Question
orkuha_common.document_store.records.StudyGroup
) – Class of a record that belongs to this collection. - indexes_unique (list or None) – declare unique indexes.
- indexes (list or None) – additional indexes
- validators (list or None) – additional validators
Returns: -
isodate_fields
= ['_metadata.created', '_metadata.updated']¶ List common isodate fields
-
object_id_fields
= ['_id']¶ Fields containing MongoDB ObjectIDs
-
index_updated
= [('_metadata.updated', -1)]¶ Declare updated field as index.
-
classmethod
bson_to_json
(_dict)[source]¶ Encode BSON dictionary to JSON.
Encodes special type of dictionary that comes from MongoDB queries to JSON representation. Also converts datetimes to strings.
Parameters: _dict (dict) – Source object containing BSON. Returns: Source object converted to JSON. Return type: str
-
get_validator
()[source]¶ Get defined database-level validators.
Note: All validators are combined with AND operator. Returns: Database level validators to be used on DB setup. Return type: dict
-
process_json_for_upsert
(json_document, old_metadata=None)[source]¶ Preprocess JSON for insert/update operations.
Decodes JSON to Python dictionary. Validates the result. Creates metadata for the document if the document has none, otherwise uses the submitted metadata. Decodes submitted metadata datestamps to datetime objects.
Parameters: Returns: Document ready to be submitted to database.
Return type:
- record_class (
-
kuha_document_store.database.
RECORD_COLLECTIONS
= [<kuha_document_store.database.RecordsCollection object>, <kuha_document_store.database.RecordsCollection object>, <kuha_document_store.database.RecordsCollection object>, <kuha_document_store.database.RecordsCollection object>]¶ Define Record Collections
-
class
kuha_document_store.database.
Database
(settings)[source]¶ MongoDB database.
Provides access to low-level database operations. For fine access control uses two database credentials, one for read-only operations, one for write operations. Chooses the correct credentials to authenticate based on the operation to be performed.
Note: Does not authenticate or connect to the database before actually performing operations that need connecting. Therefore connection/authentication issues will raise when performing operations and not when initiating the database. Parameters: settings ( argparse.Namespace
) – settings for database connectionsReturns: Database
-
query_single
(collection_name, query, fields=None, callback=None)[source]¶ Query for a single database document.
Parameters: Returns: A single document or None if no matching document is found. or if callback is given.
Return type:
-
query_multiple
(collection_name, query, callback, fields=None, skip=0, sort_by=None, sort_order=1, limit=0)[source]¶ Query for multiple database documents.
Note: has mandatory callback parameter.
Parameters: - collection_name (str) – Name of database collection.
- query (dict) – Database query.
- callback (Function that receives single record result as argument.) – Result callback. Called with each document as parameter.
- fields (list or None) – Fields to select. None selects all.
- skip (int) – Skip documents from the beginning of query.
- sort_by (str) – Sort by field.
- sort_order (int) – Sort by ascending or descending order. MongoDB users 1 to sort ascending -1 to sort descending.
- limit (int) – Limit the number of returning documents. 0 returns all documents.
-
query_distinct
(collection_name, fieldname, filter_=None)[source]¶ Query for distinct values in collection field.
Parameters: Returns: distinct values.
Return type:
-
count
(collection_name, filter_=None)[source]¶ Query for document count.
Parameters: Returns: Count of documents.
Return type:
-
insert
(collection_name, document)[source]¶ Insert single document to database.
Parameters: Returns: Insert result
Return type: pymongo.results.InsertOneResult
-
replace
(collection_name, oid, document)[source]¶ Replace single document in database.
Parameters: Returns: Update result.
Return type: pymongo.results.UpdateResult
-
insert_or_replace
(collection_name, query, document)[source]¶ Insert or replace a single document in database.
Uses special MongoDB method which will replace an existing document if one is found via query. Otherwise it will insert a new document.
Parameters: Returns: The document that was stored.
Return type:
-
-
class
kuha_document_store.database.
DocumentStoreDatabase
(settings)[source]¶ Subclass of
Database
Provides specialized methods extending the functionality of
Database
. Combines database operations with properties of RecordsCollection. Defines exceptions that, when raised, the HTTP-response operation can continue.-
recoverable_errors
= (<class 'pymongo.errors.WriteError'>, <class 'json.decoder.JSONDecodeError'>, <class 'bson.errors.InvalidId'>, <class 'kuha_document_store.validation.RecordValidationError'>)¶ These are exceptions that may be raised in normal database operation, so they are not exceptions that should terminate the HTTP-response process. As such, the caller may want to catch these errors.
-
static
json_decode
(json_object)[source]¶ Helper method for converting HTTP input JSON to python dictionary.
Parameters: json_object (str) – json to convert. Returns: JSON object converted to python dictionary. Return type: dict
-
query_multiple
(collection_name, query, callback, **kwargs)[source]¶ Query multiple documents with callback.
Converts resulting BSON to JSON. Calls callback with each resulting record JSON.
Parameters:
-
query_by_oid
(collection_name, oid, callback, fields=None, not_found_exception=None)[source]¶ Query single record by ObjectID with callback.
Converts BSON result to JSON. Calls the callback with resulting JSON. If parameter for not_found_exception is given, will raise the exception if query ObjectID points to no known database object.
Parameters:
-
query_distinct
(collection_name, fieldname, filter_=None)[source]¶ Query for distinct values in collection field.
If fieldname points to a leaf node, returns a list of values, if it points to a branch node, returns a list of dictionaries.
If fieldname points to leaf node of isodate representations, or to branch node that contains isodates, converts datetimes to datestamps which are JSON serializable.
If ‘fieldname’ points to a leaf node containing MongoDB ObjectID values, cast those values to string.
Note: Requires changes to logic if collection.object_id_fields should contain paths with multiple components, for example ‘some.path.with.id’. In that case distinct queries that point to brach nodes with OIDs will fail with Exception TypeError: ObjectId(’…’) is not JSON serializable.
Note: Distinction will not work as expected on datestamp-fields that are stored as signed 64-bit integers with millisecond precision. The returned datestamps are not as precise since they have second precision.
Parameters: Returns: distinct values from database
Return type:
-
insert_or_update_record
(record)[source]¶ Insert or update database document by Document Store record.
Special method that takes a Document Store record instance as parameter and determines whether to insert or update the given record.
Makes a query to MongoDB to determine if the record is already in database. If there is a record, calls the record instance’s updates_record method to update the instance with values that are present in database but not in the submitted instance.
Afterwards calls
insert_or_replace()
with record instances dictionary representation.Parameters: record ( kuha_common.document_store.records.Study
orkuha_common.document_store.records.Variable
orkuha_common.document_store.records.Question
orkuha_common.document_store.records.StudyGroup
) – Document Store record instance.Returns: operation details: {‘operation’: ‘insert’|’update’, ‘id’: <ObjectID>, <records-unique-values>} Return type: dict
-
bulk_insert_or_update_record
(records)[source]¶ Run bulk insert/update operations for Document Store records.
Method that takes an iterable parameter yielding Document Store records. Then calls
insert_or_update_record()
with each record instance.Parameters: records (iterable) – Document Store records. Returns: list of insert_or_update_record methods operation details. Return type: list
-
insert_json
(collection_name, json_object)[source]¶ Insert JSON-encoded document to Database.
Special method that takes a JSON object that is then inserted to database.
Parameters: Returns: Insert result.
Return type: pymongo.results.InsertOneResult
-
replace_json
(collection_name, oid, json_object, not_found_exception)[source]¶ Replace JSON-encoded document in Database.
Special method that replaces a document in database with document given as parameter json_object. The document to be replaced is queried by given oid.
This method also takes a not_found_exception as mandatory parameter. The exception is raised if a document with given oid cannot be found.
Note: if the submitted JSON does not contain metadata for the document. the metadata gets calculated by
RecordsCollection.process_json_for_upsert()
Parameters: Returns: Update result.
Return type: pymongo.results.UpdateResult
-
validation.py¶
Simple validation for dictionary representation of document store records.
note: | This module has strict dependency to
kuha_common.document_store.records |
---|
Validate study record dictionary:
>>> from kuha_common.document_store.records import Study
>>> from kuha_document_store.validation import validate
>>> validate(Study.get_collection(), Study().export_dict(include_metadata=False))
Traceback (most recent call last):
[...]
def validate(collection, document, raise_error=True, update=False):
kuha_document_store.validation.RecordValidationError: ('Validation of studies failed',
{'study_number': ['null value not allowed']}
)
-
class
kuha_document_store.validation.
RecordValidator
(*args, **kwargs)[source]¶ Subclass
cerberus.Validator
to customize validation.JSON does not support sets. Therefore a rule to validate list items for uniquity is needed.
For the sake of simplicity in raising and handling validation errors this class also overrides
cerberus.Validator.validate()
.-
validate
(document, **kwargs)[source]¶ Override
cerberus.Validator.validate()
Handle unvalidated _id-field here to simplify error message flow and enable validation messages.
If document is to be updated it is allowed to have an _id field. If document is being inserted it is an error to have an _id field.
Parameters: - document (dict) – Document to be validated.
- **kwargs – keyword arguments passed to
cerberus.Validator.validate()
. Here it is only checked if keyword argument updated is present and True.
Returns: True if validation passes, False if not.
Return type:
-
-
exception
kuha_document_store.validation.
RecordValidationError
(collection, validation_errors, msg=None)[source]¶ Raised on validation errors.
Parameters: Returns:
-
class
kuha_document_store.validation.
RecordValidationSchema
(record_class, *args)[source]¶ Create validation schema from records in
kuha_common.document_store.records
to validate user-submitted data.Schema items are built dynamically by consulting record’s field types.
- For single value fields the type is string and null values are not accepted.
- For localizable fields it is required to have a
kuha_common.document_store.constants.REC_FIELDNAME_LANGUAGE
attribute. - Field attributes are strings and they may be null.
- Subfield values are strings and not nullable.
- Fallback to string, not null.
Record’s metadata is accepted as input but not required.
Note: kuha_common.document_store.RecordBase._metadata
andkuha_common.document_store.RecordBase._id
are also validated at database level.Seealso: kuha_document_store.database.RecordsCollection.get_validator()
Every dynamically built schema item may be overriden by a custom schema item given as a parameter for class constructor.
Parameters: - record_class (
kuha_common.document_store.records.Study
orkuha_common.document_store.records.Variable
orkuha_common.document_store.records.Question
orkuha_common.document_store.records.StudyGroup
) – class which holds record attributes. - *args – Custom schema items to override dynamically built schema items.
Returns:
-
kuha_document_store.validation.
validate
(collection, document, raise_error=True, update=False)[source]¶ Validate document against collection schema.
Parameters: - collection (str) – Collection the document belongs to.
- document (dict) – Document to validate. Document is a dictionary representation of a document store record.
- raise_error (bool) – Should a
RecordValidationError
be raised if validation fails. - update (bool) – Validate for an update/replace operation of an existing record?
Returns: True if document passed validation, False if fails.
Return type: Raises: RecordValidationError
if raise_error is True and document fails validation.
db_setup.py¶
Script to help setup Document Store database.
Database administrator may use this script to setup MongoDB instance for usage with Document Store.
-
kuha_document_store.db_setup.
setup_admin_user
(admin_username, admin_password, db)[source]¶ Setup administrator credentials.
Note: authentication must be disabled in MongoDB to use this operation.
Parameters: Returns: MongoDB database command response
-
kuha_document_store.db_setup.
setup_users
(settings, client)[source]¶ Setup database users for Document Store.
Parameters: - settings (
argparse.Namespace
) – Document Store settings. - client (
pymongo.mongo_client.MongoClient
) – MongoDB client
Returns: list of MongoDB database command responses
- settings (
-
kuha_document_store.db_setup.
remove_users
(settings, client)[source]¶ Remove Document Store users from database.
Parameters: - settings (
argparse.Namespace
) – Document Store settings. - client (
pymongo.mongo_client.MongoClient
) – MongoDB client
Returns: list of MongoDB database command responses
- settings (
-
kuha_document_store.db_setup.
setup_database
(settings, client)[source]¶ Create Document Store database.
Parameters: - settings (
argparse.Namespace
) – Document Store settings. - client (
pymongo.mongo_client.MongoClient
) – MongoDB client
Returns: PyMongo Database object
- settings (
-
kuha_document_store.db_setup.
delete_database
(settings, client)[source]¶ Delete Document Store database.
Parameters: - settings (
argparse.Namespace
) – Document Store settings. - client (
pymongo.mongo_client.MongoClient
) – MongoDB client
Returns: None
- settings (
-
kuha_document_store.db_setup.
list_databases
(settings, client)[source]¶ List (print) databases.
Note: Database won’t show in list before it has a collection
Parameters: - settings (
argparse.Namespace
) – Document Store settings. - client (
pymongo.mongo_client.MongoClient
) – MongoDB client
Returns: list of database names
- settings (
-
kuha_document_store.db_setup.
setup_collections
(settings, client)[source]¶ Setup Document Store collections (tables).
Parameters: - settings (
argparse.Namespace
) – Document Store settings. - client (
pymongo.mongo_client.MongoClient
) – MongoDB client
Returns: list of results
- settings (
-
kuha_document_store.db_setup.
delete_collections
(settings, client)[source]¶ Delete Document Store collections (tables).
Parameters: - settings (
argparse.Namespace
) – Document Store settings. - client (
pymongo.mongo_client.MongoClient
) – MongoDB client
Returns: list of drop_collection results
- settings (
-
kuha_document_store.db_setup.
list_collections
(settings, client)[source]¶ List Document Store collections (tables).
Parameters: - settings (
argparse.Namespace
) – Document Store settings. - client (
pymongo.mongo_client.MongoClient
) – MongoDB client
Returns: List of mongodb collections
- settings (
-
kuha_document_store.db_setup.
list_db_users
(settings, client)[source]¶ List (print) database users.
Parameters: - settings (
argparse.Namespace
) – Document Store settings. - client (
pymongo.mongo_client.MongoClient
) – MongoDB client
Returns: dictionary containing database users and their properties.
- settings (
-
kuha_document_store.db_setup.
OPERATIONS
= {'delete_collections': <function delete_collections>, 'list_database_users': <function list_db_users>, 'setup_database': <function setup_database>, 'delete_database': <function delete_database>, 'list_collections': <function list_collections>, 'remove_users': <function remove_users>, 'setup_admin_user': <function setup_admin_user>, 'setup_users': <function setup_users>, 'setup_collections': <function setup_collections>, 'list_databases': <function list_databases>}¶ Supported operations.
importers¶
Supported importers are defined in this package.
Declare importers here.
-
kuha_document_store.importers.
importers
= {'ddi_c': <bound method XMLParserBase.from_string of <class 'kuha_common.document_store.mappings.ddi.DDI25RecordParser'>>, 'ddi_31': <bound method XMLParserBase.from_string of <class 'kuha_common.document_store.mappings.ddi.DDI31RecordParser'>>, 'ddi_122_nesstar': <bound method XMLParserBase.from_string of <class 'kuha_common.document_store.mappings.ddi.DDI122RecordParser'>>}¶ Register importers here. {importer_id: importer_function} Importer_id must be unique within importers. Importer_function must accept XML body as string for first argument and Document Store collection as an optional second argument. The importer function must return a generator that will iteratively return populated Document Store record instances.