Kuha Document Store
Kuha Document Store is a HTTP backend API written in Python for serving Document Store records to multiple repo handlers. The Document Store uses MongoDB as a persistent storage and provides multiple endpoints for managing the database documents.
Kuha Document Store is a part of Open Source software bundle Kuha2.
Features
REST API for full control
Kuha Document Store has a REST API that gives end users full control of the records stored in the Document Store. The REST API may be used to build functionality for spesific needs, for example, to automatically update a record when a record is changed in a 3rd party storage system.
With the REST API, users can submit their records to Document Store using HTTP with JSON payload.
Flexible query support
Kuha Document Store provides an endpoint for selectively querying stored records. The Query API is used by client applications which are a part of the Kuha2 software bundle.
Dependencies & requirements
Python 3.8 or newer
MongoDB 3.4 or newer (License: GNU AGPL v3.0)
The software is continuously tested with supported Python versions.
MongoDB 3.4 is the first supported version, but the software is also known to work with 3.6, 4.2 and 5.0. Intermediate versions are most likely suitable.
Python packages
The following can be obtained from Python package index.
motor (License: Apache License 2.0)
pymongo (License: Apache License 2.0)
tornado (License: Apache License 2.0)
Cerberus (License: ICS)
python-dateutil (License: Simplified BSD)
Kuha Common is a library used with Kuha2 software. It can be obtained from https://gitlab.tuni.fi/fsd/kuha_common
kuha_common (License: EUPL)
License
Kuha Document Store is available under the EUPL. See LICENSE.txt for the full license.
Configuration
The application can be configured with a configuration file, via command line arguments or by environment variables. All configuration options have default values. If a configuration option is specified in more than one place, then command line values override environment variables which override configuration file values which override defaults.
This lists some of the available configuration options. Use –help to list all available options.
- -h, --help
Show help message and exit.
- --print-configuration
Print active configuration and exit.
- --port <port>
Port of Kuha document store database. Defaults to ´`6001``. May also be controlled by setting environment variable:
KUHA_DS_PORT
.
- --replica <replica host + port>
MongoDB replica host and port. Repeat for multiple replicas. May also be controlled by setting environment variable:
KUHA_DS_DBREPLICAS
.
Configuration file
Args that start with ‘–’ (eg. –document-store-port) can also be set
in a config file. The configuration file lookup searches the file
from current working directory and from the package directory.
The name of the default configuration file is
kuha_document_store.ini
, and can be set via configuration option
--config
.
Note
Invoke with --help
to print out config file lookup paths.
Environment variables
If the program will be run by using the scripts provided in scripts
subdirectory, the runtime environment can be controlled via scripts/runtime_env
,
which will be created by copying from scripts/runtime_env.dist
at
installation time by scripts/install_kuha_document_store_virtualenv.sh
.
Running the program
This guide will use convenience scripts from scripts
subdirectory.
It is assumed that the program was installed by using
scripts/install_kuha_document_store_virtualenv.sh
.
Run Document Store server:
./scripts/run_kuha_document_store.sh
The script will source scripts/runtime_env
and activate the installed
virtualenv. Finally it calls kuha_ds_serve
, with given command line arguments.
HTTP API endpoints
Root for the requests is configurable and defaults to localhost:6001/v0
.
Every endpoint will return HTTP status code 500
on internal errors.
Note
Responses with multiple objects will be streamed one object at a time.
REST API
Document Store REST API provides CRUD support to the underlying documents.
- GET /(collection)/(document_id)
Get object from collection with optional document_id. If document_id is not given, endpoint will return all objects in collection.
- Parameters
collection (str) – Document collection. One of studies, variables, questions or study_groups.
document_id (str) – Optional document ID. 24-character hex string.
- Status Codes
200 OK – Success
400 Bad Request – Invalid parameters
404 Not Found – Resource not found
- POST /studies
Create a new object to studies-collection from JSON request body.
Example request:
POST /studies HTTP/1.1 Content-Type: application/json {"study_number": "study_1"}
Example response:
HTTP/1.1 201 Created Content-Type: application/json; charset=UTF-8 { "result": "insert_successful", "error": null, "affected_resource": "5a82e76e6fb71d06fef00e69" }
- Request Headers
Content-Type – application/json
- Request JSON Object
study_number (string) – Required study number. Used as an identifier. Must be unique within collection.
- Response JSON Object
result (string) – Operation outcome.
error (string) – Errors during operation.
affected_resource (string) – document_id of the created object.
- Status Codes
201 Created – Created successfully.
415 Unsupported Media Type – Invalid content type.
400 Bad Request – Invalid JSON, Validation failed, Duplicate unique value.
- POST /variables
Create a new object to variables-collection from JSON request body.
Example request:
POST /variables HTTP/1.1 Content-Type: application/json { "study_number": "study_1", "variable_name": "variable_1" }
Example response:
HTTP/1.1 201 Created Content-Type: application/json; charset=UTF-8 { "result": "insert_successful", "error": null, "affected_resource": "5a82ecf16fb71d06fef00e6a" }
- Request Headers
Content-Type – application/json
- Request JSON Object
study_number (string) – Required study number. Used as an identifier combined with variable_name. Their combination must be unique within collection.
variable_name (string) – Required variable name. Used as an identifier combined with study_number. Their combination must be unique within collection.
- Response JSON Object
result (string) – Operation outcome.
error (string) – Errors during operation.
affected_resource (string) – document_id of the created object.
- Status Codes
201 Created – Created successfully.
415 Unsupported Media Type – Invalid content type.
400 Bad Request – Invalid JSON, Validation failed, Duplicate unique value.
- POST /questions
Create a new object to questions-collection from JSON request body.
Example request:
POST /questions HTTP/1.1 Content-Type: application/json { "study_number": "study_1", "question_identifier": "question_1" }
Example response:
HTTP/1.1 201 Created Content-Type: application/json; charset=UTF-8 { "result": "insert_successful", "error": null, "affected_resource": "5a82ee1a6fb71d06fef00e6b" }
- Request Headers
Content-Type – application/json
- Request JSON Object
study_number (string) – Required study number. Used as an identifier combined with question_identifier. Their combination must be unique within collection.
question_identifier (string) – Required variable name. Used as an identifier combined with study_number. Their combination must be unique within collection.
- Response JSON Object
result (string) – Operation outcome.
error (string) – Errors during operation.
affected_resource (string) – document_id of the created object.
- Status Codes
201 Created – Created successfully.
415 Unsupported Media Type – Invalid content type.
400 Bad Request – Invalid JSON, Validation failed, Duplicate unique value.
- POST /study_groups
Create a new object to study_groups-collection from JSON request body.
Example request:
POST /study_groups HTTP/1.1 Content-Type: application/json { "study_group_identifier": "study_group_1" }
Example response:
HTTP/1.1 201 Created Content-Type: application/json; charset=UTF-8 { "result": "insert_successful", "error": null, "affected_resource": "5a82ee876fb71d06fef00e6c" }
- Request Headers
Content-Type – application/json
- Request JSON Object
study_group_identifier (string) – Required. Used as an identifier and must be unique within collection.
- Response JSON Object
result (string) – Operation outcome.
error (string) – Errors during operation.
affected_resource (string) – document_id of the created object.
- Status Codes
201 Created – Created successfully.
415 Unsupported Media Type – Invalid content type.
400 Bad Request – Invalid JSON, Validation failed, Duplicate unique value.
- DELETE /(collection)/(document_id)
Delete document or all documents within collection. If optional document_id is left out, will delete all documents within collection.
- Query Parameters
delete_type (string) –
Optional delete_type parameter. Defaults to
soft
.soft
is the default delete type. It results in logically deleted records.hard
will physically delete records.
- Response JSON Object
result (string) – Operation outcome.
error (string) – Errors during operation.
affected_resource (string) – document_id of the created object or number of deleted documents if document_id is not given in request.
- Status Codes
200 OK – Delete successful.
404 Not Found – Resource not found.
409 Conflict – Attempt logical delete on a record which is already logically deleted.
- PUT /(collection)/(document_id)
Replace document within collection. :see: Documents for information on payload.
- Note
Leave
_metadata
field out of the request, to let document store handle updated-timestamp automatically.- Request Headers
Content-Type – application/json
- Response JSON Object
result (string) – Operation outcome.
error (string) – Errors during operation.
affected_resource (string) – document_id of the created object or number of deleted documents if document_id is not given in request.
- Status Codes
200 OK – Replace successful
404 Not Found – Resource not found
400 Bad Request – Invalid JSON, Validation failed, Duplicate unique value.
Query API
Query documents or information on documents from collection.
- POST /query/(collection)
Execute query against collection and return results in JSON.
- Request Headers
Content-Type – application/json
- Query Parameters
query_type (string) –
Optional query parameter to select the query type.
select
is the default query type. It returns all documents found by filter.count
returns the number of documents which match the filter.distinct
return distinct results for certain field which match the filter.
- Request JSON Object
_filter (object) – Query filter. Used for all query types. Request may specify multiple filter conditions.
Example request with multiple filter conditions:
POST /query/variables HTTP/1.1 Content-Type: application/json { "_filter": { "study_number": "study_1", "variable_name": "variable_1" } }
- Request JSON Object
fields (array) – Optional. Select returned fields. Used in select query type.
_id
will always be returned. If not set, full document will be returned.skip (int) – Optinal. Skip documents from the beginning. Used in select query type.
limit (int) – Optional. Limit the number of returned documents. Used in select query type.
sort_by (string) – Optional. Sort the queried documents by field. Used in select query type.
sort_order (int) –
Optional. Sort order of the queried documents. Used in select query type.
1
: Ascending sort order.-1
: Descending sort order.
fieldname (string) – Mandatory for distinct query type. Return distinct values for this field.
Result depends on the requested query_type.
JSON response for select query-type
Results will be streamed one object at a time. The object is a document with requested fields.
JSON response for count query-type
- Response JSON Object
count (int) – Number of documents found with
_filter
.
JSON response for distinct query-type
If
fieldname
points to document’s leaf node the response is in the following format.HTTP/1.1 200 OK Content-Type: application/json; charset=UTF-8 { "<fieldname>": ["<list-of-distinct-values>"] }
If
fieldname
points to document’s branch node the response is in the following format.HTTP/1.1 200 OK Content-Type: application/json; charset=UTF-8 { "<fieldname>": ["<list-of-distinct-objects>"] }
Example requests and responses for distinct query-type
POST /query/studies?query_type=distinct HTTP/1.1 Content-Type: application/json { "fieldname": "_metadata.updated" }
HTTP/1.1 200 OK Content-Type: application/json; charset=UTF-8 { "_metadata.updated": [ "2018-02-13T13:49:37Z", "2018-02-08T10:55:41Z" ] }
POST /query/studies?query_type=distinct HTTP/1.1 Content-Type: application/json { "fieldname": "_metadata" }
HTTP/1.1 200 OK Content-Type: application/json; charset=UTF-8 { "_metadata": [ { "updated": "2017-11-09T12:07:48Z", "cmm_type": "study", "created": "2017-11-09T11:06:03Z" }, { "updated": "2017-11-09T11:37:16Z", "cmm_type": "study", "created": "2017-11-09T11:37:16Z" } ] }
Note
Distinct queries for datetime-fields will not work as expected, due to different precision in MongoDB and Document Store JSON. MongoDB stores datetimes in millisecond’s precision, while Document Store JSON supports second’s precision.
- Status Codes
200 OK – OK
400 Bad Request – Message body empty, invalid query_type, invalid query parameters for query type.
415 Unsupported Media Type – Invalid Content-Type
Documents
Documents are objects stored in a collection. Documents support four different types of fields:
key-value pair:
{"study_number": "1200"}
contained key-value pairs:
{"_metadata": { "updated": "2018-01-31T11:37:34Z", "cmm_type": "study", "created": "2018-01-31T11:37:27Z" } }
localized contained key-value pairs:
{"study_titles": [ { "language": "en", "study_title": "Study 1983" }, { "language": "fi", "study_title": "Tutkimus 1983" } ]}
list of unique values:
{"study_numbers": ["1210", "3134", "1175", "2290", "2498"]}