Kuha Client

Kuha Client is used to submit records to Kuha Document Store. Kuha Client is written in Python and uses HTTP to communicate with Document Store.

Features

  • Support for DDI 3.1, DDI 2.5 and DDI 1.2.2 metadata standards.
  • Import records to Document Store.
  • Update records already stored in Document Store.
  • Delete records in Document Store.
  • Batch process DDI files by recursing into directories:
    • Option to remove records from Document Store not found in the current batch.
    • Option to keep track of previously processed files and bypass processing if modification times have not changed.

Dependencies & requirements

  • Python 3.5. or newer
  • Recommended: python3-venv 3.5.1 or newer

The software is continuously tested against Python versions 3.5, 3.6, 3.7, and 3.8.

Python packages

Kuha Common is a library used with Kuha2 software. It can be obtained from https://bitbucket.org/tietoarkisto/kuha_common

  • kuha_common (License: EUPL)

License

Kuha Client is available under the EUPL. See LICENSE.txt for the full license.

Configuration

Most common configuration options are described here. Use --help to print all available options.

paths

Required positional argument. Absolute path to file or directory. Repeat to process multiple paths.

-h, --help

Show help and exit.

--collection <collection>

Only for upsert and import run. Limits the import to a spesific document type. Valid values are [studies,variables,questions,study_groups]. Set None to import all document types. Defaults to None.

--document-store-url <document_store_url>

Required. Full URL to Document Store, for example http://localhost:6001/v0. May also be controlled by setting environment variable: KUHA_DS_URL.

--file-log-path <path>

Only for upsert and import run. Store processed files to file log. Compare modification times on subsequent run. Bypass if modification times have not changed.

--remove-absent

Only for upsert run. Remove records from Document Store not present in current batch.

Running the program

If installed to a Python virtual environment, the environment must be activated before running the program.

Import records to Document Store by scanning a directory tree for .xml files to submit and create a file-log to keep track of processed files:

python -m kuha_client.kuha_import --file-log-path=file_log /path/to/directory

Upsert records (insert and update) to Document Store by scanning a directory tree for .xml files and comparing found files to the ones store in file-log. If a file’s modification time is newer than the one stored in file-log, the file gets processed. When using the –remove-absent flag, any ID found from document store, but not from the current batch, gets removed:

python -m kuha_client.kuha_upsert --file-log-path=file_log --remove-absent /path/to/directory

Delete record from collection:

python -m kuha_client.kuha_delete studies 5af94ff06fb71d7646160bd4

Delete all records from collection:

python -m kuha_client.kuha_delete studies ALL

Delete all records from all collections:

python -m kuha_client.kuha_delete ALL ALL