Kuha Client

Kuha Client is used to submit records to Kuha Document Store. Kuha Client is written in Python and uses HTTP to communicate with Document Store.

Features

  • Support for DDI 3.3, DDI 3.1, DDI 2.5 and DDI 1.2.2 metadata standards.

  • Synchronize records from filesystem to Document Store.

  • Delete records in Document Store. Support logical and physical deletions.

  • Batch process DDI files by recursing into directories:

    • Option to not remove records in Document Store that cannot be not found in the current batch.

    • Option to keep track of previously processed files and bypass processing if modification times have not changed.

Dependencies & requirements

  • Python 3.8. or newer

The software is continuously tested against supported Python versions.

Python packages

Kuha Common is a library used with Kuha2 software. It can be obtained from https://bitbucket.org/tietoarkisto/kuha_common

  • kuha_common (License: EUPL)

License

Kuha Client is available under the EUPL. See LICENSE.txt for the full license.

Configuration

Some common configuration options are described here. Use --help to print all available options.

paths

Only for kuha_sync. Required positional argument. Absolute path to file or directory. Repeat to process multiple paths.

--document-store-url <document_store_url>

Required. Full URL to Document Store, for example http://localhost:6001/v0. May also be controlled by setting environment variable: KUHA_DS_URL.

--collection <collection>

Only for kuha_sync. Limits the import to a spesific document type. Valid values are [studies,variables,questions,study_groups]. Set None to import all document types. Defaults to None.

--file-cache <path>
Only for kuha_sync. Path to a cache file. Will be created if

not present. Leave unset (default) to not use file caching.

--no-remove

Only for kuha_sync. Do not remove records that were not found in this batch.

--delete-type <type>

Only for kuha_delete. Select delete type: soft or hard. Soft is for logical deletions, hard is for physical deletions. Defaults to soft.

-h, --help

Show help and exit.

Running the program

If installed to a Python virtual environment, the environment must be activated before running the program.

Synchronize records (insert, update & remove) to Document Store by scanning a directory tree for .xml files and comparing found files to the ones stored in file-cache. If a file’s modification time is newer than the one stored in file-cache, the file gets processed:

kuha_sync --file-cache=file_cache /path/to/directory

Note

The file-cache is not invalidated automatically. It must be removed manually if you have removed records using kuha_delete, or you have upgraded Kuha Client, or you have altered the records in Document Store using some other mechanism than kuha_sync.

Delete record 5af94ff06fb71d7646160bd4 from studies-collection:

kuha_delete studies 5af94ff06fb71d7646160bd4

Delete study by study_number:

kuha_delete studies study_number=study_3

Delete all records from studies-collection:

kuha_delete studies ALL

Delete all records from all collections:

kuha_delete ALL ALL

Note that when deleting records with kuha_delete, the file-cache will become invalid and should be deleted. You can simply delete it:

rm file_cache