kuha_client

kuha_client.py

Kuha Client communicates with Document Store and provides a simple way of importing, updating and deleting records by reading a batch of XML files stored in filesystem.

class kuha_client.kuha_client.SourceFile(path)[source]

File used as a source for Document Store records.

Parameters:path – Path to source file
class kuha_client.kuha_client.FileLog(path)[source]

Keep track of processed files.

Parameters:path – Path to filelog
set_current(_file)[source]

Set current source file.

Parameters:_file (SourceFile) – source file currently processed.
add_pending_study_group(study_group_identifier)[source]

Add StudyGroup record to queue waiting to be processed.

Parameters:study_group_identifier – Id of pending StudyGroup
pop_pending_study_group_files(study_group_identifier)[source]

Return and remove source files containing references to study_group_identifier.

Parameters:study_group_identifier – StudyGroup identifier.
Returns:source files referencing study_group_identifier
Return type:list
add_id(collection, _id)[source]

Add id to current source file’s collection of Document Store record IDs.

Parameters:
  • collection (str) – Document Store collection the ID belongs to.
  • _id – ObjectId (ID in Document Store) of the Record.
get_ids(collection)[source]

Return list of ObjectIds for collection in current file.

Parameters:collection (str) – Document Store collection.
Returns:ObjectIds
Return type:list
get_filepaths()[source]

Get paths from self.files

Iterate throught each SourceFile in self.files and gather their paths. Return the paths.

Returns:List of filepaths
Return type:list
load()[source]

Load FileLog from self.path. Populates self.timestamp and self.files.

save()[source]

Save FileLog to self.path.

has_match(path)[source]

Does the sourcefile found from path have a match with path and modified timestamp in this filelog.

Parameters:path – Path to source file.
Returns:True if path and timestamps match.
Return type:bool
remove_files_by_path_difference(paths)[source]

Remove each SourceFile from self.files whose path is not in paths.

Compare difference in contained source file paths to paths. Remove sourcefiles from self.files whose paths are not found. Every sourcefile whose paths is not in paths gets removed.

Parameters:paths – list of filepaths to compare.
exception kuha_client.kuha_client.DocumentStoreHTTPError(error_response)[source]

Raise if DocumentStore response payload contains errors.

kuha_client.kuha_client.get_import_url(collection=None, importer=None)[source]

Construct URL to Document Store import endpoint.

Parameters:
  • collection (str) – Optional parameter to limit the import to certain collection.
  • importer (str) – Optional parameter to set importer. Defaults to ‘ddi_c’
Returns:

Constructed URL

Return type:

str

kuha_client.kuha_client.query_record(record)[source]

Query single record by unique fields.

Parameters:record (kuha_common.document_store.records.Study or kuha_common.document_store.records.Variable or kuha_common.document_store.records.Question or kuha_common.document_store.records.StudyGroup) – record to query.
Returns:found record if any.
Return type:kuha_common.document_store.records.Study or kuha_common.document_store.records.Variable or kuha_common.document_store.records.Question or kuha_common.document_store.records.StudyGroup
kuha_client.kuha_client.query_distinct_ids(collection)[source]

Query collection for distinct ObjectIds

Parameters:collection (str) – record’s collection.
Returns:set of distinct ids.
Return type:set
kuha_client.kuha_client.iterate_xml_directory(directory)[source]

Recursively iterate over XML-files in directory.

Parameters:directory (str) – Absolute path to directory.
Returns:generator for iterating XML-files.
kuha_client.kuha_client.iterate_xml_files_recursively(*paths)[source]

Helper for batch processing XML-files.

Check each path. If a path points to a file yield its absolute path. If it points to a directory, recursively iterate paths to each XML-file found and yield each file’s absolute path.

Parameters:paths (list) – Paths to file or directory.
Returns:generator for iterating absolute paths to xml-files
class kuha_client.kuha_client.BatchProcessor(collections=None, file_log=None, sourcefiletype=None)[source]

Processor for operations perfomed in a single run.

Keep record of what gets done. Collect StudyGroups from records and update accordingly. Facilitate access to operations needed to perform tasks against Document Store API.

Parameters:
  • collections (list or None) – List of collections to process. Use None to process all collections.
  • file_log (FileLog) – Keep track of processed source files and records ObjectIDs related to them.
  • sourcefiletype (str or None) – Controls how the mapping from sourcefile to Document Store records is done. None sets the default SOURCEFILETYPE_DDIC
classmethod get_supported_sourcefiletypes()[source]

Get supported source file types.

Returns:supported source file types.
Return type:list
classmethod with_file_log(file_log_path, collections=None, sourcefiletype=None)[source]

Initiate BatchProcessor with File Log.

Parameters:
  • file_log_path (str) – path to file log.
  • collections (list or None) – collection to process.
  • sourcefiletype (str or None) – file type of source file.
sourcefileparser(path)[source]

Initiate sourcefileparser, which depends on self.sourcefiletype

Parameters:path – path to source file to be parsed.
Returns:iterative parser
create(record)[source]

Create new record.

Parameters:record (kuha_common.document_store.records.Study or kuha_common.document_store.records.Variable or kuha_common.document_store.records.Question or kuha_common.document_store.records.StudyGroup) – populated record instance which gets created.
Returns:ObjectId of the new record.
Return type:str
upsert(record)[source]

Upsert record.

If record exists, compare with existing. If records differ, discard the existing record and store the new one to DocumentStore with the existing ObjectId. If record does not exist, insert it to DocumentStore.

Parameters:record (kuha_common.document_store.records.Study or kuha_common.document_store.records.Variable or kuha_common.document_store.records.Question or kuha_common.document_store.records.StudyGroup) – populated record instance which gets created.
Returns:ObjectId of the record.
Return type:str
upsert_from_parser(parser)[source]

Upsert records to self.collections from parser.

Parameters:parser – Parser generates Document Store records.
upsert_paths(*paths)[source]

Upsert records found recursively from paths.

Parameters:*paths – one or more paths to recurse to look for files to parse.
upsert_study_groups()[source]

Upsert collected StudyGroups.

add_study_group(study_group)[source]

Add StudyGroup for later processing.

Lookup the StudyGroup if it has been stored before and update its contents. If it’s not found, store it as a new one.

Parameters:study_group (kuha_common.document_store.records.StudyGroup) – StudyGroup to add for later processing.
import_source(source_data)[source]

Import source data to Document Store.

import_source_files(*paths)[source]

Import files from paths.

Parameters:*paths – one or more paths to lookup for source files.
remove_absent(collection)[source]

Remove records from collection which were not present in current upsert run.

Parameters:collections (str) – collection to process.
remove_absent_records()[source]

Remove records which were not present in current upsert run.

remove_record(record)[source]

Remove record or records.

If record is an instance of DocumentStore record, remove it from DocumentStore. If record is a record class, remove all records from it’s collection.

Parameters:record (DocumentStore record instance or class.) – Record to remove or class whose records will be removed.
remove_study_and_relatives_by_studyid(study_id)[source]

Remove study and relative records.

For a single study the process should remove:

  • Actual Study,
  • Variable referenced from the Study,
  • Questions referenced from the Study,
  • Remove references to the Study from StudyGroups.
Note:Does not remove StudyGroup even if all references to studies are removed.
Parameters:study_id (str) – ObjectId of the study to remove.
import_run(lookup_paths)[source]

Main entry point for import run.

Parameters:lookup_paths (list) – list of paths to lookup for source files.
upsert_run(lookup_paths, remove_absent=False)[source]

Main entry point for upsert run.

Upsert records found from lookup_paths. Remove absent records if remove_absent is True.

Parameters:
  • lookup_paths (sequence) – list of paths to lookup for source files.
  • remove_absent (bool) – True will remove all records not found from lookup_paths.
remove_run(record_or_class=None)[source]

Main entry point for remove run.

Parameters:record_or_class – Record or RecordClass to remove. If None will remove every record in every collection.

kuha_import.py

Callable module serves as entry point to import records from DocumentStore.

Example run from command line. Import records from /some/path:

python -m kuha_client.kuha_import --document-store-url=http://localhost:6001/v0 /some/path

Print help:

python -m kuha_client.kuha_import -h
kuha_client.kuha_import.import_run(paths, file_log_path=None, **kwargs)[source]

Import run with arguments.

Parameters:
  • paths – Lookup source files from paths.
  • file_log_path – Path to file log.
  • **kwargs – Additional keyword arguments get passed to BatchProcessor.
Returns:

0 on success.

Return type:

int

kuha_client.kuha_import.cli()[source]

Parse command line arguments. Call import_run().

Returns:Return value of import_run()

kuha_upsert.py

Callable module serves as entry point to upsert (insert or update) records from DocumentStore.

Use Document Store’s Query API to see if document exists. If it exists, fetch it, update it, submit it back to Document Store via REST API.

Example run from command line. Upsert records from /some/path:

python -m kuha_client.kuha_upsert --document-store-url=http://localhost:6001/v0 /some/path

Print help:

python -m kuha_client.kuha_upsert -h
kuha_client.kuha_upsert.upsert_run(paths, collections=None, file_log_path=None, remove_absent=False, sourcefiletype=None)[source]

Upsert run with arguments.

Parameters:
  • paths – Lookup source files from paths.
  • collections – Limit run to collections.
  • file_log_path – Path to file log.
  • remove_absent – Should upsert run remove records, which are found from Document Store but not from source files in current run.
  • sourcefiletype – File type of source files.
Returns:

0 on success.

Return type:

int

kuha_client.kuha_upsert.cli()[source]

Parse command line arguments. Call upsert_run().

Returns:Return value of upsert_run()

kuha_delete.py

Callable module serves as entry poin to delete records from DocumentStore.

Example run from command line. Delete study with ID:

python -m kuha_client.kuha_delete --document-store-url=http://localhost:6001/v0 studies 5afa741d6fb71d7b2d333982

Print help:

python -m kuha_client.kuha_delete -h
kuha_client.kuha_delete.cli()[source]

Parse command line arguments and call BatchProcessor.remove_run()

Returns:0 on success.
Return type:int