kuha_client¶
kuha_client.py¶
Kuha Client communicates with Document Store and provides a simple way of importing, updating and deleting records by reading a batch of XML files stored in filesystem.
-
class
kuha_client.kuha_client.
SourceFile
(path)[source]¶ File used as a source for Document Store records.
Parameters: path – Path to source file
-
class
kuha_client.kuha_client.
FileLog
(path)[source]¶ Keep track of processed files.
Parameters: path – Path to filelog -
set_current
(_file)[source]¶ Set current source file.
Parameters: _file ( SourceFile
) – source file currently processed.
-
add_pending_study_group
(study_group_identifier)[source]¶ Add StudyGroup record to queue waiting to be processed.
Parameters: study_group_identifier – Id of pending StudyGroup
-
pop_pending_study_group_files
(study_group_identifier)[source]¶ Return and remove source files containing references to
study_group_identifier
.Parameters: study_group_identifier – StudyGroup identifier. Returns: source files referencing study_group_identifier
Return type: list
-
add_id
(collection, _id)[source]¶ Add id to current source file’s collection of Document Store record IDs.
Parameters: - collection (str) – Document Store collection the ID belongs to.
- _id – ObjectId (ID in Document Store) of the Record.
-
get_ids
(collection)[source]¶ Return list of ObjectIds for
collection
in current file.Parameters: collection (str) – Document Store collection. Returns: ObjectIds Return type: list
-
get_filepaths
()[source]¶ Get paths from
self.files
Iterate throught each
SourceFile
inself.files
and gather their paths. Return the paths.Returns: List of filepaths Return type: list
-
has_match
(path)[source]¶ Does the sourcefile found from
path
have a match with path and modified timestamp in this filelog.Parameters: path – Path to source file. Returns: True if path and timestamps match. Return type: bool
-
remove_files_by_path_difference
(paths)[source]¶ Remove each
SourceFile
fromself.files
whose path is not inpaths
.Compare difference in contained source file paths to
paths
. Remove sourcefiles fromself.files
whose paths are not found. Every sourcefile whose paths is not in paths gets removed.Parameters: paths – list of filepaths to compare.
-
-
exception
kuha_client.kuha_client.
DocumentStoreHTTPError
(error_response)[source]¶ Raise if DocumentStore response payload contains errors.
-
kuha_client.kuha_client.
get_import_url
(collection=None, importer=None)[source]¶ Construct URL to Document Store import endpoint.
Parameters: Returns: Constructed URL
Return type:
-
kuha_client.kuha_client.
query_record
(record)[source]¶ Query single record by unique fields.
Parameters: record ( kuha_common.document_store.records.Study
orkuha_common.document_store.records.Variable
orkuha_common.document_store.records.Question
orkuha_common.document_store.records.StudyGroup
) – record to query.Returns: found record if any. Return type: kuha_common.document_store.records.Study
orkuha_common.document_store.records.Variable
orkuha_common.document_store.records.Question
orkuha_common.document_store.records.StudyGroup
-
kuha_client.kuha_client.
query_distinct_ids
(collection)[source]¶ Query collection for distinct ObjectIds
Parameters: collection (str) – record’s collection. Returns: set of distinct ids. Return type: set
-
kuha_client.kuha_client.
iterate_xml_directory
(directory)[source]¶ Recursively iterate over XML-files in directory.
Parameters: directory (str) – Absolute path to directory. Returns: generator for iterating XML-files.
-
kuha_client.kuha_client.
iterate_xml_files_recursively
(*paths)[source]¶ Helper for batch processing XML-files.
Check each path. If a path points to a file yield its absolute path. If it points to a directory, recursively iterate paths to each XML-file found and yield each file’s absolute path.
Parameters: paths (list) – Paths to file or directory. Returns: generator for iterating absolute paths to xml-files
-
class
kuha_client.kuha_client.
BatchProcessor
(collections=None, file_log=None, sourcefiletype=None)[source]¶ Processor for operations perfomed in a single run.
Keep record of what gets done. Collect StudyGroups from records and update accordingly. Facilitate access to operations needed to perform tasks against Document Store API.
Parameters: - collections (list or None) – List of collections to process. Use None to process all collections.
- file_log (
FileLog
) – Keep track of processed source files and records ObjectIDs related to them. - sourcefiletype (str or None) – Controls how the mapping from sourcefile to Document Store records is done.
None sets the default
SOURCEFILETYPE_DDIC
-
classmethod
get_supported_sourcefiletypes
()[source]¶ Get supported source file types.
Returns: supported source file types. Return type: list
-
classmethod
with_file_log
(file_log_path, collections=None, sourcefiletype=None)[source]¶ Initiate BatchProcessor with File Log.
Parameters:
-
sourcefileparser
(path)[source]¶ Initiate sourcefileparser, which depends on
self.sourcefiletype
Parameters: path – path to source file to be parsed. Returns: iterative parser
-
create
(record)[source]¶ Create new record.
Parameters: record ( kuha_common.document_store.records.Study
orkuha_common.document_store.records.Variable
orkuha_common.document_store.records.Question
orkuha_common.document_store.records.StudyGroup
) – populated record instance which gets created.Returns: ObjectId of the new record. Return type: str
-
upsert
(record)[source]¶ Upsert record.
If record exists, compare with existing. If records differ, discard the existing record and store the new one to DocumentStore with the existing ObjectId. If record does not exist, insert it to DocumentStore.
Parameters: record ( kuha_common.document_store.records.Study
orkuha_common.document_store.records.Variable
orkuha_common.document_store.records.Question
orkuha_common.document_store.records.StudyGroup
) – populated record instance which gets created.Returns: ObjectId of the record. Return type: str
-
upsert_from_parser
(parser)[source]¶ Upsert records to
self.collections
from parser.Parameters: parser – Parser generates Document Store records.
-
upsert_paths
(*paths)[source]¶ Upsert records found recursively from paths.
Parameters: *paths – one or more paths to recurse to look for files to parse.
-
add_study_group
(study_group)[source]¶ Add StudyGroup for later processing.
Lookup the StudyGroup if it has been stored before and update its contents. If it’s not found, store it as a new one.
Parameters: study_group ( kuha_common.document_store.records.StudyGroup
) – StudyGroup to add for later processing.
-
import_source_files
(*paths)[source]¶ Import files from paths.
Parameters: *paths – one or more paths to lookup for source files.
-
remove_absent
(collection)[source]¶ Remove records from collection which were not present in current upsert run.
Parameters: collections (str) – collection to process.
-
remove_record
(record)[source]¶ Remove record or records.
If record is an instance of DocumentStore record, remove it from DocumentStore. If record is a record class, remove all records from it’s collection.
Parameters: record (DocumentStore record instance or class.) – Record to remove or class whose records will be removed.
-
remove_study_and_relatives_by_studyid
(study_id)[source]¶ Remove study and relative records.
For a single study the process should remove:
- Actual Study,
- Variable referenced from the Study,
- Questions referenced from the Study,
- Remove references to the Study from StudyGroups.
Note: Does not remove StudyGroup even if all references to studies are removed. Parameters: study_id (str) – ObjectId of the study to remove.
-
import_run
(lookup_paths)[source]¶ Main entry point for import run.
Parameters: lookup_paths (list) – list of paths to lookup for source files.
-
upsert_run
(lookup_paths, remove_absent=False)[source]¶ Main entry point for upsert run.
Upsert records found from
lookup_paths
. Remove absent records ifremove_absent
is True.Parameters: - lookup_paths (sequence) – list of paths to lookup for source files.
- remove_absent (bool) – True will remove all records not found from lookup_paths.
kuha_import.py¶
Callable module serves as entry point to import records from DocumentStore.
Example run from command line. Import records from /some/path:
python -m kuha_client.kuha_import --document-store-url=http://localhost:6001/v0 /some/path
Print help:
python -m kuha_client.kuha_import -h
-
kuha_client.kuha_import.
import_run
(paths, file_log_path=None, **kwargs)[source]¶ Import run with arguments.
Parameters: - paths – Lookup source files from paths.
- file_log_path – Path to file log.
- **kwargs – Additional keyword arguments get passed to BatchProcessor.
Returns: 0 on success.
Return type:
-
kuha_client.kuha_import.
cli
()[source]¶ Parse command line arguments. Call
import_run()
.Returns: Return value of import_run()
kuha_upsert.py¶
Callable module serves as entry point to upsert (insert or update) records from DocumentStore.
Use Document Store’s Query API to see if document exists. If it exists, fetch it, update it, submit it back to Document Store via REST API.
Example run from command line. Upsert records from /some/path:
python -m kuha_client.kuha_upsert --document-store-url=http://localhost:6001/v0 /some/path
Print help:
python -m kuha_client.kuha_upsert -h
-
kuha_client.kuha_upsert.
upsert_run
(paths, collections=None, file_log_path=None, remove_absent=False, sourcefiletype=None)[source]¶ Upsert run with arguments.
Parameters: - paths – Lookup source files from paths.
- collections – Limit run to collections.
- file_log_path – Path to file log.
- remove_absent – Should upsert run remove records, which are found from Document Store but not from source files in current run.
- sourcefiletype – File type of source files.
Returns: 0 on success.
Return type:
-
kuha_client.kuha_upsert.
cli
()[source]¶ Parse command line arguments. Call
upsert_run()
.Returns: Return value of upsert_run()
kuha_delete.py¶
Callable module serves as entry poin to delete records from DocumentStore.
Example run from command line. Delete study with ID:
python -m kuha_client.kuha_delete --document-store-url=http://localhost:6001/v0 studies 5afa741d6fb71d7b2d333982
Print help:
python -m kuha_client.kuha_delete -h