The kitools package provides utilities for data scientists working on Knowledge Integration (KI) projects and supporting workflows within these projects.
The kitools package is available as both Python and R packages. To tailor this documentation to your language of choice, use the R/Python toggle in the navigation bar. This introduction provides background and an overview of package functionality. To learn how to install and use the package, follow the links in the “Articles” menu in the navigation bar.
The primary workflow supported by kitools is working with data that is stored in repositories on content nodes.
In KI projects, there are three major classes of data:
KiData_MNCH_Controlled
). These repositories are managed by the data curation team, who have read/write access to the respositories. Data scientists pull data from these repositories to use in their KI analyses, but do not have write access to the repositories.KI data scientists perform their work on a local workstation, but they need a way to get the data they need to analyze to their workstation as well as to share data and results coming out of their analysis.
The kitools package provides functionality that helps you register all data associated with your analysis and handles pushing and pulling that data to and from the content node. This is handled through the notion of a “KI project”.
At the heart of the kitools package is the notion of a “KI project”, which is a directory on the data scientist’s workstation in which all analysis data, code and artifacts are stored, and a corresponding analysis Synapse repository.
The package provides a function that initializes a KI project directory and Synapse space (or can associate the project with an existing analysis Synapse space), with additional functions that help you register datasets that are associated with the analysis. All datasets are tracked in a KI project “manifest” file, which provides a mapping of data in the local directory and where the data is located on Synapse.
A typical KI analysis begins with the specification of core datasets that will be used in the analysis, with functions to add these datasets to the project manifest and pull them from their corresponding core data repository spaces on Synapse.
Then, throughout the analysis, as the data scientist adds auxiliary data or creates analysis artifacts and results, these can be added and pushed to the KI project’s KI analysis repository.
You can learn the specifics of doing this in the “Articles” documents accessible from the navigation bar.
There are many reasons for keeping a close tracking of data used in KI analyses and providing the push/pull of the data between the analyst’s workstation and the content node. These include:
To get started using kitools, visit the installation and setup guide.