Core Modules
This section provides detailed API documentation for the core modules of DeepChem Server, automatically generated from the source code docstrings.
Datastore Operations
The datastore module handles persistent storage of datasets and models.
Address Management
The address module provides URI-like addressing for resources.
- class deepchem_server.core.address.DeepchemAddress(address: str, kind: str | None = 'data')[source]
Bases:
objectA uniform representation to refer deepchem Objects.
DeepchemAddress provides access to storage location of the object by inferring it from the DeepchemAddress provided.
- Parameters:
Examples
>>> address = 'deepchem://profile/project/zinc.csv' >>> deepchem_address = DeepchemAddress(address)
Initialize DeepchemAddress.
- Parameters:
- classmethod make_deepchem_address_from_filename(end: str) str[source]
Return a deepchem address string from a filename.
- Parameters:
end (str) – The filename whose DeepchemAddress we are creating.
- Returns:
The DeepchemAddress of the file in the format deepchem://<storage_loc>/<end>.
- Return type:
- Raises:
ValueError – If no datastore is configured.
Examples
>>> DeepchemAddress.make_deepchem_address_from_filename('temp.txt') deepchem://test_company/test_user/working_dir/temp.txt
- classmethod get_key(address: str) str[source]
Return the key from an address.
A key is used to refer to one of DeepChem’s dataset or model.
- Parameters:
address (str) – The address string whose key we are extracting.
- Returns:
The extracted key from the address.
- Return type:
Examples
The following are all examples for different formats of the same address
Example 1:
>>> dataset_address = 'deepchem://deepchem/data/delaney' >>> key = DeepchemAddress.get_key(dataset_address) >>> key delaney
Example 2:
>>> dataset_address = 'deepchem/data/delaney' >>> key = DeepchemAddress.get_key(dataset_address) >>> key deepchem/data/delaney
Example 3:
>>> dataset_address = 'delaney' >>> key = DeepchemAddress.get_key(dataset_address) delaney
- classmethod parse_address(address: str) dict[source]
Return different components of the address.
- Parameters:
address (str) – The deepchem address of the object.
- Returns:
Dictionary containing ‘profile’, ‘project’, and ‘key’ components.
- Return type:
- Raises:
ValueError – If the address format is invalid.
Examples
>>> address = 'deepchem://user/test/file' >>> parsed_address = DeepchemAddress.parse_address(address) >>> parsed_address {'profile': 'user', 'project': 'test', 'key': 'file'}
- classmethod get_path(storage_loc: str, address: str, format: str | None = 's3', base_dir: str | None = None) str[source]
Return the path of the object in the storage from the address.
When the format is
local, thebase_diris used as the base directory and ensures that the path returned matches the OS path format.- Parameters:
storage_loc (str) – The storage location of the object (used in case the address is not in default deepchem address format).
address (str) – The deepchem address of the object.
format ({'s3', 'local'}, optional) – The format of the path to be returned, by default ‘s3’.
base_dir (str, optional) – The base directory to be used in case of ‘local’ format.
- Returns:
The path of the object in the specified format.
- Return type:
- Raises:
ValueError – If the format is not ‘s3’ or ‘local’.
Examples
All the following examples return the same path - profile/project/key
Example 1:
>>> address = 'deepchem://profile/project/key' >>> storage_loc = 'profile/project' >>> path = DeepchemAddress.get_path(storage_loc, address) >>> path profile/project/key
Example 2:
>>> address = 'profile/project/key' >>> storage_loc = 'profile/project' >>> path = DeepchemAddress.get_path(storage_loc, address) >>> path profile/project/key
Example 3:
>>> address = 'key' >>> storage_loc = 'profile/project' >>> path = DeepchemAddress.get_path(storage_loc, address) >>> path profile/project/key
- classmethod get_parent_key(address: str) str[source]
Return the parent key of the object.
- Parameters:
address (str) – The deepchem address of the object or the key of the object.
- Returns:
The parent key path.
- Return type:
Examples
>>> address = 'deepchem://profile/project/parent1/parent2/key' >>> parent_key = DeepchemAddress.get_parent_key(address) >>> parent_key parent1/parent2
>>> address = 'profile/project/parent1/parent2/key' >>> parent_key = DeepchemAddress.get_parent_key(address) >>> parent_key parent1/parent2
- classmethod get_object_name(address: str) str[source]
Return the name of the object.
- Parameters:
address (str) – The deepchem address of the object or the key of the object.
- Returns:
The object name.
- Return type:
Examples
>>> address = 'deepchem://profile/project/parent1/parent2/key' >>> object_name = DeepchemAddress.get_object_name(address) >>> object_name key
>>> address = 'profile/project/parent1/parent2/key' >>> object_name = DeepchemAddress.get_object_name(address) >>> object_name key
- __str__() str[source]
Return string representation of the address.
- Returns:
The address string.
- Return type:
- __annotations__ = {'address_prefix': <class 'str'>}
- __module__ = 'deepchem_server.core.address'
Cards and Metadata
The cards module defines metadata structures for datasets and models.
- class deepchem_server.core.cards.Card[source]
Bases:
objectBase class for cards.
Provides common functionality for data and model cards including serialization and timestamp tracking.
Initialize a Card with current timestamp.
- __bytes__() bytes[source]
Convert card to bytes representation.
- Returns:
The card as bytes using UTF-8 encoding.
- Return type:
- to_json() str[source]
Convert card to JSON string representation.
- Returns:
JSON string representation of the card.
- Return type:
- update_card(key: str, value) None[source]
Update a card attribute.
- Parameters:
key (str) – The attribute name to update.
value (Any) – The new value for the attribute.
- Return type:
None
- __module__ = 'deepchem_server.core.cards'
- class deepchem_server.core.cards.DataCard(address: str, file_type: str, data_type: str, shape=None, description: str | None = None, featurizer: str | None = None, intended_use: str | None = None, caveats: str | None = None, feat_kwargs: Dict | None = None, **kwargs)[source]
Bases:
CardClass for storing data card attributes.
- Parameters:
address (str) – Address of the reference object in the datastore.
file_type (str) – The file extension - ex. csv filetype, .json file type etc.
data_type (str) – The type of object stored at the location pointed by filename - ex: pd.DataFrame, dask.dataframe.DataFrame.
shape (tuple, optional) – Shape of the data object.
description (str, optional) – A description about the datastore.
featurizer (str, optional) – The featurizer used in the dataset.
intended_use (str, optional) – Notes on dataset - the intended use of the dataset.
caveats (str, optional) – Notes on dataset - the caveats in using the dataset.
feat_kwargs (dict, optional) – Keyword arguments for featurizer (used when featurizer is not None).
**kwargs – Additional attributes to set on the card.
Notes
Difference between data_type and file_type: An example can illustrate this better. A csv file (file_type) can either be a pandas.DataFrame or dask.dataframe.DataFrame or just a csv file. The file_type holds the file extension (‘csv’) while data_type refers to the data object (pandas.DataFrame, dask.dataframe.DataFrame, etc).
Initialize a DataCard.
- SUPPORTED_DATA_TYPES = ['pandas.DataFrame', 'dc.data.NumpyDataset', 'dc.data.DiskDataset', 'json', 'text/plain', 'png', 'binary']
- SUPPORTED_FILE_TYPES = ['csv', 'dir', 'json', 'pdb', 'fasta', 'fastq', 'png', 'sdf', 'dcd', 'txt', 'xml', 'py', 'pdbqt', 'zip', 'smi', 'smiles', 'bz2', 'cxsmiles', 'onnx', 'hdf5', 'log']
- __init__(address: str, file_type: str, data_type: str, shape=None, description: str | None = None, featurizer: str | None = None, intended_use: str | None = None, caveats: str | None = None, feat_kwargs: Dict | None = None, **kwargs) None[source]
Initialize a DataCard.
- validate_datatype(data_type: str) str[source]
Validate and normalize data type name.
- Parameters:
data_type (str) – The data type to validate.
- Returns:
The validated and normalized data type.
- Return type:
- Raises:
AssertionError – If the data type is not supported.
- get_n_samples() int[source]
Get the number of samples in the dataset.
- Returns:
Number of samples in the dataset.
- Return type:
- Raises:
ValueError – If the dataset does not have shape information.
- to_json() str[source]
Convert DataCard to JSON string.
- Returns:
JSON string representation of the DataCard.
- Return type:
- property shape
Get the shape of the data.
- Returns:
Shape of the data as a tuple.
- Return type:
- __module__ = 'deepchem_server.core.cards'
- class deepchem_server.core.cards.ModelCard(address: str, model_type: str, train_dataset_address: str, description: str | None = None, featurizer: str | None = None, intended_use: str | None = None, caveats: str | None = None, init_kwargs: Dict | None = {}, train_kwargs: Dict | None = {}, **kwargs)[source]
Bases:
CardClass for storing model card attributes.
- Parameters:
address (str) – The address of model in the datastore.
model_type (str) – The type of model. Ex: dc.models.RandomForest.
train_dataset_address (str) – Training dataset used to train the model.
description (str, optional) – A description about the model.
featurizer (str, optional) – The featurizer used in the dataset.
intended_use (str, optional) – Notes on dataset - the intended use of the dataset.
caveats (str, optional) – Notes on dataset - the caveats in using the dataset.
init_kwargs (dict, optional) – Initialization kwargs for the model ex: n_layers.
train_kwargs (dict, optional) – Training kwargs for the model ex: n_epochs.
**kwargs – Additional attributes to set on the model card.
Initialize a ModelCard.
- SUPPORTED_MODEL_TYPES = ['linear_regression', 'random_forest_classifier', 'random_forest_regressor', 'gcn']
- __init__(address: str, model_type: str, train_dataset_address: str, description: str | None = None, featurizer: str | None = None, intended_use: str | None = None, caveats: str | None = None, init_kwargs: Dict | None = {}, train_kwargs: Dict | None = {}, **kwargs) None[source]
Initialize a ModelCard.
- __module__ = 'deepchem_server.core.cards'
Configuration
The config module manages server configuration settings.
- deepchem_server.core.config.set_datastore(datastore: DiskDataStore | None) None[source]
Set the global datastore instance.
- Parameters:
datastore (DiskDataStore or None) – The datastore instance to set as the global datastore, or None to reset.
- Return type:
None
Progress Logger
The progress_logger module logs the progress of the computation.
Model Mappings
The model_mappings module maps model types to their corresponding DeepChem models.
- deepchem_server.core.model_mappings.sklearn_model(model: Callable) Callable[source]
Wrapper for sklearn models to integrate with DeepChem SklearnModel.
- Parameters:
model (Callable) – A sklearn model class to be wrapped.
- Returns:
A function that initializes a DeepChem SklearnModel with the given sklearn model.
- Return type:
Callable
- deepchem_server.core.model_mappings.update_logs(log_error: ImportError) None[source]
Update logs during import errors.
- Parameters:
log_error (ImportError) – Import error object to be logged.
- Return type:
None
Examples
>>> from deepchem_server.core import model_mappings >>> model_mappings.LOGS == {} True >>> e = ImportError('cannot import DummyModule') >>> model_mappings.update_logs(e) >>> list(model_mappings.LOGS.values())[0] ImportError('cannot import DummyModule')
Model Config Mapping
The model_config_mapper module maps model types to their corresponding DeepChem models.
- class deepchem_server.core.model_config_mapper.DeepChemModelConfigMapper(model_class: Any, model_class_name: str | None = None, required_init_params: List | None = None, optional_init_params: List | None = None, required_train_params: List | None = None, optional_train_params: List | None = None, tasks: Dict | None = None)[source]
Bases:
objectMappings between models and their configuration in Deepchem.
This class contains mappings between the models and their configuration in Deepchem. It is used to generate the model cards while uploading models.
The main purpose of this class is to validate and parse the model parameters from the config.yaml file and generate the model cards.
The config.yaml file contains the following parameters:
model_class (required): The model class in Deepchem.
init_args (optional): The init arguments for the model.
train_args (optional): The train arguments for the model.
description (optional): The description of the model (will be stored in the model card).
featurizer (optional): The featurizer for the model (will be stored in the model card).
Sample config.yaml file:
model_class: GCNModel init_args: n_tasks: 1 mode: classification batch: 2 learning_rate: 0.0003 train_args: nb_epoch: 1 description: Description of the model (will be stored in the model card)
- param model_class:
The model class in Deepchem.
- type model_class:
Any
- param model_class_name:
The name of the model class. If not provided, will be inferred.
- type model_class_name:
str, optional
- param required_init_params:
A list of required init parameters.
- type required_init_params:
list, optional
- param optional_init_params:
A list of optional init parameters.
- type optional_init_params:
list, optional
- param required_train_params:
A list of required train parameters.
- type required_train_params:
list, optional
- param optional_train_params:
A list of optional train parameters.
- type optional_train_params:
list, optional
- param tasks:
A Dictionary of tasks mapped to their respective parameter name supported by the model.
- type tasks:
dict, optional
Examples
>>> from deepchem_server.core.model_config_mapper import DeepChemModelConfigMapper >>> from deepchem.models import GCNModel >>> model = DeepChemModelConfigMapper( ... model_class=GCNModel, ... required_init_params=["init_param"], ... optional_init_params=["init_param1", "init_param2"], ... required_train_params=["train_param"], ... optional_train_params=["train_param1", "train_param2"]) >>> model.get_model_class_name() 'gcn' >>> model.get_model_class() <class 'deepchem.models.torch_models.gcn.GCNModel'> >>> model <class 'deepchem.models.torch_models.gcn.GCNModel'> >>> model.add_init_params(["test_required_init_param"]) >>> model.get_init_params("required") ['init_param', 'test_required_init_param'] >>> model.add_init_params(["test_optional_init_param"], "optional") >>> model.get_init_params("optional") ['init_param1', 'init_param2', 'test_optional_init_param'] >>> model.get_init_params() {'required': ['init_param', 'test_required_init_param'], 'optional': ['init_param1', 'init_param2', 'test_optional_init_param']} >>> model.add_tasks({"task1": "task", "task2": "mode"}) >>> model.get_tasks() {'task1': 'task', 'task2': 'mode'}
In the above example, the model tasks are mapped to their respective parameter name supported by the model. For example, the task “task1” is mapped to parameter “task” and the task so, during model initialization, if “task1” is provided as a task, then the parameter “task” will be used to initialize the model. Similarly, if “task2” is provided as a task, then the parameter “mode” will be used to initialize the model.
Initialize DeepChemModelConfigMapper.
- static parse_params(required_params: List | None, optional_params: List | None) Dict[source]
Parse the required and optional parameters of the model.
Returns a dictionary with the required and optional parameters.
- static get_class_name(model_class: Any) str[source]
Try to detect the model name for the model.
- Parameters:
model_class (Any) – The model class.
- Returns:
The model class name.
- Return type:
- __init__(model_class: Any, model_class_name: str | None = None, required_init_params: List | None = None, optional_init_params: List | None = None, required_train_params: List | None = None, optional_train_params: List | None = None, tasks: Dict | None = None) None[source]
Initialize DeepChemModelConfigMapper.
- add_init_params(init_params: List, kind: Literal['required', 'optional'] = 'required') None[source]
Add the init parameters to the model config mapping.
- Parameters:
init_params (list) – A list of init parameters.
kind ({'required', 'optional'}, optional) – Whether the init parameters are required or optional, by default ‘required’.
- Return type:
None
- add_train_params(train_params: List, kind: Literal['required', 'optional'] = 'required') None[source]
Add the train parameters to the model config mapping.
- Parameters:
train_params (list) – A list of train parameters.
kind ({'required', 'optional'}, optional) – Whether the train parameters are required or optional, by default ‘required’.
- Return type:
None
- add_tasks(tasks: Dict) None[source]
Add the tasks to the model config mapping.
- Parameters:
tasks (dict) – A dictionary of tasks mapped to their respective parameter name supported by the model.
- Return type:
None
- get_model_class() Any[source]
Return the model class for the model.
- Returns:
The model class for the model.
- Return type:
Any
- get_model_class_name() str[source]
Return the model class name for the model.
- Returns:
The model class name for the model.
- Return type:
- get_init_params(kind: Literal['required', 'optional', None] | None = None) Dict[source]
Return the initialization parameters for the model.
- Parameters:
kind ({'required', 'optional', None}, optional) – If kind is None, then the function returns all the init parameters for the model. If kind is “required”, then the function returns only the required init parameters. If kind is “optional”, then the function returns only the optional init parameters.
- Returns:
Returns a dictionary containing the init parameters for the model.
- Return type:
- get_train_params(kind: Literal['required', 'optional', None] | None = None) Dict[source]
Return the train parameters for the model.
- Parameters:
kind ({'required', 'optional', None}, optional) – If kind is None, then the function returns all the train parameters for the model. If kind is “required”, then the function returns only the required train parameters. If kind is “optional”, then the function returns only the optional train parameters.
- Returns:
Returns a dictionary containing the train parameters for the model.
- Return type:
- get_tasks() Dict[source]
Return the tasks for the model.
- Returns:
Returns a Dictionary containing the tasks mapped to their respective parameter name of the model.
- Return type:
- __getitem__(item: str) Any[source]
Return the mentioned item from the model config mapping.
- Parameters:
item (str) – The item to be returned from the model config mapping.
- Returns:
The item from the model config mapping.
- Return type:
Any
- __str__() str[source]
Return the model class name for the model.
- Returns:
The model class name.
- Return type:
Examples
>>> from deepchem_server.core.model_config_mapper import DeepChemModelConfigMapper >>> from deepchem.models import GCNModel >>> model = DeepChemModelConfigMapper( ... model_class=GCNModel, ... required_init_params=["init_param"], ... optional_init_params=["init_param1", "init_param2"], ... required_train_params=["train_param"], ... optional_train_params=["train_param1", "train_param2"]) >>> str(model) 'GCNModel'
- __repr__() Any[source]
Return the model class for the model.
- Returns:
The model class.
- Return type:
Any
Examples
>>> from deepchem_server.core.model_config_mapper import DeepChemModelConfigMapper >>> from deepchem.models import GCNModel >>> model = DeepChemModelConfigMapper( ... model_class=GCNModel, ... required_init_params=["init_param"], ... optional_init_params=["init_param1", "init_param2"], ... required_train_params=["train_param"], ... optional_train_params=["train_param1", "train_param2"]) >>> model <class 'deepchem.models.torch_models.gcn.GCNModel'>
- __module__ = 'deepchem_server.core.model_config_mapper'
- class deepchem_server.core.model_config_mapper.ModelAddressWrapper(*args, **kwargs)[source]
Bases:
dictWrapper for deepchem-server model name and deepchem model config.
This class is used to wrap the deepchem-server model name and deepchem model config. It is used as a custom dictionary to map the deepchem-server model name to the deepchem model config.
Examples
>>> from deepchem_server.core.model_config_mapper import ModelAddressWrapper, DeepChemModelConfigMapper >>> from deepchem.models import GCNModel >>> model = DeepChemModelConfigMapper( ... model_class=GCNModel, ... required_init_params=["init_param"], ... optional_init_params=["init_param1", "init_param2"], ... required_train_params=["train_param"], ... optional_train_params=["train_param1", "train_param2"]) >>> model_address_map = ModelAddressWrapper({"gcn": model}) >>> model_address_map {'gcn': <class 'deepchem.models.torch_models.gcn.GCNModel'>} >>> model_address_map['gcn'] <class 'deepchem.models.torch_models.gcn.GCNModel'> >>> model_address_map.get_model_class_name('gcn') 'gcn'
>>> # using key value pairs >>> from sklearn.linear_model import LinearRegression >>> from deepchem.models import SklearnModel >>> model = DeepChemModelConfigMapper( ... model_class=SklearnModel, ... required_init_params=None, ... optional_init_params=["fit_intercept", "copy_X", "n_jobs", "positive"], ... required_train_params=None, ... optional_train_params=None) >>> model_address_map['linear_regression'] = model >>> model_address_map['linear_regression'] <class 'deepchem.models.sklearn_models.SklearnModel'>
Initialize ModelAddressWrapper.
- Parameters:
*args – Variable length argument list. Expected dict as first argument.
**kwargs – Arbitrary keyword arguments for model mappings.
- Raises:
TypeError – If more than 1 positional argument is provided or if the first argument is not a dict.
- __init__(*args, **kwargs) None[source]
Initialize ModelAddressWrapper.
- Parameters:
*args – Variable length argument list. Expected dict as first argument.
**kwargs – Arbitrary keyword arguments for model mappings.
- Raises:
TypeError – If more than 1 positional argument is provided or if the first argument is not a dict.
- get_model_config(key: str, kind: Literal['model_name', 'class_name'] = 'model_name') DeepChemModelConfigMapper | None[source]
Return the model config map given the model key.
- Parameters:
key (str) – The name/key of the model.
kind ({'model_name', 'class_name'}, optional) – Whether the key is the model name or the model class, by default ‘model_name’.
- Returns:
The model config map for the model, or None if not found.
- Return type:
DeepChemModelConfigMapper or None
- get_model_name_from_class_name(model_class_name: str) str | None[source]
Return the model name for the model class name.
The class will be used when parsing the config.yaml file, since we don’t have the model name in the config.yaml file.
- get_model_class_name(key: str) str[source]
Return the model class name for the model key.
Since using a key to access the ModelAddressWrapper returns the model class, this function reduces the code complexity.
The below code snippets are equivalent:
>>> from deepchem.models import GCNModel >>> model_address_map = ModelAddressWrapper({"gcn": DeepChemModelConfigMapper(model_class=GCNModel)}) >>> model_address_map.get_model_class_name("gcn") 'GCNModel'
>>> model_address_map.get_model_config("gcn").get_model_class_name() 'GCNModel'
- __setitem__(key: str, value: DeepChemModelConfigMapper) None[source]
Set item in the wrapper.
- Parameters:
key (str) – The model name key.
value (DeepChemModelConfigMapper) – The model config mapper to store.
- Return type:
None
- __getitem__(key: str) Any[source]
Get item from the wrapper.
- Parameters:
key (str) – The model name key.
- Returns:
The model config mapper.
- Return type:
Any
- keys() dict_keys[source]
Return the keys of the wrapper.
- Returns:
The keys of the wrapper.
- Return type:
dict_keys
- values() dict_values[source]
Return the values of the wrapper.
- Returns:
The values of the wrapper.
- Return type:
dict_values
- __module__ = 'deepchem_server.core.model_config_mapper'
Compute Operations
The compute module handles computational tasks and job execution.