Getting Started =============== The pyds library is a Python client package for interacting with the DeepChem Server API. It provides a clean, object-oriented interface for managing settings, uploading data, and submitting primitive jobs for molecular machine learning workflows. **What is pyds?** pyds simplifies the process of working with molecular data by providing: * **Unified API**: A consistent interface for all DeepChem Server operations * **Settings Management**: Centralized configuration for profiles, projects, and server connections * **Data Operations**: Easy upload and management of molecular datasets * **ML Primitives**: Ready-to-use components for featurization, training, evaluation, and inference * **Workflow Integration**: Seamless chaining of operations for complete ML pipelines Installation ------------ Install the pyds library from source: .. code-block:: bash cd pyds pip install -e . For development with testing dependencies: .. code-block:: bash pip install -e ".[dev]" Architecture ------------ The pyds library follows a clean inheritance structure designed for modularity and code reuse: .. code-block:: text BaseClient (base functionality) ├── Data (data operations) └── Primitive (abstract base for computation tasks) ├── Featurize (molecular featurization) ├── Train (model training) ├── Evaluate (model evaluation) ├── Infer (inference/predictions) └── TVTSplit (train-valid-test splitting) **Key Design Principles:** * **BaseClient**: Contains all common functionality like HTTP requests, configuration validation, and shared utilities * **Inheritance-based**: Specific clients inherit from BaseClient, eliminating code duplication * **Consistent Interface**: All clients provide the same base methods and configuration handling * **Settings Management**: Centralized configuration through a Settings class with persistent storage in a JSON file Quick Start ----------- Basic workflow for using the pyds library: 1. **Configure Settings**: Set up profile, project, and server URL 2. **Initialize Clients**: Create Data and Primitive client instances 3. **Upload Data**: Use Data client to upload datasets 4. **Run Primitives**: Use primitive classes for computation tasks .. code-block:: python from pyds import Settings, Data, Featurize, Train # Configure settings settings = Settings() settings.set_profile("my_profile") settings.set_project("my_project") # Initialize clients data_client = Data(settings) featurize_client = Featurize(settings) train_client = Train(settings) # Upload data response = data_client.upload_data("data.csv", description="My dataset") dataset_address = response['dataset_address'] # Featurize data response = featurize_client.run( dataset_address=dataset_address, featurizer="ECFP", output="featurized_data", dataset_column="smiles" ) featurized_address = response['featurized_file_address'] # Train model response = train_client.run( dataset_address=featurized_address, model_type="random_forest_classifier", model_name="my_model" )