Quick Start

This guide will help you run your first molecular machine learning workflow with DeepChem Server.

Before You Begin

Ensure you have:

DeepChem Server running (see Installation)
A sample dataset (CSV file with SMILES strings or molecular data)
Web browser for accessing the interactive API documentation

Interactive API Documentation

The best way to explore and test the API is through the interactive documentation:

Swagger UI: http://localhost:8000/docs

This provides:

Complete endpoint documentation with request/response schemas
Interactive request testing with real-time responses
Parameter descriptions and validation rules
Example requests and responses
Schema definitions for all data models

Basic Workflow

The typical molecular machine learning workflow involves:

Upload Data: Submit your molecular dataset to the server
Featurize: Transform molecules into machine learning features
TVTSplit: Split the dataset into training, validation, and test sets
Train: Build machine learning models on featurized data
Evaluate: Assess model performance
Infer: Make predictions on new data

Available Endpoints

Data Management

POST /data/uploaddata: Upload datasets to the datastore
GET /data/{dataset_id}/download: Download processed datasets

Primitive Operations

POST /primitive/featurize: Apply molecular featurization
POST /primitive/train: Train machine learning models
POST /primitive/evaluate: Evaluate model performance
POST /primitive/infer: Run inference on new data
POST /primitive/train-valid-test-split: Split datasets for training

System

GET /healthcheck: Check server health status

Python Client Library

For programmatic access, use the pyds Python client library:

from pyds import Settings, Data, Featurize, Train

# Configure settings
settings = Settings()
settings.set_profile("my_profile")
settings.set_project("my_project")

# Initialize clients
data_client = Data(settings)
featurize_client = Featurize(settings)
train_client = Train(settings)

# Upload and process data
response = data_client.upload_data("dataset.csv")
dataset_address = response['dataset_address']

# Featurize
response = featurize_client.run(
    dataset_address=dataset_address,
    featurizer="ECFP",
    output="featurized_data",
    dataset_column="smiles"
)

# Train model
response = train_client.run(
    dataset_address=response['featurized_file_address'],
    model_type="random_forest_classifier",
    model_name="my_model"
)

For detailed Python client documentation, see PyDS library docs.

Troubleshooting

Server Not Responding

Check if the server is running:

curl http://localhost:8000/healthcheck

Need More Information

Visit http://localhost:8000/docs for comprehensive API documentation and interactive testing