Developer Guide

This guide provides detailed information for developers contributing to or extending LevSeq-Dash.

Getting Started 

For initial setup and contribution guidelines, please see the CONTRIBUTING.md file in the repository root.

Architecture Overview 

LevSeq-Dash is built using the Dash framework (Plotly) for creating interactive web applications in Python. The application follows a modular architecture with clear separation of concerns.

High-Level Architecture 

┌─────────────────────────────────────────────────────────┐
│              Dash Application (main_app.py)             │
│  - Routes pages                                         │
│  - Registers all callbacks                              │
│  - Initializes data manager via factory                 │
└─────────────────┬───────────────────────────────────────┘
                  │
     ┌────────────┴────────────┐
     │                         │
     ▼                         ▼
┌──────────┐             ┌──────────┐
│ Layouts  │             │Components│
│ (UI def) │             │(Widgets) │
└────┬─────┘             └─────┬────┘
     │                         │
     └─────────┬───────────────┘
               │
   ┌───────────┴────────────┐
   │                        │
   ▼                        ▼
┌─────────────────────┐    ┌──────────────┐
│ Data Manager        │    │Seq Aligner   │
│ ┌─────────────────┐ │    └──────┬───────┘
│ │ Factory         │ │           │
│ │ (manager.py)    │ │           ▼
│ └────────┬────────┘ │    ┌──────────────┐
│          │          │    │  BioPython   │
│          ▼          │    │   Aligner    │
│ ┌─────────────────┐ │    └──────────────┘
│ │ Base (Abstract) │ │
│ └────────┬────────┘ │
│          │          │
│     ┌────┴────┐     │
│     ▼         ▼     │
│ ┌──────┐ ┌────────┐ │
│ │ Disk │ │ S3/DB  │ │
│ │ Mgr  │ │(Future)│ │
│ └──┬───┘ └────────┘ │
└────┼────────────────┘
     │
     ▼
┌─────────────┐
│ Experiment  │
│   Model     │
└─────────────┘

Core Modules 

Main Application (main_app.py)
- Entry point for the Dash application
- Registers all callbacks
- Initializes data manager and configuration
- Sets up routing and navigation
Global Strings (global_strings.py)
- Centralized location for UI text, labels, and messages
- URL path constants for page routing
- Navigation labels and menu items
- Note: Most string constants in the application are defined here for consistency and easy localization
Data Manager (data_manager/)
- Abstract Base Class (base.py): Defines interface for data operations
- Factory (manager.py): Creates appropriate data manager based on configuration
- Disk Manager (disk_manager.py): Local file storage implementation
- Experiment Model (experiment.py): Data model for experiment objects
The data manager handles: - Experiment CRUD operations - Metadata management - File I/O operations - Caching with LRU cache
Sequence Aligner (sequence_aligner/)
- Wraps BioPython’s pairwise alignment functionality
- Provides BLASTP-style alignment for protein sequences
- Calculates alignment scores and statistics
- Formats alignment strings for visualization
Layouts (layouts/)

Each layout module defines a page in the application:
- layout_landing.py: Home page with navigation
- layout_upload.py: Data upload and validation
- layout_experiment.py: Single experiment view with variants
- layout_bars.py: All experiments table
- layout_matching_sequences.py: Sequence alignment results
- layout_explore.py: Sequence exploration and filtering
- layout_about.py: About page and documentation
Components (components/)

Reusable UI components:
- widgets.py: Tables, viewers, form elements, alerts
- graphs.py: Plotting functions (heatmaps, rank plots, Single-Site Mutagenesis plots)
- vis.py: Styling constants and cell coloring utilities
- column_definitions.py: AG Grid column configurations
Utils (utils/)

Helper functions for:
- Protein structure visualization
- Chemical reaction rendering
- Sequence alignment formatting
- General utilities (logging, data processing)

Data Flow 

Upload Workflow 

User Upload
    │
    ├─> Validate CSV format
    │   └─> Check required columns
    │       └─> Validate well format
    │           └─> Verify SMILES strings
    │               └─> Check for duplicates
    │
    ├─> Process Data
    │   └─> Calculate checksums
    │       └─> Extract parent sequence
    │           └─> Generate UUID
    │
    └─> Store Files
        ├─> Save metadata (JSON)
        ├─> Save experiment data (CSV)
        └─> Save geometry (CIF)

Experiment View Workflow 

Select Experiment
    │
    ├─> Load from Cache (if available)
    │   └─> Return cached Experiment object
    │
    └─> Load from Disk
        ├─> Read CSV (core columns only)
        ├─> Read CIF geometry
        ├─> Calculate unique SMILES
        ├─> Extract plates
        └─> Cache Experiment object

Sequence Alignment Workflow 

Input Sequence
    │
    ├─> Get All Lab Sequences
    │   └─> Extract parent from each experiment
    │
    ├─> Setup BLASTP Aligner
    │   └─> Configure scoring matrix
    │       └─> Set gap penalties
    │
    ├─> Perform Alignments
    │   └─> For each target sequence:
    │       ├─> Run pairwise alignment
    │       ├─> Calculate score & statistics
    │       └─> Format alignment string
    │
    └─> Filter & Sort Results
        └─> Return top matches

Configuration File and Settings 

The application uses a YAML-based configuration system located in levseq_dash/app/config/. The configuration determines deployment mode, storage backend, data paths, and logging behavior.

Configuration Files 

Key Files:

config.yaml: Main configuration file with all settings
settings.py: Python module that loads and validates configuration

Location: levseq_dash/app/config/

Configuration Structure 

The config.yaml file is organized into several sections:

# Deployment mode: "public-playground" or "local-instance"
deployment-mode: "local-instance"

# Storage backend: "disk" or "db" (database not yet implemented)
storage-mode: "disk"

# Disk storage settings
disk:
  five-letter-id-prefix: "MYLAB"
  local-data-path: "/path/to/data"
  enable-data-modification: true

# Database settings (not yet implemented)
db:
  host: ""
  port: ""

# Logging and profiling flags
logging:
  sequence-alignment-profiling: false
  data-manager: false
  pairwise-aligner: false

Deployment Modes 

The application supports two deployment modes that determine how data is accessed and whether modifications are allowed.

public-playground Mode

Purpose: Read-only demo environment for public websites
Data Location: Bundled inside container at levseq_dash/app/data/
Data Modification: Disabled (cannot upload/delete experiments)
Use Case: Public demos and deployment

deployment-mode: "public-playground"
# all other settings ignored or set to false

local-instance Mode

Purpose: Full-featured installation with persistent storage
Data Location: External mount via Docker volume or local path
Data Modification: User can enable in order to upload/delete experiments
ID Prefix: Required when data modification is enabled - a 5-letter lab identifier prepended to all experiment UUIDs
Use Case: Research labs, production deployments

deployment-mode: "local-instance"
disk:
  enable-data-modification: true # or false if that is not wanted
  local-data-path: "/path/to/data"
  five-letter-id-prefix: "MYLAB"  # Required when enable-data-modification: true

Storage Modes 

Disk Storage (Current Implementation)

Uses local filesystem for data persistence:

storage-mode: "disk"
disk:
  five-letter-id-prefix: "MYLAB"
  local-data-path: "/Users/username/data"
  enable-data-modification: true

Settings:

five-letter-id-prefix: 5-letter code prepended to experiment UUIDs
- Required when enable-data-modification: true
- Must be exactly 5 letters (no numbers or special characters)
- Automatically converted to uppercase
- Example: “MYLAB” → experiment IDs like “MYLAB-a1b2c3d4”
local-data-path: Path to data directory
- Can be absolute: "/Users/username/Desktop/MyData"
- Can be relative to app: "data" → levseq_dash/app/data/
- Overridden by DATA_PATH environment variable
enable-data-modification: Allow upload/delete operations
- true: Full read-write access (requires valid ID prefix)
- false: Read-only mode

Database Storage (Future)

Planned support for database backends:

storage-mode: "db"
db:
  host: "localhost"
  port: "5432"

Logging Settings 

Enable detailed logging for debugging and performance analysis:

logging:
  sequence-alignment-profiling: true   # Log alignment timing
  data-manager: true                    # Log data operations
  pairwise-aligner: true               # Log alignment details

Logging Flags:

sequence-alignment-profiling: Times alignment operations, useful for performance tuning
data-manager: Logs experiment CRUD operations, file I/O, cache hits/misses
pairwise-aligner: Logs BioPython alignment parameters and results

Accessing Logging Flags in Code:

from levseq_dash.app.config import settings
from levseq_dash.app.utils import utils

# Check if logging is enabled
if settings.is_data_manager_logging_enabled():
    utils.log_with_context("Loading experiment...", log_flag=True)

Environment Variables 

Environment variables override config.yaml settings and are the preferred method for Docker deployments.

Available Variables:

DATA_PATH: Override disk.local-data-path

docker run -e DATA_PATH=/data -v /host/data:/data levseq-dash

FIVE_LETTER_ID_PREFIX: Override disk.five-letter-id-prefix

docker run -e FIVE_LETTER_ID_PREFIX=PROD levseq-dash

Configuration Priority (highest to lowest):

Environment variables (DATA_PATH, FIVE_LETTER_ID_PREFIX)
config.yaml settings
Default values (if applicable)

Adding New Configuration Options 

To add new configuration settings:

Add to config.yaml:
```
my-new-section:
  my-setting: "value"
```

Add getter function to settings.py:

def get_my_new_section():
    config = load_config()
    return config.get("my-new-section", {})

def get_my_setting():
    section = get_my_new_section()
    return section.get("my-setting", "default-value")

Use in application code:

from levseq_dash.app.config import settings

value = settings.get_my_setting()

Add environment variable support (optional):

def get_my_setting():
    # Check environment variable first
    env_value = os.environ.get("MY_SETTING")
    if env_value:
        return env_value

    # Fall back to config file
    section = get_my_new_section()
    return section.get("my-setting", "default-value")

Best Practices:

Use descriptive setting names with hyphens (my-setting, not MySetting)
Provide sensible defaults in getter functions
Document new settings in config.yaml with comments
Use environment variables for secrets and deployment-specific values
Validate settings at application startup (raise clear errors for invalid values)

Adding New Features 

Adding a New Page 

Create layout in layouts/layout_my_page.py:

import dash_bootstrap_components as dbc
from dash import html

def get_layout() -> dbc.Container:
    """Create the page layout."""
    return dbc.Container([
        dbc.Row([
            dbc.Col([
                html.H1("Page Title"),
                # Your page content
            ])
        ])
    ])

Define the path in global_strings.py:

# Add navigation label
nav_my_page = "My Page"

# Add URL path (at the end of the file with other paths)
nav_my_page_path = "/my-page"

Register page route in main_app.py:

# Import the layout module at the top
from levseq_dash.app.layouts import layout_my_page

# Add route in the route_page callback (around line 109)
@app.callback(Output("id-page-content", "children"), Input("url", "pathname"))
def route_page(pathname):
    if pathname == "/":
        return layout_landing.get_layout()
    # ... existing routes ...
    elif pathname == gs.nav_my_page_path:
        return layout_my_page.get_layout()
    else:
        return html.Div([html.H2("Page not found!")])

Add navigation link in layouts/layout_bars.py sidebar or navbar

Add callbacks in main_app.py (not in the layout file):

@app.callback(
    Output("my-output", "children"),
    Input("my-button", "n_clicks"),
    State("my-input", "value"),
    prevent_initial_call=True,
)
def handle_my_page_interaction(n_clicks, input_value):
    """Handle user interaction on my page."""
    if not n_clicks:
        return dash.no_update
    # Process and return result
    return f"Processed: {input_value}"

Important Notes:

Layout files only define the UI structure via get_layout() function
All callbacks must be registered in main_app.py (not in layout files)
Page routing is handled by the route_page callback in main_app.py
Path constants are defined in global_strings.py for consistency

Adding a New Callback 

Callbacks should be added in the relevant layout module:

from dash import callback, Input, Output

@callback(
    Output("output-id", "property"),
    Input("input-id", "property"),
    prevent_initial_call=True,
)
def my_callback(input_value):
    """
    Callback description.

    Args:
        input_value: Description.

    Returns:
        Output value description.
    """
    # Process input
    result = process_data(input_value)
    return result

Best Practices:

Use descriptive callback names
Add docstrings
Use prevent_initial_call=True when appropriate
Handle errors gracefully
Log important operations

Adding a New Graph Type 

Add plotting functions to components/graphs.py:

def create_my_plot(df, x_col, y_col):
    """
    Create a custom plot visualization.

    Args:
        df: DataFrame containing data.
        x_col: Column name for X-axis.
        y_col: Column name for Y-axis.

    Returns:
        go.Figure: Plotly figure object.
    """
    fig = px.scatter(df, x=x_col, y=y_col)
    fig.update_layout(
        # Customize appearance
    )
    return fig

Extending Data Manager 

The data manager uses an abstract base class pattern with a factory for creating instances:

┌─────────────────────────────────────────────────────┐
│              main_app.py                            │
│                                                     │
│  singleton_data_mgr_instance = create_data_manager()│
└──────────────────┬──────────────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────────────────┐
│           manager.py (Factory)                          │
│                                                         │
│  def create_data_manager():                             │
│      if is_disk_mode():                                 │
│          return DiskDataManager()                       │
│      elif is_database_mode():                           │
│          return DatabaseDataManager() ← Extend backends │
│      elif is_s3_mode():                                 │
│          return S3DataManager()       ← Extend backends │
└──────────────────┬──────────────────────────────────────┘
                   │
                   ▼
┌──────────────────────────────────────────────────────┐
│         BaseDataManager (Abstract)                   │
│  - get_experiment()                                  │
│  - add_experiment_from_ui()                          │
│  - get_all_lab_experiments_with_meta_data()          │
│  - delete_experiment()                               │
│  - ... other abstract methods                        │
└──────────────────┬───────────────────────────────────┘
                   │
       ┌───────────┴───────────┬────────────┐
       │                       │            │
       ▼                       ▼            ▼
┌────────────────┐    ┌──────────────┐  ┌──────────────┐
│DiskDataManager │    │DatabaseData  │  │S3DataManager │
│                │    │Manager       │  │              │
│(Current Model) │    │(New)         │  │(New)         │
└────────────────┘    └──────────────┘  └──────────────┘

To add new functionality to all backends:

Add method to base class (data_manager/base.py):

class DataManager(ABC):
    @abstractmethod
    def my_new_method(self, param: str) -> dict[str, Any]:
        """
        New data operation.

        Args:
            param: Description.

        Returns:
            Result dictionary.
        """
        pass

Implement in disk manager (data_manager/disk_manager.py):

class DiskDataManager(DataManager):
    def my_new_method(self, param: str) -> dict[str, Any]:
        """Implement the new method."""
        # Access self._data_path for file operations
        file_path = self._data_path / f"{param}.json"
        # Read/write operations
        return result

Implement in other backends as needed (e.g., database, S3)

To add a new storage backend (e.g., S3 or database):

Create a new manager class inheriting from BaseDataManager
Implement all abstract methods for your backend
Update the factory in manager.py to return your new manager
Add configuration options to config.yaml

Testing 

Test Organization 

Tests are organized by functionality:

levseq_dash/app/tests/                     # Application tests
├── conftest.py                            # Shared fixtures and configuration
├── test_callbacks.py                      # Dash callback tests (routing, interactions)
├── test_dbmanager.py                      # Data manager operations (CRUD)
├── test_experiment.py                     # Experiment model and validation
├── test_components.py                     # UI widgets and tables
├── test_graphs.py                         # Plotting (heatmaps, rank plots, Single-Site Mutagenesis)
├── test_settings.py                       # Configuration and settings
├── test_alignment.py                      # Sequence alignment integration
├── test_utils.py                          # Utility functions
└── test_data/                             # Test fixtures and sample data

levseq_dash/app/sequence_aligner/tests/    # Sequence alignment tests
├── test_pairwise_aligner.py               # Alignment algorithm logic
└── test_alignment_time.py                 # Performance benchmarks

Shared Test Fixtures (conftest.py)

The conftest.py file provides reusable fixtures for all tests:

Path Fixtures:

test_data_path, app_data_path: Common directory paths for test and app data
path_exp_ep_data, path_exp_ssm_data: Paths to sample experiment files (CSV, CIF, JSON)

Data Fixtures:

experiment_ep_pcr_metadata, experiment_ssm_metadata: Pre-loaded experiment metadata from JSON

Mock Configuration Fixtures:

mock_load_config_from_test_data_path: Mock config pointing to test data directory
mock_get_deployment_mode, mock_is_data_modification_enabled: Mock settings functions

Data Manager Fixtures:

disk_manager_from_app_data: DiskDataManager using app data (for read-only tests)
disk_manager_from_temp_data: DiskDataManager using temporary directory (for write tests)

These fixtures avoid code duplication and ensure consistent test environments.

Debugging 

Logging 

Enable logging in config.yaml for debugging:

logging:
  sequence-alignment-profiling: true  # Alignment timing
  data-manager: true                   # Data operations
  pairwise-aligner: true              # Alignment details

Use logging in your code:

from levseq_dash.app.utils import utils
from levseq_dash.app.config import settings

utils.log_with_context(
    "Debug message",
    log_flag=settings.is_data_manager_logging_enabled()
)

Dash DevTools 

Enable Dash DevTools for debugging callbacks and hot reload:

if __name__ == "__main__":
    app.run(debug=True)