Developer Guide
This guide provides detailed information for developers contributing to or extending LevSeq-Dash.
Getting Started
For initial setup and contribution guidelines, please see the CONTRIBUTING.md file in the repository root.
Architecture Overview
LevSeq-Dash is built using the Dash framework (Plotly) for creating interactive web applications in Python. The application follows a modular architecture with clear separation of concerns.
High-Level Architecture
┌─────────────────────────────────────────────────────────┐
│ Dash Application (main_app.py) │
│ - Routes pages │
│ - Registers all callbacks │
│ - Initializes data manager via factory │
└─────────────────┬───────────────────────────────────────┘
│
┌────────────┴────────────┐
│ │
▼ ▼
┌──────────┐ ┌──────────┐
│ Layouts │ │Components│
│ (UI def) │ │(Widgets) │
└────┬─────┘ └─────┬────┘
│ │
└─────────┬───────────────┘
│
┌───────────┴────────────┐
│ │
▼ ▼
┌─────────────────────┐ ┌──────────────┐
│ Data Manager │ │Seq Aligner │
│ ┌─────────────────┐ │ └──────┬───────┘
│ │ Factory │ │ │
│ │ (manager.py) │ │ ▼
│ └────────┬────────┘ │ ┌──────────────┐
│ │ │ │ BioPython │
│ ▼ │ │ Aligner │
│ ┌─────────────────┐ │ └──────────────┘
│ │ Base (Abstract) │ │
│ └────────┬────────┘ │
│ │ │
│ ┌────┴────┐ │
│ ▼ ▼ │
│ ┌──────┐ ┌────────┐ │
│ │ Disk │ │ S3/DB │ │
│ │ Mgr │ │(Future)│ │
│ └──┬───┘ └────────┘ │
└────┼────────────────┘
│
▼
┌─────────────┐
│ Experiment │
│ Model │
└─────────────┘
Core Modules
Main Application (
main_app.py)Entry point for the Dash application
Registers all callbacks
Initializes data manager and configuration
Sets up routing and navigation
Global Strings (
global_strings.py)Centralized location for UI text, labels, and messages
URL path constants for page routing
Navigation labels and menu items
Note: Most string constants in the application are defined here for consistency and easy localization
Data Manager (
data_manager/)Abstract Base Class (
base.py): Defines interface for data operationsFactory (
manager.py): Creates appropriate data manager based on configurationDisk Manager (
disk_manager.py): Local file storage implementationExperiment Model (
experiment.py): Data model for experiment objects
The data manager handles: - Experiment CRUD operations - Metadata management - File I/O operations - Caching with LRU cache
Sequence Aligner (
sequence_aligner/)Wraps BioPython’s pairwise alignment functionality
Provides BLASTP-style alignment for protein sequences
Calculates alignment scores and statistics
Formats alignment strings for visualization
Layouts (
layouts/)Each layout module defines a page in the application:
layout_landing.py: Home page with navigationlayout_upload.py: Data upload and validationlayout_experiment.py: Single experiment view with variantslayout_bars.py: All experiments tablelayout_matching_sequences.py: Sequence alignment resultslayout_explore.py: Sequence exploration and filteringlayout_about.py: About page and documentation
Components (
components/)Reusable UI components:
widgets.py: Tables, viewers, form elements, alertsgraphs.py: Plotting functions (heatmaps, rank plots, Single-Site Mutagenesis plots)vis.py: Styling constants and cell coloring utilitiescolumn_definitions.py: AG Grid column configurations
Utils (
utils/)Helper functions for:
Protein structure visualization
Chemical reaction rendering
Sequence alignment formatting
General utilities (logging, data processing)
Data Flow
Upload Workflow
User Upload
│
├─> Validate CSV format
│ └─> Check required columns
│ └─> Validate well format
│ └─> Verify SMILES strings
│ └─> Check for duplicates
│
├─> Process Data
│ └─> Calculate checksums
│ └─> Extract parent sequence
│ └─> Generate UUID
│
└─> Store Files
├─> Save metadata (JSON)
├─> Save experiment data (CSV)
└─> Save geometry (CIF)
Experiment View Workflow
Select Experiment
│
├─> Load from Cache (if available)
│ └─> Return cached Experiment object
│
└─> Load from Disk
├─> Read CSV (core columns only)
├─> Read CIF geometry
├─> Calculate unique SMILES
├─> Extract plates
└─> Cache Experiment object
Sequence Alignment Workflow
Input Sequence
│
├─> Get All Lab Sequences
│ └─> Extract parent from each experiment
│
├─> Setup BLASTP Aligner
│ └─> Configure scoring matrix
│ └─> Set gap penalties
│
├─> Perform Alignments
│ └─> For each target sequence:
│ ├─> Run pairwise alignment
│ ├─> Calculate score & statistics
│ └─> Format alignment string
│
└─> Filter & Sort Results
└─> Return top matches
Configuration File and Settings
The application uses a YAML-based configuration system located in levseq_dash/app/config/. The configuration determines deployment mode, storage backend, data paths, and logging behavior.
Configuration Files
Key Files:
config.yaml: Main configuration file with all settingssettings.py: Python module that loads and validates configuration
Location: levseq_dash/app/config/
Configuration Structure
The config.yaml file is organized into several sections:
# Deployment mode: "public-playground" or "local-instance"
deployment-mode: "local-instance"
# Storage backend: "disk" or "db" (database not yet implemented)
storage-mode: "disk"
# Disk storage settings
disk:
five-letter-id-prefix: "MYLAB"
local-data-path: "/path/to/data"
enable-data-modification: true
# Database settings (not yet implemented)
db:
host: ""
port: ""
# Logging and profiling flags
logging:
sequence-alignment-profiling: false
data-manager: false
pairwise-aligner: false
Deployment Modes
The application supports two deployment modes that determine how data is accessed and whether modifications are allowed.
public-playground Mode
Purpose: Read-only demo environment for public websites
Data Location: Bundled inside container at
levseq_dash/app/data/Data Modification: Disabled (cannot upload/delete experiments)
Use Case: Public demos and deployment
deployment-mode: "public-playground"
# all other settings ignored or set to false
local-instance Mode
Purpose: Full-featured installation with persistent storage
Data Location: External mount via Docker volume or local path
Data Modification: User can enable in order to upload/delete experiments
ID Prefix: Required when data modification is enabled - a 5-letter lab identifier prepended to all experiment UUIDs
Use Case: Research labs, production deployments
deployment-mode: "local-instance"
disk:
enable-data-modification: true # or false if that is not wanted
local-data-path: "/path/to/data"
five-letter-id-prefix: "MYLAB" # Required when enable-data-modification: true
Storage Modes
Disk Storage (Current Implementation)
Uses local filesystem for data persistence:
storage-mode: "disk"
disk:
five-letter-id-prefix: "MYLAB"
local-data-path: "/Users/username/data"
enable-data-modification: true
Settings:
five-letter-id-prefix: 5-letter code prepended to experiment UUIDsRequired when
enable-data-modification: trueMust be exactly 5 letters (no numbers or special characters)
Automatically converted to uppercase
Example: “MYLAB” → experiment IDs like “MYLAB-a1b2c3d4”
local-data-path: Path to data directoryCan be absolute:
"/Users/username/Desktop/MyData"Can be relative to app:
"data"→levseq_dash/app/data/Overridden by
DATA_PATHenvironment variable
enable-data-modification: Allow upload/delete operationstrue: Full read-write access (requires valid ID prefix)false: Read-only mode
Database Storage (Future)
Planned support for database backends:
storage-mode: "db"
db:
host: "localhost"
port: "5432"
Logging Settings
Enable detailed logging for debugging and performance analysis:
logging:
sequence-alignment-profiling: true # Log alignment timing
data-manager: true # Log data operations
pairwise-aligner: true # Log alignment details
Logging Flags:
sequence-alignment-profiling: Times alignment operations, useful for performance tuningdata-manager: Logs experiment CRUD operations, file I/O, cache hits/missespairwise-aligner: Logs BioPython alignment parameters and results
Accessing Logging Flags in Code:
from levseq_dash.app.config import settings
from levseq_dash.app.utils import utils
# Check if logging is enabled
if settings.is_data_manager_logging_enabled():
utils.log_with_context("Loading experiment...", log_flag=True)
Environment Variables
Environment variables override config.yaml settings and are the preferred method for Docker deployments.
Available Variables:
DATA_PATH: Overridedisk.local-data-pathdocker run -e DATA_PATH=/data -v /host/data:/data levseq-dash
FIVE_LETTER_ID_PREFIX: Overridedisk.five-letter-id-prefixdocker run -e FIVE_LETTER_ID_PREFIX=PROD levseq-dash
Configuration Priority (highest to lowest):
Environment variables (
DATA_PATH,FIVE_LETTER_ID_PREFIX)config.yamlsettingsDefault values (if applicable)
Adding New Configuration Options
To add new configuration settings:
Add to config.yaml:
my-new-section: my-setting: "value"
Add getter function to settings.py:
def get_my_new_section(): config = load_config() return config.get("my-new-section", {}) def get_my_setting(): section = get_my_new_section() return section.get("my-setting", "default-value")
Use in application code:
from levseq_dash.app.config import settings value = settings.get_my_setting()
Add environment variable support (optional):
def get_my_setting(): # Check environment variable first env_value = os.environ.get("MY_SETTING") if env_value: return env_value # Fall back to config file section = get_my_new_section() return section.get("my-setting", "default-value")
Best Practices:
Use descriptive setting names with hyphens (
my-setting, notMySetting)Provide sensible defaults in getter functions
Document new settings in config.yaml with comments
Use environment variables for secrets and deployment-specific values
Validate settings at application startup (raise clear errors for invalid values)
Adding New Features
Adding a New Page
Create layout in
layouts/layout_my_page.py:import dash_bootstrap_components as dbc from dash import html def get_layout() -> dbc.Container: """Create the page layout.""" return dbc.Container([ dbc.Row([ dbc.Col([ html.H1("Page Title"), # Your page content ]) ]) ])
Define the path in
global_strings.py:# Add navigation label nav_my_page = "My Page" # Add URL path (at the end of the file with other paths) nav_my_page_path = "/my-page"
Register page route in
main_app.py:# Import the layout module at the top from levseq_dash.app.layouts import layout_my_page # Add route in the route_page callback (around line 109) @app.callback(Output("id-page-content", "children"), Input("url", "pathname")) def route_page(pathname): if pathname == "/": return layout_landing.get_layout() # ... existing routes ... elif pathname == gs.nav_my_page_path: return layout_my_page.get_layout() else: return html.Div([html.H2("Page not found!")])
Add navigation link in
layouts/layout_bars.pysidebar or navbarAdd callbacks in
main_app.py(not in the layout file):@app.callback( Output("my-output", "children"), Input("my-button", "n_clicks"), State("my-input", "value"), prevent_initial_call=True, ) def handle_my_page_interaction(n_clicks, input_value): """Handle user interaction on my page.""" if not n_clicks: return dash.no_update # Process and return result return f"Processed: {input_value}"
Important Notes:
Layout files only define the UI structure via
get_layout()functionAll callbacks must be registered in
main_app.py(not in layout files)Page routing is handled by the
route_pagecallback inmain_app.pyPath constants are defined in
global_strings.pyfor consistency
Adding a New Callback
Callbacks should be added in the relevant layout module:
from dash import callback, Input, Output
@callback(
Output("output-id", "property"),
Input("input-id", "property"),
prevent_initial_call=True,
)
def my_callback(input_value):
"""
Callback description.
Args:
input_value: Description.
Returns:
Output value description.
"""
# Process input
result = process_data(input_value)
return result
Best Practices:
Use descriptive callback names
Add docstrings
Use
prevent_initial_call=Truewhen appropriateHandle errors gracefully
Log important operations
Adding a New Widget
Add reusable components to components/widgets.py:
def get_my_widget(widget_id, **kwargs):
"""
Create a custom widget.
Args:
widget_id: Unique ID for the widget.
**kwargs: Additional properties.
Returns:
Component with configured properties.
"""
return dbc.Component(
id=widget_id,
# Add properties
)
Adding a New Graph Type
Add plotting functions to components/graphs.py:
def create_my_plot(df, x_col, y_col):
"""
Create a custom plot visualization.
Args:
df: DataFrame containing data.
x_col: Column name for X-axis.
y_col: Column name for Y-axis.
Returns:
go.Figure: Plotly figure object.
"""
fig = px.scatter(df, x=x_col, y=y_col)
fig.update_layout(
# Customize appearance
)
return fig
Extending Data Manager
The data manager uses an abstract base class pattern with a factory for creating instances:
┌─────────────────────────────────────────────────────┐
│ main_app.py │
│ │
│ singleton_data_mgr_instance = create_data_manager()│
└──────────────────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ manager.py (Factory) │
│ │
│ def create_data_manager(): │
│ if is_disk_mode(): │
│ return DiskDataManager() │
│ elif is_database_mode(): │
│ return DatabaseDataManager() ← Extend backends │
│ elif is_s3_mode(): │
│ return S3DataManager() ← Extend backends │
└──────────────────┬──────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────┐
│ BaseDataManager (Abstract) │
│ - get_experiment() │
│ - add_experiment_from_ui() │
│ - get_all_lab_experiments_with_meta_data() │
│ - delete_experiment() │
│ - ... other abstract methods │
└──────────────────┬───────────────────────────────────┘
│
┌───────────┴───────────┬────────────┐
│ │ │
▼ ▼ ▼
┌────────────────┐ ┌──────────────┐ ┌──────────────┐
│DiskDataManager │ │DatabaseData │ │S3DataManager │
│ │ │Manager │ │ │
│(Current Model) │ │(New) │ │(New) │
└────────────────┘ └──────────────┘ └──────────────┘
To add new functionality to all backends:
Add method to base class (
data_manager/base.py):class DataManager(ABC): @abstractmethod def my_new_method(self, param: str) -> dict[str, Any]: """ New data operation. Args: param: Description. Returns: Result dictionary. """ pass
Implement in disk manager (
data_manager/disk_manager.py):class DiskDataManager(DataManager): def my_new_method(self, param: str) -> dict[str, Any]: """Implement the new method.""" # Access self._data_path for file operations file_path = self._data_path / f"{param}.json" # Read/write operations return result
Implement in other backends as needed (e.g., database, S3)
To add a new storage backend (e.g., S3 or database):
Create a new manager class inheriting from
BaseDataManagerImplement all abstract methods for your backend
Update the factory in
manager.pyto return your new managerAdd configuration options to
config.yaml
Testing
Test Organization
Tests are organized by functionality:
levseq_dash/app/tests/ # Application tests
├── conftest.py # Shared fixtures and configuration
├── test_callbacks.py # Dash callback tests (routing, interactions)
├── test_dbmanager.py # Data manager operations (CRUD)
├── test_experiment.py # Experiment model and validation
├── test_components.py # UI widgets and tables
├── test_graphs.py # Plotting (heatmaps, rank plots, Single-Site Mutagenesis)
├── test_settings.py # Configuration and settings
├── test_alignment.py # Sequence alignment integration
├── test_utils.py # Utility functions
└── test_data/ # Test fixtures and sample data
levseq_dash/app/sequence_aligner/tests/ # Sequence alignment tests
├── test_pairwise_aligner.py # Alignment algorithm logic
└── test_alignment_time.py # Performance benchmarks
Debugging
Logging
Enable logging in config.yaml for debugging:
logging:
sequence-alignment-profiling: true # Alignment timing
data-manager: true # Data operations
pairwise-aligner: true # Alignment details
Use logging in your code:
from levseq_dash.app.utils import utils
from levseq_dash.app.config import settings
utils.log_with_context(
"Debug message",
log_flag=settings.is_data_manager_logging_enabled()
)
Dash DevTools
Enable Dash DevTools for debugging callbacks and hot reload:
if __name__ == "__main__":
app.run(debug=True)