Developer Guide =============== This guide provides detailed information for developers contributing to or extending LevSeq-Dash. .. contents:: Table of Contents :local: :depth: 2 Getting Started --------------- For initial setup and contribution guidelines, please see the `CONTRIBUTING.md `_ file in the repository root. Architecture Overview --------------------- LevSeq-Dash is built using the Dash framework (Plotly) for creating interactive web applications in Python. The application follows a modular architecture with clear separation of concerns. High-Level Architecture ~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: text ┌─────────────────────────────────────────────────────────┐ │ Dash Application (main_app.py) │ │ - Routes pages │ │ - Registers all callbacks │ │ - Initializes data manager via factory │ └─────────────────┬───────────────────────────────────────┘ │ ┌────────────┴────────────┐ │ │ ▼ ▼ ┌──────────┐ ┌──────────┐ │ Layouts │ │Components│ │ (UI def) │ │(Widgets) │ └────┬─────┘ └─────┬────┘ │ │ └─────────┬───────────────┘ │ ┌───────────┴────────────┐ │ │ ▼ ▼ ┌─────────────────────┐ ┌──────────────┐ │ Data Manager │ │Seq Aligner │ │ ┌─────────────────┐ │ └──────┬───────┘ │ │ Factory │ │ │ │ │ (manager.py) │ │ ▼ │ └────────┬────────┘ │ ┌──────────────┐ │ │ │ │ BioPython │ │ ▼ │ │ Aligner │ │ ┌─────────────────┐ │ └──────────────┘ │ │ Base (Abstract) │ │ │ └────────┬────────┘ │ │ │ │ │ ┌────┴────┐ │ │ ▼ ▼ │ │ ┌──────┐ ┌────────┐ │ │ │ Disk │ │ S3/DB │ │ │ │ Mgr │ │(Future)│ │ │ └──┬───┘ └────────┘ │ └────┼────────────────┘ │ ▼ ┌─────────────┐ │ Experiment │ │ Model │ └─────────────┘ Core Modules ~~~~~~~~~~~~ 1. **Main Application** (``main_app.py``) - Entry point for the Dash application - Registers all callbacks - Initializes data manager and configuration - Sets up routing and navigation 2. **Global Strings** (``global_strings.py``) - Centralized location for UI text, labels, and messages - URL path constants for page routing - Navigation labels and menu items - **Note**: Most string constants in the application are defined here for consistency and easy localization 3. **Data Manager** (``data_manager/``) - **Abstract Base Class** (``base.py``): Defines interface for data operations - **Factory** (``manager.py``): Creates appropriate data manager based on configuration - **Disk Manager** (``disk_manager.py``): Local file storage implementation - **Experiment Model** (``experiment.py``): Data model for experiment objects The data manager handles: - Experiment CRUD operations - Metadata management - File I/O operations - Caching with LRU cache 4. **Sequence Aligner** (``sequence_aligner/``) - Wraps BioPython's pairwise alignment functionality - Provides BLASTP-style alignment for protein sequences - Calculates alignment scores and statistics - Formats alignment strings for visualization 5. **Layouts** (``layouts/``) Each layout module defines a page in the application: - ``layout_landing.py``: Home page with navigation - ``layout_upload.py``: Data upload and validation - ``layout_experiment.py``: Single experiment view with variants - ``layout_bars.py``: All experiments table - ``layout_matching_sequences.py``: Sequence alignment results - ``layout_explore.py``: Sequence exploration and filtering - ``layout_about.py``: About page and documentation 6. **Components** (``components/``) Reusable UI components: - ``widgets.py``: Tables, viewers, form elements, alerts - ``graphs.py``: Plotting functions (heatmaps, rank plots, Single-Site Mutagenesis plots) - ``vis.py``: Styling constants and cell coloring utilities - ``column_definitions.py``: AG Grid column configurations 7. **Utils** (``utils/``) Helper functions for: - Protein structure visualization - Chemical reaction rendering - Sequence alignment formatting - General utilities (logging, data processing) Data Flow --------- Upload Workflow ~~~~~~~~~~~~~~~ .. code-block:: text User Upload │ ├─> Validate CSV format │ └─> Check required columns │ └─> Validate well format │ └─> Verify SMILES strings │ └─> Check for duplicates │ ├─> Process Data │ └─> Calculate checksums │ └─> Extract parent sequence │ └─> Generate UUID │ └─> Store Files ├─> Save metadata (JSON) ├─> Save experiment data (CSV) └─> Save geometry (CIF) Experiment View Workflow ~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: text Select Experiment │ ├─> Load from Cache (if available) │ └─> Return cached Experiment object │ └─> Load from Disk ├─> Read CSV (core columns only) ├─> Read CIF geometry ├─> Calculate unique SMILES ├─> Extract plates └─> Cache Experiment object Sequence Alignment Workflow ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: text Input Sequence │ ├─> Get All Lab Sequences │ └─> Extract parent from each experiment │ ├─> Setup BLASTP Aligner │ └─> Configure scoring matrix │ └─> Set gap penalties │ ├─> Perform Alignments │ └─> For each target sequence: │ ├─> Run pairwise alignment │ ├─> Calculate score & statistics │ └─> Format alignment string │ └─> Filter & Sort Results └─> Return top matches Configuration File and Settings -------------------------------- The application uses a YAML-based configuration system located in ``levseq_dash/app/config/``. The configuration determines deployment mode, storage backend, data paths, and logging behavior. Configuration Files ~~~~~~~~~~~~~~~~~~~ **Key Files**: - ``config.yaml``: Main configuration file with all settings - ``settings.py``: Python module that loads and validates configuration **Location**: ``levseq_dash/app/config/`` Configuration Structure ~~~~~~~~~~~~~~~~~~~~~~~ The ``config.yaml`` file is organized into several sections: .. code-block:: yaml # Deployment mode: "public-playground" or "local-instance" deployment-mode: "local-instance" # Storage backend: "disk" or "db" (database not yet implemented) storage-mode: "disk" # Disk storage settings disk: five-letter-id-prefix: "MYLAB" local-data-path: "/path/to/data" enable-data-modification: true # Database settings (not yet implemented) db: host: "" port: "" # Logging and profiling flags logging: sequence-alignment-profiling: false data-manager: false pairwise-aligner: false Deployment Modes ~~~~~~~~~~~~~~~~ The application supports two deployment modes that determine how data is accessed and whether modifications are allowed. **public-playground Mode** - **Purpose**: Read-only demo environment for public websites - **Data Location**: Bundled inside container at ``levseq_dash/app/data/`` - **Data Modification**: Disabled (cannot upload/delete experiments) - **Use Case**: Public demos and deployment .. code-block:: yaml deployment-mode: "public-playground" # all other settings ignored or set to false **local-instance Mode** - **Purpose**: Full-featured installation with persistent storage - **Data Location**: External mount via Docker volume or local path - **Data Modification**: User can enable in order to upload/delete experiments - **ID Prefix**: Required when data modification is enabled - a 5-letter lab identifier prepended to all experiment UUIDs - **Use Case**: Research labs, production deployments .. code-block:: yaml deployment-mode: "local-instance" disk: enable-data-modification: true # or false if that is not wanted local-data-path: "/path/to/data" five-letter-id-prefix: "MYLAB" # Required when enable-data-modification: true Storage Modes ~~~~~~~~~~~~~ **Disk Storage** (Current Implementation) Uses local filesystem for data persistence: .. code-block:: yaml storage-mode: "disk" disk: five-letter-id-prefix: "MYLAB" local-data-path: "/Users/username/data" enable-data-modification: true **Settings**: - ``five-letter-id-prefix``: 5-letter code prepended to experiment UUIDs - **Required** when ``enable-data-modification: true`` - Must be exactly 5 letters (no numbers or special characters) - Automatically converted to uppercase - Example: "MYLAB" → experiment IDs like "MYLAB-a1b2c3d4" - ``local-data-path``: Path to data directory - Can be absolute: ``"/Users/username/Desktop/MyData"`` - Can be relative to app: ``"data"`` → ``levseq_dash/app/data/`` - Overridden by ``DATA_PATH`` environment variable - ``enable-data-modification``: Allow upload/delete operations - ``true``: Full read-write access (requires valid ID prefix) - ``false``: Read-only mode **Database Storage** (Future) Planned support for database backends: .. code-block:: yaml storage-mode: "db" db: host: "localhost" port: "5432" Logging Settings ~~~~~~~~~~~~~~~~ Enable detailed logging for debugging and performance analysis: .. code-block:: yaml logging: sequence-alignment-profiling: true # Log alignment timing data-manager: true # Log data operations pairwise-aligner: true # Log alignment details **Logging Flags**: - ``sequence-alignment-profiling``: Times alignment operations, useful for performance tuning - ``data-manager``: Logs experiment CRUD operations, file I/O, cache hits/misses - ``pairwise-aligner``: Logs BioPython alignment parameters and results **Accessing Logging Flags in Code**: .. code-block:: python from levseq_dash.app.config import settings from levseq_dash.app.utils import utils # Check if logging is enabled if settings.is_data_manager_logging_enabled(): utils.log_with_context("Loading experiment...", log_flag=True) Environment Variables ~~~~~~~~~~~~~~~~~~~~~ Environment variables override ``config.yaml`` settings and are the preferred method for Docker deployments. **Available Variables**: - ``DATA_PATH``: Override ``disk.local-data-path`` .. code-block:: bash docker run -e DATA_PATH=/data -v /host/data:/data levseq-dash - ``FIVE_LETTER_ID_PREFIX``: Override ``disk.five-letter-id-prefix`` .. code-block:: bash docker run -e FIVE_LETTER_ID_PREFIX=PROD levseq-dash **Configuration Priority** (highest to lowest): 1. Environment variables (``DATA_PATH``, ``FIVE_LETTER_ID_PREFIX``) 2. ``config.yaml`` settings 3. Default values (if applicable) Adding New Configuration Options ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To add new configuration settings: 1. **Add to config.yaml**: .. code-block:: yaml my-new-section: my-setting: "value" 2. **Add getter function to settings.py**: .. code-block:: python def get_my_new_section(): config = load_config() return config.get("my-new-section", {}) def get_my_setting(): section = get_my_new_section() return section.get("my-setting", "default-value") 3. **Use in application code**: .. code-block:: python from levseq_dash.app.config import settings value = settings.get_my_setting() 4. **Add environment variable support** (optional): .. code-block:: python def get_my_setting(): # Check environment variable first env_value = os.environ.get("MY_SETTING") if env_value: return env_value # Fall back to config file section = get_my_new_section() return section.get("my-setting", "default-value") **Best Practices**: - Use descriptive setting names with hyphens (``my-setting``, not ``MySetting``) - Provide sensible defaults in getter functions - Document new settings in config.yaml with comments - Use environment variables for secrets and deployment-specific values - Validate settings at application startup (raise clear errors for invalid values) Adding New Features ------------------- Adding a New Page ~~~~~~~~~~~~~~~~~ 1. **Create layout** in ``layouts/layout_my_page.py``: .. code-block:: python import dash_bootstrap_components as dbc from dash import html def get_layout() -> dbc.Container: """Create the page layout.""" return dbc.Container([ dbc.Row([ dbc.Col([ html.H1("Page Title"), # Your page content ]) ]) ]) 2. **Define the path** in ``global_strings.py``: .. code-block:: python # Add navigation label nav_my_page = "My Page" # Add URL path (at the end of the file with other paths) nav_my_page_path = "/my-page" 3. **Register page route** in ``main_app.py``: .. code-block:: python # Import the layout module at the top from levseq_dash.app.layouts import layout_my_page # Add route in the route_page callback (around line 109) @app.callback(Output("id-page-content", "children"), Input("url", "pathname")) def route_page(pathname): if pathname == "/": return layout_landing.get_layout() # ... existing routes ... elif pathname == gs.nav_my_page_path: return layout_my_page.get_layout() else: return html.Div([html.H2("Page not found!")]) 4. **Add navigation link** in ``layouts/layout_bars.py`` sidebar or navbar 5. **Add callbacks** in ``main_app.py`` (not in the layout file): .. code-block:: python @app.callback( Output("my-output", "children"), Input("my-button", "n_clicks"), State("my-input", "value"), prevent_initial_call=True, ) def handle_my_page_interaction(n_clicks, input_value): """Handle user interaction on my page.""" if not n_clicks: return dash.no_update # Process and return result return f"Processed: {input_value}" **Important Notes:** - Layout files only define the UI structure via ``get_layout()`` function - All callbacks must be registered in ``main_app.py`` (not in layout files) - Page routing is handled by the ``route_page`` callback in ``main_app.py`` - Path constants are defined in ``global_strings.py`` for consistency Adding a New Callback ~~~~~~~~~~~~~~~~~~~~~ Callbacks should be added in the relevant layout module: .. code-block:: python from dash import callback, Input, Output @callback( Output("output-id", "property"), Input("input-id", "property"), prevent_initial_call=True, ) def my_callback(input_value): """ Callback description. Args: input_value: Description. Returns: Output value description. """ # Process input result = process_data(input_value) return result Best Practices: - Use descriptive callback names - Add docstrings - Use ``prevent_initial_call=True`` when appropriate - Handle errors gracefully - Log important operations Adding a New Widget ~~~~~~~~~~~~~~~~~~~ Add reusable components to ``components/widgets.py``: .. code-block:: python def get_my_widget(widget_id, **kwargs): """ Create a custom widget. Args: widget_id: Unique ID for the widget. **kwargs: Additional properties. Returns: Component with configured properties. """ return dbc.Component( id=widget_id, # Add properties ) Adding a New Graph Type ~~~~~~~~~~~~~~~~~~~~~~~ Add plotting functions to ``components/graphs.py``: .. code-block:: python def create_my_plot(df, x_col, y_col): """ Create a custom plot visualization. Args: df: DataFrame containing data. x_col: Column name for X-axis. y_col: Column name for Y-axis. Returns: go.Figure: Plotly figure object. """ fig = px.scatter(df, x=x_col, y=y_col) fig.update_layout( # Customize appearance ) return fig Extending Data Manager ---------------------- The data manager uses an abstract base class pattern with a factory for creating instances: .. code-block:: text ┌─────────────────────────────────────────────────────┐ │ main_app.py │ │ │ │ singleton_data_mgr_instance = create_data_manager()│ └──────────────────┬──────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ manager.py (Factory) │ │ │ │ def create_data_manager(): │ │ if is_disk_mode(): │ │ return DiskDataManager() │ │ elif is_database_mode(): │ │ return DatabaseDataManager() ← Extend backends │ │ elif is_s3_mode(): │ │ return S3DataManager() ← Extend backends │ └──────────────────┬──────────────────────────────────────┘ │ ▼ ┌──────────────────────────────────────────────────────┐ │ BaseDataManager (Abstract) │ │ - get_experiment() │ │ - add_experiment_from_ui() │ │ - get_all_lab_experiments_with_meta_data() │ │ - delete_experiment() │ │ - ... other abstract methods │ └──────────────────┬───────────────────────────────────┘ │ ┌───────────┴───────────┬────────────┐ │ │ │ ▼ ▼ ▼ ┌────────────────┐ ┌──────────────┐ ┌──────────────┐ │DiskDataManager │ │DatabaseData │ │S3DataManager │ │ │ │Manager │ │ │ │(Current Model) │ │(New) │ │(New) │ └────────────────┘ └──────────────┘ └──────────────┘ To add new functionality to all backends: 1. **Add method to base class** (``data_manager/base.py``): .. code-block:: python class DataManager(ABC): @abstractmethod def my_new_method(self, param: str) -> dict[str, Any]: """ New data operation. Args: param: Description. Returns: Result dictionary. """ pass 2. **Implement in disk manager** (``data_manager/disk_manager.py``): .. code-block:: python class DiskDataManager(DataManager): def my_new_method(self, param: str) -> dict[str, Any]: """Implement the new method.""" # Access self._data_path for file operations file_path = self._data_path / f"{param}.json" # Read/write operations return result 3. **Implement in other backends** as needed (e.g., database, S3) To add a new storage backend (e.g., S3 or database): 1. Create a new manager class inheriting from ``BaseDataManager`` 2. Implement all abstract methods for your backend 3. Update the factory in ``manager.py`` to return your new manager 4. Add configuration options to ``config.yaml`` Testing ------- Test Organization ~~~~~~~~~~~~~~~~~ Tests are organized by functionality: .. code-block:: text levseq_dash/app/tests/ # Application tests ├── conftest.py # Shared fixtures and configuration ├── test_callbacks.py # Dash callback tests (routing, interactions) ├── test_dbmanager.py # Data manager operations (CRUD) ├── test_experiment.py # Experiment model and validation ├── test_components.py # UI widgets and tables ├── test_graphs.py # Plotting (heatmaps, rank plots, Single-Site Mutagenesis) ├── test_settings.py # Configuration and settings ├── test_alignment.py # Sequence alignment integration ├── test_utils.py # Utility functions └── test_data/ # Test fixtures and sample data levseq_dash/app/sequence_aligner/tests/ # Sequence alignment tests ├── test_pairwise_aligner.py # Alignment algorithm logic └── test_alignment_time.py # Performance benchmarks Shared Test Fixtures (conftest.py) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The ``conftest.py`` file provides reusable fixtures for all tests: **Path Fixtures**: - ``test_data_path``, ``app_data_path``: Common directory paths for test and app data - ``path_exp_ep_data``, ``path_exp_ssm_data``: Paths to sample experiment files (CSV, CIF, JSON) **Data Fixtures**: - ``experiment_ep_pcr_metadata``, ``experiment_ssm_metadata``: Pre-loaded experiment metadata from JSON **Mock Configuration Fixtures**: - ``mock_load_config_from_test_data_path``: Mock config pointing to test data directory - ``mock_get_deployment_mode``, ``mock_is_data_modification_enabled``: Mock settings functions **Data Manager Fixtures**: - ``disk_manager_from_app_data``: DiskDataManager using app data (for read-only tests) - ``disk_manager_from_temp_data``: DiskDataManager using temporary directory (for write tests) These fixtures avoid code duplication and ensure consistent test environments. Debugging --------- Logging ~~~~~~~ Enable logging in ``config.yaml`` for debugging: .. code-block:: yaml logging: sequence-alignment-profiling: true # Alignment timing data-manager: true # Data operations pairwise-aligner: true # Alignment details Use logging in your code: .. code-block:: python from levseq_dash.app.utils import utils from levseq_dash.app.config import settings utils.log_with_context( "Debug message", log_flag=settings.is_data_manager_logging_enabled() ) Dash DevTools ~~~~~~~~~~~~~ Enable Dash DevTools for debugging callbacks and hot reload: .. code-block:: python if __name__ == "__main__": app.run(debug=True)