Sequelizer Documentation

sequelizer

C toolkit for DNA sequence analysis and nanopore data processing

Image 1 Image 2 Image 3 Image 4

What is Sequelizer?

Sequelizer is a C-based toolkit designed for nanopore DNA/RNA sequencing analysis. It provides robust, efficient tools for:

Key Features

Core Commands

sequelizer fast5 - Fast5 Analysis

Comprehensive Fast5 file metadata extraction and validation.

# Single file analysis
./sequelizer fast5 data.fast5
# Dataset analysis with full details
./sequelizer fast5 /path/to/dataset/ --recursive --verbose
# Debug problematic files
./sequelizer fast5 problematic.fast5 --debug

→ Complete Fast5 command guide

sequelizer convert - Signal Extraction

Extract raw signals from Fast5 files for downstream analysis.

# Convert single file to raw signals
./sequelizer convert data.fast5 --to raw
# Batch convert with all reads
./sequelizer convert /path/to/dataset/ --to raw --recursive --all --output signals/

→ Complete convert command guide

Documentation

Getting Started

User Guides

Technical Documentation

Architecture

Clean Subcommand Design

src/
├── core/
│   ├── fast5_io.c/h          # Shared Fast5 I/O (used by Ciren)
│   ├── fast5_utils.c/h       # File utilities and metadata
│   └── util.c/h              # Common utilities
├── sequelizer.c              # Minimal main() with routing
├── sequelizer_subcommands.c/h # Command routing and help
├── sequelizer_fast5.c/h      # Fast5 analysis implementation
└── sequelizer_convert.c/h    # Signal conversion implementation

Key APIs

// Fast5 file discovery
char **find_fast5_files(const char *path, bool recursive, int *count);
// Metadata extraction
fast5_metadata_t *read_fast5_metadata(const char *filename, int *count);
// Format detection (automatic and robust)
fast5_format_t detect_fast5_format(hid_t file_id);

Extension Points

Adding new subcommands follows a simple pattern:

  1. Create sequelizer_<command>.c/h files
  2. Add enum entry and detection logic
  3. Update CMakeLists.txt
  4. Follow established error handling and help patterns

Integration

Pipeline Integration

# Generate file lists for processing
./sequelizer fast5 dataset/ --recursive > file_list.txt
# Extract signals for analysis
./sequelizer convert dataset/ --to raw --recursive --all -o signals/
# Validate files before processing
./sequelizer fast5 dataset/ --recursive 2> validation.log

Ciren Integration

Sequelizer serves as the open-source foundation for Ciren:

# Use Sequelizer for initial analysis
./sequelizer fast5 dataset/ --recursive --verbose
# Use Ciren for advanced features
../ciren/build/ciren fast5 dataset/ --format json --enhanced-stats

Shared components:

Tested Datasets

Sequelizer has been validated against real-world nanopore datasets:

# SquiggleFilter project data
./sequelizer fast5 /path/to/SquiggleFilter/data/lambda/fast5/ --recursive
# slow5tools test data  
./sequelizer fast5 /path/to/slow5tools/test/data --recursive
# Various Oxford Nanopore formats
# - Standard multi-read Fast5
# - Legacy single-read Fast5  
# - Non-standard variants missing file_type attributes

Performance

Characteristics

Benchmarks

Support and Development

Getting Help

Contributing

License

Sequelizer is open-source software. See LICENSE for details.


Quick Reference

Most Common Operations

# Analyze a dataset
./sequelizer fast5 /path/to/data/ --recursive --verbose
# Extract all signals from multi-read files  
./sequelizer convert /path/to/data/ --to raw --all --recursive -o signals/
# Debug a problematic file
./sequelizer fast5 problem_file.fast5 --debug
# Get help for any command
./sequelizer <command> --help

Next Steps

  1. Try the commands - Start with Fast5 analysis
  2. Understand compatibility - Learn about format support
  3. Integrate with pipelines - Use in your analysis workflow
  4. Explore Ciren - Advanced features and performance enhancements

For the complete feature set and enhanced performance, consider Ciren which builds on Sequelizer’s foundation.