Roadmap

Yew Search will evolve from a self-hosted homelab solution to an enterprise-grade SaaS platform, following the proven business model of GitLab, Mattermost, and N8N.

Strategy Overview

V1: Self-hosted homelab MVP - prove core value proposition V2: Multi-tenancy foundation - architecture for scale V3: Business SaaS launch - hosted offering for companies V4: Enterprise tier - dedicated infrastructure and compliance

V1 - Self-Hosted MVP (2 weeks)

Target: Individual users, homelab enthusiasts, proof of concept

Goal: Ship a working self-hosted personal search engine that users can deploy on Raspberry Pi or any Linux system with Docker.

Backend

User authentication system (cookie-based sessions, never JWT)
- Users are created via CLI command
- User login endpoints
- Session management (create, validate, delete)
- Argon2 password hashing
- Redis session store for auth lookups
- PostgreSQL user_session table for session management UI
User module
- User CRUD operations
- User profile endpoints
- Active sessions list (view/terminate sessions)
Integration architecture
- Base integration classes and interfaces
- Integration loader (dynamic plugin loading)
- OAuth endpoint (unified for all integrations)
- Task-based execution system
Polling system
- Bull task queue (Redis-backed)
- Priority-based task scheduling
- Idempotency checking (contentExists callback)
- Background worker for executing integration tasks
Gmail integration (OAuth)
- OAuth flow implementation
- Task types: start, getEmailList, downloadEmail
- Pagination support
- Early stop optimization
- Store email content in user_integration_content table
Search system
- PostgreSQL full-text search on JSONB content
- Basic query parsing and sanitization
- Search endpoints (scoped to authenticated user)
- Result ranking (simple relevance)
Database schema
- user table
- user_session table
- user_integration table (with encrypted credentials)
- user_integration_content table (JSONB content storage)
Observability
- Structured JSON logging
- Request context (requestId, userId, traceId)
- Basic error handling and logging
Docker setup
- Backend Dockerfile
- docker-compose.yml (backend, PostgreSQL, Redis)

Frontend

Svelte app setup
- App router structure
- Basic layout and navigation
Authentication pages
- Login page
- Cookie-based auth
Search interface
- Search input and results display
- Basic result rendering (title, snippet, source)
- Loading and error states
Integration management
- "Connect Gmail" button
- OAuth authorization flow
- Integration status display
Settings page
- User profile
- Active sessions (list and terminate)
Frontend Dockerfile

Documentation

Getting Started guide (README update)
Architecture overview
Backend standards (service, controller, DTO, entity)
Integration development guide
OAuth integration guide
Authorization/session management guide
Coding style guide
Deployment instructions (Docker)

Website

Single-page landing site
- Project description
- Key features
- Link to docs
- GitHub link

V2 - Multi-Tenancy Foundation (4-6 weeks)

Target: Teams, small companies (still self-hosted), prepare for SaaS

Goal: Add organizational primitives and scale improvements. Architecture now supports multi-tenant SaaS but still deployed as self-hosted.

Core Features

Database migration system
- TypeORM migrations
- Migration CLI commands
- Rollback support
Organizations/companies
- Organization entity and CRUD
- User-to-organization relationships
- Organization settings
Teams/groups within organizations
- Team entity and CRUD
- Team membership
- Team-level permissions
User permissions system
- Role-based access control (owner, admin, member, viewer)
- Permission checks in services
- Sharing integrations between users
- Data source access control
User invitations
- Email-based invitations
- Invitation acceptance flow
- Pending invitations management

Search Improvements

Elasticsearch integration
- Elasticsearch Docker container
- Sync PostgreSQL content to Elasticsearch
- Background sync worker
- Full-text search using Elasticsearch
- Improved relevance scoring
Search filters
- Filter by integration (Gmail, FTP, etc.)
- Filter by date range
- Filter by sender/source

Additional Integrations

FTP/sFTP integration
- Credentials-based auth (not OAuth)
- Directory traversal
- File metadata indexing
- Recursive directory support
Slack integration (OAuth)
- OAuth flow
- Channel message syncing
- Direct message syncing

Infrastructure

Raspberry Pi optimization
- Memory-optimized PostgreSQL config
- CPU throttling for background tasks
- Minimal Docker image sizes
- Performance testing on Pi 4
Environment configuration
- .env.example with all required vars
- Configuration validation on startup
- Better error messages for missing config

UI Improvements

Organization switcher
Team management UI
Permission management UI
Integration settings per user/team
Better search result display

V3 - Business SaaS Launch (3-6 months)

Target: Companies that want hosted solution, no self-hosting burden

Goal: Launch Yew Search as a hosted SaaS product. Self-hosted version remains available with core features.

SaaS Infrastructure

Cloud deployment
- Production-grade docker-compose or Kubernetes
- Load balancer setup
- Database connection pooling
- Redis clustering
Multi-tenant architecture
- Data isolation per organization
- Tenant-aware queries (all services check organization)
- Database per tenant vs shared database decision
Billing and subscriptions
- Stripe integration
- Subscription plans (Business, Enterprise)
- Usage tracking (searches, storage, integrations)
- Resource quotas per plan
- Billing portal (Stripe Customer Portal)
Admin dashboard
- Organization list and search
- User activity monitoring
- System health metrics
- Feature flag management per customer
Onboarding flow
- Signup for Business tier
- Organization creation
- Team setup wizard
- Integration walkthrough

Advanced Search

Search collections/groups
- Group multiple integrations into collections
- Search within specific collections
- Share collections with team members
Improved search algorithm
- BM25 or similar (better than TF-IDF)
- Boosting by recency, source, etc.
- Query expansion and synonyms
Saved searches
- Save frequently-used searches
- Search history per user

LLM Integration

LLM configuration system
- Ollama integration (self-hosted LLM)
- LangChain/LangGraph architecture
- Support for OpenAI, Anthropic, etc.
Search result summarization
- Summarize top N results
- Extract key points
- Answer questions based on search results
Optional LLM features (gated by plan)

More Integrations

Google Drive (OAuth)
Dropbox (OAuth)
Microsoft 365 (OAuth)
At least 3 more integrations based on user demand

UI/UX Improvements

Polish all interfaces
- Professional design
- Consistent component library (Shadcn/UI)
- Mobile responsive
Dark mode
Keyboard shortcuts
Advanced search syntax
Result preview/quick view

Marketing Site

Full marketing website (separate from app)
- Feature pages
- Pricing page
- Documentation
- Blog
- Customer testimonials

V4 - Enterprise Tier (6-12 months)

Target: Large companies with custom needs, compliance requirements

Goal: Launch Enterprise tier with dedicated infrastructure, custom integrations, and compliance.

Enterprise Features

Dedicated infrastructure provisioning
- Per-customer infrastructure
- Custom resource allocation
- Dedicated database
- Isolated workers
Custom integrations
- Build integrations per customer request
- Private integrations (not available to other customers)
- Integration development as a service
SSO/SAML
- SAML 2.0 authentication
- Azure AD integration
- Okta integration
- Google Workspace SSO
Advanced admin features
- Audit logs (all actions logged)
- Data retention policies
- Export all data (GDPR compliance)
- Advanced user provisioning (SCIM)

Compliance & Security

SOC2 Type II certification
- Security audit preparation
- Compliance documentation
- Annual audits
GDPR compliance enhancements
- Right to be forgotten
- Data portability
- Consent management
HIPAA compliance (if needed)
- BAA agreements
- Encryption at rest and in transit
- Access controls
Security hardening
- Penetration testing
- Vulnerability scanning
- Incident response plan
- Security training

Support & Success

Dedicated support
- Slack channel per customer
- Response time SLAs
- Priority bug fixes
Customer success manager
- Regular check-ins
- Feature adoption tracking
- Custom training sessions
Professional services
- Integration development
- Custom feature development
- Migration assistance

Customer Launch

1 pilot Enterprise customer
- Small team (5-10 people)
- Gather feedback
- Refine Enterprise offering
Case study and testimonial
Enterprise sales process documentation

Optimizations

Database Query Performance Monitoring

Context: TypeORM query logging is disabled (logging: false in app.module.ts) because it's noisy and only captures queries from one app instance. Instead, we use PostgreSQL's built-in performance monitoring tools.

pg_stat_statements Extension (Recommended)

The industry standard for query performance analysis. Tracks execution statistics for all SQL statements across all connections.

Setup:

-- Enable the extension (one time)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Query slow queries
SELECT
  query,
  calls,
  total_exec_time,
  mean_exec_time,
  max_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

Benefits:

Tracks all queries across all connections and app instances
Shows execution count, total time, mean time, max time
Persists across sessions
Near-zero performance overhead
Essential for production query optimization

log_min_duration_statement

Automatically log queries that exceed a duration threshold to PostgreSQL logs.

Setup:

-- In postgresql.conf or via ALTER SYSTEM
ALTER SYSTEM SET log_min_duration_statement = 1000;  -- Log queries > 1 second
SELECT pg_reload_conf();

Benefits:

Simple to set up
Captures slow queries with execution time
Useful for catching extreme outliers
Logs include query parameters

auto_explain Module

Automatically logs execution plans for slow queries. Useful for understanding why specific queries are slow.

Setup:

-- In postgresql.conf (requires restart)
shared_preload_libraries = 'auto_explain'
auto_explain.log_min_duration = 1000  -- Explain queries > 1s
auto_explain.log_analyze = true
auto_explain.log_timing = true

Benefits:

Shows EXPLAIN ANALYZE output for slow queries
Helps identify missing indexes or inefficient query plans
No code changes required

Recommended Configuration

V1 (Development/Self-Hosted):

Enable pg_stat_statements for query analysis
Set log_min_duration_statement = 2000 to catch very slow queries

V2-V3 (Multi-tenant/SaaS):

Enable pg_stat_statements (required)
Set log_min_duration_statement = 1000
Consider auto_explain for production debugging

V4 (Enterprise):

All of the above
Query performance monitoring dashboards
Automated slow query alerts
Per-customer query performance analysis

Future Optimizations

Add query performance metrics to observability stack
Create Grafana dashboard for pg_stat_statements data
Automated query optimization suggestions
Index recommendation system based on slow query patterns

Test Optimizations

Currently the e2e tests maintain the atomicity by clearing the database before each test Even for small datasets this can take time.

While the clearDatabase function clears all tables in parallel there will probably be a time when we need to NOT clear the database after each test.

Additionally, the current testing strategy does not allow multiple tests in the same suite to run in parallel since they read and write data to the same table.

Determining how to run tests in parallel while allowing reads and writes to a real database will be required at some point.

Feature Differentiation

Self-Hosted (Always Free)

Core search functionality
Gmail integration
1-2 additional basic integrations (FTP, Slack)
Single user or family use (< 10 users)
PostgreSQL full-text search
Community support only
Docker deployment

Business Tier (SaaS - Paid)

Hosted infrastructure (no self-hosting)
Unlimited users per organization
Teams and permissions
All integrations
Elasticsearch search
LLM features (summarization)
Search collections
SSO (Google, Microsoft)
Email support
99.9% uptime SLA
Usage analytics
Admin dashboard

Enterprise Tier (High-Touch - Custom Pricing)

Everything in Business tier
Dedicated infrastructure
Custom integrations
SAML/SSO (any provider)
SOC2/HIPAA compliance
Dedicated support (Slack channel)
Customer success manager
Professional services
Custom SLAs
Data residency options
On-premise deployment option

Long-Term Vision

Business Model: Fair-code / Open-core

Self-hosted version remains genuinely useful forever
Core functionality always free
Advanced features for companies (not individuals)
Commercial license required for business use

Deployment Options:

Local/Homelab - Free for personal use, community supported
Business SaaS - Hosted multi-tenant, standard pricing
Enterprise - Dedicated infrastructure, custom pricing

Target Markets:

Phase 1 (V1-V2): Homelab enthusiasts, power users, families
Phase 2 (V3): Small-medium businesses (10-100 employees)
Phase 3 (V4): Large enterprises (100+ employees)

Success Metrics:

V1: 100 active self-hosted deployments
V2: 1,000 active self-hosted deployments
V3: 50 paying Business customers
V4: 5 Enterprise customers

Notes

All versions maintain backward compatibility with self-hosted deployments
Breaking changes communicated 90 days in advance
Community input welcome on feature prioritization
Roadmap updated quarterly based on feedback

Strategy Overview​

V1 - Self-Hosted MVP (2 weeks)​

Backend​

Frontend​

Documentation​

Website​

V2 - Multi-Tenancy Foundation (4-6 weeks)​

Core Features​

Search Improvements​

Additional Integrations​

Infrastructure​

UI Improvements​

V3 - Business SaaS Launch (3-6 months)​

SaaS Infrastructure​

Advanced Search​

LLM Integration​

More Integrations​

UI/UX Improvements​

Marketing Site​

V4 - Enterprise Tier (6-12 months)​

Enterprise Features​

Compliance & Security​

Support & Success​

Customer Launch​

Optimizations​

Database Query Performance Monitoring​

pg_stat_statements Extension (Recommended)​

log_min_duration_statement​

auto_explain Module​

Recommended Configuration​

Future Optimizations​

Test Optimizations​

Feature Differentiation​

Self-Hosted (Always Free)​

Business Tier (SaaS - Paid)​

Enterprise Tier (High-Touch - Custom Pricing)​

Long-Term Vision​

Notes​

Strategy Overview

V1 - Self-Hosted MVP (2 weeks)

Backend

Frontend

Documentation

Website

V2 - Multi-Tenancy Foundation (4-6 weeks)

Core Features

Search Improvements

Additional Integrations

Infrastructure

UI Improvements

V3 - Business SaaS Launch (3-6 months)

SaaS Infrastructure

Advanced Search

LLM Integration

More Integrations

UI/UX Improvements

Marketing Site

V4 - Enterprise Tier (6-12 months)

Enterprise Features

Compliance & Security

Support & Success

Customer Launch

Optimizations

Database Query Performance Monitoring

pg_stat_statements Extension (Recommended)

log_min_duration_statement

auto_explain Module

Recommended Configuration

Future Optimizations

Test Optimizations

Feature Differentiation

Self-Hosted (Always Free)

Business Tier (SaaS - Paid)

Enterprise Tier (High-Touch - Custom Pricing)

Long-Term Vision

Notes