Skip to main content

Roadmap

Yew Search will evolve from a self-hosted homelab solution to an enterprise-grade SaaS platform, following the proven business model of GitLab, Mattermost, and N8N.

Strategy Overview

V1: Self-hosted homelab MVP - prove core value proposition V2: Multi-tenancy foundation - architecture for scale V3: Business SaaS launch - hosted offering for companies V4: Enterprise tier - dedicated infrastructure and compliance


V1 - Self-Hosted MVP (2 weeks)

Target: Individual users, homelab enthusiasts, proof of concept

Goal: Ship a working self-hosted personal search engine that users can deploy on Raspberry Pi or any Linux system with Docker.

Backend

  • User authentication system (cookie-based sessions, never JWT)
    • Users are created via CLI command
    • User login endpoints
    • Session management (create, validate, delete)
    • Argon2 password hashing
    • Redis session store for auth lookups
    • PostgreSQL user_session table for session management UI
  • User module
    • User CRUD operations
    • User profile endpoints
    • Active sessions list (view/terminate sessions)
  • Integration architecture
    • Base integration classes and interfaces
    • Integration loader (dynamic plugin loading)
    • OAuth endpoint (unified for all integrations)
    • Task-based execution system
  • Polling system
    • Bull task queue (Redis-backed)
    • Priority-based task scheduling
    • Idempotency checking (contentExists callback)
    • Background worker for executing integration tasks
  • Gmail integration (OAuth)
    • OAuth flow implementation
    • Task types: start, getEmailList, downloadEmail
    • Pagination support
    • Early stop optimization
    • Store email content in user_integration_content table
  • Search system
    • PostgreSQL full-text search on JSONB content
    • Basic query parsing and sanitization
    • Search endpoints (scoped to authenticated user)
    • Result ranking (simple relevance)
  • Database schema
    • user table
    • user_session table
    • user_integration table (with encrypted credentials)
    • user_integration_content table (JSONB content storage)
  • Observability
    • Structured JSON logging
    • Request context (requestId, userId, traceId)
    • Basic error handling and logging
  • Docker setup
    • Backend Dockerfile
    • docker-compose.yml (backend, PostgreSQL, Redis)

Frontend

  • Svelte app setup
    • App router structure
    • Basic layout and navigation
  • Authentication pages
    • Login page
    • Cookie-based auth
  • Search interface
    • Search input and results display
    • Basic result rendering (title, snippet, source)
    • Loading and error states
  • Integration management
    • "Connect Gmail" button
    • OAuth authorization flow
    • Integration status display
  • Settings page
    • User profile
    • Active sessions (list and terminate)
  • Frontend Dockerfile

Documentation

  • Getting Started guide (README update)
  • Architecture overview
  • Backend standards (service, controller, DTO, entity)
  • Integration development guide
  • OAuth integration guide
  • Authorization/session management guide
  • Coding style guide
  • Deployment instructions (Docker)

Website

  • Single-page landing site
    • Project description
    • Key features
    • Link to docs
    • GitHub link

V2 - Multi-Tenancy Foundation (4-6 weeks)

Target: Teams, small companies (still self-hosted), prepare for SaaS

Goal: Add organizational primitives and scale improvements. Architecture now supports multi-tenant SaaS but still deployed as self-hosted.

Core Features

  • Database migration system
    • TypeORM migrations
    • Migration CLI commands
    • Rollback support
  • Organizations/companies
    • Organization entity and CRUD
    • User-to-organization relationships
    • Organization settings
  • Teams/groups within organizations
    • Team entity and CRUD
    • Team membership
    • Team-level permissions
  • User permissions system
    • Role-based access control (owner, admin, member, viewer)
    • Permission checks in services
    • Sharing integrations between users
    • Data source access control
  • User invitations
    • Email-based invitations
    • Invitation acceptance flow
    • Pending invitations management

Search Improvements

  • Elasticsearch integration
    • Elasticsearch Docker container
    • Sync PostgreSQL content to Elasticsearch
    • Background sync worker
    • Full-text search using Elasticsearch
    • Improved relevance scoring
  • Search filters
    • Filter by integration (Gmail, FTP, etc.)
    • Filter by date range
    • Filter by sender/source

Additional Integrations

  • FTP/sFTP integration
    • Credentials-based auth (not OAuth)
    • Directory traversal
    • File metadata indexing
    • Recursive directory support
  • Slack integration (OAuth)
    • OAuth flow
    • Channel message syncing
    • Direct message syncing

Infrastructure

  • Raspberry Pi optimization
    • Memory-optimized PostgreSQL config
    • CPU throttling for background tasks
    • Minimal Docker image sizes
    • Performance testing on Pi 4
  • Environment configuration
    • .env.example with all required vars
    • Configuration validation on startup
    • Better error messages for missing config

UI Improvements

  • Organization switcher
  • Team management UI
  • Permission management UI
  • Integration settings per user/team
  • Better search result display

V3 - Business SaaS Launch (3-6 months)

Target: Companies that want hosted solution, no self-hosting burden

Goal: Launch Yew Search as a hosted SaaS product. Self-hosted version remains available with core features.

SaaS Infrastructure

  • Cloud deployment
    • Production-grade docker-compose or Kubernetes
    • Load balancer setup
    • Database connection pooling
    • Redis clustering
  • Multi-tenant architecture
    • Data isolation per organization
    • Tenant-aware queries (all services check organization)
    • Database per tenant vs shared database decision
  • Billing and subscriptions
    • Stripe integration
    • Subscription plans (Business, Enterprise)
    • Usage tracking (searches, storage, integrations)
    • Resource quotas per plan
    • Billing portal (Stripe Customer Portal)
  • Admin dashboard
    • Organization list and search
    • User activity monitoring
    • System health metrics
    • Feature flag management per customer
  • Onboarding flow
    • Signup for Business tier
    • Organization creation
    • Team setup wizard
    • Integration walkthrough
  • Search collections/groups
    • Group multiple integrations into collections
    • Search within specific collections
    • Share collections with team members
  • Improved search algorithm
    • BM25 or similar (better than TF-IDF)
    • Boosting by recency, source, etc.
    • Query expansion and synonyms
  • Saved searches
    • Save frequently-used searches
    • Search history per user

LLM Integration

  • LLM configuration system
    • Ollama integration (self-hosted LLM)
    • LangChain/LangGraph architecture
    • Support for OpenAI, Anthropic, etc.
  • Search result summarization
    • Summarize top N results
    • Extract key points
    • Answer questions based on search results
  • Optional LLM features (gated by plan)

More Integrations

  • Google Drive (OAuth)
  • Dropbox (OAuth)
  • Microsoft 365 (OAuth)
  • At least 3 more integrations based on user demand

UI/UX Improvements

  • Polish all interfaces
    • Professional design
    • Consistent component library (Shadcn/UI)
    • Mobile responsive
  • Dark mode
  • Keyboard shortcuts
  • Advanced search syntax
  • Result preview/quick view

Marketing Site

  • Full marketing website (separate from app)
    • Feature pages
    • Pricing page
    • Documentation
    • Blog
    • Customer testimonials

V4 - Enterprise Tier (6-12 months)

Target: Large companies with custom needs, compliance requirements

Goal: Launch Enterprise tier with dedicated infrastructure, custom integrations, and compliance.

Enterprise Features

  • Dedicated infrastructure provisioning
    • Per-customer infrastructure
    • Custom resource allocation
    • Dedicated database
    • Isolated workers
  • Custom integrations
    • Build integrations per customer request
    • Private integrations (not available to other customers)
    • Integration development as a service
  • SSO/SAML
    • SAML 2.0 authentication
    • Azure AD integration
    • Okta integration
    • Google Workspace SSO
  • Advanced admin features
    • Audit logs (all actions logged)
    • Data retention policies
    • Export all data (GDPR compliance)
    • Advanced user provisioning (SCIM)

Compliance & Security

  • SOC2 Type II certification
    • Security audit preparation
    • Compliance documentation
    • Annual audits
  • GDPR compliance enhancements
    • Right to be forgotten
    • Data portability
    • Consent management
  • HIPAA compliance (if needed)
    • BAA agreements
    • Encryption at rest and in transit
    • Access controls
  • Security hardening
    • Penetration testing
    • Vulnerability scanning
    • Incident response plan
    • Security training

Support & Success

  • Dedicated support
    • Slack channel per customer
    • Response time SLAs
    • Priority bug fixes
  • Customer success manager
    • Regular check-ins
    • Feature adoption tracking
    • Custom training sessions
  • Professional services
    • Integration development
    • Custom feature development
    • Migration assistance

Customer Launch

  • 1 pilot Enterprise customer
    • Small team (5-10 people)
    • Gather feedback
    • Refine Enterprise offering
  • Case study and testimonial
  • Enterprise sales process documentation

Optimizations

Database Query Performance Monitoring

Context: TypeORM query logging is disabled (logging: false in app.module.ts) because it's noisy and only captures queries from one app instance. Instead, we use PostgreSQL's built-in performance monitoring tools.

The industry standard for query performance analysis. Tracks execution statistics for all SQL statements across all connections.

Setup:

-- Enable the extension (one time)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Query slow queries
SELECT
query,
calls,
total_exec_time,
mean_exec_time,
max_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

Benefits:

  • Tracks all queries across all connections and app instances
  • Shows execution count, total time, mean time, max time
  • Persists across sessions
  • Near-zero performance overhead
  • Essential for production query optimization

log_min_duration_statement

Automatically log queries that exceed a duration threshold to PostgreSQL logs.

Setup:

-- In postgresql.conf or via ALTER SYSTEM
ALTER SYSTEM SET log_min_duration_statement = 1000; -- Log queries > 1 second
SELECT pg_reload_conf();

Benefits:

  • Simple to set up
  • Captures slow queries with execution time
  • Useful for catching extreme outliers
  • Logs include query parameters

auto_explain Module

Automatically logs execution plans for slow queries. Useful for understanding why specific queries are slow.

Setup:

-- In postgresql.conf (requires restart)
shared_preload_libraries = 'auto_explain'
auto_explain.log_min_duration = 1000 -- Explain queries > 1s
auto_explain.log_analyze = true
auto_explain.log_timing = true

Benefits:

  • Shows EXPLAIN ANALYZE output for slow queries
  • Helps identify missing indexes or inefficient query plans
  • No code changes required

V1 (Development/Self-Hosted):

  • Enable pg_stat_statements for query analysis
  • Set log_min_duration_statement = 2000 to catch very slow queries

V2-V3 (Multi-tenant/SaaS):

  • Enable pg_stat_statements (required)
  • Set log_min_duration_statement = 1000
  • Consider auto_explain for production debugging

V4 (Enterprise):

  • All of the above
  • Query performance monitoring dashboards
  • Automated slow query alerts
  • Per-customer query performance analysis

Future Optimizations

  • Add query performance metrics to observability stack
  • Create Grafana dashboard for pg_stat_statements data
  • Automated query optimization suggestions
  • Index recommendation system based on slow query patterns

Test Optimizations

Currently the e2e tests maintain the atomicity by clearing the database before each test Even for small datasets this can take time.

While the clearDatabase function clears all tables in parallel there will probably be a time when we need to NOT clear the database after each test.

Additionally, the current testing strategy does not allow multiple tests in the same suite to run in parallel since they read and write data to the same table.

Determining how to run tests in parallel while allowing reads and writes to a real database will be required at some point.


Feature Differentiation

Self-Hosted (Always Free)

  • Core search functionality
  • Gmail integration
  • 1-2 additional basic integrations (FTP, Slack)
  • Single user or family use (< 10 users)
  • PostgreSQL full-text search
  • Community support only
  • Docker deployment

Business Tier (SaaS - Paid)

  • Hosted infrastructure (no self-hosting)
  • Unlimited users per organization
  • Teams and permissions
  • All integrations
  • Elasticsearch search
  • LLM features (summarization)
  • Search collections
  • SSO (Google, Microsoft)
  • Email support
  • 99.9% uptime SLA
  • Usage analytics
  • Admin dashboard

Enterprise Tier (High-Touch - Custom Pricing)

  • Everything in Business tier
  • Dedicated infrastructure
  • Custom integrations
  • SAML/SSO (any provider)
  • SOC2/HIPAA compliance
  • Dedicated support (Slack channel)
  • Customer success manager
  • Professional services
  • Custom SLAs
  • Data residency options
  • On-premise deployment option

Long-Term Vision

Business Model: Fair-code / Open-core

  • Self-hosted version remains genuinely useful forever
  • Core functionality always free
  • Advanced features for companies (not individuals)
  • Commercial license required for business use

Deployment Options:

  1. Local/Homelab - Free for personal use, community supported
  2. Business SaaS - Hosted multi-tenant, standard pricing
  3. Enterprise - Dedicated infrastructure, custom pricing

Target Markets:

  • Phase 1 (V1-V2): Homelab enthusiasts, power users, families
  • Phase 2 (V3): Small-medium businesses (10-100 employees)
  • Phase 3 (V4): Large enterprises (100+ employees)

Success Metrics:

  • V1: 100 active self-hosted deployments
  • V2: 1,000 active self-hosted deployments
  • V3: 50 paying Business customers
  • V4: 5 Enterprise customers

Notes

  • All versions maintain backward compatibility with self-hosted deployments
  • Breaking changes communicated 90 days in advance
  • Community input welcome on feature prioritization
  • Roadmap updated quarterly based on feedback