Roadmap
Yew Search will evolve from a self-hosted homelab solution to an enterprise-grade SaaS platform, following the proven business model of GitLab, Mattermost, and N8N.
Strategy Overview
V1: Self-hosted homelab MVP - prove core value proposition V2: Multi-tenancy foundation - architecture for scale V3: Business SaaS launch - hosted offering for companies V4: Enterprise tier - dedicated infrastructure and compliance
V1 - Self-Hosted MVP (2 weeks)
Target: Individual users, homelab enthusiasts, proof of concept
Goal: Ship a working self-hosted personal search engine that users can deploy on Raspberry Pi or any Linux system with Docker.
Backend
- User authentication system (cookie-based sessions, never JWT)
- Users are created via CLI command
- User login endpoints
- Session management (create, validate, delete)
- Argon2 password hashing
- Redis session store for auth lookups
- PostgreSQL user_session table for session management UI
- User module
- User CRUD operations
- User profile endpoints
- Active sessions list (view/terminate sessions)
- Integration architecture
- Base integration classes and interfaces
- Integration loader (dynamic plugin loading)
- OAuth endpoint (unified for all integrations)
- Task-based execution system
- Polling system
- Bull task queue (Redis-backed)
- Priority-based task scheduling
- Idempotency checking (contentExists callback)
- Background worker for executing integration tasks
- Gmail integration (OAuth)
- OAuth flow implementation
- Task types: start, getEmailList, downloadEmail
- Pagination support
- Early stop optimization
- Store email content in user_integration_content table
- Search system
- PostgreSQL full-text search on JSONB content
- Basic query parsing and sanitization
- Search endpoints (scoped to authenticated user)
- Result ranking (simple relevance)
- Database schema
- user table
- user_session table
- user_integration table (with encrypted credentials)
- user_integration_content table (JSONB content storage)
- Observability
- Structured JSON logging
- Request context (requestId, userId, traceId)
- Basic error handling and logging
- Docker setup
- Backend Dockerfile
- docker-compose.yml (backend, PostgreSQL, Redis)
Frontend
- Svelte app setup
- App router structure
- Basic layout and navigation
- Authentication pages
- Login page
- Cookie-based auth
- Search interface
- Search input and results display
- Basic result rendering (title, snippet, source)
- Loading and error states
- Integration management
- "Connect Gmail" button
- OAuth authorization flow
- Integration status display
- Settings page
- User profile
- Active sessions (list and terminate)
- Frontend Dockerfile
Documentation
- Getting Started guide (README update)
- Architecture overview
- Backend standards (service, controller, DTO, entity)
- Integration development guide
- OAuth integration guide
- Authorization/session management guide
- Coding style guide
- Deployment instructions (Docker)
Website
- Single-page landing site
- Project description
- Key features
- Link to docs
- GitHub link
V2 - Multi-Tenancy Foundation (4-6 weeks)
Target: Teams, small companies (still self-hosted), prepare for SaaS
Goal: Add organizational primitives and scale improvements. Architecture now supports multi-tenant SaaS but still deployed as self-hosted.
Core Features
- Database migration system
- TypeORM migrations
- Migration CLI commands
- Rollback support
- Organizations/companies
- Organization entity and CRUD
- User-to-organization relationships
- Organization settings
- Teams/groups within organizations
- Team entity and CRUD
- Team membership
- Team-level permissions
- User permissions system
- Role-based access control (owner, admin, member, viewer)
- Permission checks in services
- Sharing integrations between users
- Data source access control
- User invitations
- Email-based invitations
- Invitation acceptance flow
- Pending invitations management
Search Improvements
- Elasticsearch integration
- Elasticsearch Docker container
- Sync PostgreSQL content to Elasticsearch
- Background sync worker
- Full-text search using Elasticsearch
- Improved relevance scoring
- Search filters
- Filter by integration (Gmail, FTP, etc.)
- Filter by date range
- Filter by sender/source
Additional Integrations
- FTP/sFTP integration
- Credentials-based auth (not OAuth)
- Directory traversal
- File metadata indexing
- Recursive directory support
- Slack integration (OAuth)
- OAuth flow
- Channel message syncing
- Direct message syncing
Infrastructure
- Raspberry Pi optimization
- Memory-optimized PostgreSQL config
- CPU throttling for background tasks
- Minimal Docker image sizes
- Performance testing on Pi 4
- Environment configuration
- .env.example with all required vars
- Configuration validation on startup
- Better error messages for missing config
UI Improvements
- Organization switcher
- Team management UI
- Permission management UI
- Integration settings per user/team
- Better search result display
V3 - Business SaaS Launch (3-6 months)
Target: Companies that want hosted solution, no self-hosting burden
Goal: Launch Yew Search as a hosted SaaS product. Self-hosted version remains available with core features.
SaaS Infrastructure
- Cloud deployment
- Production-grade docker-compose or Kubernetes
- Load balancer setup
- Database connection pooling
- Redis clustering
- Multi-tenant architecture
- Data isolation per organization
- Tenant-aware queries (all services check organization)
- Database per tenant vs shared database decision
- Billing and subscriptions
- Stripe integration
- Subscription plans (Business, Enterprise)
- Usage tracking (searches, storage, integrations)
- Resource quotas per plan
- Billing portal (Stripe Customer Portal)
- Admin dashboard
- Organization list and search
- User activity monitoring
- System health metrics
- Feature flag management per customer
- Onboarding flow
- Signup for Business tier
- Organization creation
- Team setup wizard
- Integration walkthrough
Advanced Search
- Search collections/groups
- Group multiple integrations into collections
- Search within specific collections
- Share collections with team members
- Improved search algorithm
- BM25 or similar (better than TF-IDF)
- Boosting by recency, source, etc.
- Query expansion and synonyms
- Saved searches
- Save frequently-used searches
- Search history per user
LLM Integration
- LLM configuration system
- Ollama integration (self-hosted LLM)
- LangChain/LangGraph architecture
- Support for OpenAI, Anthropic, etc.
- Search result summarization
- Summarize top N results
- Extract key points
- Answer questions based on search results
- Optional LLM features (gated by plan)
More Integrations
- Google Drive (OAuth)
- Dropbox (OAuth)
- Microsoft 365 (OAuth)
- At least 3 more integrations based on user demand
UI/UX Improvements
- Polish all interfaces
- Professional design
- Consistent component library (Shadcn/UI)
- Mobile responsive
- Dark mode
- Keyboard shortcuts
- Advanced search syntax
- Result preview/quick view
Marketing Site
- Full marketing website (separate from app)
- Feature pages
- Pricing page
- Documentation
- Blog
- Customer testimonials
V4 - Enterprise Tier (6-12 months)
Target: Large companies with custom needs, compliance requirements
Goal: Launch Enterprise tier with dedicated infrastructure, custom integrations, and compliance.
Enterprise Features
- Dedicated infrastructure provisioning
- Per-customer infrastructure
- Custom resource allocation
- Dedicated database
- Isolated workers
- Custom integrations
- Build integrations per customer request
- Private integrations (not available to other customers)
- Integration development as a service
- SSO/SAML
- SAML 2.0 authentication
- Azure AD integration
- Okta integration
- Google Workspace SSO
- Advanced admin features
- Audit logs (all actions logged)
- Data retention policies
- Export all data (GDPR compliance)
- Advanced user provisioning (SCIM)
Compliance & Security
- SOC2 Type II certification
- Security audit preparation
- Compliance documentation
- Annual audits
- GDPR compliance enhancements
- Right to be forgotten
- Data portability
- Consent management
- HIPAA compliance (if needed)
- BAA agreements
- Encryption at rest and in transit
- Access controls
- Security hardening
- Penetration testing
- Vulnerability scanning
- Incident response plan
- Security training
Support & Success
- Dedicated support
- Slack channel per customer
- Response time SLAs
- Priority bug fixes
- Customer success manager
- Regular check-ins
- Feature adoption tracking
- Custom training sessions
- Professional services
- Integration development
- Custom feature development
- Migration assistance
Customer Launch
- 1 pilot Enterprise customer
- Small team (5-10 people)
- Gather feedback
- Refine Enterprise offering
- Case study and testimonial
- Enterprise sales process documentation
Optimizations
Database Query Performance Monitoring
Context: TypeORM query logging is disabled (logging: false in app.module.ts) because it's noisy and only captures queries from one app instance. Instead, we use PostgreSQL's built-in performance monitoring tools.
pg_stat_statements Extension (Recommended)
The industry standard for query performance analysis. Tracks execution statistics for all SQL statements across all connections.
Setup:
-- Enable the extension (one time)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Query slow queries
SELECT
query,
calls,
total_exec_time,
mean_exec_time,
max_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;
Benefits:
- Tracks all queries across all connections and app instances
- Shows execution count, total time, mean time, max time
- Persists across sessions
- Near-zero performance overhead
- Essential for production query optimization
log_min_duration_statement
Automatically log queries that exceed a duration threshold to PostgreSQL logs.
Setup:
-- In postgresql.conf or via ALTER SYSTEM
ALTER SYSTEM SET log_min_duration_statement = 1000; -- Log queries > 1 second
SELECT pg_reload_conf();
Benefits:
- Simple to set up
- Captures slow queries with execution time
- Useful for catching extreme outliers
- Logs include query parameters
auto_explain Module
Automatically logs execution plans for slow queries. Useful for understanding why specific queries are slow.
Setup:
-- In postgresql.conf (requires restart)
shared_preload_libraries = 'auto_explain'
auto_explain.log_min_duration = 1000 -- Explain queries > 1s
auto_explain.log_analyze = true
auto_explain.log_timing = true
Benefits:
- Shows EXPLAIN ANALYZE output for slow queries
- Helps identify missing indexes or inefficient query plans
- No code changes required
Recommended Configuration
V1 (Development/Self-Hosted):
- Enable
pg_stat_statementsfor query analysis - Set
log_min_duration_statement = 2000to catch very slow queries
V2-V3 (Multi-tenant/SaaS):
- Enable
pg_stat_statements(required) - Set
log_min_duration_statement = 1000 - Consider
auto_explainfor production debugging
V4 (Enterprise):
- All of the above
- Query performance monitoring dashboards
- Automated slow query alerts
- Per-customer query performance analysis
Future Optimizations
- Add query performance metrics to observability stack
- Create Grafana dashboard for
pg_stat_statementsdata - Automated query optimization suggestions
- Index recommendation system based on slow query patterns
Test Optimizations
Currently the e2e tests maintain the atomicity by clearing the database before each test Even for small datasets this can take time.
While the clearDatabase function clears all tables in parallel there will probably be a time when we need to NOT clear the database after each test.
Additionally, the current testing strategy does not allow multiple tests in the same suite to run in parallel since they read and write data to the same table.
Determining how to run tests in parallel while allowing reads and writes to a real database will be required at some point.
Feature Differentiation
Self-Hosted (Always Free)
- Core search functionality
- Gmail integration
- 1-2 additional basic integrations (FTP, Slack)
- Single user or family use (< 10 users)
- PostgreSQL full-text search
- Community support only
- Docker deployment
Business Tier (SaaS - Paid)
- Hosted infrastructure (no self-hosting)
- Unlimited users per organization
- Teams and permissions
- All integrations
- Elasticsearch search
- LLM features (summarization)
- Search collections
- SSO (Google, Microsoft)
- Email support
- 99.9% uptime SLA
- Usage analytics
- Admin dashboard
Enterprise Tier (High-Touch - Custom Pricing)
- Everything in Business tier
- Dedicated infrastructure
- Custom integrations
- SAML/SSO (any provider)
- SOC2/HIPAA compliance
- Dedicated support (Slack channel)
- Customer success manager
- Professional services
- Custom SLAs
- Data residency options
- On-premise deployment option
Long-Term Vision
Business Model: Fair-code / Open-core
- Self-hosted version remains genuinely useful forever
- Core functionality always free
- Advanced features for companies (not individuals)
- Commercial license required for business use
Deployment Options:
- Local/Homelab - Free for personal use, community supported
- Business SaaS - Hosted multi-tenant, standard pricing
- Enterprise - Dedicated infrastructure, custom pricing
Target Markets:
- Phase 1 (V1-V2): Homelab enthusiasts, power users, families
- Phase 2 (V3): Small-medium businesses (10-100 employees)
- Phase 3 (V4): Large enterprises (100+ employees)
Success Metrics:
- V1: 100 active self-hosted deployments
- V2: 1,000 active self-hosted deployments
- V3: 50 paying Business customers
- V4: 5 Enterprise customers
Notes
- All versions maintain backward compatibility with self-hosted deployments
- Breaking changes communicated 90 days in advance
- Community input welcome on feature prioritization
- Roadmap updated quarterly based on feedback