One of the core features that most desktop users take for granted is almost completely missing on classic NAS systems: fast full-text search across file content. Anyone looking for an invoice from last quarter with only a contract number in mind usually clicks through hundreds of PDFs manually or falls back to local tools like Everything, Recoll or DocFetcher. With TrueNAS 26.04 that changes: the release ships an integrated Search Index with full-text search over file names and content — turning the NAS into a searchable knowledge base. iXsystems introduced the feature on the T3 Podcast E041; here is the DATAZONE field perspective.
What the Search Index does
The Search Index is an optional service that can be enabled per pool or per dataset. Once turned on, it indexes in the background:
- File names and paths — as before, but structured and queryable
- Metadata — creation and modification timestamps, owner, size, permissions
- Content of Office documents (docx, xlsx, pptx, odt, ods, odp), PDFs (text-based and via OCR), plain-text files (txt, md, csv, log, json, yaml, xml) and mailboxes (eml, mbox)
Queries run through the TrueNAS web UI or via REST/WebSocket API — the latter opens the door for integrations with DMS systems, RAG pipelines and helpdesk tools.
Architecture: Apache Tika plus Lucene
Under the hood, TrueNAS relies on well-known open-source components:
| Component | Role |
|---|---|
| Apache Tika | Content extraction for 1,000+ file formats |
| Apache Lucene | Inverted index for fast text search |
| Tesseract OCR (optional) | Text recognition in scanned PDFs and images |
| TrueNAS Indexer Service | Orchestration, incremental indexing, re-index after snapshot rollback |
Apache Tika has been the de-facto standard for content extraction in the Java world for years — Elasticsearch, Solr and many commercial DMS solutions build on it. Lucene is the underlying search engine from the same Apache family. Both are deliberately not reinvented but integrated into the TrueNAS stack. The reason is pragmatic: maintaining an in-house index stack would be neither sensible nor sustainable for iXsystems.
How the index stays current
The indexer works incrementally and leverages ZFS’s own awareness of changed files:
- Initial indexing: Full scan across all indexed datasets — typically hours to days depending on data volume
- Incremental update:
inotifywatchers plus periodic ZFS-diff checks pick up changes immediately or in 5-minute intervals - Snapshot awareness: After a snapshot rollback the index is consolidated automatically — no inconsistent search results
- Replication awareness: On replication targets the index is optionally rebuilt as well, so the backup NAS is searchable too
The index itself lives in a separate dataset (.truenas-search-index), is compressed automatically and excluded from the regular snapshot schedule. Typical index size is 2–5% of indexed payload data — for 100 TB of indexed Office documents that’s roughly 2–5 TB of index. Anyone uncomfortable with that can place the index on a separate fast pool (an NVMe-SSD special VDEV is a natural fit).
Real-world use cases
1. Tax and accounting
The classic. Client asks for an invoice from 2024-12 with a specific amount. Instead of clicking through client folders, the clerk types "Invoice 2024-12" "Consulting" 1,785 and finds the PDF in seconds — regardless of which client folder it lives in. The condition: client permissions are enforced (see below).
2. Law firm
“Where’s that contract with client X from Q2 2023, the one with the non-compete clause?” — before: open the client file, sift through contracts, scroll manually. With the Search Index: full-text query for client name and keyword, hit list in seconds.
3. Engineering and design
CAD files themselves aren’t indexed (Tika only partially understands proprietary formats), but companion documents, bills-of-materials as XLSX, requirement PDFs and Markdown specs all are. Useful when a component appears in multiple projects.
4. Helpdesk and knowledge management
If your entire knowledge base today is a shared folder full of Word files (the reality in many SMBs!), the Search Index turns that folder into a searchable knowledge base — no migration to Confluence or a DMS needed.
5. RAG preparation
For teams experimenting with Retrieval Augmented Generation: the TrueNAS index serves text snippets plus metadata directly over the API — exactly what a RAG system needs as a source.
Resource demand, realistically
Full-text search costs CPU, RAM and IOPS. iXsystems gives the following ballpark figures on the T3 podcast:
| Aspect | Initial indexing | Steady state |
|---|---|---|
| CPU | Medium to high (4–8 cores recommended) | Low (<10%) |
| RAM | +8–16 GB ARC additional | +4–8 GB Lucene heap |
| IOPS | Sequential reads with high bandwidth | Low, write spikes |
| Index storage | 2–5% of payload | Grows linearly with data |
A TrueNAS Mini X+ with 64 GB RAM can run the index for a moderate dataset just fine. Beyond roughly 50 TB of indexed data with concurrent heavy search load we recommend at least an H-series or larger — not least because OCR (when enabled) is very CPU-hungry.
Privacy: permissions are honored
A fair question: “Will every user now find every file on the NAS?” Answer: no. The TrueNAS indexer works in two stages:
- Index is global: all indexed files sit in the Lucene index regardless of user permissions
- Per-query filter: every search request filters hits against the SMB/NFS permissions of the user before returning results
This is technically equivalent to how Windows Search works in an Active Directory environment. Important caveat: the index itself holds extracted plaintext — anyone with direct access to the index storage (admins) can read everything. For highly sensitive data (HR records, NDA-protected research data) the Search Index should deliberately not be enabled, or the relevant dataset excluded.
What the Search Index is not
Set expectations:
- Not a full DMS: no versioning, no workflows, no annotations. Anyone needing document management in the strict sense still needs a DMS like ELO, M-Files or ecoDMS.
- No semantic search: Lucene does keyword matching with stemming and synonyms. “Contract” finds “contracts” but not “cooperation agreement.” Vector-based semantic search is on the iXsystems roadmap but not in 26.04.
- No image search: OCR makes text in images readable, but image content itself (logos, people, scenes) is not searchable.
- No replacement for structured databases: anyone querying invoice line items with tax rates and amounts structurally still needs an accounting or ERP system.
Turning it on in practice
In the TrueNAS web UI under Datasets → [Dataset] → Edit → Indexing:
- Enable Search Index
- Enable Index Content (otherwise only file names are indexed)
- OCR for scanned PDFs — only enable if CPU capacity is available
- Index Location: separate pool recommended for large data volumes
- Excluded Patterns: e.g.
*.iso,*.vmdk,*/backups/*
Then Save — initial indexing runs in the background at low priority. Progress is visible under Reporting → Search Index.
DATAZONE recommendation
Full-text search on the NAS is one of those features users can’t imagine living without after a week. Our pragmatic guidance:
- Enable immediately for classic Office datasets (accounting, contracts, general filing)
- Don’t enable for backup datasets, ISO repositories and pure media stores — no real benefit, only cost
- Review separately for HR records, attorney-client-privileged datasets and NDA research data — deliberately skip or place on a dedicated pool
- Plan resources first: anyone running a Mini X+ near capacity should check pool and RAM headroom before turning on indexing
Related articles:
- TrueNAS Configurator: model selection with live capacity math
- TrueNAS 25.10 Goldeye release
- TrueNAS snapshots and replication
If you want to evaluate the Search Index for your use case, we are happy to provide a sizing review including a realistic index-size estimate based on your existing datasets.
Sources
More on these topics:
More articles
TrueNAS API with Python: Automating Custom Reports
TrueNAS WebSocket and REST API with Python: generate an API key, examples for pool usage, snapshot age, SMART status. Complete script for an 80% pool alert via email.
TrueNAS Cloud Sync to Backblaze B2: Affordable Offsite Backup
TrueNAS Cloud Sync to Backblaze B2 as an offsite backup target: B2 application key, bucket setup, push mode, encryption and bandwidth management. With best practices for SMBs.
Cloud Backup Providers Compared: B2, Storj, Wasabi, AWS
Backblaze B2, Storj, Wasabi and AWS S3 compared as S3-compatible backup targets. Evaluation criteria for SMBs: price, egress, geo-redundancy, EU location, minimum retention — with a clear link to the 3-2-1 rule.