One of the core features that most desktop users take for granted is almost completely missing on classic NAS systems: fast full-text search across file content. Anyone looking for an invoice from last quarter with only a contract number in mind usually clicks through hundreds of PDFs manually or falls back to local tools like Everything, Recoll or DocFetcher. With TrueNAS 26.04 that changes: the release ships an integrated Search Index with full-text search over file names and content — turning the NAS into a searchable knowledge base. iXsystems introduced the feature on the T3 Podcast E041; here is the DATAZONE field perspective.

What the Search Index does

The Search Index is an optional service that can be enabled per pool or per dataset. Once turned on, it indexes in the background:

File names and paths — as before, but structured and queryable
Metadata — creation and modification timestamps, owner, size, permissions
Content of Office documents (docx, xlsx, pptx, odt, ods, odp), PDFs (text-based and via OCR), plain-text files (txt, md, csv, log, json, yaml, xml) and mailboxes (eml, mbox)

Queries run through the TrueNAS web UI or via REST/WebSocket API — the latter opens the door for integrations with DMS systems, RAG pipelines and helpdesk tools.

Architecture: Apache Tika plus Lucene

Under the hood, TrueNAS relies on well-known open-source components:

Component	Role
Apache Tika	Content extraction for 1,000+ file formats
Apache Lucene	Inverted index for fast text search
Tesseract OCR (optional)	Text recognition in scanned PDFs and images
TrueNAS Indexer Service	Orchestration, incremental indexing, re-index after snapshot rollback

Apache Tika has been the de-facto standard for content extraction in the Java world for years — Elasticsearch, Solr and many commercial DMS solutions build on it. Lucene is the underlying search engine from the same Apache family. Both are deliberately not reinvented but integrated into the TrueNAS stack. The reason is pragmatic: maintaining an in-house index stack would be neither sensible nor sustainable for iXsystems.

How the index stays current

The indexer works incrementally and leverages ZFS’s own awareness of changed files:

Initial indexing: Full scan across all indexed datasets — typically hours to days depending on data volume
Incremental update: inotify watchers plus periodic ZFS-diff checks pick up changes immediately or in 5-minute intervals
Snapshot awareness: After a snapshot rollback the index is consolidated automatically — no inconsistent search results
Replication awareness: On replication targets the index is optionally rebuilt as well, so the backup NAS is searchable too

The index itself lives in a separate dataset (.truenas-search-index), is compressed automatically and excluded from the regular snapshot schedule. Typical index size is 2–5% of indexed payload data — for 100 TB of indexed Office documents that’s roughly 2–5 TB of index. Anyone uncomfortable with that can place the index on a separate fast pool (an NVMe-SSD special VDEV is a natural fit).

Real-world use cases

1. Tax and accounting

The classic. Client asks for an invoice from 2024-12 with a specific amount. Instead of clicking through client folders, the clerk types "Invoice 2024-12" "Consulting" 1,785 and finds the PDF in seconds — regardless of which client folder it lives in. The condition: client permissions are enforced (see below).

2. Law firm

“Where’s that contract with client X from Q2 2023, the one with the non-compete clause?” — before: open the client file, sift through contracts, scroll manually. With the Search Index: full-text query for client name and keyword, hit list in seconds.

3. Engineering and design

CAD files themselves aren’t indexed (Tika only partially understands proprietary formats), but companion documents, bills-of-materials as XLSX, requirement PDFs and Markdown specs all are. Useful when a component appears in multiple projects.

4. Helpdesk and knowledge management

If your entire knowledge base today is a shared folder full of Word files (the reality in many SMBs!), the Search Index turns that folder into a searchable knowledge base — no migration to Confluence or a DMS needed.

5. RAG preparation

For teams experimenting with Retrieval Augmented Generation: the TrueNAS index serves text snippets plus metadata directly over the API — exactly what a RAG system needs as a source.

Resource demand, realistically

Full-text search costs CPU, RAM and IOPS. iXsystems gives the following ballpark figures on the T3 podcast:

Aspect	Initial indexing	Steady state
CPU	Medium to high (4–8 cores recommended)	Low (<10%)
RAM	+8–16 GB ARC additional	+4–8 GB Lucene heap
IOPS	Sequential reads with high bandwidth	Low, write spikes
Index storage	2–5% of payload	Grows linearly with data

A TrueNAS Mini X+ with 64 GB RAM can run the index for a moderate dataset just fine. Beyond roughly 50 TB of indexed data with concurrent heavy search load we recommend at least an H-series or larger — not least because OCR (when enabled) is very CPU-hungry.

Privacy: permissions are honored

A fair question: “Will every user now find every file on the NAS?” Answer: no. The TrueNAS indexer works in two stages:

Index is global: all indexed files sit in the Lucene index regardless of user permissions
Per-query filter: every search request filters hits against the SMB/NFS permissions of the user before returning results

This is technically equivalent to how Windows Search works in an Active Directory environment. Important caveat: the index itself holds extracted plaintext — anyone with direct access to the index storage (admins) can read everything. For highly sensitive data (HR records, NDA-protected research data) the Search Index should deliberately not be enabled, or the relevant dataset excluded.

What the Search Index is not

Set expectations:

Not a full DMS: no versioning, no workflows, no annotations. Anyone needing document management in the strict sense still needs a DMS like ELO, M-Files or ecoDMS.
No semantic search: Lucene does keyword matching with stemming and synonyms. “Contract” finds “contracts” but not “cooperation agreement.” Vector-based semantic search is on the iXsystems roadmap but not in 26.04.
No image search: OCR makes text in images readable, but image content itself (logos, people, scenes) is not searchable.
No replacement for structured databases: anyone querying invoice line items with tax rates and amounts structurally still needs an accounting or ERP system.

Turning it on in practice

In the TrueNAS web UI under Datasets → [Dataset] → Edit → Indexing:

Enable Search Index
Enable Index Content (otherwise only file names are indexed)
OCR for scanned PDFs — only enable if CPU capacity is available
Index Location: separate pool recommended for large data volumes
Excluded Patterns: e.g. *.iso, *.vmdk, */backups/*

Then Save — initial indexing runs in the background at low priority. Progress is visible under Reporting → Search Index.

DATAZONE recommendation

Full-text search on the NAS is one of those features users can’t imagine living without after a week. Our pragmatic guidance:

Enable immediately for classic Office datasets (accounting, contracts, general filing)
Don’t enable for backup datasets, ISO repositories and pure media stores — no real benefit, only cost
Review separately for HR records, attorney-client-privileged datasets and NDA research data — deliberately skip or place on a dedicated pool
Plan resources first: anyone running a Mini X+ near capacity should check pool and RAM headroom before turning on indexing

If you want to evaluate the Search Index for your use case, we are happy to provide a sizing review including a realistic index-size estimate based on your existing datasets.

TrueNAS Spotlight: Full-Text Search and Search Index on the NAS

What the Search Index does

Architecture: Apache Tika plus Lucene

How the index stays current

Real-world use cases

Resource demand, realistically

Privacy: permissions are honored

What the Search Index is not

Turning it on in practice

DATAZONE recommendation

Sources

More articles

TrueNAS on Old Hardware: Recycle or Buy New?

NVMe-TCP vs. Fibre Channel vs. iSCSI: A Practical Decision

TrueNAS Made in USA: Honestly Assessing the Data Privacy Debate for EU Customers

Need IT consulting?