Scaling SharePoint Search for Large Enterprise Farms: Index Distribution and Crawl Isolation

At a certain scale, SharePoint Search stops being a configuration problem and becomes an infrastructure problem. The topology decisions that work for a smaller farm — two app servers, a shared index directory, a handful of crawl databases — reach their limits when the item count grows into the tens or hundreds of millions. Crawl queue depth grows faster than the crawlers can process it. The index grows too large for a single server’s disk I/O and memory. Content processing saturates CPU during crawl windows.

This post covers the large farm reference architecture: what it looks like, why each hardware decision was made, and how to implement it using Optimize-SPSearchInfrastructure.ps1.


When You’ve Outgrown a Default Topology

Before describing the solution, it is worth identifying the symptoms clearly. These are the observable signs that a farm has grown beyond what a standard 2-server search topology can handle:

Crawl queue backup: The number of items waiting to be crawled grows faster than the crawl rate. Incremental crawl jobs pile up behind full crawls.

Index server CPU and disk saturation: The server hosting the Index component hits sustained high CPU during crawl windows — not because of query load, but because of index write activity from Content Processing.

Query latency spikes during crawl: Users report slow search results at the same time a full crawl is running. This is the clearest sign that Index I/O is shared with Content Processing I/O on the same server.

Content processing queue depth: The content processing component cannot keep up with documents arriving from the Crawl component, causing the pipeline to back up.

Inflection points by scale (per Microsoft capacity planning guidance):

Item CountTypical SymptomAction
Up to 20M itemsSmall topology sufficient2 app servers; index co-located with other components
20M–80M itemsMedium topology neededDedicated nodes for index components
80M–200M itemsLarge topology requiredFully separated index, crawl, and content processing servers
200M–500M itemsExtra-large topology24+ index components; specialist planning recommended

The Large Farm Reference Architecture (80M+ Items)

At 80 million items and above, co-locating the Index component with Content Processing creates I/O contention. Each index partition holds up to 20 million items when sized with 32 GB RAM and 500 GB SSD storage (per Microsoft’s capacity planning guidance). A farm with 80–200 million items therefore needs 4–10 index partitions. Distributing these partitions across dedicated index servers — separate from the crawl and content processing servers — is the defining characteristic of the large farm topology.

The large farm reference architecture separates every distinct workload onto servers that are sized for that workload. It uses six servers in two groups:

Group 1: Dedicated Index Servers (3×)

  • Host Index components only.
  • Sized for SSD storage (index size is typically 15–25% of raw content volume after extraction).
  • High memory for index caching (64GB minimum per server recommended).
  • No crawl I/O, no document parsing, no query result assembly — purely index read/write.

Group 2: Application Servers (3×)

  • Host Crawl, Content Processing, Query Processing, Analytics Processing, and Admin components.
  • Sized for high CPU (Content Processing is the most CPU-intensive component).
  • Normal spinning disk or SSD adequate — no large on-disk index.
INDEX-SERVER-1 INDEX-SERVER-2 INDEX-SERVER-3
────────────── ────────────── ──────────────
Index Part. 0 Index Part. 1 Index Part. 2
Index Part. 3 Index Part. 4 Index Part. 5
Index Part. 6 Index Part. 7 Index Part. 8
Index Part. 9 Index Part. 10 Index Part. 11
APP-SERVER-1 APP-SERVER-2 APP-SERVER-3
────────────── ────────────── ──────────────
Admin Crawl Content Processing
Query Processing Analytics Query Processing
Content Processing Crawl Analytics

This separation is the core insight: index I/O and crawl I/O must not compete for the same disk spindles or IOPS budget. On a fast SSD with adequate memory, each index server can host 4–6 index partitions comfortably, giving you 12 partitions across 3 servers — the default PartitionCount in Optimize-SPSearchInfrastructure.ps1.

What a Replica Is

Each partition can have one or more replicas — copies of the same index data on different servers. A primary replica serves queries under normal operation. If the primary is unavailable, the replica takes over automatically. Replicas also distribute query load.

How the Script Distributes Partitions

Optimize-SPSearchInfrastructure.ps1 uses modular arithmetic to spread partitions across index servers:

for ($i = 0; $i -lt $partitions; $i++) {
$node = $indexNodes[$i % $indexNodes.Count]
New-SPEnterpriseSearchIndexComponent `
-SearchTopology $clone `
-SearchServiceInstance (Get-SPEnterpriseSearchServiceInstance -Identity $node) `
-IndexPartition ($i % $partitions)
}

With 3 index servers and 12 partitions, each server hosts partitions 0, 3, 6, 9 (server 1), partitions 1, 4, 7, 10 (server 2), and partitions 2, 5, 8, 11 (server 3).

Note: In the current implementation, replicas are not placed automatically. For full HA, run the script twice with the primary and replica index server lists swapped, or extend the script to call New-SPEnterpriseSearchIndexComponent a second time per partition on a different server.


Crawl Database Sizing and Isolation

Why Crawl Databases Get Overloaded

The crawl database records every URL the crawler has ever seen, its current state (queued, crawled, failed), its last-modified date, and the delta between the last crawl and the current one. During an active crawl, the crawl database receives thousands of write operations per minute. A single crawl database on a large farm creates a write hotspot on SQL.

The Sizing Rule of Thumb

The appropriate number of crawl databases depends on your daily crawl delta (new and modified content per day) and number of distinct content sources. Start with one crawl database per major content type and scale out based on observed queue depth.

A sensible baseline for a large enterprise farm:

  • Collaboration SSA: 5 crawl databases (-CrawlDBCount 5)
  • My Sites SSA: 2 crawl databases (-CrawlDBCount 2)

Adjust these counts upward if you observe sustained crawl queue backup during incremental crawl cycles.

Collaboration vs. My Sites: Keep Them Separate

Even if you use a single SSA, the crawl databases for collaboration and My Sites content should be separate. A deep-archive full crawl of SharePoint team sites should not fill the crawl queue in a way that delays incremental crawls of My Sites. Initialize-SPSearchArchitecture.ps1 with -Target Collaboration and -Target “My Sites” creates named databases in separate groups specifically to enforce this isolation.

# Collaboration crawl databases (run this first)
.\Initialize-SPSearchArchitecture.ps1 `
-Target Collaboration `
-SSAName "Enterprise Search Service" `
-SQLServer "<YOUR-SQL-SERVER>\<INSTANCE>" `
-CrawlDBCount 5 `
-ContentSourceCount 5
# My Sites crawl databases
.\Initialize-SPSearchArchitecture.ps1 `
-Target "My Sites" `
-SSAName "Enterprise Search Service" `
-SQLServer "<YOUR-SQL-SERVER>\<INSTANCE>" `
-CrawlDBCount 2 `
-ContentSourceCount 2

Content Processing Component Placement

Content Processing is the most CPU-intensive component in the search pipeline. It receives raw document bytes from the Crawl component, invokes protocol handlers and IFilters to extract text, runs entity extraction (people, locations, dates), applies managed property mappings, and writes enriched items to the Index component.

For a large enterprise farm, content processing must be on the App servers — not the Index servers. If you place Content Processing and Index on the same server:

  • Document parsing workloads compete with index write I/O.
  • Memory pressure from large document parsing can evict index pages from cache.
  • A runaway content processing job (for example, a malformed 2GB PDF) can cause disk I/O spikes that degrade query latency.

Optimize-SPSearchInfrastructure.ps1 places Content Processing on every server listed in -AppServers. For 3 App servers, you get 3 Content Processing components running in parallel, distributing the parsing load.


Query Processing Component Placement

Query Processing is memory-bound and latency-sensitive. It receives query requests from web applications, applies query rules and scope transformations, issues index read requests to the Index components, assembles results, and applies the relevancy ranking model.

At this scale with multiple index partitions, Query Processing must fan out to all 12 partitions for every query. The latency budget for this fanout is measured in tens of milliseconds. Network distance between the Query Processing component and the Index components matters — keep them on the same network segment.

At minimum: 2 Query Processing components for HA. The script places one on each App server, giving you 3 for a 3-App-server deployment.


Running Optimize-SPSearchInfrastructure.ps1 for a Large Enterprise Farm

Prerequisites

  • Initialize-SPSearchArchitecture.ps1 has been run for both Collaboration and My Sites targets.
  • Index directories exist and are empty on all three index servers.
  • Search Service Instances are started on all 6 servers.

Parameter Walkthrough

.\Optimize-SPSearchInfrastructure.ps1 `
-Target Collaboration `
-IndexServers "<INDEX-SERVER-1>", "<INDEX-SERVER-2>", "<INDEX-SERVER-3>" `
-AppServers "<APP-SERVER-1>", "<APP-SERVER-2>", "<APP-SERVER-3>" `
-EnterpriseSSAName "Enterprise Search Service" `
-SQLServer "<YOUR-SQL-SERVER>\<INSTANCE>" `
-PartitionCount 12 `
-LinkDBCount 3 `
-PerformanceLevel Maximum
ParameterDescription
-TargetCollaboration, “My Sites”, or Both — which SSA to optimize
-IndexServersDedicated index server names (Index components only)
-AppServersApp server names (Crawl, Query, Content Processing, Analytics)
-PartitionCountTotal number of index partitions across all index servers
-LinkDBCountNumber of Link databases (recommendation engine) to provision
-PerformanceLevelMaximum for production farms; Reduced or PartlyReduced for development

What the Script Does Internally

  1. Clones the active topology.
  2. Distributes -PartitionCount index partitions across -IndexServers using round-robin.
  3. Places Crawl, Query, Content Processing, and Analytics components on every server in -AppServers.
  4. Places the Admin component on the first App server.
  5. Scales the Link database to -LinkDBCount instances.
  6. Activates the new topology.

Note: Large farms take significantly longer to activate than small farms. A 12-partition topology across 6 servers typically takes 8–20 minutes to fully activate. The script does not wait for activation completion — monitor progress via Central Admin → Search Topology.

Expected Timing

Farm SizeApproximate Activation Time
2 servers, 1 partition2–5 minutes
4 servers, 2 partitions5–10 minutes
6 servers, 12 partitions10–25 minutes

During activation, the existing topology remains active and continues serving queries. Activation is non-destructive to the running index.


Post-Scale Validation

Verify All Index Partitions Are Active

$ssa = Get-SPEnterpriseSearchServiceApplication -Identity "Enterprise Search Service"
Get-SPEnterpriseSearchTopology -SearchApplication $ssa -Active |
Get-SPEnterpriseSearchComponent |
Where-Object { $_.GetType().Name -like "*Index*" } |
Select Name, ServerName, IndexPartitionOrdinal, State

You should see 12 entries (one per partition) with State = Active, distributed across the three index servers.

Monitor Crawl Database Queue Depth

Start a full crawl and monitor the crawl queue across databases:

$ssa = Get-SPEnterpriseSearchServiceApplication -Identity "Enterprise Search Service"
$sources = Get-SPEnterpriseSearchCrawlContentSource -SearchApplication $ssa
$sources | Select Name, CrawlState, SuccessCount, ErrorCount, ItemCount

If one crawl database is growing while others are idle, the content source to crawl database mapping may not be balanced. Adjust using Set-SPEnterpriseSearchCrawlContentSource -CrawlDatabase.

Query Latency Baseline

After activating the new topology and completing a full crawl, use the Crawl Health Reports and Query Health Reports in Central Admin (Search Service Application → Diagnostics) to baseline crawl throughput and query latency. Compare metrics before and after the topology change to validate the improvement.


Summary

Scaling SharePoint Search beyond 80 million items requires separating the index workload from the crawl and content processing workloads onto dedicated hardware. The large farm reference architecture — 3 dedicated index servers hosting 12 partitions, 3 App servers hosting all other components — provides the I/O isolation and CPU headroom that prevents crawl activity from degrading query performance.

Optimize-SPSearchInfrastructure.ps1 in this post’s scripts/ folder implements this architecture with parameterised inputs. Run Initialize-SPSearchArchitecture.ps1 first to size the crawl databases correctly, then run the optimize script to redistribute the component topology.

👉 SPSE Search Topology Starter Kit (Production-Ready PowerShell + PDF Runbooks)

👉 SPSE Search Config Backup Kit

👉 SPSE Search Deployment Kit for Large Farms

👉 SPSE Crawl Optimisation Kit 

👉 Complete SPSE Search Architecture Pack


Related Posts

  • Post #1: Designing the Right SharePoint Search Topology for Production SPSE Farms
  • Post #2: Deploying a Custom SharePoint Search Topology with PowerShell (End-to-End)
  • Post #4: Controlling SharePoint Crawl Performance: Impact Rules and Crawl Rules
  • Post #5: Federated Search in SPSE: Searching Across Multiple Search Service Applications

Leave a Reply