How to Set Up Search Architecture in SharePoint Subscription Edition After Migration

Your databases are mounted. Sites are loading. Users are logging in. Then someone types a search query — and gets nothing.

This is not a failure. It is the expected state of every SharePoint migration the moment after cutover. The search index did not come with you. It cannot. It is not a database you can detach and reattach on the new farm — it is a set of binary corpus files built by a specific farm’s crawl component and tied entirely to that environment. Everything starts over.

This post covers what comes next: how to rebuild the Search Service Application correctly for SharePoint Subscription Edition, configure the right topology for your farm size, throttle the first crawl so it does not wreck your farm in the first 24 hours, remap your content sources after the database migration, and know how long to realistically expect the initial crawl to take.

All of this assumes you have completed the database migration and cutover work covered in Posts #7 and #8. The scripts referenced throughout this post are part of the Search Setup Bundle — seven production-ready PowerShell scripts that replace the manual, error-prone PowerShell work described here.


Why You Cannot Migrate the SharePoint Search Index

This needs to be stated plainly: the search index cannot be migrated. There is no supported path for copying or attaching a SharePoint search index from one farm to another, and no workaround that gets you close enough to be useful.

The index is stored as a set of binary files in the index partition directory on your crawl/index server — not in SQL. SharePoint does not expose those files as a transportable artifact. Even if you attempted to copy the index directory to your new farm, the schema differences between SP2019 and SPSE would prevent successful attachment, and the component identifiers that tie the index to a specific topology would be invalid on the new farm.

The crawl database — stored in SQL — is technically portable, but it represents the crawl state of your old farm. The URLs, content source assignments, and crawl history in that database reference server names, web application URLs, and component IDs that no longer exist in the same form. The safe practice is to start with a clean crawl database for SPSE and use Get-SPSearchConfigExport.ps1 to export your SP2019 configuration as a reference before decommission. Use the export to compare, not to restore.

Here is what you need to reconfigure from scratch:

  • The Search Service Application itself
  • Crawl databases — created fresh on the new SQL server
  • All content sources — URLs, crawl schedules, and authentication
  • Crawl rules — exclusions and inclusion overrides
  • Managed property mappings (if you had custom ones in SP2019)
  • Result sources and query transforms (if business units depended on scoped search)

This is a known reset point in every SharePoint migration. Frame it that way with your stakeholders before go-live — not as a gap in the migration, but as the expected starting position for post-migration search operations. Search quality will improve daily as the crawl progresses.


Understanding SPSE Search Architecture: Components You Need to Know

Before you configure topology, you need a working model of what you are configuring. SPSE search is built from six distinct components, all managed within a single Search Service Application. Each component has a specific role in the crawl-to-query pipeline.

Search Service Application (SSA)
The SSA is the container for the entire search system. It holds the topology definition, the crawl databases, the managed property schema, the content sources, and the crawl rules. You will have one SSA per farm in standard deployments. Everything else — components, databases, index partitions — hangs off this object.

Crawl Component
The crawl component connects to your content sources (SharePoint web applications, file shares, external systems), fetches content, and feeds it downstream. It is the most resource-intensive component during the initial full crawl — CPU, network, and SQL I/O all spike on the crawl server during an active full crawl. In multi-server topologies, the crawl component should be on a dedicated application server, not co-located with query processing or web front ends.

Content Processing Component
Once the crawl component retrieves content, the content processing component takes over: it extracts text, parses metadata, applies managed property mappings, and prepares items for indexing. This component is CPU-bound during heavy crawl operations. Scale it horizontally in medium and large farms.

Analytics Processing Component
The analytics processing component handles usage analytics and search relevance signals — click-through data, view frequency, and the inputs that drive promoted results and recommendations. In a fresh migration, this component starts with no historical data. Relevance will normalize over several weeks as usage patterns accumulate.

Index Component (Index Partition)
The index partition stores the full-text search index — the processed, queryable representation of all crawled content. In small farms, a single partition is standard. In medium and large farms, multiple partitions distribute the index load and allow for replication. If you want high availability for search, index partitions must be replicated — primary and replica on separate physical servers. One partition failure on an unreplicated topology means search returns no results until the partition recovers.

Query Processing Component
When a user submits a search query, the query processing component handles it: applies stemming, tokenization, relevance ranking, and result source logic, then returns results from the index. Query processing should be co-located close to the web front ends in large topologies to reduce query latency.

Admin Component
The admin component manages the Search Service Application itself — it coordinates topology changes, component health monitoring, and SSA-level operations. There is always exactly one active admin component per SSA. It is lightweight and can safely share a server with other components in small farms.

These components are configured using the SPSE topology object model: New-SPEnterpriseSearchTopology creates the topology object, individual New-SPEnterpriseSearchCrawlComponent, New-SPEnterpriseSearchIndexComponent, and related cmdlets add components to it, and Set-SPEnterpriseSearchTopology activates it. The scripts in the Search Setup Bundle wrap this model so you do not have to manage the cmdlet chain manually.


Recommended SPSE Search Topology for Small, Medium, and Large Farms

The right topology depends on your content volume and the number of servers available. This table is the reference — use it in design documents, infrastructure reviews, and pre-migration planning.

Farm SizeCrawlContent ProcessingAnalyticsQueryAdminIndex PartitionsRecommended for
Small (1–2 WFE)111111Dev/test or <1M items
Medium (3–4 WFE)221211–2Production up to 10M items
Large (5+ WFE)442412–4Enterprise >10M items

A few architectural notes that matter in practice:

In medium and large topologies, the crawl component and index partitions should be on dedicated search servers — not on web front ends or SQL boxes. Co-locating the crawl component on a WFE means the first full crawl competes with user requests for CPU and network. That is the scenario that generates performance tickets on day two of go-live.

For high availability in production, each index partition needs a primary and a replica on physically separate servers. If a single partition host goes down on an unreplicated topology, search results go dark immediately. The replica keeps results available during maintenance or failure.

For crawl database placement, isolate crawl databases on a SQL instance separate from your content databases where possible. An active full crawl generates continuous write I/O to the crawl database — placing it on the same SQL instance as your content databases introduces I/O contention that affects both search and general farm performance.

Initialize-SPSearchArchitecture.ps1 defaults to a small/single-server topology and accepts parameters to scale up. For medium and large farms, review the topology table before running the script and pass the appropriate component counts.


How Initialize-SPSearchArchitecture.ps1 Sets Up the Search Foundation

This script handles the initial build-out: creating the Search Service Application, provisioning crawl databases, and setting up the default content source. It is the first script you run when standing up search on the new farm.

What it does, in sequence:

  1. Creates the SSA with a specified name and database server
  2. Creates one or more crawl databases and associates them with the SSA
  3. Creates the default “Local SharePoint Sites” content source and adds the target web application URLs
  4. Configures the initial crawl schedule — full crawl on a weekly schedule, incremental every 15 minutes
  5. Initializes the index partition on the designated search server at the specified index path

The $IndexLocation parameter matters more than it looks. The index should always be on a dedicated, fast disk — separate from the OS volume and separate from SQL data volumes. Placing the index on the C: drive or on a shared data disk is a common misconfiguration that degrades query performance once the index reaches meaningful size. Specify a dedicated SSD volume when your infrastructure allows it.

The script is written to be re-run safely: it checks for an existing SSA before attempting creation and exits cleanly if one is already present. This makes it safe to run during phased setup or if the initial run is interrupted.

.\Initialize-SPSearchArchitecture.ps1 `
-SearchServerName "SPSE-APP01" `
-CrawlDatabaseServer "SQL-PROD" `
-ContentSourceName "SPSE-LocalSites" `
-CrawlSchedule "Daily"

Once the SSA and content sources exist, the next decision is how to start the crawl. The answer is: not yet. Run the crawl impact settings first. An unthrottled crawl on a freshly built farm is one of the most reliable ways to generate a performance incident on go-live day.


Deploying Everything at Once with Deploy-SPSE-Search-Optimized.ps1

In a live migration window, running Initialize-SPSearchArchitecture.ps1 and then manually running Set-SPSearchCrawlImpact.ps1 with a separate parameter set introduces a window where the SSA is configured but crawl throttling is not yet applied. If someone triggers a crawl manually or a scheduled crawl fires, you are crawling without impact limits.

Deploy-SPSE-Search-Optimized.ps1 eliminates that gap. It chains initialization, crawl impact configuration, and crawl rules into a single idempotent deployment run.

The execution sequence:

  1. Accepts a unified configuration block — server names, SQL instance, crawl impact settings, and crawl rule definitions
  2. Initializes the SSA and crawl databases via Initialize-SPSearchArchitecture.ps1 logic
  3. Immediately applies crawl impact settings before any crawl can start
  4. Applies crawl exclusion rules in the same pass
  5. Validates the topology using Get-SPEnterpriseSearchTopology and outputs a summary — components created, crawl databases provisioned, impact settings applied, topology status
.\Deploy-SPSE-Search-Optimized.ps1 `
-SearchServer "SPSE-APP01" `
-SqlServer "SQL-PROD" `
-ApplyCrawlThrottle `
-ApplyCrawlRules `
-Verbose

This is the script to run during the migration window itself. One call, validated output, no manual sequencing. The -Verbose flag gives you a log of every step for post-migration audit and documentation.


Why You Must Throttle the First Crawl

If you have never watched an unthrottled first crawl on a large farm, the pattern is predictable: crawl starts at 9 AM with default settings, SQL I/O climbs steadily, by 11 AM the crawl database server is at 80–95% I/O utilization, and performance alerts start arriving. The web front ends slow down as the crawl component floods the content databases with requests. Users notice. Tickets open.

This is a preventable problem. The crawl component, by default, requests content as fast as the farm responds — no natural rate limit. On a freshly migrated farm that is already under load from users logging in for the first time, that default behaviour is a denial of service against your own infrastructure.

Set-SPSearchCrawlImpact.ps1 configures four throttling levers:

Hit rate rules — maximum requests per second from the crawl component to a given content source. Setting this to 1–4 req/s during go-live prevents the crawl from saturating web application response queues.

Download threshold — maximum document size the crawler will fetch. Without a limit, the crawl component will attempt to download 500 MB video files in the document library, stalling on large transfers and blocking the crawl queue. A 64 MB threshold filters out bulk binary content without excluding office documents.

Socket timeout — how long the crawl waits for a response before marking a URL as inaccessible. Lowering this on go-live day prevents stalled connections from blocking the crawl for minutes at a time.

Retry delay — how long the crawler waits before retrying a failed URL. A sensible retry delay prevents the crawl from hammering an intermittently slow endpoint.

.\Set-SPSearchCrawlImpact.ps1 `
-MaxRequestsPerSecond 4 `
-DownloadTimeout 60 `
-SocketTimeout 30

The recommended two-phase approach:

Phase 1 — Go-live week: Throttled crawl at 1–4 req/s, background crawl priority. The crawl runs slowly, search results appear progressively, and farm performance stays stable. Communicate to stakeholders that search quality improves over the first several days.

Phase 2 — Post-stabilization: Once the initial full crawl completes and the farm is stable, run Set-SPSearchCrawlImpact.ps1 again with relaxed parameters. Incremental crawls are much lighter than full crawls, so the daily operational impact drops significantly after Phase 1.


Crawl Rules: What to Exclude and How Set-SPSearchCrawlRules.ps1 Manages It

The default “Local SharePoint Sites” content source attempts to crawl everything it can reach — including sites you do not want in end-user search results. Without explicit exclusion rules, Central Administration pages, transitional read-only sites from the migration, old host headers, and app catalog infrastructure all show up in the crawl queue.

Get crawl rules right before the first crawl starts. Fixing them mid-crawl means the crawler has already invested time fetching content you do not want, and those items stay in the index until the next full crawl recycles them.

Common exclusions after a SPSE migration:

Rule typeExample pathWhy exclude
Excludehttp://centraladmin:port/*Central Admin generates noise
Excludehttp://old-url.contoso.com/*Old host headers no longer valid
Exclude*/layouts/15/*System pages with no content value
Exclude*/SiteAssets/Forms/*Form files, not searchable content
Include overridehttp://newurl.contoso.com/*Force-include migrated host sites

Set-SPSearchCrawlRules.ps1 accepts a list of URL patterns and rule types and applies them to the SSA in bulk. It also accepts input from a CSV file for farms with many host-header site collections — useful when you have 30 or 40 exclusion patterns coming out of the migration inventory.

.\Set-SPSearchCrawlRules.ps1 `
-MappingFile ".\content-source-mapping.md" `
-Verbose

The patterns use standard SharePoint crawl rule wildcard syntax. If you are not sure which URLs to exclude, check the crawl log after the first partial crawl run — the noise quickly becomes visible as clusters of errors or irrelevant items from specific URL prefixes.


Re-mapping Site Collections to Content Sources with Sync-SPSearchContentMapping.ps1

In SP2019, it was common to create separate content sources for different business units, departments, or site collection groups — HR on one content source with a daily crawl, Finance on another with a two-hour incremental, Archive on a background-only schedule. That structure provided scheduling isolation, crawl performance control, and the ability to scope result sources per content source.

After a database migration, those site collections may have moved. New web application URLs, reorganized content databases, renamed managed paths — any of these changes means the original content source URLs no longer match where those sites actually live on the new farm. The content sources need to be rebuilt to reflect post-migration reality.

Sync-SPSearchContentMapping.ps1 reads a Markdown mapping file that documents the post-migration site-to-content-source assignments and applies them programmatically. The mapping file format is a simple table:

| Site URL | Content Source |
|---|---|
| https://intranet.contoso.com | SPSE-Intranet |
| https://projects.contoso.com | SPSE-Projects |
| https://legacy.contoso.com | SPSE-Archive |

The script reads the table, validates that each content source exists (or creates it if it does not), and adds the site URL to the correct content source in the SSA. Sites that cannot be resolved generate warnings rather than errors, so partial runs do not abort on a single missing site collection.

The Markdown format is intentional. If you followed earlier posts in this series, your database migration mapping was already documented in a Markdown table — the same format used for Sync-SPDatabaseMigration.ps1 in Post #7. The content source mapping can be an extension of that same document, adding a “Content Source” column to the existing site/database inventory. No translation step. The same Git-committed document that drove the database migration drives the search configuration.

.\Sync-SPSearchContentMapping.ps1 `
-MappingFile ".\content-source-mapping.md" `
-Verbose

A concrete example of why this matters: the HR intranet moved from http://sp2019.contoso.com/sites/hr (a managed path site on the old farm) to https://intranet.contoso.com (a dedicated web application on the new farm). The old content source pointed at the managed path. After migration, that URL returns a redirect or a 404. HR staff search for their own content and find nothing. Sync-SPSearchContentMapping.ps1 fixes this in a single run once the mapping file reflects the new URL structure.


Export Your SP2019 Search Configuration Before Decommission

Before the SP2019 farm goes dark, export its search configuration. Once the farm is decommissioned, this reference is gone.

Get-SPSearchConfigExport.ps1 produces a structured export containing:

  • All content sources — names, start addresses, crawl schedules, and authentication methods
  • All crawl rules — inclusions, exclusions, and URL patterns
  • Managed property mappings — property names, types, aliases, and crawled property mappings
  • Result sources and their query transforms
  • Search topology summary — component placement and index partition count

The primary use case is validation after the SPSE setup is complete. Run the export on SP2019 before cutover, commit it to Git alongside the migration artifacts, then re-run on SPSE after search is configured. Diff the two outputs to identify anything that was not recreated — missing managed properties, dropped result sources, or content sources that were renamed inconsistently.

Managed property mappings are the easiest thing to lose silently. If a business-critical result source or a search-driven web part depended on a custom managed property that was not recreated in SPSE, it stops working without error. The search results just do not return that content type. The export makes the gap visible before users report it.

.\Get-SPSearchConfigExport.ps1 `
-OutputPath "C:\Migration\SearchConfig-$(Get-Date -Format yyyyMMdd).xml"

Run it on SP2019 before cutover. Run it again on SPSE after setup. The date-stamped filename makes the before/after comparison straightforward.


Crawl Timeline Estimates and How to Monitor Progress

Most administrators underestimate the initial crawl duration. The common assumption is “it should be done overnight.” For anything over 2 million items with appropriate throttling applied, that assumption is usually wrong by a factor of two or three.

Here are realistic estimates based on item volume and throttle settings:

Content volumeThrottled crawl speedEstimated first crawl durationNotes
500K items~2K items/min4–6 hoursTypical small intranet
2M items~2K items/min16–20 hoursMedium farm, overnight recommended
5M items~3K items/min24–36 hoursStart Friday evening
10M items~3K items/min55–70 hoursPlan a long weekend
25M items~4K items/min100+ hoursStaged crawl approach recommended
50M+ items~4–5K items/min200+ hoursMultiple index partitions, staggered start

These are estimates. Actual crawl duration depends on average document size (a farm heavy with large PDF and Office files crawls slower than one with small HTML pages), SQL I/O throughput on the crawl database server, SharePoint response time under concurrent user load, and network latency between the crawl component and content sources. Use these numbers to set stakeholder expectations — not to guarantee a completion time.

Set the expectation with business stakeholders before go-live: search results will appear progressively over the first several days. The most recently modified content crawls first under incremental scheduling, so active sites gain search visibility before archive content does. This is the right framing for users who notice gaps in search results during the first week.

Monitoring the Crawl

Three monitoring approaches for different levels of detail:

Central Administration → Search Service Application → Crawl Log
The Crawl Log shows items crawled, items with warnings, and items with errors — broken down by content source. Check it at 24-hour intervals during the initial crawl. An error rate above 5% of total items early in the crawl usually indicates a content source authentication problem or an unreachable URL, not a crawl performance issue. Investigate before the crawl proceeds further.

PowerShell — real-time crawl status

Get-SPEnterpriseSearchCrawlContentSource -SearchApplication "Search Service Application" |
Select-Object Name, CrawlState, CrawlCompleted, SuccessCount, ErrorCount

This gives you the current crawl state (Crawling, Idle, or Paused) and running item counts per content source without navigating to Central Admin.

SQL — direct crawl database query
For large farms where Central Admin becomes slow to render the Crawl Log (a common problem above 10 million items), query the MSSCrawlURL table in the crawl database directly:

SELECT CrawlState, COUNT(*) AS ItemCount
FROM MSSCrawlURL
GROUP BY CrawlState
ORDER BY ItemCount DESC

This gives you a real-time item count by crawl state — pending, succeeded, failed, or excluded — without depending on the Central Admin UI. It is a production monitoring technique that the crawl log UI cannot match at scale.

Check crawl progress at the 24-hour and 48-hour marks after the initial full crawl starts. If the error count is growing faster than the success count, stop and diagnose before continuing.


Set Up SPSE Search Right the First Time — No Trial and Error

The scripts in this post are packaged as-is for production use — not demos or templates. Each script includes parameter validation, idempotency checks, and verbose logging so every run is auditable. If you are running this search setup once on a single farm, you will save hours of manual PowerShell work.

Get the Search Setup Bundle (Initialize-SPSearchArchitecture.ps1, Deploy-SPSE-Search-Optimized.ps1, Set-SPSearchCrawlImpact.ps1, Set-SPSearchCrawlRules.ps1, Sync-SPSearchContentMapping.ps1, Optimize-SPSearchInfrastructure.ps1, Get-SPSearchConfigExport.ps1) — every script parameterized, commented, and ready for enterprise use.

Contact sudharsan_1985@live.in to get the scripts.


What Comes Next

Post #11 covers post-migration validation — confirming that content databases, service applications, user profiles, and search are all functioning correctly before the migration is formally closed out. The search crawl you started with this post will still be running during those first validation passes. That is expected. The validation scripts in Post #11 will tell you where it stands.


Part 10 of 12 — SharePoint 2019 to Subscription Edition Migration series.

Leave a Reply