Designing Indexes in Azure AI Search: Comprehensive Guide

Introduction

Once you’ve completed the setup for Azure AI Search — creating the service, defining your data source, indexing, and running basic queries — the next step is designing the search index. The index is the schema layer that defines what you can search, filter, sort, facet, and retrieve. A well-designed index leads to better search relevance, faster performance, efficient storage, and happier users.

In this guide, you will learn:

  • What an index is in Azure AI Search
  • Field types, attributes, and how they affect behavior & performance
  • How to model complex data (nested objects, arrays)
  • How to define analyzers, normalizers, and synonyms
  • How to create or update indexes via REST API or portal
  • Best practices & common pitfalls

What Is an Index? Key Concepts

According to Microsoft, an index in Azure AI Search is a structure stored in your search service and populated by JSON documents.

  • The fields collection in the index defines the structure of those documents: each field has a name, a data type, and attributes that control how it’s used.
  • Documents are analogous to rows in a relational table, but with more flexibility: you can have nested or complex fields (objects/arrays).

Field Types & Attributes

Data Types (Edm.* and Collections)

Microsoft supports a number of primitive types and complex types:

TypeDescription
Edm.StringText content; typical for titles, descriptions, tags.
Collection(Edm.String)An array of strings (e.g., list of tags or keywords).
Numeric types (Edm.Int32, Edm.Int64, Edm.Double)Used for numeric data that you might want to filter, sort, or facet by.
Edm.DateTimeOffsetDates, times; useful for “created_at”, “modified_at”, etc.
Edm.BooleanTrue/false fields; e.g. “isPublished”, “hasImage”.
Edm.GeographyPointGeospatial point; used for location‐based search.
Complex Types (Edm.ComplexType & Collection(Edm.ComplexType))Nested objects: e.g. an Address object with subfields, or an array of objects (e.g. products with multiple offers).

Field Attributes

These are metadata flags on fields that determine how they behave in search / query API. Important ones:

AttributeWhat It Means
searchableWhether the field is full-text searchable (i.e. tokenized, analyzed). Only fields of type Edm.String or Collection(Edm.String) can be “searchable”.
filterableAllows use in $filter expressions. Note: for string fields, filterable implies exact matching (no tokenization) and is case sensitive unless a normalizer is applied.
sortableAllows you to sort results on that field via $orderby. Only single-valued simple fields can be sortable (i.e. not arrays/collections).
facetableEnables faceted navigation (counts by category, etc.). Arrays and numeric or string fields are often facetable, but not vector or geography fields.
retrievableWhether the value of the field is returned in search results. Useful for hiding internal fields (e.g. IDs or flags) while still using them for filtering or scoring. The key field must be retrievable.
keyUnique identifier for each document in the index. Must be Edm.String type. Exactly one field must be marked key = true.

Other Index Settings

  • Analyzers: how text is broken up (“tokenized”), lowercased, accents removed, etc. Default is standard.lucene. You can use language analyzers (English, French, etc.), or custom ones.
  • Synonym maps: rules for mapping terms (“color” ↔ “colour”) at query time. Only on searchable fields.
  • Normalizers: for fields used in filtering/faceting/sorting. Normalizers do light transformations like making lowercase, removing diacritics, mapping special characters. These apply to string/collection string fields that are filterable/sortable/facetable.

Modeling Complex Types

When your data isn’t just flat (one record = simple fields) but has nested or repeating structures, you use complex types:

  • A complex field (Edm.ComplexType) is an object with subfields (child fields).
  • A collection of complex types (Collection(Edm.ComplexType)) is an array of such objects. For example: a Book document may have a field Authors which is a collection of objects, each with Name, Nationality, etc.

Key considerations:

  • Complex sub-fields can have the same attributes (searchable, filterable etc.) depending on type (simple or complex).
  • Nested complex types help modelling hierarchical data without flattening everything.

Creating an Index: Portal, REST API & SDK

Via Azure Portal

  • Use the “Import data wizard” if your data is already somewhere Azure can connect to (e.g. Blob Storage, SQL, Cosmos). The wizard will try to infer schema, including complex types.
  • In the portal, you can also manually define an index: add fields, set data types & attributes, analyzers, etc.

Via REST API

  • Use Create or Update Index endpoint (/indexes?api-version=X) with JSON definitions that include:
    • name of the index
    • fields: list of field definitions with name, type, and attributes (searchable, filterable, sortable, facetable, analyzer / normalizer, etc.).
    • Optionally scoringProfiles, defaultScoringProfile, synonymMaps, etc.
  • Once created, many field attributes cannot be changed. For example, making a field filterable/ facetable/ sortable after initial creation isn’t supported in many cases — in such cases you often need to rebuild the index.

Examples: Index Schema Samples

Here are a few example index schemas (simplified) to illustrate how field definitions work:

{
  "name": "documents-index",
  "fields": [
    {
      "name": "docId",
      "type": "Edm.String",
      "key": true,
      "searchable": false,
      "filterable": true,
      "sortable": true,
      "facetable": false,
      "retrievable": true
    },
    {
      "name": "title",
      "type": "Edm.String",
      "searchable": true,
      "filterable": true,
      "sortable": true,
      "facetable": false,
      "retrievable": true,
      "analyzer": "en.lucene"
    },
    {
      "name": "content",
      "type": "Edm.String",
      "searchable": true,
      "filterable": false,
      "sortable": false,
      "facetable": false,
      "retrievable": true,
      "analyzer": "standard.lucene"
    },
    {
      "name": "tags",
      "type": "Collection(Edm.String)",
      "searchable": true,
      "filterable": true,
      "sortable": false,
      "facetable": true,
      "retrievable": true
    },
    {
      "name": "publishDate",
      "type": "Edm.DateTimeOffset",
      "searchable": false,
      "filterable": true,
      "sortable": true,
      "facetable": true,
      "retrievable": true
    },
    {
      "name": "ratings",
      "type": "Edm.Double",
      "searchable": false,
      "filterable": true,
      "sortable": true,
      "facetable": true,
      "retrievable": true
    }
  ]
}

How Full-Text Search vs Filtering vs Faceting vs Sorting Works

  • Full-Text / Searchable: Users can enter free text. Tokenization and analyzers are applied. E.g. “azure setup tutorial” matches fields marked searchable.
  • Filtering: Exact‐match comparisons. $filter=publishDate ge 2025-01-01 and ratings gt 4 etc. Works only if fields are marked filterable. Strings in filters are case‐sensitive unless a normalizer is used.
  • Faceting: Allows showing counts or grouping by values in a field. Requires facetable. Useful for UIs (e.g. “by tag”, “by category”, etc.).
  • Sorting: $orderby=publishDate desc etc. Must have sortable on the field and be supported data type (not a collection in many cases).

Analyzers, Normalizers, and Synonyms

  • Analyzers: Choose appropriate analyzers depending on language and content. Built-in analyzers include standard Lucene ones and language-specific ones. Once set for a field, the analyzer cannot be changed later.
  • Normalizers: Used for fields in filter/facet/sort contexts. They do light text transformations (lowercasing, removing diacritics, mapping special characters) to ensure consistency. Useful when you need case-insensitive filtering.
  • Synonym Maps: You can define synonym maps and assign them to searchable fields. Useful in free text scenarios to equate similar words.

Index Lifecycle: Updates, Rebuilds & Aliases

  • Once an index is created, many field attribute settings are immutable (e.g. whether a field is filterable, sortable, facetable). To change those, often you need to rebuild or create a new index.
  • For schema changes in production, a recommended pattern is:
    1. Build a new index schema with desired changes
    2. Use an index alias so that your app can switch from the old index to the new one with minimal downtime.

Best Practices & Performance Considerations

  • Only mark fields as searchable/filterable/sortable/facetable if needed — unnecessary attributes increase index size and slow performance.
  • Use normalizers on filterable/sortable/facetable string fields to avoid casing or accent mismatches. Without them, filtering/faceting may treat “USA”, “usa”, “Usa” differently.
  • Keep field values small when fields are used with filterable/facetable or sortable on strings: fields up to 32 kilobytes for those attributes. Larger text content should be kept only as searchable/retrievable without filterable/sortable.
  • Use complex types to more accurately represent nested data rather than flattening everything. Helps maintain schema clarity.

Common Pitfalls & How to Avoid Them

PitfallWhy It HappensHow to Avoid
Field missing from queriesField not marked searchable or filterable/etc.Always plan ahead which fields will be used in what query types.
Can’t change field attributes laterAttributes like filterable/sortable often immutable post-creation.Build schema carefully; use aliases & versioning to roll out schema changes.
Filters not matching expected textCase sensitivity, special characters, tokenization issues.Use normalizers; test filter values; standardize content.
Facets counting too many distinct values / ambiguous bucketsHigh cardinality in string fields; inconsistent values (typos, casing).Use low-cardinality fields for facets; normalize values; clean data.

Summary & What Comes Next

Designing your index carefully ensures you unlock the full power of Azure AI Search: good relevance, good performance, maintainability. Key takeaways:

  • Choose the right data types and set attributes that align with query patterns.
  • Use analyzers, normalizers, and synonyms wisely.
  • Model data with complex types when needed.
  • Plan for schema changes via versioning or aliases.

What next in the series: After index design, the logical next article should be “Enriching Indexes with AI Skills: OCR, Entity Recognition, Semantic Search & Vector Search”. That will build on this schema to add meaningful search-improvement features.

https://sudharsank.github.io/azure-ai-search-handbook

Leave a comment