Introduction
Once you’ve completed the setup for Azure AI Search — creating the service, defining your data source, indexing, and running basic queries — the next step is designing the search index. The index is the schema layer that defines what you can search, filter, sort, facet, and retrieve. A well-designed index leads to better search relevance, faster performance, efficient storage, and happier users.
In this guide, you will learn:
- What an index is in Azure AI Search
- Field types, attributes, and how they affect behavior & performance
- How to model complex data (nested objects, arrays)
- How to define analyzers, normalizers, and synonyms
- How to create or update indexes via REST API or portal
- Best practices & common pitfalls
What Is an Index? Key Concepts
According to Microsoft, an index in Azure AI Search is a structure stored in your search service and populated by JSON documents.
- The fields collection in the index defines the structure of those documents: each field has a name, a data type, and attributes that control how it’s used.
- Documents are analogous to rows in a relational table, but with more flexibility: you can have nested or complex fields (objects/arrays).
Field Types & Attributes
Data Types (Edm.* and Collections)
Microsoft supports a number of primitive types and complex types:
| Type | Description |
|---|---|
Edm.String | Text content; typical for titles, descriptions, tags. |
Collection(Edm.String) | An array of strings (e.g., list of tags or keywords). |
Numeric types (Edm.Int32, Edm.Int64, Edm.Double) | Used for numeric data that you might want to filter, sort, or facet by. |
Edm.DateTimeOffset | Dates, times; useful for “created_at”, “modified_at”, etc. |
Edm.Boolean | True/false fields; e.g. “isPublished”, “hasImage”. |
Edm.GeographyPoint | Geospatial point; used for location‐based search. |
Complex Types (Edm.ComplexType & Collection(Edm.ComplexType)) | Nested objects: e.g. an Address object with subfields, or an array of objects (e.g. products with multiple offers). |
Field Attributes
These are metadata flags on fields that determine how they behave in search / query API. Important ones:
| Attribute | What It Means |
|---|---|
| searchable | Whether the field is full-text searchable (i.e. tokenized, analyzed). Only fields of type Edm.String or Collection(Edm.String) can be “searchable”. |
| filterable | Allows use in $filter expressions. Note: for string fields, filterable implies exact matching (no tokenization) and is case sensitive unless a normalizer is applied. |
| sortable | Allows you to sort results on that field via $orderby. Only single-valued simple fields can be sortable (i.e. not arrays/collections). |
| facetable | Enables faceted navigation (counts by category, etc.). Arrays and numeric or string fields are often facetable, but not vector or geography fields. |
| retrievable | Whether the value of the field is returned in search results. Useful for hiding internal fields (e.g. IDs or flags) while still using them for filtering or scoring. The key field must be retrievable. |
| key | Unique identifier for each document in the index. Must be Edm.String type. Exactly one field must be marked key = true. |
Other Index Settings
- Analyzers: how text is broken up (“tokenized”), lowercased, accents removed, etc. Default is
standard.lucene. You can use language analyzers (English, French, etc.), or custom ones. - Synonym maps: rules for mapping terms (“color” ↔ “colour”) at query time. Only on searchable fields.
- Normalizers: for fields used in filtering/faceting/sorting. Normalizers do light transformations like making lowercase, removing diacritics, mapping special characters. These apply to string/collection string fields that are filterable/sortable/facetable.
Modeling Complex Types
When your data isn’t just flat (one record = simple fields) but has nested or repeating structures, you use complex types:
- A complex field (
Edm.ComplexType) is an object with subfields (child fields). - A collection of complex types (
Collection(Edm.ComplexType)) is an array of such objects. For example: aBookdocument may have a fieldAuthorswhich is a collection of objects, each withName,Nationality, etc.
Key considerations:
- Complex sub-fields can have the same attributes (searchable, filterable etc.) depending on type (simple or complex).
- Nested complex types help modelling hierarchical data without flattening everything.
Creating an Index: Portal, REST API & SDK
Via Azure Portal
- Use the “Import data wizard” if your data is already somewhere Azure can connect to (e.g. Blob Storage, SQL, Cosmos). The wizard will try to infer schema, including complex types.
- In the portal, you can also manually define an index: add fields, set data types & attributes, analyzers, etc.
Via REST API
- Use
Create or Update Indexendpoint (/indexes?api-version=X) with JSON definitions that include:nameof the indexfields: list of field definitions with name, type, and attributes (searchable, filterable, sortable, facetable, analyzer / normalizer, etc.).- Optionally
scoringProfiles,defaultScoringProfile,synonymMaps, etc.
- Once created, many field attributes cannot be changed. For example, making a field filterable/ facetable/ sortable after initial creation isn’t supported in many cases — in such cases you often need to rebuild the index.
Examples: Index Schema Samples
Here are a few example index schemas (simplified) to illustrate how field definitions work:
{
"name": "documents-index",
"fields": [
{
"name": "docId",
"type": "Edm.String",
"key": true,
"searchable": false,
"filterable": true,
"sortable": true,
"facetable": false,
"retrievable": true
},
{
"name": "title",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"sortable": true,
"facetable": false,
"retrievable": true,
"analyzer": "en.lucene"
},
{
"name": "content",
"type": "Edm.String",
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"retrievable": true,
"analyzer": "standard.lucene"
},
{
"name": "tags",
"type": "Collection(Edm.String)",
"searchable": true,
"filterable": true,
"sortable": false,
"facetable": true,
"retrievable": true
},
{
"name": "publishDate",
"type": "Edm.DateTimeOffset",
"searchable": false,
"filterable": true,
"sortable": true,
"facetable": true,
"retrievable": true
},
{
"name": "ratings",
"type": "Edm.Double",
"searchable": false,
"filterable": true,
"sortable": true,
"facetable": true,
"retrievable": true
}
]
}
How Full-Text Search vs Filtering vs Faceting vs Sorting Works
- Full-Text / Searchable: Users can enter free text. Tokenization and analyzers are applied. E.g. “azure setup tutorial” matches fields marked
searchable. - Filtering: Exact‐match comparisons.
$filter=publishDate ge 2025-01-01 and ratings gt 4etc. Works only if fields are markedfilterable. Strings in filters are case‐sensitive unless a normalizer is used. - Faceting: Allows showing counts or grouping by values in a field. Requires
facetable. Useful for UIs (e.g. “by tag”, “by category”, etc.). - Sorting:
$orderby=publishDate descetc. Must havesortableon the field and be supported data type (not a collection in many cases).
Analyzers, Normalizers, and Synonyms
- Analyzers: Choose appropriate analyzers depending on language and content. Built-in analyzers include standard Lucene ones and language-specific ones. Once set for a field, the analyzer cannot be changed later.
- Normalizers: Used for fields in filter/facet/sort contexts. They do light text transformations (lowercasing, removing diacritics, mapping special characters) to ensure consistency. Useful when you need case-insensitive filtering.
- Synonym Maps: You can define synonym maps and assign them to searchable fields. Useful in free text scenarios to equate similar words.
Index Lifecycle: Updates, Rebuilds & Aliases
- Once an index is created, many field attribute settings are immutable (e.g. whether a field is filterable, sortable, facetable). To change those, often you need to rebuild or create a new index.
- For schema changes in production, a recommended pattern is:
- Build a new index schema with desired changes
- Use an index alias so that your app can switch from the old index to the new one with minimal downtime.
Best Practices & Performance Considerations
- Only mark fields as searchable/filterable/sortable/facetable if needed — unnecessary attributes increase index size and slow performance.
- Use normalizers on filterable/sortable/facetable string fields to avoid casing or accent mismatches. Without them, filtering/faceting may treat “USA”, “usa”, “Usa” differently.
- Keep field values small when fields are used with filterable/facetable or sortable on strings: fields up to 32 kilobytes for those attributes. Larger text content should be kept only as searchable/retrievable without filterable/sortable.
- Use complex types to more accurately represent nested data rather than flattening everything. Helps maintain schema clarity.
Common Pitfalls & How to Avoid Them
| Pitfall | Why It Happens | How to Avoid |
|---|---|---|
| Field missing from queries | Field not marked searchable or filterable/etc. | Always plan ahead which fields will be used in what query types. |
| Can’t change field attributes later | Attributes like filterable/sortable often immutable post-creation. | Build schema carefully; use aliases & versioning to roll out schema changes. |
| Filters not matching expected text | Case sensitivity, special characters, tokenization issues. | Use normalizers; test filter values; standardize content. |
| Facets counting too many distinct values / ambiguous buckets | High cardinality in string fields; inconsistent values (typos, casing). | Use low-cardinality fields for facets; normalize values; clean data. |
Summary & What Comes Next
Designing your index carefully ensures you unlock the full power of Azure AI Search: good relevance, good performance, maintainability. Key takeaways:
- Choose the right data types and set attributes that align with query patterns.
- Use analyzers, normalizers, and synonyms wisely.
- Model data with complex types when needed.
- Plan for schema changes via versioning or aliases.
What next in the series: After index design, the logical next article should be “Enriching Indexes with AI Skills: OCR, Entity Recognition, Semantic Search & Vector Search”. That will build on this schema to add meaningful search-improvement features.
https://sudharsank.github.io/azure-ai-search-handbook
Happy Indexing…