- [index modules](#index-modules)
  - [index module introduce](#index-module-introduce)
    - [索引设置](#索引设置)
    - [static index settings](#static-index-settings)
      - [`index.number_of_shards`](#indexnumber_of_shards)
  - [Analysis](#analysis)
  - [index shard allocation](#index-shard-allocation)
  - [index blocks](#index-blocks)
    - [index block settings](#index-block-settings)
      - [`index.blocks.read_only`](#indexblocksread_only)
      - [`index.blocks.read_only_allow_delete`](#indexblocksread_only_allow_delete)
      - [`index.blocks.read`](#indexblocksread)
      - [`index.blocks.write`](#indexblockswrite)
      - [`index.blocks.metadata`](#indexblocksmetadata)
    - [增加index block示例](#增加index-block示例)
      - [path param](#path-param)
      - [query param](#query-param)
      - [示例](#示例)
  - [Similarity module](#similarity-module)
    - [配置similarity](#配置similarity)
  - [slow query](#slow-query)
    - [Search Slow log](#search-slow-log)
    - [identify search log origin](#identify-search-log-origin)
    - [index slow log](#index-slow-log)
  - [Store](#store)
    - [File System Storage types](#file-system-storage-types)
    - [preloading data into the file system cache](#preloading-data-into-the-file-system-cache)


# index modules
## index module introduce
index module用于对索引进行创建，并且控制和索引相关的各个方面。

### 索引设置
索引级别的设置可以针对每个索引来进行设置，设置分类如下：
- static：static设置只能在如下时机应用于索引
  - 索引创建时
  - 索引处于closed状态
  - 使用`update-index-settings` api，并且带有`reopen=true`的查询参数，带有该参数时，会关闭受影响索引，更新后将受影响索引重新开启
- dynamic：dynamic索引可以在索引处于活跃状态时，通过`udpate-index-settings` api进行修改

> #### closed index
> 当一个索引处于`closed`状态时，所有针对该索引的read/write操作都会被阻塞，所有可以用于`opened`状态索引的操作，其执行对于`closed`状态的索引来说都不允许。
>
> 对于`closed`状态的索引，既无法向其中新增文档，也无法在索引中检索文档。
>
> 处于closed状态的索引能够通过`open index api`来重新开启。

### static index settings
#### `index.number_of_shards`
索引拥有的primary shards数量，默认为1，该配置项只能在索引创建时被设置。`即使索引处于closed状态，该配置项也无法修改。`

> `index.number_of_shards`最大为1024

## Analysis
index analysis module充当了可配置的analyzer注册中心，analyzer可用于将`string`类型字段转换为独立的terms，这将用于：
- 将文档string field的字段转化为terms，并且将terms添加到倒排索引中，令文档可以被搜索
- analyzer被用于高级查询，例如`match`查询，将用户输入的查询字符串分割为search terms用于查询

## index shard allocation
index shard allocation提供了可针对单个索引的设置，用于控制node中shard的分配：
- shard allocating filtering：控制shard被分配给哪个node
- delayed allocation：`节点离开导致未分配shard`的延迟分配
- total shards per node：相同索引在同一node中，shards数量的上限
- data tier allocation：控制对data tier分配的索引

## index blocks
index blocks限制了针对特定索引的操作类型。操作阻塞的类型可以分为：
- 读操作阻塞
- 写操作阻塞
- 元数据操作阻塞

针对索引操作的阻塞可以通过`dynamic index setting`来进行新增和移除。并且，阻塞也可以通过特定的api来进行添加和移除。

针对wrtie blocks设置的修改，一旦修改成功，那么所有的在途写操作都已经完成。

### index block settings
#### `index.blocks.read_only`
如果该配置项设置为true，该索引和索引的元数据都是只读的，当设置为`false`时，允许写操作和元数据变更。

#### `index.blocks.read_only_allow_delete`
类似于`index.blocks.write`，但是可以对index进行删除。不要针对`index.blocks.read_only_allow_delete`进行手动设置或移除。`disk-based shard allocator`会根据磁盘剩余空间自动添加或移除该配置项。

从索引中删除文档释放资源（而不是删除索引本身）会暂时增加索引的大小，故而在node的磁盘空间不足时可能无法实现。当`index.blocks.read_only_allow_delete`被设置为true时，`并不允许删除索引中的文档`。但是，删除索引本身的操作只需要极少量的额外磁盘空间，并且几乎可以立即删除索引所占用的空间，故而删除索引本身的操作仍然被允许。

> elastic search在磁盘占用高于`flood stage watermark`时，会自动为索引增加` read-only-allow-delete`阻塞；当磁盘占用率跌倒`high watermark`之下时，则是会自动释放该阻塞

#### `index.blocks.read`
如果设置为true，会阻塞针对index的读操作

#### `index.blocks.write`
如果设置为true，会阻塞针对索引的写操作，和`index.blocks.read_only`不同，本设置项并不影响metadata。

例如，为索引设置write block后，仍然可以对metadata进行变更，但是设置了`index.blocks.read_only`后，无法对元数据进行变更

#### `index.blocks.metadata`
如果设置为true，会禁止对元数据的读取和变更

### 增加index block示例
```bash
# PUT /<index>/_block/<block>
PUT /my-index-000001/_block/write
```
#### path param
- `<index>`: 由`,`分隔的列表或通配符表达式，代表该请求的索引名称
  - 默认情况下，`<index>`部分需要指定索引的精确名称。如果想要使用`_all, *`等通配表达式，需要将`action.destructive_requires_name`属性设置为`false`。
- `<block>`: 向索引应用的阻塞类型
  - &lt;block&gt;部分可选的值为`metadata, read, read_only, write`

#### query param
- `allow_no_indices`：
  - 如果该参数设置为false，那么当索引项中任一`wildcard expression, idnex alias或_all`值没有匹配的索引或只能匹配到closed状态的索引，那么该请求会返回异常。
  - 例如`foo*,bar*`，`foo*`表达式匹配到索引，但是`bar*`没有相匹配的索引，那么会抛出异常。
  - 该参数认值为`true`
- `expand_wildcards`:
  - wildcard pattern能够匹配到的索引类型。如果请求能够匹配到data stream，那么该参数能够决定wildcard pattern能够匹配到hidden data stream
  - 该参数的值支持`,`分隔，有效的值如下：
    - `all`：匹配任何data stream或index，包括hidden的
    - `open`：匹配`open, non-hidden`状态的索引和`non-hidden`状态的data stream
    - `closed`：匹配`closed, non-hidden`状态的索引和`non-hidden`状态的data stream 
    - `hidden`：匹配`hidden`状态的索引和`hidden`状态的data stream。`hidden`必须和`open, closed`中任一组合使用，也能和两者一起使用`open, closed, hidden`
    - `none`：不接受wildcard pattern 
    - 该参数默认值为`open`
  - `ignore_unavailable`: 如果参数设置为false，若未匹配到索引或匹配到closed状态的索引，返回异常
    - 该参数默认值为`false`
  - `master_timeout`：等待master node的最大时间，默认为`30s`，如果超过该限制master node仍然不可访问，那么该请求会返回异常
  - `timeout`：在更新完metadata后，等待cluster中所有节点返回的时间限制，默认为`30s`。如果超时后仍未能接受到返回，那么针对cluster metadata的修改仍然会被应用，但是在返回中会指定并非接受到了所有的ack
  
#### 示例
添加write block的示例如下所示：
```
PUT /my-index-000001/_block/write
```
返回结果如下：
```
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "indices" : [ {
    "name" : "my-index-000001",
    "blocked" : true
  } ]
}
```
## Similarity module
similarity moudle（scoring/ranking model）定义了如何对匹配到的document进行打分。similaity是针对单个字段的，这意味着可以通过mapping为每个字段都定义不同的mapping。

similarity仅适用于text类型和keyword类型的字段。

### 配置similarity
大多similarity都可以通过如下方式进行配置：
```
PUT /index
{
  "settings": {
    "index": {
      "similarity": {
        "my_similarity": {
          "type": "DFR",
          "basic_model": "g",
          "after_effect": "l",
          "normalization": "h2",
          "normalization.h2.c": "3.0"
        }
      }
    }
  }
}
```
上述示例中，配置了DFR similarity，故而，在mapping中，即可通过`my_similarity`来进行引用，示例如下所示：
```
PUT /index/_mapping
{
  "properties" : {
    "title" : { "type" : "text", "similarity" : "my_similarity" }
  }
}
```
## slow query
### Search Slow log
shard level slow search log允许将slow query记录到特定的日志文件中。

对于threshold，可以对query阶段和fetch阶段分别进行配置，示例如下所示：
```
index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms

index.search.slowlog.threshold.fetch.warn: 1s
index.search.slowlog.threshold.fetch.info: 800ms
index.search.slowlog.threshold.fetch.debug: 500ms
index.search.slowlog.threshold.fetch.trace: 200ms
```
上述所有的配置都是`dynamic`的，并且可以针对每个index单独进行设置，示例如下所示：
```
PUT /my-index-000001/_settings
{
  "index.search.slowlog.threshold.query.warn": "10s",
  "index.search.slowlog.threshold.query.info": "5s",
  "index.search.slowlog.threshold.query.debug": "2s",
  "index.search.slowlog.threshold.query.trace": "500ms",
  "index.search.slowlog.threshold.fetch.warn": "1s",
  "index.search.slowlog.threshold.fetch.info": "800ms",
  "index.search.slowlog.threshold.fetch.debug": "500ms",
  "index.search.slowlog.threshold.fetch.trace": "200ms"
}
```

默认情况下，threshold为`-1`，代表threshold被停用。

该日志针对的是shard的范围。

search slow log file在`log4j2.properties`文件中进行配置。

### identify search log origin
通过将`index.search.slowlog.include.user`配置项设置为true，可以在slow log中输出`触发该slow query的用户信息`，示例如下：
```
PUT /my-index-000001/_settings
{
  "index.search.slowlog.include.user": true
}
```
上述设置将导致用户信息将会被包含在slow log中：
```json
{
  "@timestamp": "2024-02-21T12:42:37.255Z",
  "log.level": "WARN",
  "auth.type": "REALM",
  "elasticsearch.slowlog.id": "tomcat-123",
  "elasticsearch.slowlog.message": "[index6][0]",
  "elasticsearch.slowlog.search_type": "QUERY_THEN_FETCH",
  "elasticsearch.slowlog.source": "{\"query\":{\"match_all\":{\"boost\":1.0}}}",
  "elasticsearch.slowlog.stats": "[]",
  "elasticsearch.slowlog.took": "747.3micros",
  "elasticsearch.slowlog.took_millis": 0,
  "elasticsearch.slowlog.total_hits": "1 hits",
  "elasticsearch.slowlog.total_shards": 1,
  "user.name": "elastic",
  "user.realm": "reserved",
  "ecs.version": "1.2.0",
  "service.name": "ES_ECS",
  "event.dataset": "elasticsearch.index_search_slowlog",
  "process.thread.name": "elasticsearch[runTask-0][search][T#5]",
  "log.logger": "index.search.slowlog.query",
  "elasticsearch.cluster.uuid": "Ui23kfF1SHKJwu_hI1iPPQ",
  "elasticsearch.node.id": "JK-jn-XpQ3OsDUsq5ZtfGg",
  "elasticsearch.node.name": "node-0",
  "elasticsearch.cluster.name": "distribution_run"
}
```
### index slow log
index slow log和search slow log类似，其log file名称以`_index_indexing_slowlog.json`结尾。index slow log的配置如下所示：
```
index.indexing.slowlog.threshold.index.warn: 10s
index.indexing.slowlog.threshold.index.info: 5s
index.indexing.slowlog.threshold.index.debug: 2s
index.indexing.slowlog.threshold.index.trace: 500ms
index.indexing.slowlog.source: 1000
```
index slow log的配置也是dynamic的，可以通过如下示例来进行配置：
```
PUT /my-index-000001/_settings
{
  "index.indexing.slowlog.threshold.index.warn": "10s",
  "index.indexing.slowlog.threshold.index.info": "5s",
  "index.indexing.slowlog.threshold.index.debug": "2s",
  "index.indexing.slowlog.threshold.index.trace": "500ms",
  "index.indexing.slowlog.source": "1000"
}
```
如果想要在日志中包含触发该slow index请求的用户，可以通过如下方式进行请求：
```
PUT /my-index-000001/_settings
{
  "index.indexing.slowlog.include.user": true
}
```
默认情况下，elasticsearch会打印slow log中头1000个字符。可以通过`index.indexing.slowlog.source`来修改该配置。
- 如果将`indexing.slowlog.source`设置为false或0，将会跳过对`source`的输出
- 如果将`indexing.slowlog.source`设置为true，将会输出所有`source`的内容

## Store
Store module控制如何对磁盘上的index data进行存储和访问。
### File System Storage types
对于存储类型，存在许多不同的实现。默认情况下，elasticsearch会基于操作系统选择最佳实现。

可以对所有index都显式的设置存储类型，需要修改`config/elasticsearch.yml`：
```
index.store.type: hybridfs
```
`index.store.type`为static设置，在索引创建时，可以针对每个索引进行单独设置：
```
PUT /my-index-000001
{
  "settings": {
    "index.store.type": "hybridfs"
  }
}
```
如下列举了受支持的storage types：
- `fs`：默认file system实现，该设置会基于操作系统选择最佳的文件系统类型，目前在所有支持`hybridfs`的系统中都会选择hybirdfs，但是后续可能会改变。
- `simplefs`：`7.15`中已经被废弃
- `niofs`：`NIO FS`使用nio将shard index存储在文件系统中。其允许多个线程同时对一个文件进行读取。该文件系统不推荐在windows下使用。
- `mmapfs`：`MMAPFS`将shard index存储在文件系统中，并且会将文件映射到内存中。内存映射将会使用一部分虚拟内存空间地址，使用大小和文件大小相同，请确保有足够多的虚拟内存空间被分配。
- `hybirdfs`：`hybirdfs`是`niofs`和`mmapfs`的混合体，对于每种读取访问类型都会选择最佳的文件系统。

### preloading data into the file system cache
默认情况下，elasticsearch完全依赖操作系统文件系统的io操作缓存。可以通过设置`index.store.preload`来告知操作系统在打开索引时，将`hot index file`文件中的内容加载到内存中。

`index.store.preload`接受一个由`,`分隔的拓展名列表，所有后续名包含在列表中的文件都会被预加载到内存中。这将在操作系统重启、系统内存缓存丢失时极大改善性能。但是，这将会降低索引打开的速度，只有当`index.store.preload`中指定的内容加载到内存中时，索引才能够被访问。

该设置只会尽力而为，可能并不会起作用，取决于store type和操作系统。

`index.store.preload`是一个static设置，可以在`config/elasticsearch.yml`中设置：
```
index.store.preload: ["nvd", "dvd"]
```
在索引创建时，该配置项同样也可以设置:
```
PUT /my-index-000001
{
  "settings": {
    "index.store.preload": ["nvd", "dvd"]
  }
}
```
该属性的默认值为`[]`，代表不会预加载任何内容。对于主动被搜索的