博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Elasticsearch之批量操作bulk
阅读量:6695 次
发布时间:2019-06-25

本文共 4666 字,大约阅读时间需要 15 分钟。

1、bulk相当于数据库里的bash操作。

2、引入批量操作bulk,提高工作效率,你想啊,一批一批添加与一条一条添加,谁快?

3、bulk API可以帮助我们同时执行多个请求

4、bulk的格式:

action:index/create/update/delete

metadata:_index,_type,_id

request body:_source (删除操作不需要加request body)

                   { action: { metadata }}

                   { request body        }

5、bulk里为什么不支持get呢?

  答:批量操作,里面放get操作,没啥用!所以,官方也不支持。

6、create 和index的区别

  如果数据存在,使用create操作失败,会提示文档已经存在,使用index则可以成功执行。

 7、bulk一次最大处理多少数据量?

  bulk会把将要处理的数据载入内存中,所以数据量是有限制的,最佳的数据量不是一个确定的数值,它取决于你的硬件,你的文档大小以及复杂性,你的索引以及搜索的负载。

  一般建议是1000-5000个文档,如果你的文档很大,可以适当减少队列,大小建议是5-15MB,默认不能超过100M,可以在es的配置文件(即$ES_HOME下的config下的elasticsearch.yml)中。

 

 

 

  来修改这个值http.max_content_length: 100mb【不建议修改,太大的话bulk也会慢】,

 

 

 

 

 

 

 

 

 

 

 

 

 

批量操作bulk例子

 (1) 比如,我这里,在$ES_HOME里,新建一文件,命名为request。(这里为什么命名为request,去看官网就是)在Linux里,有无后缀没区别。

[hadoop@djt002 elasticsearch-2.4.3]$ pwd

/usr/local/elasticsearch/elasticsearch-2.4.3
[hadoop@djt002 elasticsearch-2.4.3]$ ll
total 56
drwxrwxr-x. 2 hadoop hadoop 4096 Feb 20 22:54 bin
drwxrwxr-x. 3 hadoop hadoop 4096 Feb 21 01:28 config
drwxrwxr-x. 3 hadoop hadoop 4096 Feb 20 22:59 data
drwxrwxr-x. 2 hadoop hadoop 4096 Feb 20 22:54 lib
-rw-rw-r--. 1 hadoop hadoop 11358 Aug 24 00:46 LICENSE.txt
drwxrwxr-x. 2 hadoop hadoop 4096 Feb 21 00:33 logs
drwxrwxr-x. 5 hadoop hadoop 4096 Dec 8 00:41 modules
-rw-rw-r--. 1 hadoop hadoop 150 Aug 24 00:46 NOTICE.txt
drwxrwxr-x. 2 hadoop hadoop 4096 Feb 20 22:59 plugins
-rw-rw-r--. 1 hadoop hadoop 8700 Aug 24 00:46 README.textile
[hadoop@djt002 elasticsearch-2.4.3]$ vim request
[hadoop@djt002 elasticsearch-2.4.3]$ more request 
{"index":{"_index":"zhouls","_type":"emp","_id":"10"}}
{ "name":"jack", "age" :18}
{"index":{"_index":"zhouls","_type":"emp","_id":"11"}}
{"name":"tom", "age":27}
{"update":{"_index":"zhouls","_type":"emp", "_id":"2"}}
{"doc":{"age" :22}}
{"delete":{"_index":"zhouls","_type":"emp","_id":"1"}}
[hadoop@djt002 elasticsearch-2.4.3]$

 

 

 

 

或者

{ "index" : {"_index":"zhouls","_type":"emp","_id":"21"}}

{ "name" : "test21"}

 

例子:

{ "index" : { "_index" : "zhouls", "_type" : "type1", "_id" : "1" } }

{ "field1" : "value1" }

{ "index" : { "_index" : "zhouls", "_type" : "type1", "_id" : "2" } }

{ "field1" : "value1" }

{ "delete" : { "_index" : "zhouls", "_type" : "type1", "_id" : "2" } }          (删除操作不需要加request body)      

{ "create" : { "_index" : "zhouls", "_type" : "type1", "_id" : "3" } }

{ "field1" : "value3" }

{ "update" : {"_index" : "zhouls", "_type" : "type1","_id" : "1" } }

{ "doc" : {"field2" : "value2"} }

 

 

 

 

 

(2)使用文件的方式

  vi requests 

写入批量操作语句。比如,下面

{"index":{"_index":"zhouls","_type":"emp","_id":"10"}}

{ "name":"jack", "age" :18}
{"index":{"_index":"zhouls","_type":"emp","_id":"11"}}
{"name":"tom", "age":27}
{"update":{"_index":"zhouls","_type":"emp", "_id":"2"}}
{"doc":{"age" :22}}
{"delete":{"_index":"zhouls","_type":"emp","_id":"1"}}

 

  在$ES_HOME目录下,执行下面命令

  curl  -PUT  '192.168.80.200:9200/_bulk'   --data-binary @request;

   curl  -XPOST  '192.168.80.200:9200/_bulk'   --data-binary @request;

[hadoop@djt002 elasticsearch-2.4.3]$ curl -PUT '192.168.80.200:9200/_bulk' --data-binary @request;

{"took":123,"errors":true,"items":[{"index":{"_index":"zhouls","_type":"emp","_id":"10","_version":1,"_shards":{"total":2,"successful":1,"failed":0},"status":201}},{"index":{"_index":"zhouls","_type":"emp","_id":"11","_version":1,"_shards":{"total":2,"successful":1,"failed":0},"status":201}},{"update":{"_index":"zhouls","_type":"emp","_id":"2","status":404,"error":{"type":"document_missing_exception","reason":"[emp][2]: document missing","index":"zhouls","shard":"-1"}}},{"delete":{"_index":"zhouls","_type":"emp","_id":"1","_version":1,"_shards":{"total":2,"successful":1,"failed":0},"status":404,"found":false}}]}[hadoop@djt002 elasticsearch-2.4.3]$

 

 

 

之后,再查看下。

[hadoop@djt002 elasticsearch-2.4.3]$ curl -XGET 'http://192.168.80.200:9200/zhouls/emp/1?pretty'

{
"_index" : "zhouls",
"_type" : "emp",
"_id" : "1",
"found" : false
}
[hadoop@djt002 elasticsearch-2.4.3]$ curl -XGET 'http://192.168.80.200:9200/zhouls/emp/2?pretty'
{
"_index" : "zhouls",
"_type" : "emp",
"_id" : "2",
"found" : false
}
[hadoop@djt002 elasticsearch-2.4.3]$ curl -XGET 'http://192.168.80.200:9200/zhouls/emp/11?pretty'
{
"_index" : "zhouls",
"_type" : "emp",
"_id" : "11",
"_version" : 4,
"found" : true,
"_source" : {
"name" : "tom",
"age" : 27
}
}

[hadoop@djt002 elasticsearch-2.4.3]$ curl -XGET 'http://192.168.80.200:9200/zhouls/emp/10?pretty'

{
"_index" : "zhouls",
"_type" : "emp",
"_id" : "10",
"_version" : 4,
"found" : true,
"_source" : {
"name" : "jack",
"age" : 18
}
}

 

 

 

  (3) bulk请求可以在URL中声明/_index 或者/_index/_type

  这个,自行去测试!

 

 

 

 

 

 

 

 

 

官网

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

本文转自大数据躺过的坑博客园博客,原文链接:http://www.cnblogs.com/zlslch/p/6422055.html,如需转载请自行联系原作者

你可能感兴趣的文章
云开雾散,带你看看大数据行业
查看>>
Random 随机数
查看>>
java join sleep wait notify notifyAll
查看>>
Linux: grep
查看>>
Android View架构总结
查看>>
filebeat多版本for Windows7
查看>>
ReachMax上云路:支撑日50亿PV请求和TB级数据运算的云端架构
查看>>
Upgrade json-serde-xxx jar in Apache Hive-1.2.1
查看>>
Dart入门—开发环境
查看>>
PHP>5.3版本部分新功能
查看>>
irrlicht引擎:实现天龙八部的RPG换装
查看>>
转 【转载】 SYN cookies 机制下连接的建立
查看>>
ExtJS 4.2 教程-02:bootstrap.js 工作方式
查看>>
利用jquery制作闪动标题
查看>>
SQLServer · 最佳实践 · 数据库实现大容量插入的几种方式
查看>>
AWK文本处理工具(Linux)
查看>>
JavaEE 获取路径全攻略
查看>>
使用mina解析http协议的使用
查看>>
Code Review for Vue 2.0 Preview
查看>>
java8 去掉 perm 用 Metaspace 来替代
查看>>