要学习该插件,第一步先让该插件跑起来再说
首先,我们创建一个文本文件, 文本文件的内容如下

1
2
3
4
5
6
[sqczm@sqczm first]$ pwd
/opt/logstash-6.7.1/demo/first
[sqczm@sqczm first]$ more users.txt
name: zhangsan, age: 21, addr: "中国 北京"
name: lisi, age:20,addr:"美国"
name:wangwu,age:19,addr:"beijing"

接下来,我们来配置一下logstash的配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[sqczm@sqczm first]$ pwd
/opt/logstash-6.7.1/demo/first
[sqczm@sqczm first]$ more first.conf
input {
file {
path => ["/opt/logstash-6.7.1/demo/first/users.txt"]
}
}
filter {

}
output {
stdout {}
}

最后,我们来启动logstash

1
2
3
4
5
6
7
8
9
10
11
12
[sqczm@sqczm logstash-6.7.1]$ pwd
/opt/logstash-6.7.1
[sqczm@sqczm logstash-6.7.1]$ bin/logstash -f /opt/logstash-6.7.1/demo/first/first.conf
Sending Logstash logs to /opt/logstash-6.7.1/logs which is now configured via log4j2.properties
[2019-04-20T16:18:32,057][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2019-04-20T16:18:32,083][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"6.7.1"}
[2019-04-20T16:18:40,628][INFO ][logstash.pipeline ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2019-04-20T16:18:41,060][INFO ][logstash.inputs.file ] No sincedb_path set, generating one based on the "path" setting {:sincedb_path=>"/opt/logstash-6.7.1/data/plugins/inputs/file/.sincedb_ccdcb2b886f0094c5a7fa2ddbbd759e3", :path=>["/opt/logstash-6.7.1/demo/first/users.txt"]}
[2019-04-20T16:18:41,112][INFO ][logstash.pipeline ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x2ed54826 run>"}
[2019-04-20T16:18:41,202][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2019-04-20T16:18:41,248][INFO ][filewatch.observingtail ] START, creating Discoverer, Watch with file and sincedb collections
[2019-04-20T16:18:41,658][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}

WTF,控制台居然没有打印文本信息?
先来看下官方文档,官方文档中有一个start_position的参数,官方的描述是这样的:
logstash-input-file官方说明

start_position

  • Value can be any of: beginning, end
  • Default value is “end”

Choose where Logstash starts initially reading files: at the beginning or at the end. The default behavior treats files like live streams and thus starts at the end. If you have old data you want to import, set this to beginning.

This option only modifies “first contact” situations where a file is new and not seen before, i.e. files that don’t have a current position recorded in a sincedb file read by Logstash. If a file has already been seen before, this option has no effect and the position recorded in the sincedb file will be used.

现在我们应该明白了,如果该参数不设置,默认将从文件的末尾开始读取,也就是说为什么我们刚才启动后控制台没有打印文本的原因了。按照官方的描述我们将其设置为”beginning”,改完后的配置如下所示

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[sqczm@sqczm logstash-6.7.1]$ pwd
/opt/logstash-6.7.1
[sqczm@sqczm logstash-6.7.1]$ more demo/first/first.conf
input {
file {
path => ["/opt/logstash-6.7.1/demo/first/users.txt"]
start_position => "beginning"
}
}
filter {

}
output {
stdout {}
}

修改完毕后,我们继续继续启动

1
2
3
4
5
6
7
8
9
10
11
12
[sqczm@sqczm logstash-6.7.1]$ pwd
/opt/logstash-6.7.1
[sqczm@sqczm logstash-6.7.1]$ bin/logstash -f /opt/logstash-6.7.1/demo/first/first.conf
Sending Logstash logs to /opt/logstash-6.7.1/logs which is now configured via log4j2.properties
[2019-04-20T16:31:36,250][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2019-04-20T16:31:36,274][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"6.7.1"}
[2019-04-20T16:31:44,536][INFO ][logstash.pipeline ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2019-04-20T16:31:44,864][INFO ][logstash.inputs.file ] No sincedb_path set, generating one based on the "path" setting {:sincedb_path=>"/opt/logstash-6.7.1/data/plugins/inputs/file/.sincedb_ccdcb2b886f0094c5a7fa2ddbbd759e3", :path=>["/opt/logstash-6.7.1/demo/first/users.txt"]}
[2019-04-20T16:31:44,915][INFO ][logstash.pipeline ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x5e479e9a run>"}
[2019-04-20T16:31:45,008][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2019-04-20T16:31:45,022][INFO ][filewatch.observingtail ] START, creating Discoverer, Watch with file and sincedb collections
[2019-04-20T16:31:45,443][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}

WTF,等到花儿都谢了,楞是没等到输出任何文件内容,难道少配置参数了,我们继续来看官方文档

Tracking of current position in watched files edit

The plugin keeps track of the current position in each file by recording it in a separate file named sincedb. This makes it possible to stop and restart Logstash and have it pick up where it left off without missing the lines that were added to the file while Logstash was stopped.

By default, the sincedb file is placed in the data directory of Logstash with a filename based on the filename patterns being watched (i.e. the path option). Thus, changing the filename patterns will result in a new sincedb file being used and any existing current position state will be lost. If you change your patterns with any frequency it might make sense to explicitly choose a sincedb path with the sincedb_path option.

A different sincedb_path must be used for each input. Using the same path will cause issues. The read checkpoints for each input must be stored in a different path so the information does not override.

Files are tracked via an identifier. This identifier is made up of the inode, major device number and minor device number. In windows, a different identifier is taken from a kernel32 API call.

Sincedb records can now be expired meaning that read positions of older files will not be remembered after a certain time period. File systems may need to reuse inodes for new content. Ideally, we would not use the read position of old content, but we have no reliable way to detect that inode reuse has occurred. This is more relevant to Read mode where a great many files are tracked in the sincedb. Bear in mind though, if a record has expired, a previously seen file will be read again.

Sincedb files are text files with four (< v5.0.0), five or six columns:

  1. The inode number (or equivalent).
  2. The major device number of the file system (or equivalent).
  3. The minor device number of the file system (or equivalent).
  4. The current byte offset within the file.
  5. The last active timestamp (a floating point number)
  6. The last known path that this record was matched to (for old sincedb records converted to the new format, this is blank.

On non-Windows systems you can obtain the inode number of a file with e.g. ls -li.

官方说了,sincedb文件中记录了每个被监听文件的位置等信息,当我们logstash重启后就不需要再从头读取文件了。
接下来我们要做的就是将该文件删除,官方描述该文件在data目录下,我们来找下,找了半天你会神奇的发现该文件压根不存在,其实是我们错了,因为该文件是隐藏文件,所以我们没找到,好了,执行以下命令进行删除

1
2
3
4
5
6
[sqczm@sqczm logstash-6.7.1]$ pwd
/opt/logstash-6.7.1
[sqczm@sqczm logstash-6.7.1]$ ls data/plugins/inputs/file/
[sqczm@sqczm logstash-6.7.1]$ ls -a data/plugins/inputs/file/
. .. .sincedb_ccdcb2b886f0094c5a7fa2ddbbd759e3
[sqczm@sqczm logstash-6.7.1]$ rm -rf data/plugins/inputs/file/.sincedb_ccdcb2b886f0094c5a7fa2ddbbd759e3

删除完毕后,我们继续启动logstash

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
[sqczm@sqczm logstash-6.7.1]$ pwd
/opt/logstash-6.7.1
[sqczm@sqczm logstash-6.7.1]$ bin/logstash -f /opt/logstash-6.7.1/demo/first/first.conf

Sending Logstash logs to /opt/logstash-6.7.1/logs which is now configured via log4j2.properties
[2019-04-20T16:57:38,915][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2019-04-20T16:57:38,939][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"6.7.1"}
[2019-04-20T16:57:47,643][INFO ][logstash.pipeline ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2019-04-20T16:57:48,093][INFO ][logstash.inputs.file ] No sincedb_path set, generating one based on the "path" setting {:sincedb_path=>"/opt/logstash-6.7.1/data/plugins/inputs/file/.sincedb_ccdcb2b886f0094c5a7fa2ddbbd759e3", :path=>["/opt/logstash-6.7.1/demo/first/users.txt"]}
[2019-04-20T16:57:48,145][INFO ][logstash.pipeline ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0xc6ca077 run>"}
[2019-04-20T16:57:48,233][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2019-04-20T16:57:48,251][INFO ][filewatch.observingtail ] START, creating Discoverer, Watch with file and sincedb collections
[2019-04-20T16:57:48,693][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
/opt/logstash-6.7.1/vendor/bundle/jruby/2.5.0/gems/awesome_print-1.7.0/lib/awesome_print/formatters/base_formatter.rb:31: warning: constant ::Fixnum is deprecated
{
"@timestamp" => 2019-04-20T08:57:48.917Z,
"@version" => "1",
"path" => "/opt/logstash-6.7.1/demo/first/users.txt",
"host" => "sqczm",
"message" => "name: lisi, age:20,addr:\"美国\""
}
{
"@timestamp" => 2019-04-20T08:57:48.886Z,
"@version" => "1",
"path" => "/opt/logstash-6.7.1/demo/first/users.txt",
"host" => "sqczm",
"message" => "name: zhangsan, age: 21, addr: \"中国 北京\""
}
{
"@timestamp" => 2019-04-20T08:57:48.918Z,
"@version" => "1",
"path" => "/opt/logstash-6.7.1/demo/first/users.txt",
"host" => "sqczm",
"message" => "name:wangwu,age:19,addr:\"beijing\""
}

心情无比激动呐,终于看到了文本内容,接下来再来改造下,其实大家在看的时候发现我伪造的数据其实想表现一种json格式,我们把配置文件继续改造下。
修改配置文件,将其设置为json格式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[sqczm@sqczm logstash-6.7.1]$ pwd
/opt/logstash-6.7.1
[sqczm@sqczm logstash-6.7.1]$ more demo/first/first.conf
input {
file {
path => ["/opt/logstash-6.7.1/demo/first/users.txt"]
start_position => "beginning"
codec => "json"
}
}
filter {

}
output {
stdout {}
}

修改完后记得删除sincedb文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
sqczm@sqczm logstash-6.7.1]$ pwd
/opt/logstash-6.7.1
[sqczm@sqczm logstash-6.7.1]$ rm -rf data/plugins/inputs/file/.sincedb_ccdcb2b886f0094c5a7fa2ddbbd759e3
[sqczm@sqczm logstash-6.7.1]$ bin/logstash -f /opt/logstash-6.7.1/demo/first/first.conf
……省略部分输出……
{
"tags" => [
[0] "_jsonparsefailure"
],
"@version" => "1",
"@timestamp" => 2019-04-20T11:48:45.377Z,
"path" => "/opt/logstash-6.7.1/demo/first/users.txt",
"host" => "sqczm",
"message" => "name: lisi, age:20,addr:\"美国\""
}
{
"tags" => [
[0] "_jsonparsefailure"
],
"@version" => "1",
"@timestamp" => 2019-04-20T11:48:45.332Z,
"path" => "/opt/logstash-6.7.1/demo/first/users.txt",
"host" => "sqczm",
"message" => "name: zhangsan, age: 21, addr: \"中国 北京\""
}
{
"tags" => [
[0] "_jsonparsefailure"
],
"@version" => "1",
"@timestamp" => 2019-04-20T11:48:45.381Z,
"path" => "/opt/logstash-6.7.1/demo/first/users.txt",
"host" => "sqczm",
"message" => "name:wangwu,age:19,addr:\"beijing\""
}

看着结果很奔溃呐,tags节点的错误信息是json解析出错,忽然一看我构造的数据压根不是json格式,赶紧修改下。

1
2
3
4
5
6
[sqczm@sqczm logstash-6.7.1]$ pwd
/opt/logstash-6.7.1
[sqczm@sqczm logstash-6.7.1]$ more demo/first/users.txt
{"name": "zhangsan", "age": 21, "addr": "中国 北京"}
{"name": "lisi", "age":20,"addr":"美国"}
{"name":"wangwu","age":19,"addr":"beijing"}

修改完毕后,我们赶紧删除sincedb文件后重启

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
sqczm@sqczm logstash-6.7.1]$ pwd
/opt/logstash-6.7.1
[sqczm@sqczm logstash-6.7.1]$ rm -rf data/plugins/inputs/file/.sincedb_ccdcb2b886f0094c5a7fa2ddbbd759e3
[sqczm@sqczm logstash-6.7.1]$ bin/logstash -f /opt/logstash-6.7.1/demo/first/first.conf
……省略部分输出……
{
"@version" => "1",
"path" => "/opt/logstash-6.7.1/demo/first/users.txt",
"name" => "zhangsan",
"@timestamp" => 2019-04-20T11:54:55.419Z,
"age" => 21,
"addr" => "中国 北京",
"host" => "sqczm"
}
{
"@version" => "1",
"path" => "/opt/logstash-6.7.1/demo/first/users.txt",
"name" => "lisi",
"@timestamp" => 2019-04-20T11:54:55.460Z,
"age" => 20,
"addr" => "美国",
"host" => "sqczm"
}
{
"@version" => "1",
"path" => "/opt/logstash-6.7.1/demo/first/users.txt",
"name" => "wangwu",
"@timestamp" => 2019-04-20T11:54:55.462Z,
"age" => 19,
"addr" => "beijing",
"host" => "sqczm"
}

到此,我们logstash-input-file插件的例子到此结束,其他属性可以看官方文档进行练习。