kafka学习指南(总结版)


版本介绍

从使用上来看,以0.9为分界线,0.9开始不再区分高级/低级消费者API。

从兼容性上来看,以0.8.x为分界线,0.8.x不兼容以前的版本。

总体拓扑架构

从上可知:

1、生产者不需要访问zookeeper。

2、消费者fetch消息、生产者发布消息总是向leader节点发请求,不会发送给follower。

3、和rocketmq一样,为了线性提高性能,每个topic被分为partition(跟数据库的分库分表一样的道理,对业务而言透明,属于技术策略,不是业务策略),每个partition只能被相同消费组的任何一个成员消费,topic的分区数量(默认是1)可通过./kafka-topics.sh –zookeeper localhost:2181 -alter –partitions 5 –topic userService修改,其合理值的设置可以参考https://blog.csdn.net/kwengelie/article/details/51150114。

4、kafka 0.8.x使用zk存储每个consumer-group在每个topic的每个partition的点位。 

broker内部机制

 

消息消费过程

 

核心高级API(不允许用户控制消费者和broker的交互过程)

ConsumerConnector

KafkaStream

ConsumerConfig

低级API则允许控制各个交互过程,比如从哪里开始读以及在客户端维护点位,rocketmq实现其实采用的就是高层和底层结合的API,也就是kafka 0.9之后合并的api版本。

底层API的主要接口是SimpleConsumer。

 

kafka监控管理

主流的三种kafka监控程序分别为:

  • 1、Kafka Web Conslole
  • 2、Kafka Manager
  • 3、KafkaOffsetMonitor

以我们使用的KafkaOffsetMonitor为例,KafkaOffsetMonitor是Kafka的一款客户端消费监控工具,用来实时监控Kafka服务的Consumer以及它们所在的Partition中的Offset,我们可以浏览当前的消费者组,并且每个Topic的所有Partition的消费情况都可以一目了然。KafkaOffsetMonitor托管在Github上,可以通过Github下载。下载地址:https://github.com/quantifind/KafkaOffsetMonitor/releases,也可以从baidu网盘下载(内网的话,要使用这个,否则会缺少从cdn加载的js)。

可以通过java -cp KafkaOffsetMonitor-assembly-0.2.0.jar     com.quantifind.kafka.offsetapp.OffsetGetterWeb     --zk 10.20.30.10:2181     --port 8088     --refresh 10.seconds     --retain 2.days启动,各配置含义可以参考github。

 常见问题

如何通过java api获取所有topic?

消费和如何一次性订阅多个topic? 

如何查看所有的topic?

[root@hs-test-10-20-30-11 kafka]# bin/kafka-topics.sh --zookeeper 10.20.30.10:2181 --list
global
test

查看特定topic的配置?

[root@hs-test-10-20-30-11 kafka]# bin/kafka-topics.sh --zookeeper 10.20.30.10:2181 --topic global --describe
Topic:global PartitionCount:1 ReplicationFactor:1 Configs:
Topic: global Partition: 0 Leader: 0 Replicas: 0 Isr: 0

生产者连接的时候报了下列错误

WARN [Producer clientId=console-producer] Connection to node -1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

有两个原因:1、kafka没有启动;2、连接串使用了非conf/server.properties里面的LISTENERS参数的值。

如何查看所有的消费者?

新的方式,也就是不是使用基于zk的客户端(kafka.consumer.Consumer.createJavaConsumerConnector、内部是bootstrap)。

[root@hs-test-10-20-30-11 kafka]# bin/kafka-consumer-groups.sh --new-consumer --bootstrap-server 10.20.30.11:9092 --list
Note: This will only show information about consumers that use the Java consumer API (non-ZooKeeper-based consumers).

老的方式:基于zk的客户端(kafka.javaapi.consumer.ZookeeperConsumerConnector,已经deprecated)。

[root@hs-test-10-20-30-11 kafka]# bin/kafka-consumer-groups.sh --zookeeper 10.20.30.10:2181 --list
Note: This will only show information about consumers that use ZooKeeper (not those using the Java consumer API).

AAA
TA50-Aggr-Logger-ConsumerGroup
console-consumer-23104
console-consumer-37858

log4j-kafka配置

增加jar包依赖: 

        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka_2.12</artifactId>
            <version>0.11.0.3</version>
        </dependency>

配置log4j2.xml,如下:

logger增加kafka appender。

        <Root level="INFO" additivity="false">
            <AppenderRef ref="Console"/>
            <AppenderRef ref="KAFKA"/>
            <AppenderRef ref="app_error" />
        </Root>

增加kafka appender。

    <Appenders>
        <!-- 输出错误日志到Kafka -->
        <Kafka name="KAFKA" topic="bomp">
            <ThresholdFilter level="error" onMatch="DENY" onMismatch="NEUTRAL"/>
            <ThresholdFilter level="trace" onMatch="ACCEPT" onMismatch="DENY"/>
            <PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss,SSS}:%4p %t (%F:%L) - %m%n" />
            <Property name="bootstrap.servers">10.20.30.11:9092</Property>
        </Kafka>
    </Appenders>

这样log4j配置kafka就完成了。对于c++,可以使用librdkafka库,https://docs.confluent.io/2.0.0/clients/librdkafka/index.html,后续会专门出文讲解。

相关问题

消费者报:

2018-09-17 14:10:07.768 WARN 130400 --- [r-finder-thread] kafka.client.ClientUtils$ : Fetching topic metadata with correlation id 0 for topics [Set(test)] from broker [BrokerEndPoint(0,10.20.30.11,9092)] failed

java.nio.channels.ClosedChannelException: null
at kafka.network.BlockingChannel.send(BlockingChannel.scala:112) ~[kafka_2.12-0.11.0.3.jar:na]
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:80) ~[kafka_2.12-0.11.0.3.jar:na]
at kafka.producer.SyncProducer.doSend(SyncProducer.scala:79) ~[kafka_2.12-0.11.0.3.jar:na]
at kafka.producer.SyncProducer.send(SyncProducer.scala:124) ~[kafka_2.12-0.11.0.3.jar:na]
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:61) [kafka_2.12-0.11.0.3.jar:na]
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:96) [kafka_2.12-0.11.0.3.jar:na]
at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:72) [kafka_2.12-0.11.0.3.jar:na]
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64) [kafka_2.12-0.11.0.3.jar:na]

解决方法:在server.properties里面设置下advertised.host.name,重启试试看。参考https://stackoverflow.com/questions/30606447/kafka-consumer-fetching-metadata-for-topics-failed

zk日志中报:

2018-10-08 14:13:28,297 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x100147743c10000 type:setData cxid:0xc8 zxid:0x53 txntype:-1 reqpath:n/a Error Path:/config/topics/uft_trade Error:KeeperErrorCode = NoNode for /config/topics/uft_trade
2018-10-08 14:13:28,302 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x100147743c10000 type:create cxid:0xc9 zxid:0x54 txntype:-1 reqpath:n/a Error Path:/config/topics Error:KeeperErrorCode = NodeExists for /config/topics

解决方法:待排查。

spring boot kafka客户端在某虚拟机服务器(物理机一直运行未发生)上运行一段时间后,瞬间cpu system 80-90%,大量下列日志:

2018-10-09 13:54:57,713  WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 2682ms for sessionid 0x100175687960002
2018-10-09 13:54:57,904  WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 2672ms for sessionid 0x100175687960004
2018-10-09 13:54:58,621  WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 2675ms for sessionid 0x100175687960003
2018-10-09 13:54:57,232  WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 2700ms for sessionid 0x100175687960007
2018-10-09 13:55:09,812  INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 2672ms for sessionid 0x100175687960004, closing socket connection and attempting reconn
ect
2018-10-09 13:55:02,942  WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 2702ms for sessionid 0x100175687960008
2018-10-09 13:55:09,755  INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 2675ms for sessionid 0x100175687960003, closing socket connection and attempting reconn
ect
2018-10-09 13:55:09,789  INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 2682ms for sessionid 0x100175687960002, closing socket connection and attempting reconn
ect
2018-10-09 13:55:18,677  WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 2675ms for sessionid 0x100175687960005
2018-10-09 13:55:11,752  WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 20016ms for sessionid 0x100175687960001
2018-10-09 13:55:17,709  WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 2678ms for sessionid 0x100175687960006
2018-10-09 13:55:12,779  INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 2700ms for sessionid 0x100175687960007, closing socket connection and attempting reconn
ect
2018-10-09 13:55:20,634  INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 2702ms for sessionid 0x100175687960008, closing socket connection and attempting reconn
ect
2018-10-09 13:55:22,178  INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 20016ms for sessionid 0x100175687960001, closing socket connection and attempting recon
nect
2018-10-09 13:58:10,244  INFO ZkClient:713 - zookeeper state changed (Disconnected)
2018-10-09 13:58:10,240  INFO ZkClient:713 - zookeeper state changed (Disconnected)
2018-10-09 13:58:10,241  INFO ZkClient:713 - zookeeper state changed (Disconnected)
2018-10-09 13:58:10,240  INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 2675ms for sessionid 0x100175687960005, closing socket connection and attempting reconn
ect
2018-10-09 13:58:10,243  INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 2678ms for sessionid 0x100175687960006, closing socket connection and attempting reconn
ect
2018-10-09 13:58:11,107  INFO ZkClient:713 - zookeeper state changed (Disconnected)
2018-10-09 13:58:40,384  INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-10-09 13:58:40,383  INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-10-09 13:58:40,379  INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-10-09 13:58:40,378  INFO ZkClient:713 - zookeeper state changed (Disconnected)
2018-10-09 13:58:40,378  INFO ZkClient:713 - zookeeper state changed (Disconnected)
2018-10-09 13:58:40,377  INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-10-09 13:59:22,082  INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session
2018-10-09 13:59:22,084  INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session
2018-10-09 13:59:22,099  INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session
2018-10-09 13:59:22,108  INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session
2018-10-09 13:59:22,130  INFO ZkClient:713 - zookeeper state changed (Disconnected)
2018-10-09 13:59:23,382  INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-10-09 13:59:23,412  INFO ZkClient:713 - zookeeper state changed (Expired)
2018-10-09 13:59:23,412  INFO ZkClient:713 - zookeeper state changed (Expired)
2018-10-09 13:59:23,443  INFO ZooKeeper:438 - Initiating client connection, connectString= sessionTimeout=500 watcher=org.I0Itec.zkclient.ZkClient@8646db9
2018-10-09 13:59:23,411  WARN ClientCnxn:1285 - Unable to reconnect to ZooKeeper service, session 0x100175687960001 has expired
2018-10-09 13:59:32,474  INFO ZkClient:713 - zookeeper state changed (Disconnected)
2018-10-09 13:59:23,404  WARN ClientCnxn:1285 - Unable to reconnect to ZooKeeper service, session 0x100175687960007 has expired
2018-10-09 13:59:23,390  INFO ZkClient:713 - zookeeper state changed (Expired)
2018-10-09 13:59:32,477  INFO ZooKeeper:438 - Initiating client connection, connectString= sessionTimeout=500 watcher=org.I0Itec.zkclient.ZkClient@4671e53b
2018-10-09 13:59:23,390  WARN ClientCnxn:1285 - Unable to reconnect to ZooKeeper service, session 0x100175687960008 has expired
2018-10-09 13:59:23,390  INFO ZkClient:713 - zookeeper state changed (Expired)
2018-10-09 13:59:32,477  INFO ZooKeeper:438 - Initiating client connection, connectString= sessionTimeout=500 watcher=org.I0Itec.zkclient.ZkClient@6a1aab78
2018-10-09 13:59:23,389  WARN ClientCnxn:1285 - Unable to reconnect to ZooKeeper service, session 0x100175687960004 has expired
2018-10-09 13:59:32,417  INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-10-09 13:59:23,380  INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-10-09 13:59:23,446  INFO ZooKeeper:438 - Initiating client connection, connectString= sessionTimeout=30000 watcher=org.I0Itec.zkclient.ZkClient@dc24521
2018-10-09 13:59:41,829  INFO ClientCnxn:1154 - Unable to reconnect to ZooKeeper service, session 0x100175687960004 has expired, closing socket connection
2018-10-09 13:59:41,832  INFO ZkClient:936 - Waiting for keeper state SyncConnected
2018-10-09 13:59:41,829  INFO ClientCnxn:1154 - Unable to reconnect to ZooKeeper service, session 0x100175687960008 has expired, closing socket connection
2018-10-09 13:59:41,831  INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session
2018-10-09 13:59:41,830  INFO ClientCnxn:1154 - Unable to reconnect to ZooKeeper service, session 0x100175687960007 has expired, closing socket connection
2018-10-09 13:59:41,830  INFO ClientCnxn:1154 - Unable to reconnect to ZooKeeper service, session 0x100175687960001 has expired, closing socket connection
2018-10-09 13:59:41,860  INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session
2018-10-09 13:59:42,585  INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session
2018-10-09 13:59:42,810  INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-10-09 13:59:42,835  INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session
2018-10-09 14:00:31,813  WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 48978ms for sessionid 0x100175687960002
2018-10-09 14:00:31,825  WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 49644ms for sessionid 0x100175687960005
2018-10-09 14:00:31,825  INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 49644ms for sessionid 0x100175687960005, closing socket connection and attempting recon
nect
2018-10-09 14:00:31,827  WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 49968ms for sessionid 0x100175687960006
2018-10-09 14:00:31,827  INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 49968ms for sessionid 0x100175687960006, closing socket connection and attempting recon
nect
2018-10-09 14:00:31,842  WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 50011ms for sessionid 0x100175687960003
2018-10-09 14:00:31,868  INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 50011ms for sessionid 0x100175687960003, closing socket connection and attempting recon
nect
2018-10-09 14:00:31,853  INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 48978ms for sessionid 0x100175687960002, closing socket connection and attempting recon
nect
2018-10-09 14:00:31,885  INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-10-09 14:00:31,886  INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session
2018-10-09 14:00:31,887  INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-10-09 14:00:31,887  INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session
2018-10-09 14:00:31,907  INFO ClientCnxn:519 - EventThread shut down for session: 0x100175687960001
2018-10-09 14:00:31,907  INFO ClientCnxn:519 - EventThread shut down for session: 0x100175687960008
2018-10-09 14:00:31,908  INFO ClientCnxn:519 - EventThread shut down for session: 0x100175687960004
2018-10-09 14:00:31,944  INFO ClientCnxn:519 - EventThread shut down for session: 0x100175687960007
2018-10-09 14:00:33,391  INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-10-09 14:00:33,396  INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-10-09 14:00:33,424  WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 1336ms for sessionid 0x0
2018-10-09 14:00:33,430  INFO ClientCnxn:1299 - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x10017568796000b, negotiated timeout = 30000
2018-10-09 14:00:33,517  INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session
2018-10-09 14:00:33,516  INFO ZkClient:713 - zookeeper state changed (SyncConnected)
2018-10-09 14:00:34,399  INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-10-09 14:00:34,354  INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 1336ms for sessionid 0x0, closing socket connection and attempting reconnect
2018-10-09 14:00:34,433  INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session
2018-10-09 14:00:34,475  INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-10-09 14:00:34,476  INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session
2018-10-09 14:00:34,485  WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 968ms for sessionid 0x0
2018-10-09 14:00:34,488  INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 968ms for sessionid 0x0, closing socket connection and attempting reconnect
2018-10-09 14:00:37,472  INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-10-09 14:00:37,484  INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session
2018-10-09 14:00:37,487  INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session
2018-10-09 14:00:37,488  INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-10-09 14:00:37,489  INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session
2018-10-09 14:00:37,479  WARN ClientCnxn:1285 - Unable to reconnect to ZooKeeper service, session 0x100175687960006 has expired
2018-10-09 14:00:37,495  INFO ClientCnxn:1154 - Unable to reconnect to ZooKeeper service, session 0x100175687960006 has expired, closing socket connection
2018-10-09 14:00:37,447  INFO ZkClient:713 - zookeeper state changed (Expired)
2018-10-09 14:00:37,479  INFO ZkClient:713 - zookeeper state changed (Expired)
2018-10-09 14:00:37,519  INFO ZooKeeper:438 - Initiating client connection, connectString= sessionTimeout=500 watcher=org.I0Itec.zkclient.ZkClient@69b0fd6f
2018-10-09 14:00:37,519  INFO ZooKeeper:438 - Initiating client connection, connectString= sessionTimeout=500 watcher=org.I0Itec.zkclient.ZkClient@4a87761d
2018-10-09 14:00:37,446  WARN ClientCnxn:1285 - Unable to reconnect to ZooKeeper service, session 0x100175687960005 has expired
2018-10-09 14:00:37,519  INFO ClientCnxn:1154 - Unable to reconnect to ZooKeeper service, session 0x100175687960005 has expired, closing socket connection
2018-10-09 14:00:37,765  INFO ZkClient:713 - zookeeper state changed (Expired)
2018-10-09 14:00:37,780  INFO ZkClient:713 - zookeeper state changed (Expired)
2018-10-09 14:00:37,780  WARN ClientCnxn:1285 - Unable to reconnect to ZooKeeper service, session 0x100175687960003 has expired
2018-10-09 14:00:37,791  INFO ClientCnxn:1154 - Unable to reconnect to ZooKeeper service, session 0x100175687960003 has expired, closing socket connection
2018-10-09 14:00:38,194  INFO ZooKeeper:438 - Initiating client connection, connectString= sessionTimeout=500 watcher=org.I0Itec.zkclient.ZkClient@3aeaafa6
2018-10-09 14:00:37,995  WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 507ms for sessionid 0x0
2018-10-09 14:00:52,148  INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 507ms for sessionid 0x0, closing socket connection and attempting reconnect
2018-10-09 14:00:38,198  INFO ZooKeeper:438 - Initiating client connection, connectString= sessionTimeout=500 watcher=org.I0Itec.zkclient.ZkClient@491cc5c9
2018-10-09 14:00:52,141  INFO ClientCnxn:519 - EventThread shut down for session: 0x100175687960006
2018-10-09 14:00:52,128  INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-10-09 14:00:52,154  INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session
2018-10-09 14:00:52,126  INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-10-09 14:00:52,179  INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session
2018-10-09 14:00:38,010  WARN ClientCnxn:1285 - Unable to reconnect to ZooKeeper service, session 0x100175687960002 has expired
2018-10-09 14:00:52,231  INFO ClientCnxn:1154 - Unable to reconnect to ZooKeeper service, session 0x100175687960002 has expired, closing socket connection
2018-10-09 14:00:52,683  WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 504ms for sessionid 0x0
2018-10-09 14:05:12,238  INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-10-09 14:05:12,176  INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-10-09 14:08:21,078  INFO ClientCnxn:519 - EventThread shut down for session: 0x100175687960002
2018-10-09 14:05:12,113  WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 259911ms for sessionid 0x10017568796000b
2018-10-09 14:08:21,107  INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 259911ms for sessionid 0x10017568796000b, closing socket connection and attempting reco
nnect
2018-10-09 14:05:12,098  INFO ClientCnxn:519 - EventThread shut down for session: 0x100175687960003
2018-10-09 14:00:52,677  WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 501ms for sessionid 0x0
2018-10-09 14:08:21,107  INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 501ms for sessionid 0x0, closing socket connection and attempting reconnect

 经大概看了下帖子https://blog.csdn.net/xjping0794/article/details/77784171的内容,查看该段时间系统io,确实很高,高达50%,如下:

14时00分28秒       sda   3062.38 922268.58    670.77    301.38      5.17      1.71      0.16     49.44
14时00分28秒   ol-root   3111.77 922266.41    495.79    296.54      5.29      1.70      0.16     49.43
14时00分28秒   ol-swap     22.04      2.09    174.24      8.00      0.13      5.80      0.15      0.33
14时11分16秒       sda   5432.75 1537105.34    768.61    283.07     19.06      3.53      0.17     91.53
14时11分16秒   ol-root   5513.26 1537106.56    731.82    278.93     19.55      3.54      0.17     91.52
14时11分16秒   ol-swap      5.07      4.68     35.87      8.00      0.01      2.27      0.19      0.10

14时11分16秒       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
14时20分01秒       sda   2784.00 795332.59    462.60    285.85     10.89      3.93      0.18     50.09
14时20分01秒   ol-root   2827.44 795311.85    414.30    281.43     11.18      3.95      0.18     50.07
14时20分01秒   ol-swap      6.96     12.98     42.72      8.00      0.05      7.80      0.18      0.12
14时30分01秒       sda      3.13     12.42     59.59     23.04      0.00      0.57      0.44      0.14

但是这段时间没有东西特别在运行,这就比较奇怪了,那会儿一下子也忘了用iotop看下是哪个进程所致。上述帖子提到的几点是:

关于ZK日志存放,官网给出如下建议:

Having a dedicated log devicehas a large impact on throughput and stable latencies. It is highly recommenedto dedicate a log device and set dataLogDir to point to a directory on thatdevice, and then make sure to point dataDir to a directory not residing on thatdevice.

在ZOO.CFG中增加:

      forceSync=no

默认是开启的,为避免同步延迟问题,ZK接收到数据后会立刻去讲当前状态信息同步到磁盘日志文件中,同步完成后才会应答。将此项关闭后,客户端连接可以得到快速响应(这一点在有BMU的服务器上问题不大)。

 再看下zk服务器的日志,差不多时间开始出现大量CancelledKeyException:

2018-10-09 13:56:36,712 [myid:] - INFO  [SyncThread:0:NIOServerCnxn@1040] - Closed socket connection for client /127.0.0.1:14926 which had sessionid 0x100175687960008
2018-10-09 13:56:43,857 [myid:] - INFO  [SyncThread:0:NIOServerCnxn@1040] - Closed socket connection for client /127.0.0.1:14924 which had sessionid 0x100175687960006
2018-10-09 13:56:49,783 [myid:] - INFO  [SyncThread:0:NIOServerCnxn@1040] - Closed socket connection for client /127.0.0.1:14919 which had sessionid 0x100175687960001
2018-10-09 13:56:49,816 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@236] - Ignoring unexpected runtime exception
java.nio.channels.CancelledKeyException
	at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
	at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:87)
	at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:205)
	at java.lang.Thread.run(Thread.java:748)
2018-10-09 13:58:54,331 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@941] - Client attempting to renew session 0x100175687960000 at /192.168.223.137:23459
2018-10-09 13:58:54,377 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@686] - Invalid session 0x100175687960000 for client /192.168.223.137:23459, probably expired
2018-10-09 13:58:54,401 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /192.168.223.137:23485
2018-10-09 13:58:54,441 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /192.168.223.137:23494
2018-10-09 13:58:56,314 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /192.168.223.137:23459 which had sessionid 0x10017
5687960000
2018-10-09 13:58:56,336 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@941] - Client attempting to renew session 0x100175687960000 at /192.168.223.137:23485
2018-10-09 13:58:56,392 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@686] - Invalid session 0x100175687960000 for client /192.168.223.137:23485, probably expired
2018-10-09 13:58:57,890 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /192.168.223.137:23497
2018-10-09 13:58:59,480 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /192.168.223.137:23485 which had sessionid 0x10017
5687960000
2018-10-09 13:59:00,383 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@941] - Client attempting to renew session 0x100175687960000 at /192.168.223.137:23494
2018-10-09 13:59:00,910 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@686] - Invalid session 0x100175687960000 for client /192.168.223.137:23494, probably expired
2018-10-09 13:59:02,140 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /192.168.223.137:23507
2018-10-09 13:59:03,286 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@941] - Client attempting to renew session 0x100175687960000 at /192.168.223.137:23497
2018-10-09 13:59:03,671 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /192.168.223.137:23494 which had sessionid 0x10017
5687960000
2018-10-09 13:59:03,905 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@686] - Invalid session 0x100175687960000 for client /192.168.223.137:23497, probably expired
2018-10-09 13:59:05,341 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@236] - Ignoring unexpected runtime exception
java.nio.channels.CancelledKeyException
	at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
	at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:87)
	at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:205)
	at java.lang.Thread.run(Thread.java:748)
2018-10-09 13:59:06,862 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /192.168.223.137:23511
2018-10-09 13:59:10,044 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@941] - Client attempting to renew session 0x100175687960000 at /192.168.223.137:23507
2018-10-09 13:59:10,267 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /192.168.223.137:23497 which had sessionid 0x10017
5687960000
2018-10-09 13:59:10,285 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@686] - Invalid session 0x100175687960000 for client /192.168.223.137:23507, probably expired
2018-10-09 13:59:10,286 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@236] - Ignoring unexpected runtime exception
java.nio.channels.CancelledKeyException
	at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
	at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:87)
	at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:205)
	at java.lang.Thread.run(Thread.java:748)
2018-10-09 13:59:10,287 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /192.168.223.137:23507 which had sessionid 0x10017
5687960000
2018-10-09 13:59:10,287 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@941] - Client attempting to renew session 0x100175687960000 at /192.168.223.137:23511
2018-10-09 13:59:10,287 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@686] - Invalid session 0x100175687960000 for client /192.168.223.137:23511, probably expired
2018-10-09 13:59:10,313 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /192.168.223.137:23519
2018-10-09 13:59:10,313 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /192.168.223.137:23511 which had sessionid 0x10017
5687960000
2018-10-09 13:59:10,314 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /192.168.223.137:23524
2018-10-09 13:59:10,314 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@941] - Client attempting to renew session 0x100175687960000 at /192.168.223.137:23519
2018-10-09 13:59:10,314 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@686] - Invalid session 0x100175687960000 for client /192.168.223.137:23519, probably expired
2018-10-09 13:59:10,315 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@941] - Client attempting to renew session 0x100175687960000 at /192.168.223.137:23524
2018-10-09 13:59:10,315 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /192.168.223.137:23519 which had sessionid 0x10017
5687960000
2018-10-09 13:59:10,316 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@686] - Invalid session 0x100175687960000 for client /192.168.223.137:23524, probably expired
2018-10-09 13:59:10,321 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@236] - Ignoring unexpected runtime exception
java.nio.channels.CancelledKeyException
	at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
	at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:87)
	at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:205)
	at java.lang.Thread.run(Thread.java:748)

上述帖子中提到在3.4.8中修复,我们用的3.4.12。进一步查找,有些提及写日志延迟很大,例如“fsync-ing the write ahead log in SyncThread:0 took 8001ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide ”但是日志中并没有看到该告警。决定加上forceSync=no试试看,参考https://www.jianshu.com/p/73eec030db86。

至于日志中的超时时间有长、有短,这是tickTime有关,可以解释,不做详细说明。

zk日志中大量下列错误信息:

id:0x9d zxid:0x42 txntype:-1 reqpath:n/a Error Path:/config/topics Error:KeeperErrorCode = NodeExists for /config/topics
2018-10-09 12:01:07,918 [myid:] - INFO  [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x100175687960000 type:create cx
id:0xa5 zxid:0x45 txntype:-1 reqpath:n/a Error Path:/brokers/topics/uft_individual/partitions/0 Error:KeeperErrorCode = NoNode for /brokers/topics/uft_individual/partitions/0
2018-10-09 12:01:07,921 [myid:] - INFO  [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x100175687960000 type:create cx
id:0xa6 zxid:0x46 txntype:-1 reqpath:n/a Error Path:/brokers/topics/uft_individual/partitions Error:KeeperErrorCode = NoNode for /brokers/topics/uft_individual/partitions
2018-10-09 12:01:17,740 [myid:] - INFO  [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x100175687960000 type:setData c
xid:0xaf zxid:0x4a txntype:-1 reqpath:n/a Error Path:/config/topics/uft_splitter Error:KeeperErrorCode = NoNode for /config/topics/uft_splitter
2018-10-09 12:01:17,741 [myid:] - INFO  [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x100175687960000 type:create cx
id:0xb0 zxid:0x4b txntype:-1 reqpath:n/a Error Path:/config/topics Error:KeeperErrorCode = NodeExists for /config/topics
2018-10-09 12:01:17,753 [myid:] - INFO  [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x100175687960000 type:create cx
id:0xb8 zxid:0x4e txntype:-1 reqpath:n/a Error Path:/brokers/topics/uft_splitter/partitions/0 Error:KeeperErrorCode = NoNode for /brokers/topics/uft_splitter/partitions/0
2018-10-09 12:01:17,754 [myid:] - INFO  [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x100175687960000 type:create cx
id:0xb9 zxid:0x4f txntype:-1 reqpath:n/a Error Path:/brokers/topics/uft_splitter/partitions Error:KeeperErrorCode = NoNode for /brokers/topics/uft_splitter/partitions
2018-10-09 12:01:35,671 [myid:] - INFO  [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x100175687960000 type:setData c
xid:0xc2 zxid:0x53 txntype:-1 reqpath:n/a Error Path:/config/topics/cres_global Error:KeeperErrorCode = NoNode for /config/topics/cres_global

参考https://github.com/mesos/kafka/issues/136,可是kafka服务一直正常启动着啊(对比启动日志也可以看出确实已经启动了)。https://stackoverflow.com/questions/34393837/zookeeper-kafka-error-keepererrorcode-nodeexists还有一个原因,是因为zk的data未删除的原因,可我们是全新安装过一会也有这个问题。最后查看https://stackoverflow.com/questions/43559328/got-user-level-keeperexception-when-processing,如下:

The message you see is not an error yet. It is a potential exception raised by Zookeeper that original object making a request has to handle.

When you start a fresh Kafka, it gets a bunch of NoNode messages. It's normal because some paths don't exist yet. At the same time, you get also NodeExists messages as the path exists already.

Example: Error:KeeperErrorCode = NoNode for /config/topics/test It's because Kafka sends a request to Zookeeper for this path. But it doesn't exist. That's OK, because you are trying to create it. So, you see "INFO" from Zookeeper but no error from Kafka. Once Kafka gets this message, it tries to create your topic. To do so, it needs to access a path in Zookeeper for topics. So, it sends a request and gets an error NodeExists for /config/topics. Again, it's normal and Kafka ignores the message.

Long story short, these are all non-issue messages and you should skip them. If it bothers you, change logging configuration of Zookeeper (it's not recommended though).

其实就是提示性信息,不用管它就好了,kafka会直接忽略该信息。

kafka:为什么说Kafka使用磁盘比内存快

Kafka最核心的思想是使用磁盘,而不是使用内存,可能所有人都会认为,内存的速度一定比磁盘快,我也不例外。

在看了Kafka的设计思想,查阅了相应资料再加上自己的测试后,发现磁盘的顺序读写速度和内存持平。      

 而且Linux对于磁盘的读写优化也比较多,包括read-ahead和write-behind,磁盘缓存等。

如果在内存做这些操作的时候,一个是JAVA对象的内存开销很大,另一个是随着堆内存数据的增多,JAVA的GC时间会变得很长,

使用磁盘操作有以下几个好处:

磁盘缓存由Linux系统维护,减少了程序员的不少工作。

磁盘顺序读写速度超过内存随机读写。

JVM的GC效率低,内存占用大。

使用磁盘可以避免这一问题。

系统冷启动后,磁盘缓存依然可用。

参考

https://blog.csdn.net/xbs1019/article/details/54949324

https://blog.csdn.net/Kim_Weir/article/details/79934095

 https://kafka.apache.org/documentation

https://www.cnblogs.com/justuntil/p/8033792.html

智能推荐

注意!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系我们删除。



 
© 2014-2019 ITdaan.com 粤ICP备14056181号  

赞助商广告