modern-cpp-kafka
Описание
fork of https://github.com/morganstanley/modern-cpp-kafka.git
Языки
- C++94,6%
- CMake3,2%
- Python1,8%
- Starlark0,4%
About the Modern C++ Kafka API
The modern-cpp-kafka API is a layer of C++ wrapper based on librdkafka (the C part only), with high quality, but more friendly to users.
- By now, modern-cpp-kafka is compatible with librdkafka v2.4.0.
KAFKA is a registered trademark of The Apache Software Foundation and
has been licensed for use by modern-cpp-kafka. modern-cpp-kafka has no
affiliation with and is not endorsed by The Apache Software Foundation.
Why it's here
The librdkafka is a robust high performance C/C++ library, widely used and well maintained.
Unfortunately, to maintain C++98 compatibility, the C++ interface of librdkafka is not quite object-oriented or user-friendly.
Since C++ is evolving quickly, we want to take advantage of new C++ features, thus making life easier for developers. And this led us to create a new C++ API for Kafka clients.
Eventually, we worked out the modern-cpp-kafka, -- a header-only library that uses idiomatic C++ features to provide a safe, efficient and easy to use way of producing and consuming Kafka messages.
Features
-
Header-only
- Easy to deploy, and no extra library required to link
-
Ease of Use
-
Interface/Naming matches the Java API
-
Object-oriented
-
RAII is used for lifetime management
-
librdkafka's polling and queue management is now hidden
-
-
Robust
-
Verified with kinds of test cases, which cover many abnormal scenarios (edge cases)
-
Stability test with unstable brokers
-
Memory leak check for failed client with in-flight messages
-
Client failure and taking over, etc.
-
-
-
Efficient
-
No extra performance cost (No deep copy introduced internally)
-
Much better (2~4 times throughput) performance result than those native language (Java/Scala) implementation, in most commonly used cases (message size: 256 B ~ 2 KB)
-
Installation / Requirements
-
Just include the
directory for your projectinclude/kafka -
The compiler should support C++17
-
Or, C++14, but with pre-requirements
-
Need boost headers (for
)boost::optional -
For GCC compiler, it needs optimization options (e.g.
)-O2
-
-
-
Dependencies
-
librdkafka headers and library (only the C part)
- Also see the requirements from librdkafka
-
rapidjson headers: only required by
addons/KafkaMetrics.h
-
User Manual
Properties
kafka::Properties Class Reference
-
It is a map which contains all configuration info needed to initialize a Kafka client, and it's the only parameter needed for a constructor.
-
The configuration items are key-value pairs, -- the type of key is always
, while the type for a value could be one of the followingsstd::string-
std::string
-
Most items are identical with librdkafka configuration
-
But with exceptions
-
Default value changes
Key String Default Description log_level5Default was from librdkafka6client.idrandom string No default from librdkafka group.idrandom string (for only) No default from librdkafkaKafkaConsumer -
Additional options
Key String Default Description enable.manual.events.pollfalseTo poll the (offset-commit/message-delivery callback) events manually max.poll.records500(for only) The maximum number of records that a single call toKafkaConsumerwould returnpoll() -
Ignored options
Key String Explanation enable.auto.offset.storemodern-cpp-kafka will save the offsets in its own way auto.commit.interval.msmodern-cpp-kafka will only commit the offsets within each operationpoll()
-
-
-
std::function<...>
- For kinds of callbacks
Key String Value Type log_cb(LogCallback)std::function<void(int, const char*, int, const char* msg)>error_cb(ErrorCallback)std::function<void(const Error&)>stats_cb(StatsCallback)std::function<void(const std::string&)>oauthbearer_token_refresh_cb(OauthbearerTokenRefreshCallback)std::function<SaslOauthbearerToken(const std::string&)> -
Interceptors
- To intercept thread start/exit events, etc.
Key String Value Type interceptorsInterceptors
-
Examples
-
std::string brokers = "192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092"; kafka::Properties props ({ {"bootstrap.servers", {brokers}}, {"enable.idempotence", {"true"}}, });
-
kafka::Properties props; props.put("bootstrap.servers", brokers); props.put("enable.idempotence", "true");
- Note:
is the only mandatory property for a Kafka clientbootstrap.servers
KafkaProducer
kafka::clients::producer::KafkaProducer Class Reference
A Simple Example
Here's a very simple example to see how to send a message with a .
#include <kafka/KafkaProducer.h>
#include <cstdlib>
#include <iostream>
#include <string>
int main()
{
using namespace kafka;
using namespace kafka::clients::producer;
// E.g. KAFKA_BROKER_LIST: "192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092"
const std::string brokers = getenv("KAFKA_BROKER_LIST"); // NOLINT
const Topic topic = getenv("TOPIC_FOR_TEST"); // NOLINT
// Prepare the configuration
const Properties props({{"bootstrap.servers", brokers}});
// Create a producer
KafkaProducer producer(props);
// Prepare a message
std::cout << "Type message value and hit enter to produce message..." << std::endl;
std::string line;
std::getline(std::cin, line);
ProducerRecord record(topic, NullKey, Value(line.c_str(), line.size()));
// Prepare delivery callback
auto deliveryCb = [](const RecordMetadata& metadata, const Error& error) {
if (!error) {
std::cout << "Message delivered: " << metadata.toString() << std::endl;
} else {
std::cerr << "Message failed to be delivered: " << error.message() << std::endl;
}
};
// Send a message
producer.send(record, deliveryCb);
// Close the producer explicitly(or not, since RAII will take care of it)
producer.close();
}
Notes
-
The
is an unblocked operation unless the message buffering queue is full.send() -
Make sure the memory block for
'sProducerRecordis valid until thekeyis called.send -
Make sure the memory block for
'sProducerRecordis valid until the message delivery callback is called (unless thevalueis with optionsend).KafkaProducer::SendOption::ToCopyRecordValue -
It's guaranteed that the message delivery callback would be triggered anyway after
, -- a producer would even be waiting for it before close.send -
At the end, we could close Kafka client (i.e.
orKafkaProducer) explicitly, or just leave it to the destructor.KafkaConsumer
The Lifecycle of the Message
The message for the KafkaProducer is called , it contains , (optional), and . Both & are , and since there's no deep-copy for the , the user should make sure the memory block for the be valid, until the delivery callback has been executed.
In the previous example, we don't need to worry about the lifecycle of , since the content of the keeps to be available before closing the producer, and all message delivery callbacks would be triggered before finishing closing the producer.
Example for shared_ptr
A trick is capturing the shared pointer (for the memory block of ) in the message delivery callback.
std::cout << "Type message value and hit enter to produce message... (empty line to quit)" << std::endl;
// Get input lines and forward them to Kafka
for (auto line = std::make_shared<std::string>();
std::getline(std::cin, *line);
line = std::make_shared<std::string>()) {
// Empty line to quit
if (line->empty()) break;
// Prepare a message
ProducerRecord record(topic, NullKey, Value(line->c_str(), line->size()));
// Prepare delivery callback
// Note: Here we capture the shared pointer of `line`, which holds the content for `record.value()`
auto deliveryCb = [line](const RecordMetadata& metadata, const Error& error) {
if (!error) {
std::cout << "Message delivered: " << metadata.toString() << std::endl;
} else {
std::cerr << "Message failed to be delivered: " << error.message() << std::endl;
}
};
// Send the message
producer.send(record, deliveryCb);
}
Example for deep-copy
The option could be used for , thus the memory block of would be copied into the internal sending buffer.
std::cout << "Type message value and hit enter to produce message... (empty line to quit)" << std::endl;
// Get input lines and forward them to Kafka
for (std::string line; std::getline(std::cin, line); ) {
// Empty line to quit
if (line.empty()) break;
// Prepare a message
ProducerRecord record(topic, NullKey, Value(line.c_str(), line.size()));
// Prepare delivery callback
auto deliveryCb = [](const RecordMetadata& metadata, const Error& error) {
if (!error) {
std::cout << "Message delivered: " << metadata.toString() << std::endl;
} else {
std::cerr << "Message failed to be delivered: " << error.message() << std::endl;
}
};
// Send the message (deep-copy the payload)
producer.send(record, deliveryCb, KafkaProducer::SendOption::ToCopyRecordValue);
}
Embed More Info in a ProducerRecord
Besides the (i.e. ), a could also put extra info in its & .
is a vector of which contains (i.e. ) and (i.e. ).
Example
const kafka::Topic topic = "someTopic";
const kafka::Partition partition = 0;
const std::string key = "some key";
const std::string value = "some payload";
const std::string category = "categoryA";
const std::size_t sessionId = 1;
{
kafka::clients::producer::ProducerRecord record(topic,
partition,
kafka::Key{key.c_str(), key.size()},
kafka::Value{value.c_str(), value.size()});
record.headers() = {{
kafka::Header{kafka::Header::Key{"Category"}, kafka::Header::Value{category.c_str(), category.size()}},
kafka::Header{kafka::Header::Key{"SessionId"}, kafka::Header::Value{&sessionId, sizeof(sessionId)}}
}};
std::cout << "ProducerRecord: " << record.toString() << std::endl;
}
About enable.manual.events.poll
By default, would be constructed with configuration.
That means, a background thread would be created, which keeps polling the events (thus calls the message delivery callbacks)
Here we have another choice, -- using , thus the MessageDelivery callbacks would be called within member function .
- Note: in this case, the send() will be an unblocked operation even if the message buffering queue is full, -- it would throw an exception (or return an error code with the input reference parameter), instead of blocking there.
Example
// Prepare the configuration (with "enable.manual.events.poll=true")
const Properties props({{"bootstrap.servers", {brokers}},
{"enable.manual.events.poll", {"true" }}});
// Create a producer
KafkaProducer producer(props);
std::cout << "Type message value and hit enter to produce message... (empty line to finish)" << std::endl;
// Get all input lines
std::list<std::shared_ptr<std::string>> messages;
for (auto line = std::make_shared<std::string>(); std::getline(std::cin, *line) && !line->empty();) {
messages.emplace_back(line);
}
while (!messages.empty()) {
// Pop out a message to be sent
auto payload = messages.front();
messages.pop_front();
// Prepare the message
ProducerRecord record(topic, NullKey, Value(payload->c_str(), payload->size()));
// Prepare the delivery callback
// Note: if fails, the message will be pushed back to the sending queue, and then retries later
auto deliveryCb = [payload, &messages](const RecordMetadata& metadata, const Error& error) {
if (!error) {
std::cout << "Message delivered: " << metadata.toString() << std::endl;
} else {
std::cerr << "Message failed to be delivered: " << error.message() << ", will be retried later" << std::endl;
messages.emplace_back(payload);
}
};
// Send the message
producer.send(record, deliveryCb);
// Poll events (e.g. message delivery callback)
producer.pollEvents(std::chrono::milliseconds(0));
}
Error Handling
might occur at different places while sending a message,
-
A
would be triggered ifkafka::KafkaExceptionfails to call theKafkaProduceroperation.send -
Delivery
could be fetched via the delivery-callback.kafka::Error -
The
for failureskafka::Error::value()-
Local errors
-
-- The topic doesn't existRD_KAFKA_RESP_ERR__UNKNOWN_TOPIC -
-- The partition doesn't existRD_KAFKA_RESP_ERR__UNKNOWN_PARTITION -
-- Invalid topic (topic is null or the length is too long (>512))RD_KAFKA_RESP_ERR__INVALID_ARG -
-- No ack received within the time limitRD_KAFKA_RESP_ERR__MSG_TIMED_OUT -
-- The message size conflicts with local configurationRD_KAFKA_RESP_ERR_INVALID_MSG_SIZEmessage.max.bytes
-
-
Broker errors
-
Typical errors are
-
Invalid message:
,RD_KAFKA_RESP_ERR_CORRUPT_MESSAGE,RD_KAFKA_RESP_ERR_MSG_SIZE_TOO_LARGE,RD_KAFKA_RESP_ERR_INVALID_REQUIRED_ACKS,RD_KAFKA_RESP_ERR_UNSUPPORTED_FOR_MESSAGE_FORMAT.RD_KAFKA_RESP_ERR_RECORD_LIST_TOO_LARGE -
Topic/Partition not exist:
, -- automatic topic creation is disabled on the broker or the application is specifying a partition that does not exist.RD_KAFKA_RESP_ERR_UNKNOWN_TOPIC_OR_PART -
Authorization failure:
,RD_KAFKA_RESP_ERR_TOPIC_AUTHORIZATION_FAILEDRD_KAFKA_RESP_ERR_CLUSTER_AUTHORIZATION_FAILED
-
-
Idempotent Producer
The configuration is highly RECOMMENDED.
Example
kafka::Properties props;
props.put("bootstrap.servers", brokers);
props.put("enable.idempotence", "true");
// Create an idempotent producer
kafka::clients::producer::KafkaProducer producer(props);
- Note: please refer to the document from librdkafka for more details.
Kafka Consumer
kafka::clients::consumer::KafkaConsumer Class Reference
A Simple Example
#include <kafka/KafkaConsumer.h>
#include <cstdlib>
#include <iostream>
#include <signal.h>
#include <string>
std::atomic_bool running = {true};
void stopRunning(int sig) {
if (sig != SIGINT) return;
if (running) {
running = false;
} else {
// Restore the signal handler, -- to avoid stuck with this handler
signal(SIGINT, SIG_IGN); // NOLINT
}
}
int main()
{
using namespace kafka;
using namespace kafka::clients::consumer;
// Use Ctrl-C to terminate the program
signal(SIGINT, stopRunning); // NOLINT
// E.g. KAFKA_BROKER_LIST: "192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092"
const std::string brokers = getenv("KAFKA_BROKER_LIST"); // NOLINT
const Topic topic = getenv("TOPIC_FOR_TEST"); // NOLINT
// Prepare the configuration
const Properties props({{"bootstrap.servers", {brokers}}});
// Create a consumer instance
KafkaConsumer consumer(props);
// Subscribe to topics
consumer.subscribe({topic});
while (running) {
// Poll messages from Kafka brokers
auto records = consumer.poll(std::chrono::milliseconds(100));
for (const auto& record: records) {
if (!record.error()) {
std::cout << "Got a new message..." << std::endl;
std::cout << " Topic : " << record.topic() << std::endl;
std::cout << " Partition: " << record.partition() << std::endl;
std::cout << " Offset : " << record.offset() << std::endl;
std::cout << " Timestamp: " << record.timestamp().toString() << std::endl;
std::cout << " Headers : " << toString(record.headers()) << std::endl;
std::cout << " Key [" << record.key().toString() << "]" << std::endl;
std::cout << " Value [" << record.value().toString() << "]" << std::endl;
} else {
std::cerr << record.toString() << std::endl;
}
}
}
// No explicit close is needed, RAII will take care of it
consumer.close();
}
-
By default, the
is constructed with propertyKafkaConsumerenable.auto.commit=true-
It means it will automatically commit previously polled offsets on each poll (and the final close) operations.
- Note: the internal offset commit is asynchronous, which is not guaranteed to succeed. Since the operation is supposed to be triggered (again) at a later time (within each
), thus the occasional failure doesn't matter.poll
- Note: the internal offset commit is asynchronous, which is not guaranteed to succeed. Since the operation is supposed to be triggered (again) at a later time (within each
-
-
could take a topic list. It's a block operation, and would wait for the consumer to get partitions assigned.subscribe -
must be called periodically, thus to trigger kinds of callback handling internally. In practice, it could be put in apoll.while loop
Rebalance events
The could specify the while it subscribes the topics, and the callback will be triggered while partitions are assigned or revoked.
Example
// The consumer would read all messages from the topic and then quit.
// Prepare the configuration
const Properties props({{"bootstrap.servers", {brokers}},
// Emit RD_KAFKA_RESP_ERR__PARTITION_EOF event
// whenever the consumer reaches the end of a partition.
{"enable.partition.eof", {"true"}},
// Action to take when there is no initial offset in offset store
// it means the consumer would read from the very beginning
{"auto.offset.reset", {"earliest"}}});
// Create a consumer instance
KafkaConsumer consumer(props);
// Prepare the rebalance callbacks
std::atomic<std::size_t> assignedPartitions{};
auto rebalanceCb = [&assignedPartitions](kafka::clients::consumer::RebalanceEventType et, const kafka::TopicPartitions& tps) {
if (et == kafka::clients::consumer::RebalanceEventType::PartitionsAssigned) {
assignedPartitions += tps.size();
std::cout << "Assigned partitions: " << kafka::toString(tps) << std::endl;
} else {
assignedPartitions -= tps.size();
std::cout << "Revoked partitions: " << kafka::toString(tps) << std::endl;
}
};
// Subscribe to topics with rebalance callback
consumer.subscribe({topic}, rebalanceCb);
TopicPartitions finishedPartitions;
while (finishedPartitions.size() != assignedPartitions.load()) {
// Poll messages from Kafka brokers
auto records = consumer.poll(std::chrono::milliseconds(100));
for (const auto& record: records) {
if (!record.error()) {
std::cerr << record.toString() << std::endl;
} else {
if (record.error().value() == RD_KAFKA_RESP_ERR__PARTITION_EOF) {
// Record the partition which has been reached the end
finishedPartitions.emplace(record.topic(), record.partition());
} else {
std::cerr << record.toString() << std::endl;
}
}
}
}
To Commit Offset Manually
Once the KafkaConsumer is configured with , the user has to find out the right places to call /.
Example
// Prepare the configuration
Properties props({{"bootstrap.servers", {brokers}}});
props.put("enable.auto.commit", "false");
// Create a consumer instance
KafkaConsumer consumer(props);
// Subscribe to topics
consumer.subscribe({topic});
while (running) {
auto records = consumer.poll(std::chrono::milliseconds(100));
for (const auto& record: records) {
std::cout << record.toString() << std::endl;
}
if (!records.empty()) {
consumer.commitAsync();
}
}
consumer.commitSync();
// No explicit close is needed, RAII will take care of it
// consumer.close();
Error Handling
-
Normally,
will be thrown if an operation fails.kafka::KafkaException -
But if the
operation fails, thepollwould be embedded in thekafka::Error.kafka::clients::consumer::ConsumerRecord -
There're 2 cases for the
kafka::Error::value()-
Success
-
(RD_KAFKA_RESP_ERR__NO_ERROR), -- got a message successfully0 -
(RD_KAFKA_RESP_ERR__PARTITION_EOF), -- reached the end of a partition (no message got)-191
-
-
Failure
-
Callbacks for KafkaClient
We're free to set callbacks in with a , , or .
Example
// Prepare the configuration
Properties props({{"bootstrap.servers", {brokers}}});
// To print out the error
props.put("error_cb", [](const kafka::Error& error) {
// https://en.wikipedia.org/wiki/ANSI_escape_code
std::cerr << "\033[1;31m" << "[" << kafka::utility::getCurrentTime() << "] ==> Met Error: " << "\033[0m";
std::cerr << "\033[4;35m" << error.toString() << "\033[0m" << std::endl;
});
// To enable the debug-level log
props.put("log_level", "7");
props.put("debug", "all");
props.put("log_cb", [](int /*level*/, const char* /*filename*/, int /*lineno*/, const char* msg) {
std::cout << "[" << kafka::utility::getCurrentTime() << "]" << msg << std::endl;
});
// To enable the statistics dumping
props.put("statistics.interval.ms", "1000");
props.put("stats_cb", [](const std::string& jsonString) {
std::cout << "Statistics: " << jsonString << std::endl;
});
Thread Model
-
Number of Background Threads within a Kafka Client
-
N threads for the message transmission (towards N brokers).
-
2 (for
) / 3 (forKafkaProducer) threads to handle internal operations, timers, consumer group operations, etc.KafkaConsumer -
1 thread for (message-delivery/offset-commit) callback events polling, -- the thread only exists while the client is configured with
(the default config)enable.manual.events.poll=false
-
-
Which Thread Handles the Callbacks
-
: the thread which callsconsumer::RebalanceCallbackconsumer.poll(...) -
consumer::OffsetCommitCallback
-
While
: the thread which callsenable.manual.events.poll=falseconsumer.pollEvents(...) -
While
: the background (events polling) threadenable.manual.events.poll=true
-
-
producer::Callback
-
While
: the thread which callsenable.manual.events.poll=falseproducer.pollEvents(...) -
While
: the background (events polling) threadenable.manual.events.poll=true
-
-
For Developers
Build (for tests/tools/examples)
-
Specify library locations with environment variables
Environment Variable Description LIBRDKAFKA_INCLUDE_DIRlibrdkafka headers LIBRDKAFKA_LIBRARY_DIRlibrdkafka libraries GTEST_ROOTgoogletest headers and libraries BOOST_ROOTboost headers and libraries /SASL_LIBRARYDIRSASL_LIBRARY[optional] for SASL connection support RAPIDJSON_INCLUDE_DIRSrequires rapidjson headersaddons/KafkaMetrics.h -
Build commands
-
cd empty-folder-for-build
-
(following options could be used withcmake path-to-project-root)-DBuild Option Description BUILD_OPTION_USE_TSAN=ONUse Thread Sanitizer BUILD_OPTION_USE_ASAN=ONUse Address Sanitizer BUILD_OPTION_USE_UBSAN=ONUse Undefined Behavior Sanitizer BUILD_OPTION_CLANG_TIDY=ONEnable clang-tidy checking BUILD_OPTION_GEN_DOC=ONGenerate documentation as well BUILD_OPTION_DOC_ONLY=ONOnly generate documentation BUILD_OPTION_GEN_COVERAGE=ONGenerate test coverage, only support by clang currently -
make
-
(to installmake install)tools
-
Run Tests
-
Kafka cluster setup
-
To run the binary, the test runner requires following environment variables
Environment Variable Descrioption Example KAFKA_BROKER_LISTThe broker list for the Kafka cluster export KAFKA_BROKER_LIST=127.0.0.1:29091,127.0.0.1:29092,127.0.0.1:29093KAFKA_BROKER_PIDSThe broker PIDs for test runner to manipulate export KAFKA_BROKER_PIDS=61567,61569,61571KAFKA_CLIENT_ADDITIONAL_SETTINGSCould be used for addtional configuration for Kafka clients export KAFKA_CLIENT_ADDITIONAL_SETTINGS="security.protocol=SASL_PLAINTEXT;sasl.kerberos.service.name=...;sasl.kerberos.keytab=...;sasl.kerberos.principal=..."-
The environment variable
is mandatory for integration/robustness test, which requires the Kafka cluster.KAFKA_BROKER_LIST -
The environment variable
is mandatory for robustness test, which requires the Kafka cluster and the privilege to stop/resume the brokers.KAFKA_BROKER_PIDS
Test Type KAFKA_BROKER_LISTKAFKA_BROKER_PIDStests/unit - - tests/integration Required - tests/robustness Required Required -