406 words
2 minutes
A Few Notes On Topics, Partitions, Consumer Groups etc.
2020-03-20

They say a picture is worth a thousand words. I created a high-level diagram (aided by a few notes) to highlight the relationships between a few key components in Kafka.

Topic and Partitions#

A topic can have one or more partitions. Any reference to message consumption implies consumption from a partition. In other words, there is no such thing as consuming directly from a topic; all messages are consumed from partitions (within a topic).

From the above diagram, you can see that -

  • There is one topic T1
  • There are two partitions, P0 and P1, within the topic T1

Consumer Groups and Partition Assignment#

Consumers in Kafka belong to a Consumer Group.

A Consumer can consume from one or more (topic) partitions, but Kafka guarentees that each partition is only consumed by one consumer within a Consumer Group at a given instance. However, Kafka allows consumers from different consumer groups to consume from the same partition. This is one way of increasing the message throughput in Kafka.

There are two primary ways in which partitions are assigned to consumers

  • Consumers can explcitly specify the partition they want to consume from. So if multiple consumers end up consuming from the same partition, Kafka will only allow one of the consumers access to the partition

  • Kafka automatically determines partition assigments to consumers

A few noteworthy points about consumer groups and (automatic) partition assignments shown in the diagram -

  • There are two Consumer Groups - CG1 and CG2

  • Consumer Group - CG1

    • CG1 has two consumers, C1 and C2
    • Kafka assigns each consumer a unique partition
      • P0 -> C1
      • P1 -> C2
  • Consumer Group - CG2

    • CG2 has one consumer, C1
    • Since there is only one consumer in CG2, Kafka assigns all the available partitions (P0 and P1) to the lone consumer.

Offset management#

This leads to the final topic (no pun intended) in our discussion - Kafka Offsets. Kafka allows consumers track their position (offset) in each partition. The important thing to note is that, since Kafka allows consumers from different consumer groups to simultaneously consume a specific partition, a partition offset will be maintained (by Kafka) for each of the consumer groups.

From the above diagram, you can observe the following about the offsets for partitions P0 and P1

  • The offsets for P0
    • Offset 5 for CG1 and
    • Offset 2 for GC1
  • The offsets for P1
    • Offset 7 for CG1 and
    • Offset 4 for GC2

Where to go from here#