Skip to main content

Apache Kafka 3.7.2 (Amazon Linux 2023) AMI Administrator Guide

1. Quick Start Information

Connection Methods:

  • Access the instance via SSH using the ec2-user user. Use sudo to run commands requiring root privileges. To switch to the root user, use sudo su - root.

Install Information:

  • OS: Amazon Linux 2023
  • Kafka version: 3.7.2
  • Scala version: 2.13
  • Java: Amazon Corretto 17 (AWS-optimized OpenJDK 17)
  • Java Home: /usr/lib/jvm/java-17-amazon-corretto
  • Mode: KRaft (no ZooKeeper required)
  • Install Directory: /opt/kafka (symlink → /opt/kafka_2.13-3.7.2)
  • Service User: kafka (system user, no login shell)
  • Default Port: 9092

Kafka Service Management:

  • Start Kafka service: sudo systemctl start kafka
  • Stop Kafka service: sudo systemctl stop kafka
  • Restart Kafka service: sudo systemctl restart kafka
  • Check Kafka status: sudo systemctl status kafka
  • Enable auto-start: sudo systemctl enable kafka

Quick Verification Commands:

  • Check Kafka version: /opt/kafka/bin/kafka-topics.sh --version
  • Check Java version: java -version
  • View Kafka logs: sudo journalctl -u kafka -f

Firewall Configuration:

  • Please allow SSH port 22.
  • Allow Kafka port 9092 if accessing from external clients or applications.
  • For security, it is recommended to limit access to trusted IPs only.

2. First Launch & Verification

Step 1: Connect to Your Instance

  1. Launch your instance in your cloud provider's console (e.g., AWS EC2)
  2. Ensure SSH port 22 is allowed in your security group
  3. Connect via SSH:
    ssh -i your-key.pem ec2-user@YOUR_PUBLIC_IP

Step 2: Verify Java Installation

Check Amazon Corretto 17:

java -version

Expected Output:

openjdk version "17.0.x" 2024-xx-xx LTS
OpenJDK Runtime Environment Corretto-17.x.x.x (build 17.0.x+x-LTS)
OpenJDK 64-Bit Server VM Corretto-17.x.x.x (build 17.0.x+x-LTS, mixed mode, sharing)

Confirm Corretto-17 is shown in the output.

Step 3: Verify Kafka Service Status

Check if Kafka daemon is running:

sudo systemctl status kafka --no-pager

Expected Output:

● kafka.service - Apache Kafka 3.7.2 Server (KRaft Mode)
Loaded: loaded (/etc/systemd/system/kafka.service; enabled; preset: disabled)
Active: active (running) since ...
Main PID: xxxx (java)

Step 4: Verify Kafka Functionality

List available topics:

/opt/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092

Expected Output:

aws-marketplace-test

Create a new test topic:

/opt/kafka/bin/kafka-topics.sh --create \
--topic my-test-topic \
--bootstrap-server localhost:9092 \
--partitions 1 \
--replication-factor 1

Expected Output:

Created topic my-test-topic.

Step 5: Test Produce and Consume Messages

Open a producer in one terminal:

/opt/kafka/bin/kafka-console-producer.sh \
--topic my-test-topic \
--bootstrap-server localhost:9092

Type a message and press Enter, then Ctrl+C to exit.

Open a consumer in another terminal:

/opt/kafka/bin/kafka-console-consumer.sh \
--topic my-test-topic \
--bootstrap-server localhost:9092 \
--from-beginning

Expected Output: The message you typed should appear.


3. Architecture & Detailed Configuration

This AMI runs Apache Kafka 3.7.2 in KRaft mode (Kafka Raft Metadata mode), which eliminates the dependency on Apache ZooKeeper. KRaft mode is the modern, recommended way to run Kafka and has been production-ready since Kafka 3.3.

Installation Architecture:

[Amazon Corretto 17]

[Kafka 3.7.2 (Scala 2.13)]
/opt/kafka_2.13-3.7.2/ ← actual directory
/opt/kafka/ ← symlink (used in all configs)

[KRaft Mode - No ZooKeeper]
/opt/kafka/config/kraft/server.properties → KRaft configuration

[Cluster ID + Formatted Storage]
kafka-storage.sh format → initializes data directory

[Systemd Service]
/etc/systemd/system/kafka.service → Auto-start on boot

[Service User: kafka]
No login shell → runs with minimal privileges

Key Design Decisions:

  1. KRaft Mode: Eliminates ZooKeeper dependency — simpler architecture, fewer components to manage
  2. Amazon Corretto 17: AWS-optimized JDK with long-term support, explicit path set in service file to prevent version drift
  3. Symlink Strategy: /opt/kafka/opt/kafka_2.13-3.7.2 allows version upgrades without changing configs or service files
  4. Dedicated kafka User: Service runs as a restricted system user (no login shell) for security
  5. Heap Tuning: 1G heap (-Xmx1G -Xms1G) optimized for t3.small/t3.medium instances

Why KRaft Over ZooKeeper?

FeatureZooKeeper ModeKRaft Mode
ComponentsKafka + ZooKeeperKafka only
ComplexityHighLow
Metadata storageZooKeeperKafka itself
Production readyLegacyRecommended (3.3+)
Future supportDeprecatedActive development

3.1. Systemd Service File

File Location: /etc/systemd/system/kafka.service

Complete Contents:

[Unit]
Description=Apache Kafka 3.7.2 Server (KRaft Mode)
Documentation=http://kafka.apache.org/documentation.html
Requires=network.target
After=network.target

[Service]
Type=simple
User=kafka
Group=kafka
Environment="JAVA_HOME=/usr/lib/jvm/java-17-amazon-corretto"
Environment="KAFKA_HEAP_OPTS=-Xmx1G -Xms1G"
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/kraft/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
Restart=on-failure

[Install]
WantedBy=multi-user.target

How This Works:

  • Environment="JAVA_HOME=...": Explicitly pins Java 17 Corretto path — prevents issues if multiple JDKs are installed or system defaults change
  • Environment="KAFKA_HEAP_OPTS=...": Sets JVM heap to 1G min/max — eliminates heap resizing overhead and suitable for t3.small/medium
  • User=kafka / Group=kafka: Runs as dedicated system user for security isolation
  • Restart=on-failure: Automatically restarts Kafka if it crashes unexpectedly
  • Type=simple: Systemd treats the first process as the main process (suitable for Kafka's foreground mode)

3.2. KRaft Configuration File

File Location: /opt/kafka/config/kraft/server.properties

Key Settings:

# The role of this server. KRaft mode uses 'broker,controller' for combined mode
process.roles=broker,controller

# The node id for this server
node.id=1

# The connect string for the KRaft controller quorum
controller.quorum.voters=1@localhost:9093

# Listeners
listeners=PLAINTEXT://:9092,CONTROLLER://:9093
advertised.listeners=PLAINTEXT://localhost:9092

# Log directories
log.dirs=/tmp/kraft-combined-logs

How This Works:

  • process.roles=broker,controller: Combined mode — single node acts as both broker and controller
  • controller.quorum.voters: Defines the KRaft quorum (cluster membership)
  • listeners: Port 9092 for client connections, port 9093 for internal KRaft communication
  • log.dirs: Where Kafka stores message data (formatted during setup)

4. How-To-Create: Reproduce This Environment

This section explains how this AMI was built, allowing you to reproduce the installation on any Amazon Linux 2023 system.

Step 1: Update the System

Purpose: Ensure a clean, up-to-date base before installing software.

sudo dnf update -y

Step 2: Install Amazon Corretto 17

Purpose: Install AWS's production-grade, long-term-supported OpenJDK distribution.

sudo dnf install java-17-amazon-corretto-devel -y

How This Works:

  • java-17-amazon-corretto-devel: Installs the full JDK (compiler + runtime), not just the JRE
  • Amazon Corretto 17 receives security patches from AWS through 2029+
  • Installs to /usr/lib/jvm/java-17-amazon-corretto/

Verify:

java -version

Expected Output (example):

openjdk version "17.0.13" 2024-10-15 LTS
OpenJDK Runtime Environment Corretto-17.0.13.11.1 (build 17.0.13+11-LTS)
OpenJDK 64-Bit Server VM Corretto-17.0.13.11.1 (build 17.0.13+11-LTS, mixed mode, sharing)

Step 3: Download and Extract Kafka

Purpose: Obtain the official Kafka 3.7.2 binary distribution.

wget https://downloads.apache.org/kafka/3.7.2/kafka_2.13-3.7.2.tgz
sudo tar -xzf kafka_2.13-3.7.2.tgz -C /opt

How This Works:

  • kafka_2.13-3.7.2.tgz: Kafka built with Scala 2.13 (current stable Scala version)
  • Extracted to /opt/kafka_2.13-3.7.2/ containing all binaries, configs, and scripts

Purpose: Create a stable /opt/kafka path that remains constant across version upgrades.

sudo ln -s /opt/kafka_2.13-3.7.2 /opt/kafka

How This Works:

When upgrading Kafka in the future:

  1. Extract new version: sudo tar -xzf kafka_2.13-3.8.0.tgz -C /opt
  2. Update symlink: sudo ln -sfn /opt/kafka_2.13-3.8.0 /opt/kafka
  3. Restart service: sudo systemctl restart kafka

No changes needed to the service file or any scripts.

Step 5: Create the kafka System User

Purpose: Run Kafka as a dedicated, unprivileged system user for security.

sudo useradd -r -s /bin/false kafka

How This Works:

  • -r: Creates a system account (lower UID range, no home directory by default)
  • -s /bin/false: No login shell — the user cannot log in interactively
  • This limits the blast radius if Kafka is compromised

Step 6: Set File Ownership

Purpose: Give the kafka user ownership of all Kafka files.

sudo chown -R kafka:kafka /opt/kafka_2.13-3.7.2
sudo chown -h kafka:kafka /opt/kafka

How This Works:

  • -R: Recursively sets ownership on all files in the Kafka directory
  • -h: Sets ownership on the symlink itself (not the target)

Step 7: Initialize KRaft Storage

Purpose: Generate a unique Cluster ID and format the storage directory for KRaft mode.

sudo -u kafka bash << 'EOF'
KAFKA_CLUSTER_ID=$(/opt/kafka/bin/kafka-storage.sh random-uuid)
echo "Cluster ID: $KAFKA_CLUSTER_ID"
/opt/kafka/bin/kafka-storage.sh format \
-t $KAFKA_CLUSTER_ID \
-c /opt/kafka/config/kraft/server.properties
EOF

How This Works:

  • sudo -u kafka bash: Executes the block as the kafka user — ensures storage is owned correctly
  • kafka-storage.sh random-uuid: Generates a globally unique cluster identifier
  • kafka-storage.sh format: Writes the cluster metadata to the log directory defined in server.properties

Expected Output:

Cluster ID: <uuid>
Formatting /tmp/kraft-combined-logs with metadata.version 3.7-IV4.

Step 8: Create Systemd Service

Purpose: Register Kafka as a system service that starts automatically on boot.

sudo tee /etc/systemd/system/kafka.service > /dev/null << 'EOF'
[Unit]
Description=Apache Kafka 3.7.2 Server (KRaft Mode)
Documentation=http://kafka.apache.org/documentation.html
Requires=network.target
After=network.target

[Service]
Type=simple
User=kafka
Group=kafka
Environment="JAVA_HOME=/usr/lib/jvm/java-17-amazon-corretto"
Environment="KAFKA_HEAP_OPTS=-Xmx1G -Xms1G"
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/kraft/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now kafka

How This Works:

  • daemon-reload: Tells systemd to re-read all service unit files
  • enable --now: Enables the service for auto-start AND starts it immediately in one command

Step 9: Verify Installation

# Check service status
sudo systemctl status kafka --no-pager

# Create test topic
/opt/kafka/bin/kafka-topics.sh --create \
--topic aws-marketplace-test \
--bootstrap-server localhost:9092 \
--partitions 1 \
--replication-factor 1

# List topics
/opt/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092

Expected Output:

Active: active (running) ...
Created topic aws-marketplace-test.
aws-marketplace-test

5. Using the Kafka Environment

5.1. Topic Management

# Create a topic
/opt/kafka/bin/kafka-topics.sh --create \
--topic my-topic \
--bootstrap-server localhost:9092 \
--partitions 3 \
--replication-factor 1

# Describe a topic
/opt/kafka/bin/kafka-topics.sh --describe \
--topic my-topic \
--bootstrap-server localhost:9092

# Delete a topic
/opt/kafka/bin/kafka-topics.sh --delete \
--topic my-topic \
--bootstrap-server localhost:9092

# List all topics
/opt/kafka/bin/kafka-topics.sh --list \
--bootstrap-server localhost:9092

5.2. Producing and Consuming Messages

Produce messages:

/opt/kafka/bin/kafka-console-producer.sh \
--topic my-topic \
--bootstrap-server localhost:9092

Type messages and press Enter for each. Use Ctrl+C to exit.

Consume messages:

# Consume from the beginning
/opt/kafka/bin/kafka-console-consumer.sh \
--topic my-topic \
--bootstrap-server localhost:9092 \
--from-beginning

# Consume only new messages
/opt/kafka/bin/kafka-console-consumer.sh \
--topic my-topic \
--bootstrap-server localhost:9092

5.3. Consumer Groups

# List consumer groups
/opt/kafka/bin/kafka-consumer-groups.sh \
--list \
--bootstrap-server localhost:9092

# Describe a consumer group (check lag)
/opt/kafka/bin/kafka-consumer-groups.sh \
--describe \
--group my-group \
--bootstrap-server localhost:9092

5.4. Monitoring

# View broker metadata
/opt/kafka/bin/kafka-metadata-quorum.sh \
--bootstrap-server localhost:9092 describe --status

# Check cluster information
/opt/kafka/bin/kafka-broker-api-versions.sh \
--bootstrap-server localhost:9092

6. Important File Locations

File PathPurpose
/opt/kafkaKafka installation symlink
/opt/kafka_2.13-3.7.2/Actual Kafka installation directory
/opt/kafka/bin/Kafka shell scripts (kafka-topics.sh, etc.)
/opt/kafka/config/kraft/server.propertiesKRaft mode configuration
/opt/kafka/config/kraft/KRaft configuration directory
/tmp/kraft-combined-logs/Kafka data directory (messages + metadata)
/etc/systemd/system/kafka.serviceSystemd service file
/usr/lib/jvm/java-17-amazon-corretto/Amazon Corretto 17 Java home

7. Troubleshooting

Issue 1: Kafka Service Fails to Start

Symptoms:

$ sudo systemctl status kafka
Active: failed (Result: exit-code)

Diagnosis:

View detailed logs:

sudo journalctl -u kafka -n 50 --no-pager

Common Causes:

  1. Storage not formatted (KRaft mode requires initialization):
sudo -u kafka bash << 'EOF'
KAFKA_CLUSTER_ID=$(/opt/kafka/bin/kafka-storage.sh random-uuid)
/opt/kafka/bin/kafka-storage.sh format \
-t $KAFKA_CLUSTER_ID \
-c /opt/kafka/config/kraft/server.properties
EOF
sudo systemctl start kafka
  1. Wrong JAVA_HOME path:
ls /usr/lib/jvm/

Update the Environment="JAVA_HOME=..." line in /etc/systemd/system/kafka.service to match the actual path, then reload:

sudo systemctl daemon-reload
sudo systemctl start kafka
  1. Port 9092 already in use:
sudo lsof -i :9092

Issue 2: Cannot Connect to Kafka

Symptoms:

$ /opt/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092
[ERROR] ... Connection refused

Diagnosis:

Check if Kafka is running:

sudo systemctl status kafka

Check if port 9092 is listening:

sudo ss -tlnp | grep 9092

Solution:

Start the service if not running:

sudo systemctl start kafka

Issue 3: Out of Memory Error

Symptoms:

java.lang.OutOfMemoryError: Java heap space

Diagnosis:

Check current heap setting:

grep KAFKA_HEAP_OPTS /etc/systemd/system/kafka.service

Solution:

Increase heap size in the service file (for larger instances):

sudo nano /etc/systemd/system/kafka.service

Change:

Environment="KAFKA_HEAP_OPTS=-Xmx1G -Xms1G"

To (for t3.large or larger):

Environment="KAFKA_HEAP_OPTS=-Xmx2G -Xms2G"

Reload and restart:

sudo systemctl daemon-reload
sudo systemctl restart kafka

Issue 4: Topics Not Persisting After Restart

Symptoms:

Topics created before a restart disappear.

Diagnosis:

Check the data directory:

ls /tmp/kraft-combined-logs/

Cause:

The default log.dirs=/tmp/kraft-combined-logs in server.properties uses the /tmp directory, which may be cleared on reboot.

Solution:

Change the log directory to a persistent path:

sudo mkdir -p /var/lib/kafka/data
sudo chown -R kafka:kafka /var/lib/kafka
sudo nano /opt/kafka/config/kraft/server.properties

Update:

log.dirs=/var/lib/kafka/data

Re-format storage with the new path:

sudo systemctl stop kafka
sudo -u kafka bash << 'EOF'
KAFKA_CLUSTER_ID=$(/opt/kafka/bin/kafka-storage.sh random-uuid)
/opt/kafka/bin/kafka-storage.sh format \
-t $KAFKA_CLUSTER_ID \
-c /opt/kafka/config/kraft/server.properties
EOF
sudo systemctl start kafka

8. Final Notes

Key Takeaways

  1. Kafka 3.7.2 running in KRaft mode — no ZooKeeper required
  2. Amazon Corretto 17 as the Java runtime — AWS-optimized with long-term support
  3. Symlink strategy (/opt/kafka) enables easy version upgrades
  4. Dedicated kafka user for security isolation
  5. The installation is production-ready and AMI-optimized with auto-start enabled

Kafka Use Cases

  • Event Streaming: Real-time data pipelines between services
  • Log Aggregation: Collect and centralize logs from distributed systems
  • Message Queuing: Decouple producers and consumers in microservices
  • Stream Processing: Integrate with Kafka Streams or Apache Flink
  • Activity Tracking: User behavior events for analytics
WorkloadInstanceReason
Development / Testingt3.smallLow cost, 1G heap fits
Small productiont3.medium2 vCPU, good throughput
Medium productiont3.large2G heap, higher throughput
High throughputm5.xlarge+Dedicated compute

Additional Resources


For support or questions, please contact the Easycloud team.