Moving data between systems sounds simple until you hit production scale.
The common pattern: poll a database every few minutes, query rows where updated_at > last_run_time, copy downstream, repeat. Works fine at small scale. Breaks under load.
Missed updates when timestamps overlap. Duplicates on retry. Deletes invisible unless handled manually. High load on production databases. Lag between change and consumption. This polling approach fails when it matters most.
What log-based CDC actually does
Change Data Capture reads database transaction logs directly. PostgreSQL WAL, MySQL binlog, SQL Server transaction logs. The logs databases already maintain for crash recovery and replication.
Debezium, an open-source CDC platform, taps these logs. When an application writes data, the database records it in the transaction log. Debezium reads that entry, converts it to an event with before/after state plus operation type, publishes to Kafka. No polling. No guessing. No missed changes.
Delivers changes in milliseconds with minimal overhead. Each change becomes an event preserving the full lifecycle. Critical for event-driven architectures, real-time analytics syncing to Elasticsearch or warehouses, and auditing.
The trade-offs that matter
Kafka infrastructure required. Schema changes need planning. Initial snapshots can miss pre-startup changes. Production scaling adds complexity versus simpler batch syncs.
But the decoupling matters. Multiple consumers can act independently: one updates cache, another populates analytics, another writes to a data lake. Each system does one job well.
Debezium powers real-time integration, including Alibaba Cloud's ApsaraMQ Kafka deployments. Recent guides (Jan 2026) show setup patterns for PostgreSQL and MySQL with Kafka Connect. August 2025 saw Quarkus integration for embedded CDC in applications.
When it makes sense
Log-based CDC fits when you need reliable change tracking at scale, event-driven architecture, or real-time downstream updates. It's a system that requires operational investment. History suggests the teams that succeed treat it as infrastructure, not a script.
The real question: can your team maintain it? CDC aligns with how databases work internally. That's why it works. That's also why it's not simple.