KEFCore: how it works

Entity Framework Core provider for Apache Kafka can be used in some operative conditions.

It is important to start with a simple description on how it works. In the following chapters sometime it is used the term back-end and sometime Apache Kafka cluster: they shall be considered the same thing int the Entity Framework Core provider for Apache Kafka context.

Basic concepts

Here below an image from Wikipedia describing simple concepts:

Alt text

Simplifying there are three active elements:

Topics: storage of the records (the data), they are hosted in the Apache Kafka cluster and can be partitioned
Producers: entities producing records to be stored in one or more topics
Consumers: entities receiving records from the topics

When a producer send a record to Apache Kafka cluster, the record will be sent to the consumers subscribed to the topics the producer is producing on: this is a classic pub-sub pattern. Apache Kafka cluster adds the ability to store this information within the topic the producer has produced on, this feature guarantee that:

an application consuming from the Apache Kafka cluster can hear only latest changes or position to a specific position in the past and start from that point to receive data
the standard way to consume from Apache Kafka cluster is to start from the end (latest available record) or start from the beginning (first available record)

How Entity Framework Core provider for Apache Kafka works

An application based on Entity Framework Core provider for Apache Kafka is both a producer and a consumer at the same time:

when an entity is created/updated/deleted (e.g. calling SaveChanges) the provider will invoke the right producer to store a new record in the right topic of the Apache Kafka cluster
then the consumer subscribed will be informed about this new record and will store it back: this seems not useful till now, but it will be more clear later

Apache Kafka cluster becams a:

a central routing for data changes in Entity Framework Core based applications.
a reliable storage because, when the application restarts, the data stored in the topics will be read back from the consumers so the state will be aligned to the latest available.

Apache Kafka comes with topic compaction feature, thanks to it the point 2 is optimized. Entity Framework Core provider for Apache Kafka is interested to store only the latest state of the entity and not the changes. Using the topic compaction, the combination of producer, consumer and Apache Kafka cluster can apply the CRUD operations on data:

Create: a producer stores a new record with a unique key
Read: a consumer retrieves records from topic
Update: a producer storing a new record with a previously stored unique key will discard the old records
Delete: a producer storing a new record with a previously stored unique key, and value set to null, will delete all records with that unique key

All CRUD operations are helped, behind the scene, from KNetCompactedReplicator or KNetProducer/Apache Kafka Streams.

First-level cache

KNetCompactedReplicator or Apache Kafka Streams act as first-level cache of Entity Framework Core: data coming from the Apache Kafka cluster updates their content while the system is running. The behavior is intrinsic and does not need any extra call to the back-end.

Data storage

Apache Kafka stores the information using records. It is important to convert entities in something usable from Apache Kafka. The conversion is done using serializers that converts the Entities (data in the model) into Apache Kafka records and viceversa: see serialization chapter for more info.

Entity Framework Core provider for Apache Kafka compared to other providers

In the previous chapter was described how Entity Framework Core provider for Apache Kafka permits to reproduce the CRUD operations. Starting from the model defined in the code, the data are stored in the topics and each topic can be seen as a table of a database filled in with the same data. From the point of view of an application, the use of Entity Framework Core provider for Apache Kafka is similar to the use of the InMemory provider.

A note on migrations

The current version of Entity Framework Core provider for Apache Kafka does not support migrations explicitly.

Entity Framework Core provider for Apache Kafka features not available in other providers

Here a list of features Entity Framework Core provider for Apache Kafka gives to its user and useful in some use cases. The features below are strictly correlated with the consumers receiving back the record from Apache Kafka cluster described above.

Distributed cache

In the previous chapter was stated that consumers align the application data to the last topics information. The alignment is managed from KNetCompactedReplicator and/or Apache Kafka Streams, everything is driven from the Apache Kafka back-end. Considering two, or more, applications, sharing the same model and configuration, they always align to the latest state of the topics involved. This implies that, virtually, there is a distributed cache between the applications and the Apache Kafka back-end:

Apache Kafka stores physically the cache (shared state) within the topics and routes changes to the subscribed applications
Applications use latest cache version (local state) received from Apache Kafka back-end

If an application restarts it will be able to retrieve latest data (latest cache) and aligns to the shared state.

Alt text

Events

Generally, an application based on Entity Framework Core, executes queries to the back-end to store, or retrieve, information on demand. The alignment (record consumed) can be considered a change event: so any change in the backend produces an event used in different mode:

Mainly these change events are used from KNetCompactedReplicator and/or Apache Kafka Streams to align the local state;
Moreover Entity Framework Core provider for Apache Kafka can inform, using callbacks and at zero cost, the registered application about these events.

Then the application can use the reported events in many modes:

execute a query
write something to disk
execute a REST call
and so on

Alt text

IMPORTANT NOTE: the events are raised from external threads and this can lead to concurrent exceptions if the KafkaDbContext is used to retrieve information.

Applications not based on Entity Framework Core

Till now was spoken about applications based on Entity Framework Core, however this provider can be used to feed applications not based on Entity Framework Core. Entity Framework Core provider for Apache Kafka comes with ready-made helping classes to subscribe to any topic of the Apache Kafka cluster to retrieve the data stored from an application based on Entity Framework Core. Any application can use this feature to:

read latest data stored in the topics from the application based on Entity Framework Core
attach to the topics involved from the application based on Entity Framework Core and receive change events upon something was produced

The ready-made helping classes upon a record is received, deserialize it and returns back the filled Entity.

Table of Contents

KEFCore: how it works

Basic concepts

How Entity Framework Core provider for Apache Kafka works

First-level cache

Data storage

Entity Framework Core provider for Apache Kafka compared to other providers

A note on migrations

Entity Framework Core provider for Apache Kafka features not available in other providers

Distributed cache

Events

Applications not based on Entity Framework Core