Kafka Stream-GlobalKTable join on a specific field

Tags: , ,



So I have a KStream that that gets deserialized into a POJO like so

public class FinancialMessage {

public String user_id;
public String stock_symbol;
public String exchange_id;

}

And here’s how the Global Ktable record looks like

public class CompanySectors {

public String company_id;
public String company_name;
public String tckr;
public String sector_cd;
}

I want to be able to join the KStream’s stock_symbol field with the Ktable’s tckr field. Is this possible? I want to create a new EnrichedMessage object before I stream it into another topic. I had code like below but I seem to be getting some null pointer exceptions.

Exception in thread "trade-enrichment-stream-0c7e7782-4217-4450-8086-21871b4ebc45-StreamThread-1" java.lang.NullPointerException
    at com.domain.EnrichedMessage.<init>(EnrichedMessage.java:51)
    at com.domain.TradeEnrichmentTopology.lambda$3(TradeEnrichmentTopology.java:73)
    at org.apache.kafka.streams.kstream.internals.KStreamKTableJoinProcessor.process(KStreamKTableJoinProcessor.java:79)
    at org.apache.kafka.streams.processor.internals.ProcessorNode.lambda$process$2(ProcessorNode.java:142)
    at org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:806)
    at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:142)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:201)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:180)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:133)
    at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:101)
    at org.apache.kafka.streams.processor.internals.StreamTask.lambda$process$3(StreamTask.java:383)
    at org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:801)
    at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:383)
    at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.process(AssignedStreamsTasks.java:475)
    at org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:550)
    at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:802)
    at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:697)
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:670)

Here’s what the code snippet looks like.

KStream<String, FinancialMessage> financialMessageStream =
        builder.stream(
            INCOMING_TOPIC,
            Consumed.with(Serdes.String(), financialMessageSerde)
        );

    GlobalKTable<String, CompanySectors> companySectorsStore = 
        builder.globalTable(
            KTABLE_TOPIC,
            Consumed.with(Serdes.String(), companySectorsSerde)
    );
    
    KStream<String, EnrichedMessage> enrichedStream = financialMessageStream.leftJoin(
        companySectorsStore,
        (financialMessageKey, financialMessageValue) -> financialMessageValue.stock_symbol,
        (financialMessageValue, companySectorsValue) -> new EnrichedMessage(financialMessageValue, companySectorsValue)
    );
    
    enrichedStream.to(
        OUTGOING_TOPIC,
        Produced.with(Serdes.String(), enrichedMessageSerde));

I imagine that there might be some error in my leftJoin logic.

Answer

When doing a left join, you can assume that the left stream’s record is not null; however, you cannot assume that the right GlobalKTable will have a record for matching the given key, and therefore the resulting record could be null. In your case, when you instantiate a new EnrichedMessage(financialMessageValue, companySectorsValue), are you sure that companySectorsValue isn’t null? If it is null, are you handling it properly? It appears that your NPE is occurring in the constructor of EnrichedMessage, so just make sure that you know that companySectorsValue can be null.

Also, ensure your GlobalKTable is prepopulated before any join logic occurs.



Source: stackoverflow