Skip to content

Snowplow Data Processing from PubSub to Java API

I am using Snowplow to do the behavioral data tracking. I could consume the data from Pub/Sub to BigQuery using Snowplow loader (& mutator) open source code (https://docs.snowplowanalytics.com/docs/getting-started-on-snowplow-open-source/setup-snowplow-on-gcp/setup-bigquery-destination/), but I would like to consume the data from Pub/Sub to a Java API directly.

However, the data from Pub/Sub is unstructured without a schema in a String format. The data includes “t” as the delimiter as well as “{}” to store some schemas, which may require the string processing to do the data formatting.

Is there any other better way to decode the data from Pub/Sub to Java API rather than writing complex string processing. Thank you!

Answer

Snowplow maintains a number of so-called ‘analytics SDKs’ that let you transform the enriched hybrid tsv + JSON format into plain JSON that can then be used in downstream applications.

For Java, your best bet would probably be the Scala Analytics SDK: https://github.com/snowplow/snowplow-scala-analytics-sdk.

There are also SDKs for .NET, Go, JavaScript and Python: https://github.com/snowplow/snowplow/tree/master/5-data-modeling/analytics-sdk.