I am using Snowplow to do the behavioral data tracking. I could consume the data from Pub/Sub to BigQuery using Snowplow loader (& mutator) open source code (https://docs.snowplowanalytics.com/docs/getting-started-on-snowplow-open-source/setup-snowplow-on-gcp/setup-bigquery-destination/), but I would like to consume the data from Pub/Sub to a Java API directly.
However, the data from Pub/Sub is unstructured without a schema in a String format. The data includes “t” as the delimiter as well as “{}” to store some schemas, which may require the string processing to do the data formatting.
Is there any other better way to decode the data from Pub/Sub to Java API rather than writing complex string processing. Thank you!
Advertisement
Answer
Snowplow maintains a number of so-called ‘analytics SDKs’ that let you transform the enriched hybrid tsv + JSON format into plain JSON that can then be used in downstream applications.
For Java, your best bet would probably be the Scala Analytics SDK: https://github.com/snowplow/snowplow-scala-analytics-sdk.
There are also SDKs for .NET
, Go
, JavaScript
and Python
: https://github.com/snowplow/snowplow/tree/master/5-data-modeling/analytics-sdk.