Skip to content
Advertisement

Add document to Firestore from Beam with auto generated ID

I would like to use Apache Beam Java with the recently published Firestore connector to add new documents to a Firestore collection. While I thought that this should be a relatively easy task, the need for creating com.google.firestore.v1.Document objects seem to make things a bit more difficult. I was using this blog post on Using Firestore and Apache Beam for data processing as a starting point.

What I actually only want is to write is a simple transformation, mapping MyClass objects to Firestore documents, which are then added to a Firestore collection.

What I now ended up with is a Beam SimpleFunction, which maps MyClass objects to Documents:

JavaScript

and a DoFn transforming these Documents to Write objects with configured update (can probably be also simplified to a SimpleFunction but was copied from the blog post):

JavaScript

I’m using these two functions in my pipeline as follows:

JavaScript

The major disadvantages here are:

  • I have to specify a document ID and can not use an auto-generated one as with the “plain” Java SDK
  • I have to specify the project ID and the database name although they should be available. At least for the Java SDK, I have don’t have to set them.

Is there any way to add documents using the Firestore connector without explicitly setting document ID, project ID and database?

Advertisement

Answer

I agree, this is not the most convenient API (and I don’t see a better one at the moment). It seems to be designed for modifying existing documents, not creating new ones.

I think it would make sense to have a higher-level transform; I filed https://issues.apache.org/jira/browse/BEAM-13994 . In the meantime, you could do something like

JavaScript

which would be generally re-usable and likely worth contributing to Beam.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement