Skip to content

direct logging on elasticsearch vs using logstash and filebeat

I’m using a Spring Boot back-end to provide some restful API and need to log all of my request-response logs into ElasticSearch.

Which of the following two methods has better performance?

  1. Using Spring Boot ResponseBodyAdvice to log every request and response that is sent to the client directly to ElasticSearch.

  2. Log every request and response into a log file and using filebeat and/or logstash to send them to ElasticSearch.

Answer

First off, I assume, that you have a distributed application, otherwise just write your stuff in a log file and that’s it

I also assume that you have quite a log of logs to manage, otherwise, if you’re planning to log like a couple of messages in a hour, then it doesn’t really matter which way you go – both will do the job.

Technically both ways can be implemented, although for the first path I would suggest a different approach, at least I did something similar ~ 5 years ago in one of my projects:

Create a custom log appender that throws everything into some queue (for async processing) and from that took an Apache Flume project that can write stuff to the DB of your choice in a transaction manner with batch support, “all-or-nothing” semantics, etc.

This approach solves issues that might appear in the “first” option that you’ve presented, while some other issues will be left unsolved.

If I compare the first and the second option that you’ve presented, I think you better off with filebeat / logstash or even both to write to ES, here is why:

When you log in the advice – you will “eat” the resources of your JVM – memory, CPU to maintain ES connections pool, thread pool for doing an actual log (otherwise the business flow might slow down because of logging the requests to ES).

In addition you won’t be able to write “in batch” into the elasticsearch without the custom code and instead will have to create an “insert” per log message that might be wasty.

One more “technicality” – what happens if the application gets restarted for some reason, will you be able to write all the logs prior to the restart if everything gets logged in the advice?

Yet another issue – what happens if you want to “rotate” the indexes in the ES, namely create an index with TTL and produce a new index every day.

filebeat/logstash potentially can solve all these issues, however they might require a more complicated setup. Besides, obviously you’ll have more services to deploy and maintain:

  • logstash is way heavier than filebeat from the resource consumption standpoint, and usually you should parse the log message (usually with grok filter) in logstash.
  • filebeat is much more “humble” when it comes to the resource consumption, and if you have like many instances to log (really distributed logging, that I’ve assumed you have anyway) consider putting a service of filebeat (deamon set if you have k8s) on each node from which you’ll gather the logs, so that a single filebeat process could handle different instances, and then deploy a cluster of instances of logstash on a separate machine so that they’ll do a heavy log-crunching all the time and stream the data to the ES.

How does logstash/filebeat help? Out of my head:

  • It will run in its own pace, so even if process goes down, the messages produced by this process will be written to the ES after all
  • It even can survive short outages of the ES itself I think (should check that)
  • It can handle different processes written in different technologies, what if tomorrow you’ll want to gather logs from the database server, for example, that doesn’t have spring/not written java at all
  • It can handle indices rotation, batch writing internally so you’ll end up with effective ES management that otherwise you had to write by yourself. What are the drawbacks of the logstash/filebeat approach? Again, out of my head, not a full list or something:
  • Well, much more data will go through the network all-in-all
  • If you use “LogEvent” you don’t need to parse the string, so this conversion is redundant.

As for performance implications – it basically depends on what do you measure how exactly does your application look like, what hardware do you have, so I’m afraid I won’t be able to give you a clear answer on that – you should measure in your concrete case and come up with a way that works for you better.