Skip to content

Direct Buffer Memory error when connecting mqtt

we are running an Apache Beam Apllication on a Flink Cluster.

Since a few days the application fails with the following error:

    Caused by: javax.net.ssl.SSLException: failure when writing TLS control frames
        at io.netty.handler.ssl.SslHandler.setHandshakeFailureTransportFailure(SslHandler.java:1870)
        at io.netty.handler.ssl.SslHandler.access$600(SslHandler.java:167)
        at io.netty.handler.ssl.SslHandler$2.operationComplete(SslHandler.java:985)
        at io.netty.handler.ssl.SslHandler$2.operationComplete(SslHandler.java:980)
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551)
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
        at io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:183)
        at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:95)
        at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:30)
        at io.netty.handler.ssl.SslHandler.wrapNonAppData(SslHandler.java:980)
        at io.netty.handler.ssl.SslHandler.handshake(SslHandler.java:2046)
        at io.netty.handler.ssl.SslHandler.startHandshakeProcessing(SslHandler.java:1966)
        at io.netty.handler.ssl.SslHandler.channelActive(SslHandler.java:2101)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:230)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:216)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:209)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelActive(DefaultChannelPipeline.java:1398)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:230)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:216)
        at io.netty.channel.DefaultChannelPipeline.fireChannelActive(DefaultChannelPipeline.java:895)
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:305)
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:335)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.OutOfMemoryError: Direct buffer memory. The direct out-of-memory error has occurred. This can mean two things: either job(s) require(s) a larger size of JVM direct memory or there is a direct memory leak. The direct memory can be allocated by user code or some of its dependencies. In this case 'taskmanager.memory.task.off-heap.size' configuration option should be increased. Flink framework and its dependencies also consume the direct memory, mostly for network communication. The most of network memory is managed by Flink and should not result in out-of-memory error. In certain special cases, in particular for jobs with high parallelism, the framework may require more direct memory which is not managed by Flink. In this case 'taskmanager.memory.framework.off-heap.size' configuration option should be increased. If the error persists then there is probably a direct memory leak in user code or some of its dependencies which has to be investigated and fixed. The task executor has to be shutdown...

The connection is build up with the following method:

private Mqtt5AsyncClient setupClient() {
    Mqtt5ClientBuilder mqttClientBuilder = Mqtt5Client.builder().identifier("beam-"+UUID.randomUUID().toString())
        .serverHost(properties.getServerHost()).serverPort(properties.getServerPort());
    if (properties.getUsername() != null && properties.getPassword() != null) {
      mqttClientBuilder = mqttClientBuilder.simpleAuth().username(properties.getUsername())
          .password(properties.getPassword().getBytes()).applySimpleAuth();
    } else if (properties.getUsername() != null || properties.getPassword() != null) {
      LoggerFactory.getLogger(getClass()).error("Both username and password must be provided!");
    }

    if (properties.isSslEnabled()) { // Add ssl config if ssl is enabled
      try {
        TrustManagerFactory tmf = TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm());
        KeyStore ks = KeyStore.getInstance(KeyStore.getDefaultType());
        ks.load(new ByteArrayInputStream(truststore), properties.getTrustStorePassword().toCharArray());

        tmf.init(ks);

        if (properties.skipHostnameVerification()) { // Disable host name verification if required
          mqttClientBuilder = mqttClientBuilder.sslConfig().trustManagerFactory(tmf)
              .hostnameVerifier(new NoopHostnameVerifier()).applySslConfig();
        } else {
          mqttClientBuilder = mqttClientBuilder.sslConfig().trustManagerFactory(tmf).applySslConfig();
        }
      } catch (NoSuchAlgorithmException | KeyStoreException | CertificateException | IOException e) {
        LoggerFactory.getLogger(getClass()).error("Error while setting up ssl", e);
      }
    }

    Mqtt5BlockingClient newClient = mqttClientBuilder.buildBlocking();
    newClient.connect();
    
    mqttClientCount.inc();
    return newClient.toAsync();
  }

But it runs on my machine When I start the application from my eclipse project all works fine. So the error only happen in the Flink Cluster which makes debugging a little bit difficult.

The only thing that has changed on the server was an update to openjdk-11.0.14. Updating the java on my pc to the same version doesn’t create the error at all.

So I’m running out of ideas what could cause the error.

So I checked the following things

  • Is the SSL Zertificate valid –> Yes
  • Is username and password correct –> Yes
  • Is there any logging on the mqtt side –> no

It seems the Java application fails before it could connect to the mqtt broker. Because there aren’t any login attempts on the mqtt broker.

We are using spring-boot-2.1.18 and hivemq-mqtt-client:1.2.2. The Broker is an vernemq.

Any suggestions are welcome.

Thanks in advance

Answer

So I got fixed it.

I don’t know where the problem exactly was.

I load a backup of my vm when all was working. Then I updated openjdk-11 to the latest version. After this the error appears again.

So it’s really an problem with the update to openjdk-11-jdk 11.0.14.

After deleting openjdk and make a clean reinstall all works fine again.

Maybe this will help someone.