How to optimize wazuh-indexer memory usage and SSL/TLS configuration after updating Wazuh server

Question:

How to fix wazuh-indexer errors after updating Wazuh server? >
> I updated Wazuh server last week and since then I have been getting errors when I try to run the wazuh-indexer service. The errors are related to high heap usage and G1GC triggering. I have already set the heap size to 8G in the jvm.options file and followed the tuning guide from the documentation, but the problem persists. I also noticed an SSL handshake exception that might be caused by a JDK update and TLS versions mismatch. I solved the issue by moving the backup directory out of /etc/wazuh-indexer, but I don’t understand why that worked. Can anyone explain the root cause and how to prevent this from happening again?

Answer:

How to fix wazuh-indexer errors after updating Wazuh server?

Wazuh is a security platform that provides threat detection, compliance monitoring, and incident response capabilities. Wazuh-indexer is a component of Wazuh that stores and indexes the data collected by Wazuh agents and managers. Wazuh-indexer is based on OpenSearch, an open source fork of Elasticsearch.

Recently, some users have reported errors when running the wazuh-indexer service after updating the Wazuh server. The errors are related to high heap usage and garbage collection (G1GC) triggering. The users have also noticed an SSL handshake exception that might be caused by a JDK update and TLS versions mismatch. In this article, we will explain the root cause of these errors and how to fix them.

The first error that the users encountered was the following:

“` [2024-02-05T19:34:36,844][INFO ][o.o.i.b.HierarchyCircuitBreakerService] [node-1] attempting to trigger G1GC due to high heap usage [1044762216] [2024-02-05T19:34:36,858][INFO ][o.o.i.b.HierarchyCircuitBreakerService] [node-1] GC did bring memory usage down, before [1044762216], after [1022933632], allocations [1], duration [13] “`

This error indicates that the wazuh-indexer service was using too much heap memory and had to trigger a garbage collection (GC) process to free up some space. GC is a mechanism that automatically reclaims memory from objects that are no longer in use by the application. However, GC can also cause performance issues, as it pauses the application to perform the memory cleanup.

The users tried to solve this error by increasing the heap size to 8G in the jvm.options file, which is the configuration file for the Java Virtual Machine (JVM) that runs the wazuh-indexer service. However, this did not solve the problem, as the service still consumed the maximum heap size and triggered GC frequently.

The root cause of this error is that the wazuh-indexer service was receiving more data than it could handle. The data was coming from the Wazuh agents and managers, which collect and send various types of security events, such as file integrity monitoring, vulnerability detection, log analysis, and more. The wazuh-indexer service had to index and store all this data in its database, which required a lot of memory.

The solution to this error is to scale up or scale out the wazuh-indexer service. Scaling up means increasing the resources of the existing node, such as CPU, RAM, disk, and network. Scaling out means adding more nodes to the cluster, which can distribute the workload and increase the availability and fault tolerance. Both scaling methods can improve the performance and stability of the wazuh-indexer service.

To scale up the wazuh-indexer service, the users can follow these steps:

  • Edit the jvm.options file and increase the heap size to a value that is suitable for the node’s RAM. The recommended heap size is 50% of the available RAM, but not more than 32G. For example, if the node has 16G of RAM, the heap size can be set to 8G. If the node has 64G of RAM, the heap size can be set to 32G.
  • Edit the wazuh-indexer.yml file and increase the number of shards and replicas for the indices. Shards are the basic units of data distribution and parallelism in OpenSearch. Replicas are copies of shards that provide redundancy and high availability. Increasing the number of shards and replicas can improve the performance and reliability of the data storage and retrieval. The recommended number of shards and replicas depends on the size and type of the data, but a general rule of thumb is to have one shard per GB of data and one replica per shard.
  • Restart the wazuh-indexer service and monitor the heap usage and GC frequency. If the service still consumes too much memory and triggers GC often, consider scaling out the service.
  • To scale out the wazuh-indexer service, the users can follow these steps:

  • Add more nodes to the wazuh-indexer cluster. The nodes can be physical or virtual machines, as long as they have enough resources to run the service. The recommended minimum resources for a node are 4 CPU cores, 8G of RAM, and 50G of disk space.
  • Configure the new nodes to join the cluster. This can be done by editing the wazuh-indexer.yml file and setting the cluster.name and discovery.seed_hosts parameters. The cluster.name parameter defines the name of the cluster, which must be the same for all the nodes. The discovery.seed_hosts parameter defines the list of initial nodes that the new node will contact to join the cluster. For example, if the cluster name is wazuh-cluster and the initial nodes are node-1 and node-2, the wazuh-indexer.yml file for the new node can look like this:
  • “`

    cluster.name: wazuh-cluster

    discovery.seed_hosts: [“node-1”, “node-2”]

    “`

  • Restart the wazuh-indexer service and verify that the new node has joined the cluster. This can be done by using the OpenSearch API or the OpenSearch Dashboards interface. The API can be accessed by sending HTTP requests to the wazuh-indexer service on port 9200. The OpenSearch Dashboards interface can be accessed by opening a web browser and navigating to the wazuh-indexer service on port 5601. The cluster status and health can be checked by using the _cat/health endpoint or the Cluster Overview dashboard.
  • Rebalance the cluster to distribute the data and workload across the nodes. This can be done by using the OpenSearch API or the OpenSearch Dashboards interface. The API can be accessed by sending HTTP POST requests to the _cluster/reroute endpoint with the explain parameter set to true. The OpenSearch Dashboards interface can be accessed by opening the Dev Tools console and executing the same request. The cluster rebalance can take some time, depending on the size and number of the indices. The progress and result of the rebalance can be monitored by using the _cat/recovery endpoint or the Index Management dashboard.
  • SSL handshake exception

    The second error that the users encountered was the following:

    “`

    Exception during establishing a SSL connection:

    javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)

    javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)

    at sun.security.ssl.Alert.createSSLException(Alert.java:131) ~[?:?]

    at sun.security.ssl.TransportContext.fatal(TransportContext.java:378) ~[?:?]

    at sun.security.ssl.TransportContext.fatal(TransportContext.java:321) ~[?:?]

    at sun.security.ssl.TransportContext.fatal(TransportContext.java:316) ~[?:?]

    at sun.security.ssl.SSLTransport.decode(SSLTransport.java:134) ~[?:?]

    at sun.security.ssl.SSLEngineImpl.decode(SSLEngineImpl.java:736) ~[?:?]

    at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:691) ~[?:?]

    at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:506) ~[?:?]

    at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:482) ~[?:?]

    at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:679) ~[?:?]

    “`

    This error indicates that the wazuh-indexer service failed to establish a secure connection with another node or client using the SSL/TLS protocol. SSL/TLS is a protocol that provides encryption, authentication, and integrity for the communication between two parties. SSL/TLS uses a handshake process to negotiate the parameters and keys for the encryption and authentication.

    The root cause of this error is that the wazuh-indexer service and the other party had incompatible SSL/TLS versions or cipher suites. A cipher suite is a combination of algorithms that define how the encryption and authentication are performed. A cipher suite consists of four components: a key exchange algorithm, a bulk encryption algorithm, a message authentication code (MAC) algorithm, and a pseudo-random function (PRF) algorithm. For example, a cipher suite can look like this:

    “`

    TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384

    “`

    This cipher suite means that the SSL/TLS version is TLS, the key exchange algorithm is ECDHE (Elliptic Curve Diffie-Hellman Ephemeral), the bulk encryption algorithm is AES (Advanced Encryption Standard) with 256-bit key and GCM (Galois/Counter Mode) mode, the MAC algorithm is SHA384 (Secure Hash Algorithm with 384-bit output), and the PRF algorithm is the same as the MAC algorithm.

    The

error message shows that the cipher suite used by the wazuh-indexer service and the other party was an AEAD (Authenticated Encryption with Associated Data) cipher suite. AEAD cipher suites are a type of cipher suites that combine the encryption and authentication into one operation, using a single algorithm. AEAD cipher suites are more efficient and secure than the traditional cipher suites that use separate algorithms for encryption and authentication. However, AEAD cipher suites also have some limitations, such as the

Leave a Reply

Your email address will not be published. Required fields are marked *

Privacy Terms Contacts About Us