Page 1 of 4 123 ... LastLast
Results 1 to 10 of 39
  1. #1
    Power Member
    Join Date
    Nov 2012
    Posts
    182

    LS server running out of memory

    We have a customer with a production LS server that is running out of memory very quickly (see logs below). From what I can see as soon as it starts the number of concurrent connections reported by the monitor starts to climb at an alarming rate. It gets to over 14000 in less than 60 seconds, and continues until it dies.

    It points to some sort of DOS attack or one or more clients going to some sort of connection loop.

    Is there anything in the logs that can help me understand what is happening, and I assume there is some way (via configuration) that I can prevent the server of accepting rogue (or too many) connections before it dies.



    Code:
    JVMDUMP039I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError" at 2015/02/22 13:04:15 - please wait. 
    JVMDUMP039I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError" at 2015/02/22 13:04:15 - please wait. 
    JVMDUMP032I JVM requested System dump using '/RNS/lightstreamer/bin/ibmi/core.20150222.130415.10060.0001.dmp' in response to an ev 
    JVMDUMP010I System dump written to /RNS/lightstreamer/bin/ibmi/core.20150222.130415.10060.0001.dmp 
    JVMDUMP032I JVM requested Heap dump using '/RNS/lightstreamer/bin/ibmi/heapdump.20150222.130415.10060.0002.phd' in response to an 
    JVMDUMP010I Heap dump written to /RNS/lightstreamer/bin/ibmi/heapdump.20150222.130415.10060.0002.phd 
    JVMDUMP032I JVM requested Heap dump using '/RNS/lightstreamer/bin/ibmi/heapdump.20150222.130415.10060.0003.phd' in response to an 
    JVMDUMP010I Heap dump written to /RNS/lightstreamer/bin/ibmi/heapdump.20150222.130415.10060.0003.phd 
    JVMDUMP032I JVM requested Java dump using '/RNS/lightstreamer/bin/ibmi/javacore.20150222.130415.10060.0004.txt' in response to an 
    JVMDUMP010I Java dump written to /RNS/lightstreamer/bin/ibmi/javacore.20150222.130415.10060.0004.txt 
    JVMDUMP032I JVM requested Snap dump using '/RNS/lightstreamer/bin/ibmi/Snap.20150222.130415.10060.0006.trc' in response to an even 
    JVMDUMP010I Snap dump written to /RNS/lightstreamer/bin/ibmi/Snap.20150222.130415.10060.0006.trc 
    JVMDUMP013I Processed dump event "systhrow", detail "java/lang/OutOfMemoryError". 
    JVMDUMP032I JVM requested Java dump using '/RNS/lightstreamer/bin/ibmi/javacore.20150222.130415.10060.0005.txt' in response to an 
    JVMDUMP010I Java dump written to /RNS/lightstreamer/bin/ibmi/javacore.20150222.130415.10060.0005.txt 
    JVMDUMP032I JVM requested Snap dump using '/RNS/lightstreamer/bin/ibmi/Snap.20150222.130415.10060.0007.trc' in response to an even 
    JVMDUMP010I Snap dump written to /RNS/lightstreamer/bin/ibmi/Snap.20150222.130415.10060.0007.trc

  2. #2
    Power Member
    Join Date
    Nov 2012
    Posts
    182
    Keeping an eye on it now to see what is happening. The total number of sessions (bearing in mind it is 20:11 on a Sunday) keeps switching between 11 and 12 - so pretty trivial. The number of concurrent connections is floating between 1800 and 2000+ (maximum 3892) so that is not going up at an alarming rate at the moment. I was wondering if this number of connections is normal for so few sessions though? Why is the number of connections so much higher than the number of sessions? I obviously don't know the difference.

    There are a lot of warnings
    21:06:03 Current delay for tasks scheduled on thread pool TLS/SSL HANDSHAKE is 18 seconds. Timer​-1
    21:06:03 If the delay issue persists, taking a JVM full thread dump may be the best way to investigate the issue. Timer​-1
    21:06:05 Current delay for tasks scheduled on thread pool TLS/SSL HANDSHAKE is 19 seconds. Timer​-1
    21:06:05 If the delay issue persists, taking a JVM full thread dump may be the best way to investigate the issue. Timer​-1
    21:06:07 Current delay for tasks scheduled on thread pool TLS/SSL HANDSHAKE is 19 seconds. Timer​-1
    21:06:07 If the delay issue persists, taking a JVM full thread dump may be the best way to investigate the issue. Timer​-1
    21:06:09 Current delay for tasks scheduled on thread pool TLS/SSL HANDSHAKE is 19 seconds. Timer​-1

  3. #3
    Power Member
    Join Date
    Nov 2012
    Posts
    182
    The other thing I notice is that there are a very large number of errors being generated - all like this (coming in by the second):


    21:09:31 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:5133​. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:31 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:7323​. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:31 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:2860​3. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:31 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:6188​0. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:31 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:5673​2. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:32 Handshake error on Lightstreamer HTTPS Server: Broken pipe on 217.118.110.123:3939​7. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 1
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:3899​2. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 2
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:2667​2. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 2
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:3284​4. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 2
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:2105​6. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:6151​1. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:5998​8. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:4500​. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:4600​9. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:1499​5. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:3461​2. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:6021​9. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:5044​7. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:6029​9. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:2954​0. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:6139​8. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:1688​1. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:2330​4. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:1807​1. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:1047​1. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:2349​9. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:5252​9. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:2853​2. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:5540​4. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:1633​3. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:5273​3. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:2619​1. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:1196​2. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4
    21:09:41 Handshake error on Lightstreamer HTTPS Server: task timed out on 217.118.110.123:4072​4. NIO TLS/S​SL HANDS​HAKE SELEC​TOR 4



  4. #4
    Power Member
    Join Date
    Nov 2012
    Posts
    182
    The number of connections compared to the number of sessions is concerning me though. At a different customer where the current number of connections and the number of sessions are roughly the same.

  5. #5
    Administrator
    Join Date
    Feb 2012
    Location
    Milano
    Posts
    716
    Hi Kevin,

    Generally, the number of connections is greater than or equal to the number of the sessions, in fact, for each session we have a permanent connection dedicated to streaming, and a control connection, temporary, used to send control commands that manage the contents of the stream channel.
    In normal situations you should expect a number of connections equal to or slightly higher than the number of sessions. However, in some cases, connections may become much more numerous, for example in phases in which many clients are trying to connect, or if many clients are connected in polling mode.

    In particular, in case of a session in polling mode, that is not leveraging the reuse of connections, and frequent updates the number of open connections to the server can significantly grow. However numbers like yours are quite unusual.
    50 connections with only a client using the Monitor is something unexpected. In order to investigate further the issue it would be necessary set these logger levels:



    since, with hundreds of connections per second the size of the log file can grow quickly you can keep these settings for a few seconds (up to a minute) and then collect the log.
    With these levels we should have some more information on connections and why they can not create a session.

    In order to prevent the "out of memory" and also to saturate the handshake threads pool you can use this parameter:

  6. #6
    Power Member
    Join Date
    Nov 2012
    Posts
    182
    OK thanks. My main problem (other than the cause) is that in order to get more information I need them to start the server again. The last time they started it, their firewall became saturated and prevented other essential network activity from taking place. I need to discuss with them the best approach.

    Question: If I artificially set the maximum sessions to 1 (for example) would that explain the fact that the number of connections exceeds the number of sessions? Would the excess connections just be other clients being unable to create a session?

    Is there a way to restrict the number of connections (as opposed to sessions)?
    Is there a way to restrict the connections to certain IP ranges? This is exposed on the internet, but we may need to restrict it to known clients to get closer to the truth.

  7. #7
    Administrator
    Join Date
    Feb 2012
    Location
    Milano
    Posts
    716
    Hi Kevin,

    Please note that the log settings doesn't need a restart of the server.

    Yes, if you set 1 to max_sessions connection attempts from other clients can lead to the increase of the number of connections.

    About to restrict the connections to certain IP ranges, the Lightstreamer server provides this:

    http://www.lightstreamer.com/docs/se...a.lang.String)

    but please note that this is an option of JMX feature and I'm not sure it suits your needs.

    Anyway, please could you confirm the exact version of Lighstreamer server and JavaScript client library in use?

    Thank you,
    Giuseppe

  8. #8
    Power Member
    Join Date
    Nov 2012
    Posts
    182
    Noted about the log settings. They have the server switched off at the moment anyway because their current understanding is that with it switched on their firewall becomes rapidly overloaded.

    So, with maximum sessions set low, a client can make a physical connection to the server but cannot actually do anything? I am just wondering how "maximum sessions" can provide heuristic server protection if it still has to manage all the connections that occur?

    If the monitor reports over 14000 connections to me, what does that mean exactly? Does it mean that my client code is somehow creating more and more connections (via the Lighstreamer.js) even though there is already a connection established? I have to consider the fact that I may have a bug in the way I am using the client code.

    I think you are right about the IP restrictions. A JMX solution is not what I am after - I guess they will have to do the IP restrictions on their firewall.

    This customer is currently on server version 5.1.2 and client version 6.1.4 build 1640.14

    They are shortly to go to version 6 in a couple of weeks.

  9. #9
    Administrator
    Join Date
    Feb 2012
    Location
    Milano
    Posts
    716
    Hi Kevin,

    Please note that a new connection does not always mean new client. Sometimes, especially for no webscoket case, clients already connected may require a new connection (for a rebind, or a polling request or a new subscription).
    Thus limiting the number of connections can affect the proper functioning of sessions also already established.

    The parameters that protect the server from your critical situation are:

    and in your specific case it seems that the second might work.
    However, for a typical DoS attack scenario, we expect that the firewall should actually apply the necessary countermeasures.


    It seems strange to me that it might be a bug in the client side, since the huge difference between the number of sessions and connections. Also because the browser typically limit by itself the number of connections to the same URL.

    Please, if the server log of the Saturday episode is available send us it at support@lightstreamer.com. We will check it in search of clues.

  10. #10
    Power Member
    Join Date
    Nov 2012
    Posts
    182
    I will see if I can get the log from yesterday as it also occurred in a small window while i checked it.

    Today the situation is thus:
    All access blocked on the firewall except for our staff
    Server running with

    • <handshake_pool_max_queue>100</handshake_pool_max_queue>

    also logging has been changed

    • <logger name="LightstreamerLogger.requests" level="INFO"/>
    • <logger name="LightstreamerLogger.connections" level="DEBUG"/>


    Currently it all looks completely normal.

 

 

Similar Threads

  1. "Out of memory" errors in Safari
    By kpturner in forum Client SDKs
    Replies: 25
    Last Post: April 8th, 2014, 10:36 AM
  2. ARC memory management
    By ebrahim in forum Client SDKs
    Replies: 3
    Last Post: May 28th, 2013, 10:46 AM
  3. Replies: 3
    Last Post: July 29th, 2011, 09:56 AM
  4. Memory leak on Firefox only
    By riwang in forum Client SDKs
    Replies: 10
    Last Post: July 27th, 2009, 12:13 PM
  5. Memory Leaks
    By igindex in forum Client SDKs
    Replies: 1
    Last Post: November 11th, 2006, 10:45 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
All times are GMT +1. The time now is 08:27 PM.