Jump to content
  • 0

Solving fatal errors in AntMedia server (v2.4.3) running in Azure Kubernetes


Dalibor
 Share

Question

Hi everyone,

We've lately been dealing with the occurrence of a strange "fatal errors detected by the Java Runtime Environment" in our Azure Kubernetes AntMedia server (v2.4.3) implementation. This results in a Kubernetes pod restarting, which disrupts the usability of our streaming app which uses AntMedia in the background (basically publishing/playing stops until new pod is initiated and running).

All the logs from AntMedia server are available here:

AntMedia-2.4.3-fatal-error-logs.txt

The fatal error occurs at: 8:56:11, 8:57:01, 9:01:31, 9:03:46.

Does anyone know why this happens? Is there any way of stopping this from happening?

Thanks in advance!

Link to comment
Share on other sites

9 answers to this question

Recommended Posts

  • 0

Hi @Burak,

Thanks for checking the logs.

Unfortunately, we can't provide the mentioned /usr/local/antmedia/hs_err_pid1.log file since every time this fatal error occurs, our K8 pod automatically restarts, and we lose all the temp files in there, such as this file. We also tried setting up the persistent storage on K8 with help of your colleagues before, but we were unable to do it as there were some difficulties setting that up that we haven't solved on the end.

Is there any other way we tackle this issue? Perhaps you could try to reproduce it on your end with the same steps as there are in the sent logs? Also, I've seen there is a new version of AntMedia server (2.5.1) available since last week, does that perhaps solve this issue?

If there is anything we can do to help you diagnose and solve this issue, please do tell. We're facing many issues in production due to this issue constantly happening, so any progress would be helpful.

Link to comment
Share on other sites

  • 0
13 hours ago, Dalibor said:

Hi @Burak,

Thanks for checking the logs.

Unfortunately, we can't provide the mentioned /usr/local/antmedia/hs_err_pid1.log file since every time this fatal error occurs, our K8 pod automatically restarts, and we lose all the temp files in there, such as this file. We also tried setting up the persistent storage on K8 with help of your colleagues before, but we were unable to do it as there were some difficulties setting that up that we haven't solved on the end.

Is there any other way we tackle this issue? Perhaps you could try to reproduce it on your end with the same steps as there are in the sent logs? Also, I've seen there is a new version of AntMedia server (2.5.1) available since last week, does that perhaps solve this issue?

If there is anything we can do to help you diagnose and solve this issue, please do tell. We're facing many issues in production due to this issue constantly happening, so any progress would be helpful.

It is possible to create a small program in something like python, which monitors the system process of ant media server periodically. When you see the status changing to failed or deactivating or inactive you move the log file and crash file to a separate location so that the restart does not overwrite it.

Link to comment
Share on other sites

  • 0

For normal (antmedia and error) logs, one can create a monitoring tool to collect them as told here. For the crash log, it is generated when server crashes. So it may be late to get it as you told. I am not sure if the solution you suggested works because still there may be a timing issue. Pod may already closes before your script polls. 

Mr. @Murat Eminoglu what do you think, can we get hs crash report file before pod closes in case of a server crash?

Link to comment
Share on other sites

  • 0

Hi all!

Just to confirm that we've received the snapshot version of AntMedia server (v2.6.0-snapshot) with the fix for this issue, and it seems it has been solved (as we haven't reproduced it since).

  • Like 2
Link to comment
Share on other sites

 Share

×
×
  • Create New...