I'm running an Express.js application using Socket.io for a chat webapp and I get the following error randomly around 5 times during 24h. The node process is wrapped in forever and it restarts itself immediately.
The problem is that restarting Express kicks my users out of their rooms and nobody wants that.
The web server is proxied by HAProxy. There are no socket stability issues, just using websockets and flashsockets transports. I cannot reproduce this on purpose.
This is the error with Node v0.10.11
:
events.js:72
throw er; // Unhandled 'error' event
^
Error: read ECONNRESET //alternatively it s a 'write'
at errnoException (net.js:900:11)
at TCP.onread (net.js:555:19)
error: Forever detected script exited with code: 8
error: Forever restarting script for 2 time
EDIT (2013-07-22)
Added both socket.io client error handler and the uncaught exception handler. Seems that this one catches the error:
process.on('uncaughtException', function (err) {
console.error(err.stack);
console.log("Node NOT Exiting...");
});
So I suspect it's not a Socket.io issue but an HTTP request to another server that I do or a MySQL/Redis connection. The problem is that the error stack doesn't help me identify my code issue. Here is the log output:
Error: read ECONNRESET
at errnoException (net.js:900:11)
at TCP.onread (net.js:555:19)
How do I know what causes this? How do I get more out of the error?
Ok, not very verbose but here's the stacktrace with Longjohn:
Exception caught: Error ECONNRESET
{ [Error: read ECONNRESET]
code: 'ECONNRESET',
errno: 'ECONNRESET',
syscall: 'read',
__cached_trace__:
[ { receiver: [Object],
fun: [Function: errnoException],
pos: 22930 },
{ receiver: [Object], fun: [Function: onread], pos: 14545 },
{},
{ receiver: [Object],
fun: [Function: fireErrorCallbacks],
pos: 11672 },
{ receiver: [Object], fun: [Function], pos: 12329 },
{ receiver: [Object], fun: [Function: onread], pos: 14536 } ],
__previous__:
{ [Error]
id: 1061835,
location: 'fireErrorCallbacks (net.js:439)',
__location__: 'process.nextTick',
__previous__: null,
__trace_count__: 1,
__cached_trace__: [ [Object], [Object], [Object] ] } }
Here I serve the flash socket policy file:
net = require("net")
net.createServer( (socket) =>
socket.write("<?xml version=\"1.0\"?>\n")
socket.write("<!DOCTYPE cross-domain-policy SYSTEM \"http://www.macromedia.com/xml/dtds/cross-domain-policy.dtd\">\n")
socket.write("<cross-domain-policy>\n")
socket.write("<allow-access-from domain=\"*\" to-ports=\"*\"/>\n")
socket.write("</cross-domain-policy>\n")
socket.end()
).listen(843)
Can this be the cause?
ベストアンサー1
You might have guessed it already: it's a connection error.
"ECONNRESET" means the other side of the TCP conversation abruptly closed its end of the connection. This is most probably due to one or more application protocol errors. You could look at the API server logs to see if it complains about something.
But since you are also looking for a way to check the error and potentially debug the problem, you should take a look at "How to debug a socket hang up error in NodeJS?" which was posted at stackoverflow in relation to an alike question.
Quick and dirty solution for development:
Use longjohn, you get long stack traces that will contain the async operations.
Clean and correct solution: Technically, in node, whenever you emit an 'error'
event and no one listens to it, it will throw. To make it not throw, put a listener on it and handle it yourself. That way you can log the error with more information.
To have one listener for a group of calls you can use domains and also catch other errors on runtime. Make sure each async operation related to http(Server/Client) is in different domain context comparing to the other parts of the code, the domain will automatically listen to the
error
events and will propagate it to its own handler. So you only listen to that handler and get the error data. You also get more information for free.
EDIT (2013-07-22)
As I wrote above:
"ECONNRESET" means the other side of the TCP conversation abruptly closed its end of the connection. This is most probably due to one or more application protocol errors. You could look at the API server logs to see if it complains about something.
What could also be the case: at random times, the other side is overloaded and simply kills the connection as a result. If that's the case, depends on what you're connecting to exactly…
しかし、確かなことが 1 つあります。TCP 接続で読み取りエラーが発生し、例外が発生しているということです。編集で投稿したエラー コードを見ると、それが確認できます。