Yesterday a single node was unable to start due to errors in '''mem3_rep_manager''' after a few hours, all 6 nodes are unable to start.
Each node's run, right now, looks like this:
[Tue, 22 May 2012 19:04:07 GMT] [info] [<0.87.0>] [--------] Apache CouchDB has started on http://undefined:5986/
[Tue, 22 May 2012 19:04:08 GMT] [error] [emulator] [--------] Error in process <0.171.0> on node 'bigcouch@couchdb1' with exit value: {{badmatch,nil},[{fabric_view,remove_down_shards,2},{rexi_utils,process_mailbox,6},{fabric_view_changes,receive_results,5},{fabric_view_changes,send_changes,6},{fabric_view_changes,go,5}]}
[Tue, 22 May 2012 19:04:08 GMT] [error] [<0.164.0>] [--------] ** Generic server mem3_rep_manager terminating
** Last message in was {'EXIT',<0.171.0>,
{{badmatch,nil},
[{fabric_view,remove_down_shards,2},
{rexi_utils,process_mailbox,6},
{fabric_view_changes,receive_results,5},
{fabric_view_changes,send_changes,6},
{fabric_view_changes,go,5}]}}
** When Server state == {state,<0.165.0>,10,nil,[<0.171.0>]}
** Reason for termination ==
** {unexpected_msg,{'EXIT',<0.171.0>,
{{badmatch,nil},
[{fabric_view,remove_down_shards,2},
{rexi_utils,process_mailbox,6},
{fabric_view_changes,receive_results,5},
{fabric_view_changes,send_changes,6},
{fabric_view_changes,go,5}]}}}
The last http request was retrieving a view, then a stack trace just like above happened and the node went down. Since the load balancer would kick each request to a working node, that eventually downed the entire cluster.
Yesterday a single node was unable to start due to errors in '''mem3_rep_manager''' after a few hours, all 6 nodes are unable to start.
Each node's run, right now, looks like this:
The last http request was retrieving a view, then a stack trace just like above happened and the node went down. Since the load balancer would kick each request to a working node, that eventually downed the entire cluster.