Hi,
We have a 3 node arakoon cluster that was recently upgraded to version 1.6.10.
Remote collapsing is not working on it.
Output of dump-store of the head.db:
root@CCAUMAN1:~# arakoon --dump-store /opt/tlogs/Marketplace/head.db
i: Some ("99279767")
master: Some(Marketplace_2,0)
routing : --
interval: {(_,_),(_,_)}
I have all tlogs starting from 992.tlf:
root@CCAUMAN1:~# ls -ltrh /opt/tlogs/Marketplace/
total 3.7G
-rw-r--r-- 1 root root 7.4M 2014-05-13 03:20 992.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-13 12:31 993.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-13 21:37 994.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-14 06:48 995.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-14 15:54 996.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-15 01:04 997.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-15 10:11 998.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-15 19:17 999.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-16 04:27 1000.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-16 13:34 1001.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-16 22:40 1002.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-17 07:51 1003.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-17 16:56 1004.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-18 02:07 1005.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-18 11:17 1006.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-18 20:23 1007.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-19 05:31 1008.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-19 14:36 1009.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-19 23:46 1010.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-20 08:56 1011.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-20 18:02 1012.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-21 03:10 1013.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-21 12:20 1014.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-21 21:26 1015.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-22 06:36 1016.tlf
-rw-r--r-- 1 root root 1.8G 2014-05-22 12:33 head.db
-rw-r--r-- 1 root root 37M 2014-05-22 14:00 1017.tlog
Still, I get an error when trying a collapse-remote call:
root@CCAUMAN1:~# arakoon --collapse-remote marketplace 10.74.8.130 7081 3
arakoon: main: remote_collapsing_failed: Failure("unknown failure (EC: 255)")
This is the relevant info in the arakoon server log during the collapse-remote client call:
May 22 09:53:39 8335: (client_protocol|info): connection=10.74.8.130:client_service_23036 COLLAPSE_TLOGS: n=3
May 22 09:53:39 8335: (client_protocol|info): ... Start collapsing ... (n=3)
May 22 09:53:39 8336: (main|info): Starting collapse
May 22 09:53:39 8345: (main|info): Creating local store at /opt/tlogs/Marketplace/head.db
May 22 09:53:39 8345: (main|info): returning assuming no I "/opt/tlogs/Marketplace/head.db": Failure("success")
May 22 09:53:39 8346: (main|info): Going to collapse 1014 tlogs
May 22 09:53:39 8356: (main|info): copy_file /opt/tlogs/Marketplace/head.db /opt/tlogs/Marketplace/head.db.clone
May 22 09:53:43 0370: (main|info): 10.74.8.130:client_service:session=2 connection=10.74.8.130:client_service_23037 socket_address=ADDR_INET 10.74.8.132,42279 file_descriptor_inode=1159833830
May 22 09:53:43 0396: (main|info): exiting session (2) connection=10.74.8.130:client_service_23037: End_of_file
May 22 09:53:43 4503: (main|info): Marketplace_2 is master
May 22 09:53:50 0873: (main|info): done: copy_file
May 22 09:53:50 0875: (main|info): rename /opt/tlogs/Marketplace/head.db.clone.tmp -> /opt/tlogs/Marketplace/head.db.clone
May 22 09:53:57 4566: (main|info): Creating local store at /opt/tlogs/Marketplace/head.db.clone
May 22 09:53:57 4568: (client_protocol|error): Exception during client request (Failure("success")) => rc:ff msg:unknown failure
May 22 09:53:57 4570: (main|info): exiting session (1) connection=10.74.8.130:client_service_23036: Lwt_io.Channel_closed("output")
May 22 09:53:57 4570: (main|info): Exception on closing of socket (connection=10.74.8.130:client_service_23036): Unix.Unix_error(Unix.EBADF, "check_descriptor", "")
When checking the files in my tlog dir, I see that the copy of head.db to head.db.clone is done, but that the head.db is untouched.
Thx for looking into this.
Michael Van Wesenbeeck
Hi,
We have a 3 node arakoon cluster that was recently upgraded to version 1.6.10.
Remote collapsing is not working on it.
Output of dump-store of the
head.db:I have all tlogs starting from
992.tlf:Still, I get an error when trying a collapse-remote call:
This is the relevant info in the arakoon server log during the collapse-remote client call:
When checking the files in my tlog dir, I see that the copy of
head.dbtohead.db.cloneis done, but that thehead.dbis untouched.Thx for looking into this.
Michael Van Wesenbeeck