Skip to content

arakoon --remote-collapse failing on arakoon version 1.6.10 #428

@michael-arion

Description

@michael-arion

Hi,

We have a 3 node arakoon cluster that was recently upgraded to version 1.6.10.
Remote collapsing is not working on it.

Output of dump-store of the head.db:

root@CCAUMAN1:~# arakoon --dump-store /opt/tlogs/Marketplace/head.db
i: Some ("99279767")
master: Some(Marketplace_2,0)
routing : --
interval: {(_,_),(_,_)}

I have all tlogs starting from 992.tlf:

root@CCAUMAN1:~# ls -ltrh /opt/tlogs/Marketplace/
total 3.7G
-rw-r--r-- 1 root root 7.4M 2014-05-13 03:20 992.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-13 12:31 993.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-13 21:37 994.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-14 06:48 995.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-14 15:54 996.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-15 01:04 997.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-15 10:11 998.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-15 19:17 999.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-16 04:27 1000.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-16 13:34 1001.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-16 22:40 1002.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-17 07:51 1003.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-17 16:56 1004.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-18 02:07 1005.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-18 11:17 1006.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-18 20:23 1007.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-19 05:31 1008.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-19 14:36 1009.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-19 23:46 1010.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-20 08:56 1011.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-20 18:02 1012.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-21 03:10 1013.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-21 12:20 1014.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-21 21:26 1015.tlf
-rw-r--r-- 1 root root 7.4M 2014-05-22 06:36 1016.tlf
-rw-r--r-- 1 root root 1.8G 2014-05-22 12:33 head.db
-rw-r--r-- 1 root root  37M 2014-05-22 14:00 1017.tlog

Still, I get an error when trying a collapse-remote call:

root@CCAUMAN1:~# arakoon --collapse-remote marketplace 10.74.8.130 7081 3
arakoon: main: remote_collapsing_failed: Failure("unknown failure (EC: 255)")

This is the relevant info in the arakoon server log during the collapse-remote client call:

May 22 09:53:39 8335: (client_protocol|info): connection=10.74.8.130:client_service_23036 COLLAPSE_TLOGS: n=3
May 22 09:53:39 8335: (client_protocol|info): ... Start collapsing ... (n=3)
May 22 09:53:39 8336: (main|info): Starting collapse
May 22 09:53:39 8345: (main|info): Creating local store at /opt/tlogs/Marketplace/head.db
May 22 09:53:39 8345: (main|info): returning assuming no I "/opt/tlogs/Marketplace/head.db": Failure("success")
May 22 09:53:39 8346: (main|info): Going to collapse 1014 tlogs
May 22 09:53:39 8356: (main|info): copy_file /opt/tlogs/Marketplace/head.db /opt/tlogs/Marketplace/head.db.clone
May 22 09:53:43 0370: (main|info): 10.74.8.130:client_service:session=2 connection=10.74.8.130:client_service_23037 socket_address=ADDR_INET 10.74.8.132,42279 file_descriptor_inode=1159833830
May 22 09:53:43 0396: (main|info): exiting session (2) connection=10.74.8.130:client_service_23037: End_of_file
May 22 09:53:43 4503: (main|info): Marketplace_2 is master
May 22 09:53:50 0873: (main|info): done: copy_file
May 22 09:53:50 0875: (main|info): rename /opt/tlogs/Marketplace/head.db.clone.tmp -> /opt/tlogs/Marketplace/head.db.clone
May 22 09:53:57 4566: (main|info): Creating local store at /opt/tlogs/Marketplace/head.db.clone
May 22 09:53:57 4568: (client_protocol|error): Exception during client request (Failure("success")) => rc:ff msg:unknown failure
May 22 09:53:57 4570: (main|info): exiting session (1) connection=10.74.8.130:client_service_23036: Lwt_io.Channel_closed("output")
May 22 09:53:57 4570: (main|info): Exception on closing of socket (connection=10.74.8.130:client_service_23036): Unix.Unix_error(Unix.EBADF, "check_descriptor", "")

When checking the files in my tlog dir, I see that the copy of head.db to head.db.clone is done, but that the head.db is untouched.

Thx for looking into this.
Michael Van Wesenbeeck

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions