-
Notifications
You must be signed in to change notification settings - Fork 0
QFJ HA
You have to consider the QFJ specific limitation during design you HA solution.
You should NOT setup two same FIX sessions (same senderCompID(49)/targetCompID(56)) in parallel. e.g. you are a client, to connect with a platform(e.g. a FX platform, or an Exchange). Normally, The server side will provide a listening address(+port). You will connect to that address with the specified compIDs with your local QFJ application. You should NOT connect to the server with another your local QFJ application. Maybe the server will provide secondary address for failover purpose. You have to check with server support team, when you can use the secondary address(e.g. anytime when primary connection is broken, or requires formal notification via email? from server side). e.g. you are a server, which accept connection requests from multi clients. You have to consider which QFJ server application(if you have multi active ones) should be used for a client Logon request. You have to use some 3rd party software(any name here???) to made such dispatchment.
| Initator or Acceptor | HA Model | Note |
|---|---|---|
| Initiator | Live/Standby | Only single instance could keep tcp connection with remote FIX acceptor. The standby instance should NOT try the tcp connection, until you find that the live one is broken, e.g. by Linux Heartbeat , Veritas Cluster |
| Acceptor | Live/Live/Live/... | NOT SUPPORTED, since it does not support to setup two live tcp connections on same FIX session. |
| Acceptor | Live/Standby | this is straight forward. Just setup two nodes, and failover if required, e.g. by Linux Heartbeat, Veritas Cluster |
| Acceptor | Live/Live/Live/... | You need a dispatcher for the tcp request from FIX initiaotr, e.g. F5(looks like expensive), others |
note: for any HA model, you have to consider whether it is required to share the FIX message among the instances. It is required to share, if sequence number is expected to continue on intra-day reonnection, to recovery the gap messages. See below section about whether you need the sharing.
Message sharing cross instances? (AKA continue the sequence number on intra-day reonnection, to recovery the gap messages?)
there is a session close window normally for FIX session, e.g. 10 minutes daily. When session start, the session will start with 1. For any re-connection during the non-session-close-window start, it means intra-day reconnection. In such case, you could continue the sequence number, or reset it, based on business requirement. More about the sequence number and FIX session: https://www.fixtrading.org/standards/fixt https://www.onixs.biz/fix-dictionary/FIXT1.1/section_session_protocol.html
- No, if message loss is ok, or there is application level guranrety delivery(via ack). You should always reset the sequence number(141=Y in the Logon) during intra-day reconnection.
- Yes, if you hope to recover any sequence GAP application message by FIX session transport.
note:what's FIX sequence GAP? see [ https://www.fixtrading.org/standards/fixt https://www.onixs.biz/fix-dictionary/FIXT1.1/section_session_protocol.html ] note: you should check with your FIX acceptor on it. note: session reset could simplify handshake. It is always recommended if possible.
- No, if application level gurantee delivery is defined. You should let your initiator to always reset the sequence number, otherwise the session maybe meet issue while Acceptor side failover or they reconnection.
- No, if it is ok for message loss. You should let your initiator to always reset the sequence number, otherwise the session maybe meet issue while Acceptor side failover or they reconnection.
- Yes, if you hope to recover any sequence GAP application message by FIX session transport.
note: why the session maybe has problem while client reconnection, without session reset? Because maybe they will connect to another Acceptor instance, where there is NO sequence information. Then the sequence number cannot continue. The session maybe goes to wrong state because of sequence number mismatch. note: you have to define the sequence number reset strategy based on business requirement.
Shared disk(e.g. NAS, SAN, DAS), or database(via JDBC) can be used. In additionary setup related storate(as described in below sections), you have to set RefreshMessageStoreAtLogon=Y. It will promise to refresh the state from the storage when a client logon. For this parameter, please see http://www.quickfixj.org/quickfixj/usermanual/1.6.4/usage/acceptor_failover.html
It introduces complexity, too. And looks like it is more expensive than a open source database server, e.g. mysql or postgres. During failover,
- stop the failure node,
- standby node mount the disk
- standy node start, as live
Here is the QFJ configuration path FileStorePath=./fileStore/FirstQFJServer
follow the [QFJ Storage - JDBC as message store] to setup JDBC store factory in source code, and related database.
-
Linux Heartbeat: http://www.linux-ha.org/wiki/Heartbeat "Heartbeat is a daemon that provides cluster infrastructure (communication and membership) services to its clients. This allows clients to know about the presence (or disappearance!) of peer processes on other machines and to easily exchange messages with them." It can be used for both Initiator and Acceptor. But looks like many configuration is required on it. You need a linux operation expert on it.
-
for FIX Acceptor, F5 or other similar things to distribute the TCP connection request
-
develop your own HA solution,
- refer how Hadoop Namenode HA via ZooKeeper - https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_hadoop-ha/content/ch_HA-NameNode.html