APP29 Recovery (APP01, APP02, APP03)

               

Purpose:

 

The APP29 system is comprised of three separate application service types: APP01, APP02 and APP03.

Together, these three service types receive and process MESSAGING, MESSAGING and MESSAGING messages from the APP50. 

The matching engine produces one message for all three of these service types to process, each taking their part from this message.

The data path for these messages is as follows:

1) ME to APP01

2) APP01 to APP02

3) APP02 to APP03

 

                Because these service types work together to process all outbound market data, their operations and recoveries must be considered together.

 

APP01 Services include the following services:

APP10 Processes read combined data from APP50 and send MESSAGING data to SITE 1 and CMA, as well as MESSAGING/MESSAGING data to APP02.

APP12 Processes read combined data from APP50 and send MESSAGING data to SITE 2 and CMA, as well as MESSAGING/MESSAGING data to APP02.

 

APP02 Services include the following services:

APP11 Processes read combined data from APP01 and MESSAGING data from APP20 and Brokerplex, and send MESSAGING data to SITE 1 and RTC.

APP13 Processes read combined data from APP01 and MESSAGING data from APP20 and Brokerplex, and send MESSAGING data to SITE 2 and RTC.

Both processes send MESSAGING data to APP03.

 

APP03 Services:

APP03 Services read MESSAGING data from APP02 and send MESSAGING data to MESSAGING Subscribers via Multicast.

 

Use the following hyperlinks to jump to the desired section of APP29 documentation:

APP29_Recovery_Considerations

 

APP29_NTM_Control_Commands

 

APP29_APP01_Troubleshooting_Table

APP29_APP02_Troubleshooting_Table

APP29_APP03_Troubleshooting_Table

 

APP29_APP01_APP02_Monitoring_Considerations

APP29_APP03_Monitoring_Considerations

 


 

APP29 Recovery Considerations:

Stopping/Restart Processes:

 

-          Use NTM Control Utility - Service Control - Process Controller to stop/restart processes.

-          Use APP29 nodes only when moving between nodes.  (APP29 NAT addresses must be configured/used by SIPs.)

 

Because APP01, APP02 and APP03 work together to process all outbound market data, their operational impacts and recoveries must be considered together.

                               

Go to APP29_Combined_APP01_APP02_APP03_Move_Procedure for procedure if moving these systems to other nodes.

 

 

-          When stopping/restarting APP01 instances:

1)      There should be no special dependencies or considerations.  A simple restart should suffice.

2)      Upon reconnection to SITE 1 or SITE 2, APP01 will request any queued combined APP29 data messages from the UME data store and send delayed MESSAGINGs to the database loaders only (not the SIP), and then request the most recent stock MESSAGINGs from all connected ME instances and send these to the SIPs.

 

-          When stopping/restarting APP02 instances:

1)      For APP13 (SITE 2) processes, it is imperative that the database is up to date before restart.

a)       If the APP13 involved is restarted without the database up to date with all most recent transactions,

stock specific sequence numbers being sent to SITE 2 may be off and MESSAGING messages may not be processed as expected.

2)      For APP11 (SITE 1), there should be no special dependencies or considerations.  A simple restart should suffice.

3)      Upon reconnection to SITE 1 or NADAQ, APP02 will request any queued combined APP29 data messages from the UME data store and send queued MESSAGINGs to the SIP marked “sold” as well as to clearing, and then forward the MESSAGING portion of the messages to APP03 services.

 

-          When stopping/restarting APP03 instances:

1)      APP03 files must be moved before the APP03 restarts on the new node.

a)       If these files are not moved, two general impacts will be seen:

·         MESSAGING retrans requests made after recovery may not be able to find messages requested.

·         MESSAGING Sequence Number Resets will likely be seen by the MESSAGING subscribers and may compromise their resulting functionality.

2)      Upon startup, APP03 services will request any queued MESSAGING messages from the UME data store and resend these (emulating a MESSAGING retransmission) before sending any subsequent messages from the APP29 data path.

 

 


 

APP29 Combined APP01, APP02, APP03 Move Procedure

 

-          By design, there are four APP29 servers between both data centers:

·         One DC01 server supporting DC01 SITE 1 MESSAGINGd stocks, sending APP01 and APP02 to SITE 1 and handling the same APP03 data.

·         One DC01 server supporting DC01 SITE 2 MESSAGINGd stocks, sending APP01 and APP02 to SITE 2 and handling the same APP03 data.

·         One DC02 server supporting DC02 SITE 1 MESSAGINGd stocks, sending APP01 and APP02 to SITE 1 and handling the same APP03 data.

·         One DC02 server supporting DC02 SITE 2 MESSAGINGd stocks, sending APP01 and APP02 to SITE 2 and handling the same APP03 data.

 

-          If these services should move between servers at any time, all services should move together using the following procedure:

 

1)      Stop APP01 on the current node

2)      Stop APP02 on the current node

3)      Stop APP03 on the current node

4)      If APP60 is also moving, then stop APP60 at this point as well (which would be the case if we moved between data centers).

5)      Confirm that the associated APP13 Database loader is up to date IN THE DATA CENTER that the APP13 will be restarted in. 

a)       Use database loader reject procedures to replay this data.

·         If there are database loader rejects outstanding, they must be replayed before APP13 restarts. 

b)      It may be necessary to move the Database Loader to another node to complete loading this data. 

c)       If NOT moving the Database Loader files to a new node, skip to step 6.

·         Database Loader files should only be moved if “pre-move” database loading cannot be completed without doing so.

·         If moving Database Loader files:

o   Stop the affected APP13 APP81 process to unlock the database loader files.

o   Copy :\\data\DL*.log, DL*.pos, DL*rejects.log files (created on day) for each Database Loader process moving

to the alternate node.

o   Complete replaying the data, using Replay Procedures as necessary.

6)      If APP60 was moved, Restart APP60 on the new node.

7)      Restart APP02 on the new node.

8)      Restart APP01 on the new node. 

a)       If APP01 connects to the SIPs on restart, they will resend the most recent MESSAGINGs to the SIP. 

b)      If APP02 connects to the SIPs on restart, and APP01 is also connected, they will send MESSAGINGs in the UME data store queue sold.

c)       If there are any questions regarding MESSAGINGs not being up to date,  should resend MESSAGINGs using NTM ME commands.

d)      If there are any questions regarding MESSAGINGs not being sent,  should resend MESSAGINGs using APP20 MESSAGING queries.

e)      If there are any questions regarding clearing records not being sent,  should resend clearing using APP20.

 

(continue procedure on next page)

Continue recovery of APP03 portion of APP29 system (lower priority than APP01 and APP02):

9)      Copy APP03 log and inx files to the alternate node

a)       Copy D:\\data\APP03*.log and APP03*.inx files (created that day) for each APP03 service moving to its alternate node.

10)   Restart APP03 on new node.

11)   Confirm MESSAGING Reader Clients reflect reconnect to moved APP03 services.

 

 

APP70 (MESSAGING Retransmission) Restart Procedure

 

If APP03 processes have moved nodes, we must inform MESSAGING Retransmission processes of moved MESSAGING Services using the following procedure:

1)      Stop all APP70 processes.

2)      Modify \\appcfg\APP701\APP70config.xml  to reflect new nodes for APP03 log files.

3)      Restart all APP70 processes.

 

 

APP29 Database Loader Move Procedure

 

If APP01, APP02, APP03 processes have moved nodes, we must also move Database Loader processes afterward using the following procedure:

1)      Stop/Restart APP80 (after confirming all database loaders are up to date)

2)      Stop/Restart APP81 (after confirming all database loaders are up to date)

3)      Stop/Restart APP82 (after confirming all database loaders are up to date)

 

 

 


 

APP29 NTM Control Commands:

Use the following hyperlinks to get to APP01, APP02 or APP03 NTM Commands:

APP29_APP01_NTM_Control_Commands

APP29_APP02_NTM_Control_Commands

APP29_APP03_NTM_Control_Commands

 

 

APP01 NTM Control Commands:

 

Control connections to SIPs:

-          Use NTM Control Utility – Service Control – APP01 (Options by APP01) - Connect/Disconnect To/From Sip/SITE 1 or Switch Connection.

-          Select (and highlight) desired processes and right click to see and select desired options.

-          If user desires to connect/disconnect with the SIPs production primary site, select either Connect To/Disconnect From Sip/SITE 1 options.

-          If user desires to connect to any other SIP site other than the production primary site, select Switch Connection and choose the desire site:

1)      PRI_SITE_PRI_ADDR  (primary site, primary server)

2)      PRI_SITE_ALT_ADDR (primary site, alternate server)

3)      DR_SITE_PRI_ADDR  (DR site/remote data center, primary server)

4)      DR_SITE_ALT_ADDR (DR site/remote data center, alternate server)

 

Send Sequence Inquiry to SIPs:

-          Use NTM Control Utility – Service Control – APP01 (Options by APP01) - Send Sequence Inquiry.

-          Select (and highlight) desired processes and right click to see and select desired options.

 

APP01 Bypass:

-          Use NTM Control Utility – Service Control – APP01 (Options by APP01) – APP01 Bypass.

-          Select (and highlight) desired processes and right click to see and select desired options.

 

Abort Waiting Download Reply:

-          Use NTM Control Utility – Service Control – APP01 (Options by APP01) – Abort Waiting Download Reply.

-          Select (and highlight) desired processes and right click to see and select desired options.

 

Enable/Disable Processing MESSAGING Stat:

-          Use NTM Control Utility – Service Control – APP01 (Options by APP01) – options to Enable/Disable Processing MESSAGING Stat.

-          Select (and highlight) desired processes and right click to see and select desired options.

 

 

 

Set MESSAGING Conditions or Zero MESSAGINGs by ME:

-          Use NTM Control Utility – Service Control – APP01 (Options by ME) – to:

1)      Zero MESSAGING by ME,

2)      Set MESSAGING Condition Auto, or

3)      Set MESSAGING Condition Manual.

-          Select (and highlight) desired APP50 and right click to see and select desired options.

 

APP02 NTM Control Commands:

 

Control connections to SIPs:

-          Use NTM Control Utility – Service Control – APP02 - Connect/Disconnect To/From Sip/SITE 1 or Switch Connection.

-          Select (and highlight) desired processes and right click to see and select desired options.

-          If user desires to connect/disconnect to/from the SIPs production primary site, select either Connect/Disconnect To/From Sip/SITE 1 options.

-          If user desires to connect to any other SIP site other than the production primary site, select Switch Connection and choose the desire site:

1)      PRI_SITE_PRI_ADDR  (primary site, primary server)

2)      PRI_SITE_ALT_ADDR (primary site, alternate server)

3)      DR_SITE_PRI_ADDR  (DR site/remote data center, primary server)

4)      DR_SITE_ALT_ADDR (DR site/remote data center, alternate server)

 

 

Send Sequence Inquiry to SIPs:

-          Use NTM Control Utility – Service Control – APP02 - Send Sequence Inquiry.

-          Select (and highlight) desired processes and right click to see and select desired options.

 

Send MESSAGINGId Inquiry to SIPs:

-          Use NTM Control Utility – Service Control – APP02 - Send MESSAGINGId Inquiry.

-          Select (and highlight) desired processes and right click to see and select desired options.

 

Set Outbound Sequence Number to SIPs:

-          Use NTM Control Utility – Service Control – APP02 – Set Outbound Sequence Number.

-          Select (and highlight) desired processes and right click to see and select desired options.

-          User will have to enter desired sequence number.

 

Set Outbound MESSAGINGId Per Instrument to SIPs:

-          Use NTM Control Utility – Service Control – APP02 – Set Outbound MESSAGINGId Per Instrument.

-          Select (and highlight) desired processes and right click to see and select desired options.

-          User will have to enter Instrument and MESSAGINGId desired.

APP03 NTM Control Commands:

 

APP83 Start Of Day:

-          Use NTM Control Utility – Service Control – Book Feed Options – APP83 Start Of Day.

-          Select (and highlight) desired processes and right click to see and select desired options.

 

APP83 End Of Day:

-          Use NTM Control Utility – Service Control – Book Feed Options – APP83 End Of Day.

-          Select (and highlight) desired processes and right click to see and select desired options.

 

APP83 Set Outbound Sequence Number:

-          Use NTM Control Utility – Service Control – Book Feed Options – APP83 Set Outbound Sequence Number.

-          Select (and highlight) desired processes and right click to see and select desired options.

-          User will have to enter desired sequence number.

 

APP83 Send  System Problem Message:

-          Use NTM Control Utility – Service Control – Book Feed Options – APP83 Send  System Problem Message.

-          Select (and highlight) desired processes and right click to see and select desired options.

 

APP83 Send  System Problem Clear Message:

-          Use NTM Control Utility – Service Control – Book Feed Options – APP83 Send  System Problem Clear Message.

-          Select (and highlight) desired processes and right click to see and select desired options.

 

 

 

 


 

APP01 Troubleshooting Table:

APP01 Symptom

Impacts

Response

APP01 SITE 1 connectivity issues

 

Evidenced by:

-          In APP01_APP02_to_SITE 1 stats, APP10 processes are not connected.

-          If SITE 1 moves to DR site, CQS processes will report messages with text “disaster” in them.

 

NOTE: APP10 processes will try to auto-reconnect continuously until connections can be made.

 

Since APP01 is first process in APP29 system path:

- will not be reporting MESSAGING related market data to industry.

- will not be reporting MESSAGING related market data to industry; This includes clearing.

- will not be reporting MESSAGING related market data to subscribers.

 

-Trading must be halted in affected issues if problem goes on too long.

 

1)      Refer to:

 Generalized Event -  Cannot Send MESSAGINGs To SIP.docx

Generalized Recovery Scenario.

2)      Work with SITE 1 and Tech Services to identify and resolve issues.

-          SITE 1 may ask  to move APP10 connections to Primary Alternate Servers or their DR site.  Use NTM Control APP01 options to switch APP01 connections. 

3)      APP10 Services may need to be stopped/restarted.  See APP29_Recovery_Considerations

 

APP01 SITE 2 connectivity issues

 

Evidenced by:

-          In APP01_APP02_to_SITE 2 stats, APP12 processes are not connected.

-          If SITE 1 moves to DR site, UQDF processes will report messages with text “disaster” in them.

 

NOTE: APP12 processes will try to auto-reconnect continuously until connections can be made.

 

Since APP01 is first process in APP29 system path:

- will not be reporting MESSAGING related market data to industry.

- will not be reporting MESSAGING related market data to industry; This includes clearing.

- will not be reporting MESSAGING related market data to subscribers.

 

-Trading must be halted in affected issues if problem goes on too long.

 

1)      Refer to:

 Generalized Event -  Cannot Send MESSAGINGs To SIP.docx

Generalized Recovery Scenario.

2)      Work with SITE 2 and Tech Services to identify and resolve issues.

3)      SITE 2 may ask  to move APP12 connections to Primary Alternate Servers or their DR site.  Use NTM Control APP01 options to switch APP01 connections. 

4)      NOTE: If SITE 2 moves to DR site, MDP recovery procedures will also need to be used.   See Application Recovery - MDP.docx  Application Specific Recoveries.

5)      APP12 Services may need to be stopped/restarted.  See APP29_Recovery_Considerations

 


 

APP02 Troubleshooting Table:

APP02 Symptom

Impacts

Response

APP02 SITE 1 connectivity issues

 

Evidenced by:

-          In APP01_APP02_to_SITE 1 stats, APP11 processes are not connected.

-          If SITE 1 moves to DR site, CTS processes will report messages with text “disaster” in them.

 

NOTE: APP11 processes will try to auto-reconnect continuously until connections can be made.

 

Since APP02 is second process in APP29 system path:

- will not be reporting MESSAGING related market data to industry; This includes clearing.

- will not be reporting MESSAGING related market data to subscribers.

 

-Trading must be halted in affected issues if problem goes on too long.

 

1)      Refer to:

 Generalized Event -  Cannot Send MESSAGINGs To SIP.docx

Generalized Recovery Scenario.

2)      Work with SITE 1 and Tech Services to identify and resolve issues.

3)      SITE 1 may ask  to move APP11 connections to Primary Alternate Servers or their DR site.  Use NTM Control APP02 options to switch APP02 connections. 

4)      APP11 Services may need to be stopped/restarted.  See APP29_Recovery_Considerations

 

APP02 SITE 2 connectivity issues

 

Evidenced by:

-          In APP01_APP02_to_SITE 2 stats, APP13 processes are not connected.

-          If SITE 2 moves to DR site, UTDF processes will report messages with text “disaster” in them.

 

NOTE: APP13 processes will try to auto-reconnect continuously until connections can be made.

 

Since APP02 is second process in APP29 system path:

- will not be reporting MESSAGING related market data to industry; This includes clearing.

- will not be reporting MESSAGING related market data to subscribers.

 

-Trading must be halted in affected issues if problem goes on too long.

 

1)      Refer to:

 Generalized Event -  Cannot Send MESSAGINGs To SIP.docx

Generalized Recovery Scenario.

1)      Work with SITE 2 and Tech Services to identify and resolve issues.

2)      SITE 2 may ask  to move APP13 connections to Primary Alternate Servers or their DR site.  Use NTM Control APP01 options to switch APP01 connections. 

-          NOTE: If SITE 2 moves to DR site, MDP recovery procedures will also need to be used.   See Application Recovery - MDP.docx  Application Specific Recoveries.

3)      APP13 Services may need to be stopped/restarted.  See APP29_Recovery_Considerations

 

 

APP03 Troubleshooting Table:

APP03 Symptom

Impacts

Response

Users report missing data in MESSAGING data.

 

May or may not be evidenced by:

-          Sequence gaps reported by APP70 processes in both EMT and MESSAGING Reader Client.

 

-MESSAGING Subscribers are missing data that they may or may not use in trading decisions.

 

6)      Use MESSAGING Reader to confirm whether or not the same data lost by user was reported by APP70 Client.

See Application Recovery - BFRD.docx monitoring section for more details.

7)      Report sequence gap information to Tech Services and work with Tech Services to determine cause/resolution.

8)      Users may utilize MESSAGING Retrans processes to try and gap fill messages lost. See

 

               

APP01_APP02 Monitoring Considerations:

Stats Monitors:
APP01 / APP02 to SITE 1 App Queues Stats

To Start:

Key Indicators to Monitor:

Symptom:

Response:

Processes facilitate MESSAGING and last sale delivery from  to SITE 1 SIPs.

Monitor shows connection status between  and SITE 1 SIP as well as processing statistics.

PROD MENU:
Market Data Monitoring Menu

To Exit:
Close Window

- Color of data in columns
- ConnStat,
- MESSAGINGQue,
- OutMsgRate,
- OutMsgs

Data is RED.

Process is either down or multicast data is not being received by monitor.
1) Check status of process
2) If process is up, call Technical Services.

Connstat is not Connected.

 is not connected to SIP.
1) Use NTM Control Utility APP01 / APP02 Service Controls to control connections.
2) Call Production Control if needed.

MESSAGINGQue is non-zero value and not decreasing as expected, or OutMsgRate or OutMsgs values are not reflecting changes as expected.

We may not be sending MESSAGINGs and/or lastsales as expected.
1) Check process log and/or data files to confirm inbound messages match outbound messages.
2) Work with SIP and Technical Services if necessary.

Stats Monitors:
APP01 / APP02 to SITE 2 App Queues Stats

To Start:

Key Indicators to Monitor:

Symptom:

Response:

Processes facilitate MESSAGING and last sale delivery from  to SITE 1 and SITE 2 SIPs.

Monitor shows connection status between  and SITE 1 / SITE 2 SIPs as well as processing statistics.

PROD MENU:
Market Data Monitoring Menu

To Exit:
Close Window

- Color of data in columns
- soup status,
- status,
- is ready to send,
- out rate,
- total sent

Data is RED.

Process is either down or multicast data is not being received by monitor.
1) Check status of process
2) If process is up, call Technical Services.

Soup status is not Connected, Status is not ready, or Is ready to send value is not Y.

 is not connected to SIP.
1) Use NTM Control Utility APP01 / APP02 Service Controls to control connections.
2) Call Production Control if needed.

Out Rate or Total Sent values are not reflecting changes as expected.

We may not be sending MESSAGINGs and/or lastsales as expected.
1) Check process log and/or data files to confirm inbound messages match outbound messages.
2) Work with SIP and Technical Services if necessary.

 

 


 

Stats Monitors:
APP01 / APP02 To  App Queues Stats

To Start:

Key Indicators to Monitor:

Symptom:

Response:

Processes facilitate MESSAGING and last sale delivery from  to SITE 1 and SITE 2 SIPs.

Monitor shows IPC connection status between APP01 / APP02 process and other  applications, or 29 west connection status by topic.

PROD MENU:
Market Data Monitoring Menu

To Exit:
Close Window

- Color of data in columns
- Status,
- Write Queue

Data is RED.

Process is either down or multicast data is not being received by monitor.
1) Check status of process
2) If process is up, call Tech Services.

Not all APP01, APP02 or IPC connected processes are displayed as expected.

Not all IPC connected services have been started or haven’t processed any messages since monitor has been started.
1) Check status of services.
2) Check relevant service log files.

IPC Connected status is not CONNECTED.

Messages cannot be sent from source to destination if IPC channel disconnected.
1) Stop/Restart destination process if other processes connecting to the same are showing similar issues; Otherwise, stop/restart source process.
2) Notify Production Support if issues.

29West connected status is Inactive

29 West communications has been disabled between the services involved.
1) Check status of process
2) If process is up, call Prod Support.

IPC Connected Queue size is non-zero value and not decreasing as expected.

Messages cannot be sent from source to destination unless IPC channel is connected.
1) Stop/Restart destination process so as not to accidentally delete queued messages; Do not stop/restart source process.
2) Notify Production Support if issues.

29West connected Write Queue is non-zero values and not reducing as expected.

29 West communications has been disabled between the services involved.
1) Check status of process
2) If process is up, call Production Support.

 


 

Stats Monitors:
APP01 / APP02 IPC Instance Stats

To Start:

Key Indicators to Monitor:

Symptom:

Response:

Processes facilitate MESSAGING and last sale delivery from  to SITE 1 and SITE 2 SIPs.

Monitor shows IPC channel processing statistics.

PROD MENU:
Market Data Monitoring Menu

To Exit:
Close Window

- Color of data in columns
- Service,
- Hostname,
- Msgs In,
- Msgs Out

Data is RED.

Process is either down or multicast data is not being received by monitor.
1) Check status of process
2) If process is up, call Technical Services.

Not all APP01 / APP02 processes are displayed as expected.

Service has not been started.
1) Check status of service.

Msgs In and/or Msgs Out are zero.

No messages have been sent/received since that monitor has been started.
1) Check APP01 / APP02 log files.
2) Call Production Support if necessary.

 


 

APP03 Monitoring Considerations:

Stats Monitors:
APP03 Stats

To Start:

Key Indicators to Monitor:

Symptom:

Response:

Processes receive MESSAGING data via ME->APP01->APP02->APP03 path, and send multicast to MESSAGING Subscribers.

Monitor shows statistics of data received, as well as instances of slow message delivery times between processes in the path and rule_603a violations.

PROD MENU:
Market Data Monitoring Menu

To Exit:
Close Window

- Color of data in columns
- rule_603a_violation_cnt
- me_APP01_over_limit
- APP01_APP02_over_limit

- APP02_APP03_over_limit

Data is RED.

Process is either down or multicast data is not being received by monitor.
1) Check status of process
2) If process is up, call Technical Services.

 

 

 

Rule 603a violation count is > 0

Rule 603a violation has been reported

1) Notify Production Support and  management.

 

 

 

Any one of, or any combination of the process “over limit” columns are greater than 0.

We may not be processing as expected.
1) Production Control gets hourly reports of the data.  They are aware of reporting thresholds and will investigate if problems indicate greater problems other than those already identified.

 

Also see MESSAGING Reader.  It will report any issues specific to MESSAGING multicast delivery, at least to our MESSAGING Reader servers.

Use the following hyperlink to see this documentation: Application Recovery - BFRD.docx