viernes, 25 de diciembre de 2020

Agent vs Integration Server in Sterling Order Management System

Sunday, September 9, 2012

Agent framework scalability and Tuning considerations for high volumes

The core of the Sterling solution for many implementations lies in the Sterling agent framework and APIs provided for monitoring and order fulfillment. OMS implementations typically use Schedule Order, Release Order, ConsolidateToShipment, Real Time Availability Monitor agents to name a few. Although every agent works differently and there is no run_faster parameter available to scale up Sterling agents there are few underlying elements that vastly control the extent of their scalability. Very little is documented in the public domain on how exactly the agent framework works and to what extent it can scale so here goes my attempt to demystify agent operations and scalability. This post assumes that you are familiar with the OMS nomenclature else you may want to read my earlier post first.

How a Generic Agent works -
A generic Sterling agent is a background batch processing job that does the following -

Check if there are messages to process from the configured JMS Queue. If the queue is empty post a getJobs message and go to Step 2 else go to Step 4.
Read the getJobs message and gets the first set of jobs (first batch) from the database using a getJobs method up to the defined buffer size (Number of records to buffer configuration which defaults to 5000).
Writes these records back into the configured JMS queue in the form of executeJobs messages as well as the next getJobs message containing the last fetched record key such as an TaskQKey
Retrieves executeJobs messages from the queue and does the necessary processing using the executeJobs method
After finishing first batch gets the next set of jobs (second batch) up to the buffer size using the last fetched record key in the getJobs message.
Works on second set of jobs.
Continue the above process till all the present jobs are worked upon.
After all the present jobs are worked upon then wait for signal i.e. the agent trigger to start working again.
Upon getting the signal to start, agent will start working again i.e. follow Step 2 to Step 7

More details on default agent behavior -

Triggering an Agent is the act of posting a getJobs message to the JMS queue. Triggering may be manual or automatic i.e. self triggered. During an agent startup if there are no messages in the queue an agent automatically triggers itself.

Within getJobs method, agent tries to acquire lock on YFS_OBJECT_LOCK table for agent Criteria ID

If lock is not available then getJobs method exits and does nothing. This is used to ensure that duplicate sets of records are not retrieved for processing.

If lock is available then getJobs method fetches records which needs to be processed.

Above records are posted as execute message to JMS queue. For each message depending on the JMS session pooling setting a new MQ session is created or borrowed to post the message and then session is closed or returned to the pool. This default behavior could change in an upcoming version as a result of the testing we undertook for one of our customers.

After the execute messages, one getJobs message is also posted with last record key so as to facilitate retrieval of next batch of messages.

Each thread of the agent picks execute message one by one and processes them. Multiple threads of execute method can run concurrently.

After all the execute messages are consumed then only getJobs message is left in queue then the same agent thread uses the getJobs() method to process the getJobs message and continue the processing cycle.

Scalability concerns and Scaling the Availability Monitor agent -
Are Sterling agents multi-threaded?
Not entirely. The getJobs component of the agent working is deliberately made single threaded via the database locking on YFS_OBJECT_LOCK to ensure same set of records are not processed and retrieved multiple times. However, the bulk of the workload is on the executeJobs component which is multi-threaded and can run in multiple JVMs.

Will my agent scale to meet the peak throughput?
Depends on your volumes. Scaling an agent involves tuning the getJobs and the executeJobs component. The scaling and tuning of the executeJobs component is a different exercise which varies depending on the use case so it will not be covered in this post. At low to medium volumes under 100K/hr scalability issue are largely with the executeJobs component. For workloads under 100K jobs/hour the default settings that governs agent behavior should work well. If you are using the agent framework to process over 150K "jobs" per hour there may be challenges using the default implementation. I use the term jobs to denote the message entity for e.g. Jobs in the case of ScheduleOrder are distinct Orders and for Availability Monitor it is distinct Inventory Items.

What are the elements that affect scaling beyond 100K jobs/hour?

Performance of the getJobs query - Slower the query more time is spent on retrieving messages
Time taken to write all of the retrieved executeJobs messages to the queue - Default behavior for creating and closing MQ sessions to write individual messages meant that was a significant overhead. Using the product HF to enable bulk loading of messages significantly improves the write time per message. Other aspects such as Persistence setting used for the queue, network latency between the agent servers and MQ server can also affect message write times to the queue.
Buffer size of messages to get - Default of 5K may not suffice at very high loads as it would mean 40 or more execution of the getJobs component to achieve just 200K throughput. Since getJobs is single threaded there needs to be an optimal number of executions of it.

Scaling the Real Time Availability Monitor (RTAM) Agent - A case study
At a customer site one of the challenges was to scale the Real Time Availability monitor agent to do the Partial Sync of inventory at over 250 K records/hour. The customer was running Sterling 8.5 HF 25, WAS 7 and MQ 7. Following actions were taken to scale the agent from about 150K/hr to around 300K /hr -
1. Tuning the getJobs query - Front loading the YFS_INVENTORY_ACTIVITY table would heavily skew the test results due to the excessive time spent querying it as part of getJobs query. Hence, trimming or keeping the Inventory Activity record table under check significantly alters the time take for getJobs and also more realistically represents production work load. We also ensured usage of the correct index and updated statistics.
2. Setting the agent queue to non-persistent - We defined the internal JMS Queue as non-persistent on MQ. Then, setting the PER(QDEF) option in the scp file for this queue's entry while generating the bindings. Writing each message to the persistent queue takes between 11-20ms whereas on the non-persistent queue it is under 5 ms.
3. Enabled JMS session pooling for this agent via the following property in customer_overrides.properties -
yfs.yfs.jms.session.disable.pooling=N
This allows sessions to be borrowed and returned to the pool instead of new ones getting created and closed for each message.

4. Enabling the bulk loader property for the agent framework to avoid creating and closing sessions for each message being posted to the queue. We worked with IBM Sterling support to accomplish this via 8.5 HF48. The below 2 properties were set in the customer_overrides.properties file

yfs.agent.bulk.sender.enabled=Y
yfs.agent.bulk.sender.batch.size=50000
The batch size setting of 50000 should be equal to or great than the maximum buffer size you plan to use across all agents.

5. Running the agents in the same data center as the MQ server - This helps further improve the latency between the two tiers and therefore the overall performance. This may not always be possible if you are running agents in multiple data centers.

6. Increasing the buffer size of records retrieved to 10000 from the default of 5000 - We tested with various settings between 5K and 25K and found that the overall performance was best at 10000 for our setup. The optimal setting for the buffer size may vary on your environment and workload so run performance tests to determine what works best for your needs.

Now that you have a better understanding of how the Sterling agent works you should be in a better position to troubleshoot and scale the agents. Happy testing and tuning!

Agent vs Integration Server in Sterling Order Management System

Agent Vs Integration Server : Processes that run in the background to perform various tasks.

Even though both servers performs background activities has below differences

NAME	DESCRIPTION
Time Triggered Transaction	Program running at specific time intervals. Example RELEASE.0001 transaction
Agent Server	The process that runs the time-triggered transactions is known as an agent server. Example: DefaultAgent
Integration Server	The Process that manages asynchronous services, such as messages to and from external systems. Example: Order Fulfilment  Order Fulfilment Services  Edge Deployment  ProcessCreateOrder

AGENT SERVER	INTEGRATION SERVER
The process that runs the time-triggered transactions is known as an agent server.	An Integration Server is a process that manages asynchronous services, such as messages to and from external systems.
Multi-thread : Supported	Multi-thread : Supported
Multiple instances : Support	Multiple instances : Support
Can run multiple time triggered transaction: Yes	Configured via Service Definition Framework

How to create Agent Server in Sterling Order Management System ?

Go under any time triggered transaction
Click the green color + symbol under “Agent Criteria Definition”section
Enter Server name and click save

Agent Vs Integration Server

How to create Integration Server in Sterling Order Management System ?

Go under “Service Definitions” create new service
Drag and drop JMS or file component
Connect the component with any other component (For example email or export_db)
Click on the connector as shown in below image
Go under server tab
Click the green color + symbol
Enter Server name and click save

Integration Server Creation

How to find server is Agent Server or Integration Server ?

This information is available in yfs_server table. We can find using below query.

select server_type from yfs_server where servername=’?’;

Server type :

00 – Integration Server
01 – Agent Server

Time-triggered transactions

A time-triggered transaction is a program that performs a variety of individual functions, automatically and at specific time intervals. It is not triggered by conditions, events, or user input.

Automatically : Schedule Trigger message every N minutes (Configuration under time triggered transaction)
Manually: Trigger from $U process; triggeragent.sh <Criteria Name>

Three types of time-triggered transactions

Business process transactions – responsible for processing day-to-day transactions. Example RELEASE.0001
Monitors – watch and send alerts for processing delays and exceptions. Example ORDER_MONITOR_EX
Purges – clear out data that may be discarded after having been processed. Example SUPPLYTEMPPRG

Besides the different types of time-triggered transactions specified here, there are task based & non task based agents. Do you know how the task based agents get their task record (if not manually created using manageTaskQueue API)?

Good question. That task_q entry decision is made by pipeline transaction configuration. This configuration available under transactions Others tab. Look for This transaction is task based checkbox. if this checkbox selected an entry will be made to yfs_task_q table. For example

Create order next transaction is schedule order. Schedule order has “This transaction is task based” enabled. So during order creation when order moved to created status system finds what is next transaction and if the transaction enabled for task_q. if yes entry made into yfs_tak_q table.

Hope this helps. if not clear please write to us. Thanks

yantra/yfsadmin/agenttrigger.jsp

OMS Log File Locations

Development server log files all reside on vldvoms01.lifetouch.net but the paths vary by environment.

Foundation/logs/

OMS3 MC log files can be found on the following servers in the Foundation/logs/ directory. OMS3 QA log files can be found on the following servers in the Foundation/logs/ directory. OMS3 Prod log files can be found on the following servers in the /Foundation/logs/ directory.

Agent Trigger

Yantra agents trigger themselves on a regular basis. However, if you need to trigger an agent immediately, the following web page will let you select an agent and trigger it.

OMS Agent Server Descriptions
"Agent Servers" are java processes that contain one or more process "Agents" that move work through the Sterling framework. Note that "Agent Servers" are commonly referred to as "Agents" for short. They are a core part of Spectrum OMS.

OMS Agent Server 'HOW TO' Instructions for Starting / Monitoring

Agents are multi-threaded, stand-alone processes that process work within the Sterling Commerce framework. Integration Servers are a special class of Agents that are used to send and receive messages from other applications.

Use the Sterling System Management Console to view the list of currently running agents. To open the management console, login to OMS / Spectrum, select the "System" menu, and then select the "Sterling System Management Console" sub-menu.

Once the management console is open, select the "Tools" menu and then the "View Live Agents / Integration Servers" sub-menu. In addition to viewing the live Integration Servers, you can stop or suspend the servers from the console. Use "./stopagents.sh stop" to stop the agents.

If you need to start an Agent / Integration Server in TEST or SIT, cat the content of the /bin/startServer.sh file. Find the agent line that you need and copy/paste the line in your ssh session.

Listing all running agents

./stopagents.sh show|showall

Stopping all agents

./stopagents.sh stop|kill

Starting all agents

./startServers.sh

Restart all running agents

./restartagents.sh

PHYSICS PHD BLOG

viernes, 25 de diciembre de 2020