Investigating slow proposed shipment generation.
Incident Report for ShipHawk
Postmortem

Incident summary

Some of the ShipHawk NetSuite users experienced slowness in item fulfillments syncing between NetSuite and ShipHawk.

The slowness was detected by the monitoring system at 9:28 AM Pacific Time, Monday 8/8, and continued till 12:51 PM Pacific Time.

Impact

Because of internal configuration changes, proposed shipment generation for large orders that had incomplete product information was done incorrectly and caused generation of a huge amount of packages.

Processing of those proposed shipments took too much memory on background workers that were processing that queue. That, in turn, caused their unstable behavior and caused delays for all other item fulfillments processed in that queue.

As a result, NetSuite Item Fulfillments were synchronizing to ShipHawk with a delay from 3 to 52 minutes.

Detection and Recovery

The incident was detected by ShipHawk monitoring system when the synchronization delay reached 3 minutes.

The initial response was to scale processing power. Adding additional resources did not help as the new background job processors quickly became stuck for the same reason. The delay eventually increased and reached 52 minutes at its peak.

At 12:30 PM we fixed the data of the products that were causing the issue and removed incorrectly generated proposed shipments. That unblocked the system and all the jobs that were waiting in the queue were processed within 21 minutes. The system returned to its normal state at 12:51 PM Pacific Time.

Corrective actions

In order to prevent that type of issue in the future, we plan to accomplish the following:

  1. Develop a time-limiting system for background job processors, so a few slow jobs don’t block the entire queue.
  2. Improve the UX to eliminate the ability to create product configurations that could cause unexpected behavior.
  3. Add hard limitations to specific actions of the system, in order to reduce the risk of resource-abusive processes.
Posted Aug 16, 2022 - 08:22 PDT

Resolved
This issue was resolved at 12:51 PM Pacific Time.

Customer impact: Some customers have reported a delay when syncing orders from their ERP.

Start time: 9:28 AM Pacific Time
End time: 12:51 PM Pacific Time
Posted Aug 09, 2022 - 09:11 PDT
Monitoring
A fix is in place and being rolled out. Processing times will improve over the next 10-15 minutes.

Customer impact: Some customers have reported a delay when syncing orders from their ERP.
Posted Aug 08, 2022 - 12:48 PDT
Identified
The issue has been identified and we are working to resolve it. We estimate this issue will be solved within the next hour.

Customer impact: Some customers have reported a short delay when syncing orders from their ERP.
Posted Aug 08, 2022 - 12:06 PDT
Investigating
Our monitoring system has identified some slowness when generating proposed shipments. Some customers may see a minor delay in the time it takes for a proposed shipment to generate when an order syncs to ShipHawk from their ERP.
We are actively investigating this issue.
Posted Aug 08, 2022 - 10:10 PDT
This incident affected: ShipHawk APIs (Shipping APIs).