In response to the growing complexity of systems and the increasing user base of Unicom Wo Music, Bonree provides a unified intelligent operations monitoring platform. It enables end-to-end monitoring across the entire stack—including the end (APP/H5/mini programs), the network (latency, traffic), the cloud (infrastructure cloud, physical servers, databases), and the platform (business system APM performance monitoring)—to ensure a real-time, stable, and high-quality user experience.

Free Trial



Background Analysis



With the continuous development of Wanglaoji Health Company (hereinafter referred to as “Wanglaoji”), its IT system architecture has become increasingly complex. Both system traffic and the number of functional modules continue to grow, leading to rising application complexity.

As IT failures and risk points increase, traditional infrastructure monitoring methods are no longer sufficient to meet current operational requirements. At the same time, the company faces multiple challenges, including rapidly changing business demands, continuously increasing user expectations, and pressure for cost reduction and efficiency improvement. As a result, the likelihood of performance degradation or service anomalies in IT applications has significantly increased, impacting overall business continuity.

Therefore, establishing an effective application management mechanism and ensuring stable IT system operations has become an urgent requirement for business development.

Before the implementation of the project, Wanglaoji lacked RUM (Real User Monitoring) and APM (Application Performance Monitoring) alerting mechanisms. System status was entirely dependent on user complaints—meaning the first person to detect issues was often customer service or business staff rather than operations engineers.

This reactive model led to significant delays in incident detection and left operations teams in a blind spot.





Application Scenario




   



In daily operations, the Bonree ONE platform unified the collection of RUM (Real User Monitoring) and APM (Application Performance Monitoring) data from Wanglaoji’s SSO and TPM systems, building an end-to-end observability baseline.

When system anomalies occur, intelligent alerting strategies trigger notifications within seconds and notify operations engineers. A closed-loop troubleshooting process is then executed as follows:


APM Call Chain Analysis: Quickly Define Incident Boundaries

Operations engineers first use the APM module’s full call-chain tracing to accurately locate faulty services and abnormal nodes, quickly determining whether the issue is caused by downstream dependency latency or code-level performance bottlenecks.

6-1782719413938

7-1782719418243

8

RUM Session Replay: Reconstruct User Behavior

RUM session replay is used to reconstruct real user interaction paths during the incident period. Combined with client IP, device type, and geographic distribution, engineers can determine whether the issue is caused by specific environments or regional network conditions.

This helps eliminate irrelevant client-side factors and ensures optimization efforts focus on true root causes.


Deep Correlation of Middleware Metrics: Identify Hidden Bottlenecks

For complex incidents, detailed monitoring data from databases, caches, and message queues is analyzed. Key metrics such as connection count, response latency, and queue backlog are compared across dimensions.

Through multi-dimensional correlation analysis, hidden issues such as slow SQL queries, connection pool exhaustion, cache breakdowns, or message queue congestion can be quickly identified, significantly reducing trial-and-error troubleshooting costs.


9

10


Playbook Accumulation and Knowledge Loop: Continuous Stability Improvement

Based on the above analysis, targeted remediation measures are implemented. After incident resolution, full-chain data, root cause conclusions, and handling processes are standardized and stored in a knowledge base.

When similar incidents occur in the future, historical cases can be automatically referenced, shortening response time and forming a closed-loop process of:

Detection → Diagnosis → Recovery → Knowledge Retention

This continuously strengthens the resilience of Wanglaoji’s business systems.



Application Result


   


  • Average incident detection time reduced from hours to under 10 minutes

  • Alert accuracy improved to ≥95%, significantly reducing noise and operational interference

  • Incident resolution efficiency improved from hours of manual investigation to minutes of intelligent localization

  • Successfully implemented RUM, APM, and core middleware monitoring capabilities, enabling full end-to-end observability from user side to service side

  • Built a unified monitoring and alerting system that makes system status visible, measurable, and traceable

  • Transitioned operations from a reactive complaint-driven model to proactive governance


Looking forward, Bonree will continue to collaborate with Wanglaoji, focusing on advancing AI capabilities, including:

  • Intelligent root cause analysis

  • AI-assisted diagnostics

Further improving operational efficiency and evolving the system from “observable and measurable” to “intelligent and autonomous.”




Why Bonree



Global Leader in Intelligent Observability

Bonree is an AI-driven global leader in intelligent observability.

Full-Stack End-to-End Observability Capabilities

Bonree ONE provides full-stack observability from user experience, application services, middleware, databases, to underlying infrastructure.


Related Case

客户案例

See Our Unified Intelligent Observability Platform in Action!