Kunlun Bank’s Integrated Intelligent Observability Platform Ensures Business Stability Kunlun Bank achieves full-chain monitoring and tracing of applications from program entry to execution through code-level call tracking, establishing a new benchmark in observability-driven operations. By using the Bonree ONE integrated intelligent observability platform, Kunlun Bank precisely analyzes root causes and performs layer-by-layer decomposition analysis, significantly improving fault diagnosis efficiency with remarkable results. Amid the wave of digitalization, domestic banking institutions have launched digital transformation strategies to drive and empower their business development, internal control management, and risk prevention, aiming to stand out in fierce market competition. During this transformation process, banks often face challenges such as insufficient organizational agility, unstable network environments, difficulties in data governance, and low operational troubleshooting efficiency, resulting in inefficient business processing. At the same time, the banking business scenarios are complex and diverse, requiring innovative development of operational approaches in exploring the relationships between systems and the status of various modules. Through the construction of the Application Performance Monitoring platform in 2019, Kunlun Bank has completed performance monitoring for over twenty business systems, two apps, and one web page, as well as conducting active probing tasks across 16 systems, 39 cities, and 3 telecom operators. The production environment probes collect up to 2TB of data, with a daily processing volume of 100GB. Data sources include internally monitored application systems, mobile apps, and web pages. Data types cover metrics, sessions, call chains, small files, snapshots, topology relationships, configuration information, and more. Different data types have different data retention periods accordingly. 1. During critical events, troubleshooting efficiency is low, highlighting the urgent need for real-time monitoring mechanisms In IT operations, Kunlun Bank sometimes faces complex fault localization scenarios. For example, during an annual party membership fee collection campaign, a large number of systems triggered high-level alerts almost simultaneously. These systems are interconnected through various networks, with supporting and dependency relationships, and each system is supported by a complex architecture. In such situations, quickly pinpointing faults and restoring business operations within a limited time frame poses a rare but high-risk challenge for operations personnel. 2. The diverse banking business scenarios pose challenges to operational approaches such as system call relationships and the status of various modules Business requirements frequently change—for example, launching new products or adjusting service models. Kunlun Bank needs to establish a flexible IT architecture and operation processes, and promptly adjust system configurations and functionalities to meet business needs, thereby supporting rapid business growth 3. Challenges of Data Integration Kunlun Bank’s business involves large volumes of data, including customer information and transaction records. However, these data are often scattered across different systems and databases with inconsistent formats, resulting in data silos and making comprehensive data analysis difficult. 4. Complex System Architecture and Challenges in Ensuring Business Stability Kunlun Bank has a large and complex system architecture, including core banking systems, payment systems, risk management systems, and more. The intricate dependencies and integration requirements among these systems make operations and maintenance more challenging. The reliability and stability of the bank’s systems are crucial for business continuity. Therefore, Kunlun Bank needs to establish a comprehensive technical architecture and operations framework to ensure stable system operation while being able to respond quickly to various emergencies. 1. Monitoring of Application Systems, Mobile Apps, and Web Pages Kunlun Bank uses Bonree’s products—Bonree ONE—to monitor application systems, mobile apps, and web pages. Each of the three platforms consists of a client side and a server side. On the client side, probes are injected to collect client data, which is then reported to the server side. The server analyzes and processes the data and ultimately presents the results. The overall design of Bonree products adopts a layered architecture. 2. Production Environment Monitoring Helps the bank’s internal development teams monitor application performance in the production environment in real time, including metrics such as response time, throughput, and error rates. Through monitoring, potential performance issues can be promptly detected and resolved, ensuring the stability and reliability of applications under high load and high concurrency scenarios 3. Fault Troubleshooting and Issue Localization When applications encounter faults or performance issues, performance monitoring provides detailed metrics and reports to help development teams quickly identify the root cause. By analyzing monitoring data, bottlenecks can be located, and appropriate measures taken to fix issues, optimize application performance, and enhance system stability and responsiveness 4. Analyzing User Behavior to Improve User Experience Performance monitoring can track user actions and experiences within the application, analyze user behavior and feedback, and understand user satisfaction and pain points. By combining performance data with user feedback, development teams can optimize the application interface and user interactions, enhancing user experience and increasing user loyalty and satisfaction 1. Emergency Fault Investigation and Second-Level Performance Diagnosis Ensure stable system operation during critical events. For example, during a certain event, Kunlun Bank’s system experienced a sudden failure. Faced with a large volume of alert data, the system struggled with effective aggregation, correlation analysis, root cause analysis, and fault diagnosis. By implementing an application performance monitoring system to achieve end-to-end performance visibility, the bank was able to quickly locate performance issues and perform step-by-step problem isolation and analysis, enabling second-level code performance diagnosis. Ultimately, the time to detect and resolve issues was compressed from hours to minutes, improving team operational efficiency by 80%. 2. Code-Level Call Tracking for Full Business Chain Observability Kunlun Bank’s online loan platform requires high performance. Before adopting Bonree’s products, the system could not accurately determine whether the systems related to online loans were running slowly. The Bonree ONE platform helped Kunlun Bank’s online loan platform achieve full-link monitoring and tracing of the application from program entry to execution. It quickly identified latency issues in downstream systems and enabled timely responses, ensuring the stable and secure operation of Kunlun Bank’s online loan platform and providing users with a high-quality service experience 3. Achieving Standardization of Operation and Maintenance Data to Provide High-Quality Data Support for the Stable Operation of Core Business During the project implementation, the Controller serves as the probe access and data processing component, receiving and processing various metric data uploaded by probes. Through the Config protocol, the Controller issues data collection policies, controlling probes to collect data as needed. Via the Upload protocol, the Controller receives raw data, performs validity checks, classification, and normalization, and finally stores the data. This process standardizes operation and maintenance data, ensuring data timeliness, completeness, relevance, and validity. Through data modeling and governance, the Controller provides high-quality data support for scenarios such as application monitoring and intelligent analysis, improving operation and maintenance efficiency, helping the team quickly locate and resolve issues, and ensuring the stable operation of the bank’s core business systems. 1. Building a Digital Operation and Maintenance System Kunlun Bank has broken the traditional operation and maintenance model by actively applying performance monitoring systems. Leveraging digital transformation methods, the bank accurately assesses its technological innovation capabilities, further advancing precision and intelligence in tech-financial services, and providing customers with more valuable financial services. 2. Actively Empowering Business Innovation and Development Kunlun Bank has broken the traditional operation and maintenance model by actively applying performance monitoring systems. Leveraging digital transformation methods, the bank accurately assesses its technological innovation capabilities, further advancing precision and intelligence in tech-financial services, and providing customers with more valuable financial services. 3. Building an Integrated Intelligent Observability and Operation Environment Kunlun Bank has comprehensively upgraded and reformed its operation and maintenance monitoring management system, connecting cross-departmental and cross-system processes at a low cost. During the full-process governance, partial data governance was completed, enhancing visualization capabilities, improving work efficiency, and reducing operation and maintenance costs. This has made the bank’s operation environment more intuitive, secure, and observable. On its continuous journey forward, Kunlun Bank will steadfastly adhere to technology-driven and internal-external collaboration approaches to empower scenario-based ecosystem development. It will deeply explore a full-process experience service system centered on user experience and high-quality product creation. Kunlun Bank will also continue to collaborate with Bonree to advance innovation in financial enterprise operation and maintenance management. Based on its own characteristics and the demands of financial business scenarios, it will strengthen the digital management foundation of network operations, enhance fintech empowerment levels, and comprehensively improve the enterprise’s competitive advantage