Driving Business Capability Enhancement, Bonree Data Adds Powerful Tools for Multi-Cloud Resource Management

Free Trial

Introduction


   

Driving Business Capability Enhancement, Bonree Data Adds Powerful Tools for Multi-Cloud Resource Management



Through Bonree ONE, Bonree Data helps China National Offshore Oil Corporation (CNOOC) establish a resource management metric system for its multi-cloud platform, dynamically monitoring resources such as hosts, virtual machines, networks, storage, and containerized services. This enables quantifiable allocation and reclamation of cloud platform infrastructure resources. At the same time, it supports CNOOC in building a business-centered application system SLO monitoring mechanism, providing quantifiable monitoring data on resource consumption, application availability, and service quality for critical cloud applications, effectively improving business stability and operational efficiency




Background Analysis


   

1. The cloud platform resources are vast and complex, making it difficult to manage from a global perspective  

After years of development, China National Offshore Oil Corporation’s (CNOOC) cloud platform has formed a multi-cloud architecture covering five domestic centers and three overseas centers. The overall resource volume is large and complex, and the usage of resources across various cloud platforms lacks centralized display and analysis, resulting in high difficulty for global management.


2. Business resource consumption is unclear and lacks centralized statistics

The overall resource consumption of business on China National Offshore Oil Corporation’s (CNOOC) cloud platform lacks centralized statistics, making it impossible to reasonably adjust and allocate resources, and difficult to conduct business ROI analysis


3. Idle resource usage lacks quantitative basis, indirectly causing resource waste  

CNOOC’s business systems lack historical data on business resource consumption, resulting in the absence of resource benchmarks and quantitative basis for idle resource usage when applying for capacity


4. The system monitoring indicator systems are not unified, lacking comprehensive monitoring  

CNOOC’s various business systems have inconsistent monitoring indicator frameworks and lack systematic, comprehensive monitoring, making it difficult to establish standardized fault classification and resource evaluation systems


5. Long fault localization time and difficulty in cross-department fault tracking  

CNOOC’s systems lack the capability for fault backtracking and tracing. Incidental fault data cannot be retained, resulting in prolonged diagnosis and localization time for complex faults, which affects MTTD (Mean Time to Detect). During cross-departmental diagnosis, the indicators and data generated by troubleshooting tools are difficult to correlate, making fault tracking challenging




Application Scenario


   


1. Establish a unified resource monitoring system standard to achieve standardized resource stratification

Bonree helped CNOOC establish a unified cloud platform resource monitoring system standard, achieving standardized stratification of various cloud platform resources. By collecting metric data from each platform, a unified monitoring view and analysis interface were created.

The IaaS layer mainly includes seven key entity types: hosts, virtual hosts, network devices, network interfaces, storage, file systems, and system processes.

The PaaS layer mainly includes nine key entity types: container clusters, nodes, workloads, jobs, services, pods, routes, images, and cloud services.

The SaaS layer mainly includes six key entity types: cloud services, instances, applications, message queues (MQ), databases (DB), and APIs


2. Collect data on system resource usage and regularly evaluate resource utilization efficiency

With the support of Bonree Data, China National Offshore Oil Corporation (CNOOC) has realized data collection on resource usage across various business systems. By associating entity relationship data within the resource metrics system with the resource consumption of each business system, it achieves dynamic monitoring and analytical reporting of resources for each business system, enabling regular evaluation of their resource utilization efficiency. At the same time, business attribute tags are established for major resource types to enable dynamic monitoring and allocation of cloud resource usage across business systems.

IaaS layer: hosts, virtual machines, storage, network links;

PaaS layer: containers (Pods), workloads, services, request volume of cloud services;

SaaS layer: process resource usage, remote API call volume, database call volume


3. Establish Capacity Planning Reports to Improve Resource Utilization

Bonree quantifies various types of capacity across CNOOC's cloud platforms using historical metric data, as well as the per-unit business resource consumption of business systems. This enables the creation of cloud platform capacity planning reports and capacity expansion evaluation standards for business systems, thereby improving the efficiency of cloud resource utilization.

Based on periodic capacity metrics—such as the number of cores, memory size, storage capacity, network bandwidth, and cloud service request volume—for each cloud platform, linear and non-linear forecasts are conducted to provide recommendations for the next cycle.

Standardized evaluation criteria are established for business system capacity requests. When business systems apply for resources, the platform can immediately provide monthly, quarterly, and semi-annual usage trends and conduct capacity assessments based on per-unit business resource consumption.


4. Establish an SLO monitoring system to achieve comprehensive observability of all business systems


Bonree uses the VALET model as the unified framework for SLO (Service Level Objective) monitoring across business systems. By collecting golden metrics from each system through application probes as SLIs (Service Level Indicators), Bonree helps CNOOC establish a robust SLO monitoring system.

Set Critical User Journey SLOs based on departmental evaluation goals using the VALET model;

SLO alerts are configured using error budget thresholds and pushed to platform operations personnel or business users


5. Enhance fault diagnosis capability by adding application component metric collection

Use the Bonree ONE platform’s application probe as the data collection agent for ADDP. On top of collecting tracing data (call chains), also collect application component metrics to improve fault diagnosis capability.

Grant business departments the authority to trace and analyze call chains of their own application systems, thereby enhancing their ability to diagnose faults.

Achieve real-time collection and retention of application component traces (Trace), metrics (Metric), and stack information (Log);

For abnormal requests, it enables real-time retrieval of metric data from all components involved within the application system and allows for code-level error analysis using stack information


6. Break down departmental data silos and improve collaborative fault diagnosis efficiency

By centrally collecting data from Bonree's ITIM and APM probes, the solution enables correlation analysis between applications and underlying infrastructure. It establishes layered associations of SLIs (Service Level Indicators) across applications, systems, and other tiers through a unified metrics system, consolidating diagnostic entry points and data integration — thereby improving the efficiency of collaborative fault diagnosis.

Unify the collaborative fault analysis interface, enabling dependency correlation across applications, services, APIs, methods, instances, processes, containers, hosts, and databases.

Leverage distributed tracing capabilities to establish entity impact dependencies based on applications, services, and databases.

Through a standardized monitoring metrics system and clearly defined entity types and relationships, a unified alert event language is formed, and alerts are aggregated in multiple ways to reduce alert redundancy





Why Bonree


   

1. Leading market share:


Ranked No.1 in China’s APM market, consistently providing services to customers for over 14 years      


       

2. Globally competitive product:  

   

The integrated intelligent observability platform Bonree ONE, truly enabling full-stack observability of business applications.


       


   

Application Result


   

1. Achieve standardized resource layering:


Classify various cloud platform resources into IaaS, PaaS, and SaaS layers to facilitate management from a global perspective



2. Establish an SLO monitoring system:


Use the VALET model as the unified model for SLO monitoring across business systems to achieve comprehensive observability


3. Shorten MTTD (Mean Time To Detect):


Implement fault traceability to reduce the average fault detection time


4. Enhance cross-departmental coordinated diagnosis:


Realize correlation analysis between applications and underlying resources to break down data silos between departments






Related Case

客户案例

See Our Unified Intelligent Observability Platform in Action!