US20100124165A1 - Silent Failure Identification and Trouble Diagnosis - Google Patents
Silent Failure Identification and Trouble Diagnosis Download PDFInfo
- Publication number
- US20100124165A1 US20100124165A1 US12/274,737 US27473708A US2010124165A1 US 20100124165 A1 US20100124165 A1 US 20100124165A1 US 27473708 A US27473708 A US 27473708A US 2010124165 A1 US2010124165 A1 US 2010124165A1
- Authority
- US
- United States
- Prior art keywords
- network element
- silent failure
- performance data
- determining
- silent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003745 diagnosis Methods 0.000 title 1
- 238000000034 method Methods 0.000 claims abstract description 30
- 238000013024 troubleshooting Methods 0.000 claims abstract description 22
- 238000004458 analytical method Methods 0.000 claims description 7
- 238000012546 transfer Methods 0.000 claims description 4
- 238000012544 monitoring process Methods 0.000 description 17
- 230000036541 health Effects 0.000 description 16
- 238000007726 management method Methods 0.000 description 14
- 238000012545 processing Methods 0.000 description 8
- 230000006854 communication Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000001413 cellular effect Effects 0.000 description 3
- 238000012423 maintenance Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000007175 bidirectional communication Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000001627 detrimental effect Effects 0.000 description 1
- 238000002405 diagnostic procedure Methods 0.000 description 1
- VJYFKVYYMZPMAB-UHFFFAOYSA-N ethoprophos Chemical compound CCCSP(=O)(OCC)SCCC VJYFKVYYMZPMAB-UHFFFAOYSA-N 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0663—Performing the actions predefined by failover planning, e.g. switching to standby network elements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0681—Configuration of triggering conditions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/16—Threshold monitoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
- G06F11/349—Performance evaluation by tracing or monitoring for interfaces, buses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/81—Threshold
Definitions
- Exemplary embodiments relate generally to the field of telecommunications networks, and more specifically, to identifying silent failures in telecommunications networks and diagnosing troubles that caused the silent failures.
- a telecommunications network generally includes multiple network elements, such as switches and routers, functionally coupled via a suitable communications network.
- the network elements are typically manufactured with alarms to indicate that a portion of the network element has failed.
- routers commonly include alarms for detecting port failures and card failures. These alarms enable maintenance personnel and/or automated maintenance systems to easily determine the source of a failure and to efficiently resolve the failure.
- Alarms are generally limited to identifying those failures that the manufacturer chooses. In many cases, alarms are only included for fatal errors that result in the complete failure of a network element. Any failures at the network elements that do not result in an alarm are commonly referred to as “silent failures.” Silent failures can result in a number of problems that adversely affects customer traffic, such as packet loss or a reduction of two-way traffic into one-way traffic. Since silent failures by definition do not generate alarms, silent failures are conventionally detected by customers who manually monitor their own network performance. This is especially problematic during off-hours when the customer may not be actively monitoring network performance. For example, a silent failure may occur at a business on late Friday afternoon and not be discovered by the customer until Monday morning, thereby allowing the network problems to endure through the entire weekend at the business's detriment.
- a customer When a customer detects a decrease in network performance (e.g., a reduction in data transmission rates), the customer typically contacts its corresponding service provider. The service provider may then manually deploy personnel to perform a variety of diagnostic tests in order to discover the cause of the decrease in network performance. In many cases, until these tests are completed, the service provider is unaware whether the decrease in network performance is caused by a silent failure (i.e., a failure at the service provider's network elements) or by actions on the customer's side. Performing these tests are generally time consuming and can lead to significant downtime for the customer.
- a silent failure i.e., a failure at the service provider's network elements
- Embodiments of the disclosure presented herein include methods, systems, and computer-readable media for identifying and resolving a silent failure in a telecommunications network.
- a method for identifying and resolving a silent failure in a telecommunications network is provided.
- performance data associated with data traffic passing through a network element in the telecommunications network is collected.
- a determination is made whether the performance data has fallen below a threshold to identify the silent failure at the network element.
- the silent failure fails to trigger an alarm included on the network element.
- troubleshooting rules may be retrieved.
- the silent failure is resolved based on the performance data and the troubleshooting rules.
- a system for identifying and resolving a silent failure in a telecommunications network includes a memory and a processor functionally coupled to the memory.
- the memory stores a program containing code for identifying and resolving the silent failure in the telecommunications network.
- the processor is responsive to computer-executable instructions contained in the program and operative to collect performance data associated with data traffic passing through a network element in the telecommunications network, determine whether the performance data has fallen below a threshold to identify the silent failure at the network element, responsive to determining that the performance data is below the threshold and thereby identifying the silent failure at the network element, retrieve troubleshooting rules, and resolving the silent failure based on the performance data and the troubleshooting rules.
- the silent failure fails to trigger an alarm included on the network element.
- a computer-readable medium having instructions stored thereon for execution by a processor to perform a method for identifying and resolving a silent failure in a telecommunications network.
- performance data associated with data traffic passing through a network element in the telecommunications network is collected.
- a determination is made whether the performance data has fallen below a threshold to identify the silent failure at the network element.
- the silent failure fails to trigger an alarm included on the network element.
- troubleshooting rules may be retrieved. The silent failure is resolved based on the performance data and the troubleshooting rules.
- FIG. 1 is a diagram illustrating a network architecture operative to identify and resolve a silent failure in a telecommunications network, in accordance with exemplary embodiments.
- FIG. 2 is a flow diagram illustrating a method for identifying and resolving a silent failure in a telecommunications network, in accordance with exemplary embodiments.
- FIG. 3 is a computer architecture diagram showing aspects of an illustrative computer hardware architecture for a computing system capable of implementing aspects of the embodiments presented herein.
- FIG. 1 shows an illustrative telecommunications network architecture 100 according to exemplary embodiments.
- the architecture 100 includes an Internet backbone 102 , a core network 104 , an access network 106 , and a customer premises 108 .
- the customer premises 108 may include a variety of customer devices (that may be used by users other than the customers), such as telephones and computers.
- these customer devices are functionally coupled to the access network 106 via a router (not shown).
- the access network 106 which is may be operated by a service provider, may include a Digital Subscriber Line Access Multiplexer (“DSLAM”) (not shown) functionally coupled to the router via a local loop.
- the DSLAM functionally couples the access network 106 to the core network 104 , which provides a variety of services to customers connected to the access network 106 .
- the core network 104 provides access to the Internet backbone 102 to enable communications with other private networks and Internet Service Providers (“ISPs”).
- ISPs Internet Service Providers
- the Internet backbone 102 , the core network 104 , the access network 106 , and the customer premises 108 are well known to those skilled in the art as common components in telecommunications infrastructures, and as such, are not described in greater detail herein.
- the architecture 100 further includes a network health monitoring module 110 , a rule management module 112 , a rule store 114 , a trouble diagnostics module 116 , and a ticketing module 118 .
- the network health monitoring module 110 monitors incoming and outgoing data traffic at the core network 104 and the access network 106 in order to collect current performance data, such as the number of packets being transmitted and/or received within a given interval.
- the performance data may include any suitable data that indicates the relative performance of the data core network 104 and the access network 106 .
- the performance data is retrieved from the network elements present in the data core network 104 and the access network 106 .
- the network health monitoring module 110 compares the current performance data with a given threshold in order to determine whether a change in the performance data has occurred.
- a significant and detrimental change in the performance data may be an indication of a silent failure. If the current performance data exceeds or falls below the threshold, the network health monitoring module 110 informs the rule management module 112 that a possible silent failure has been detected.
- An example threshold may be the number of packets being transmitted within a given time period. Other suitable indicators of network performance may be similarly utilized as contemplated by those skilled in the art.
- the rule management module 112 retrieves troubleshooting rules from the rule store 114 and provides the retrieved troubleshooting rules to the trouble diagnostics module 116 , which executes automated diagnostics and recovery procedures in accordance with the troubleshooting rules and the performance data acquired by the network health monitoring module 110 .
- the rule management module 112 may inform the ticketing module 118 to generate a trouble ticket, which is then provided to a service provider (not shown). Responsive to receiving the trouble ticket, the service provider may dispatch maintenance personnel or perform other procedures in order to resolve the silent failure as contemplated by those skilled in the art.
- the rule management module 112 may further determine whether the silent failure is due to a switchover by performing a root cause analysis.
- a switchover refers to a “switch over” from a primary device to a standby device when the primary device fails. The switchover may cause a silent failure if the primary device is deactivated while the standby device is not properly activated, resulting in neither the primary device nor the standby device being operative. If the rule management module 112 determines that the silent failure is due to the switchover, the rule management module 112 may inform the ticketing module 118 to generate a trouble ticket.
- the network health monitoring module 110 , the rule management module 112 , and the trouble diagnostics module 116 enable service providers to take a proactive approach towards identifying and resolving silent failures.
- the network health monitoring module 110 , the rule management module 112 , and the trouble diagnostics module 116 can identify and resolve silent failures prior to the customer complaining. This is particularly useful if the silent failure occurs, for example, on a late Friday afternoon before the weekend or before an extended holiday break. In these cases, the service provider can begin resolving the silent failure immediately, instead of waiting until the customer complains on the next work day.
- FIG. 2 is a flow diagram illustrated a method 200 for identifying and resolving silent failures in a telecommunications network.
- the network health monitoring module 110 collects (at 202 ) current performance data associated with incoming and/or outgoing traffic from the core network 104 and the access network 106 .
- the network health monitoring module 110 may collect the current performance data for a given time interval.
- An example of performance data is the number of packets being transmitted and/or received within the given time interval.
- the performance data is obtained directly from the network elements by querying the network elements for the performance data.
- the network health monitoring module 110 compares (at 204 ) the current performance data with a given threshold in order to determine whether the data traffic being transmitted and/or received through the core network 104 has degraded to a level that indicates a silent failure.
- a difference is determined between the current performance data and previous performance data. The difference is then compared with a minimum value indicating a silent failure. Thus, if the difference is greater than the minimum value, then the current performance data has fallen below the threshold. If the difference less than the minimum value, then the current performance data has not fallen below the threshold.
- Network elements may include single-port network elements and dual-port network elements.
- a single-port network element data traffic flows through the single port.
- the current performance data obtained from the network element is necessarily associated with the single port.
- a dual-port (or multi-port) network element one port typically serves as a primary port while another port serves as a standby port. If the primary port fails, then the standby port can become active.
- the network health monitoring module 110 may retrieve performance data associated with both ports in order to determine which port is active. Responsive to determining which port is active, the network health monitoring module 110 can then compare the performance data associated with the active port against the threshold.
- the network health monitoring module 110 continues to monitor the core network 104 and the access network 106 by collecting (at 202 ) performance data from the core network 104 and the access network 106 . If it is determined (at 206 ) that the current performance data falls below the threshold, then the network health monitoring module 110 retrieves (at 208 ) troubleshooting rules from the trouble from the rule store 114 and provides the troubleshooting rules to the trouble diagnostics module 116 .
- the trouble diagnostics module 116 Responsive to receiving the troubleshooting rules and the performance data from the rule management module 112 , the trouble diagnostics module 116 performs (at 210 ) various actions in order to isolate and resolve (or attempt to resolve) the silent failure in accordance with the troubleshooting rules and the performance data obtained by the network health monitoring module 110 . If the trouble diagnostics module 116 determines (at 212 ) that the silent failure is successfully resolved, then the trouble diagnostics module 116 resets (at 214 ) the network element where the silent failure was identified. If it is determined (at 212 ) that the silent failure was not successfully resolved, then the trouble diagnostics module 116 may not reset the identified network element and may inform the ticketing module 118 , which generates (at 216 ) a trouble ticket for the silent failure.
- the trouble diagnostics module 116 examines User-to-Network Interface (“UNI”) ports on a given network element, which in this case is an Asynchronous Transfer Mode (“ATM”) switch, and at the customer premises 108 through the access network 106 .
- the trouble diagnostics module 116 examines the UNI ports of virtual paths (“VPs”) or virtual channels (“VCs”) associated with the network element.
- the trouble diagnostics module 116 may examine traffic at the port level (i.e., at the network element) as well as the endpoint level (i.e., at the customer premises 108 ) with respect to the UNI ports. For example, if one-way traffic is found at the port level of a given UNI port, then the silent failure may isolate the silent failure to a processor card containing the UNI port.
- the trouble diagnostics module 116 may examine other endpoints, if available, in order to determine whether the other endpoints are experiencing the same issue. Responsive to finding a second endpoint that is experiencing two-way traffic, the trouble diagnostics module 116 may return to the original endpoint experiencing the one-way traffic and optimize (or attempt to optimize) the original endpoint's associated path in accordance with the second endpoint's associated path. That is, the trouble diagnostics module 116 may trigger a re-route attempt. If multiple VPs or VCs experience problems, then the trouble diagnostics module 116 may examine multiple path points along the VPs or VCs in order to find common points, which may indicate the source of the silent failure.
- the rule management module 112 may further perform (at 218 ) a root cause analysis to determine whether the silent failure is due a switchover.
- a APS scheme includes an active line and a protection line, each of which is associated with a separate APS-enabled processor card.
- the active line fails, data traffic is switched from the active line to the protection line.
- the APS-enabled processor card associated with the active line switches from an UP state into a DOWN state
- the APS-enabled processor card associated with the protection line switches from a DOWN state into an UP state.
- the rule management module 112 may determine a switchover failure if both of the APS-enabled processor cards are in a DOWN state and neither the active line nor the protection line is handling data traffic. Responsive to performing the root cause analysis, if it is determined (at 220 ) that the silent failure is due to a switchover, then the rule management module informs the ticketing module 118 , which generates (at 222 ) a trouble ticket for the silent failure. If it is determined (at 220 ) that the silent failure is not due to a switchover, then the method 200 ends.
- FIG. 3 and the following discussion are intended to provide a brief, general description of a suitable computing environment in which embodiments may be implemented. While embodiments will be described in the general context of program modules that execute in conjunction with an application program that runs on an operating system on a computer system, those skilled in the art will recognize that the embodiments may also be implemented in combination with other program modules.
- program modules include routines, programs, components, data structures, and other types of structures that perform particular tasks or implement particular abstract data types.
- embodiments may be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like.
- the embodiments may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
- program modules may be located in both local and remote memory storage devices.
- FIG. 3 is a block diagram illustrating a system 300 operative to identify and resolve a silent failure, in accordance with exemplary embodiments.
- the system 300 includes a processing unit 302 , a memory 304 , one or more user interface devices 306 , one or more input/output (“I/O”) devices 308 , and one or more network devices 310 , each of which is operatively connected to a system bus 312 .
- the bus 312 enables bi-directional communication between the processing unit 302 , the memory 304 , the user interface devices 306 , the I/O devices 308 , and the network devices 310 .
- Examples of the system 300 include, but are not limited to, computers, servers, personal digital assistants, cellular phones, or any suitable computing devices.
- the processing unit 302 may be a standard central processor that performs arithmetic and logical operations, a more specific purpose programmable logic controller (“PLC”), a programmable gate array, or other type of processor known to those skilled in the art and suitable for controlling the operation of the server computer. Processing units are well-known in the art, and therefore not described in further detail herein.
- PLC programmable logic controller
- the memory 304 communicates with the processing unit 302 via the system bus 312 .
- the memory 304 is operatively connected to a memory controller (not shown) that enables communication with the processing unit 302 via the system bus 312 .
- the memory 304 includes an operating system 314 , one or more databases 315 , and one or more program modules 316 , according to exemplary embodiments.
- the program modules 316 may include the network health monitoring module 110 , the rule management module 112 , the trouble diagnostics module 116 , and the ticketing module 118 .
- the method 200 as described above with respect to FIG. 2 is embodied as a program module in the memory 304 .
- An example of the databases 315 is the rule store 114 .
- operating systems such as the operating system 314
- Examples of operating systems include, but are not limited to, WINDOWS and WINDOWS MOBILE operating systems from MICROSOFT CORPORATION, MAC OS operating system from APPLE CORPORATION, LINUX operating system, SYMBIAN OS from SYMBIAN SOFTWARE LIMITED, BREW from QUALCOMM INCORPORATED, and FREEBSD operating system.
- Computer-readable media may comprise computer storage media and communication media.
- Computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data.
- Computer storage media includes, but is not limited to, RAM, ROM, Erasable Programmable ROM (“EPROM”), Electrically Erasable Programmable ROM (“EEPROM”), flash memory or other solid state memory technology, CD-ROM, digital versatile disks (“DVD”), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the system 300 .
- the user interface devices 306 may include one or more devices with which a user accesses the system 300 .
- the user interface devices 306 may include, but are not limited to, computers, servers, personal digital assistants, cellular phones, or any suitable computing devices.
- the I/O devices 308 are operatively connected to an I/O controller (not shown) that enables communication with the processing unit 302 via the system bus 312 .
- the I/O devices 308 may include one or more input devices, such as, but not limited to, a keyboard, a mouse, or an electronic stylus. Further, the I/O devices 308 may include one or more output devices, such as, but not limited to, a display screen or a printer.
- the network devices 310 enable the system 300 to communicate with other networks or remote systems via a network 318 .
- the network 318 is an ATM-based network.
- Examples of network devices 310 may include, but are not limited to, a modem, a radio frequency (“RF”) or infrared (“IR”) transceiver, a telephonic interface, a bridge, a router, or a network card.
- RF radio frequency
- IR infrared
- the network 318 may include a wireless network such as, but not limited to, a Wireless Local Area Network (“WLAN”) such as a WI-FI network, a Wireless Wide Area Network (“WWAN”), a Wireless Personal Area Network (“WPAN”) such as BLUETOOTH, a Wireless Metropolitan Area Network (“WMAN”) such a WiMAX network, or a cellular network.
- WLAN Wireless Local Area Network
- WWAN Wireless Wide Area Network
- WPAN Wireless Personal Area Network
- WMAN Wireless Metropolitan Area Network
- the network 318 may be a wired network such as, but not limited to, a Wide Area Network (“WAN”) such as the Internet, a Local Area Network (“LAN”) such as the Ethernet, a wired Personal Area Network (“PAN”), or a wired Metropolitan Area Network (“MAN”).
- WAN Wide Area Network
- LAN Local Area Network
- PAN personal Area Network
- MAN wired Metropolitan Area Network
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
- Exemplary embodiments relate generally to the field of telecommunications networks, and more specifically, to identifying silent failures in telecommunications networks and diagnosing troubles that caused the silent failures.
- A telecommunications network generally includes multiple network elements, such as switches and routers, functionally coupled via a suitable communications network. The network elements are typically manufactured with alarms to indicate that a portion of the network element has failed. For example, routers commonly include alarms for detecting port failures and card failures. These alarms enable maintenance personnel and/or automated maintenance systems to easily determine the source of a failure and to efficiently resolve the failure.
- Alarms are generally limited to identifying those failures that the manufacturer chooses. In many cases, alarms are only included for fatal errors that result in the complete failure of a network element. Any failures at the network elements that do not result in an alarm are commonly referred to as “silent failures.” Silent failures can result in a number of problems that adversely affects customer traffic, such as packet loss or a reduction of two-way traffic into one-way traffic. Since silent failures by definition do not generate alarms, silent failures are conventionally detected by customers who manually monitor their own network performance. This is especially problematic during off-hours when the customer may not be actively monitoring network performance. For example, a silent failure may occur at a business on late Friday afternoon and not be discovered by the customer until Monday morning, thereby allowing the network problems to endure through the entire weekend at the business's detriment.
- When a customer detects a decrease in network performance (e.g., a reduction in data transmission rates), the customer typically contacts its corresponding service provider. The service provider may then manually deploy personnel to perform a variety of diagnostic tests in order to discover the cause of the decrease in network performance. In many cases, until these tests are completed, the service provider is unaware whether the decrease in network performance is caused by a silent failure (i.e., a failure at the service provider's network elements) or by actions on the customer's side. Performing these tests are generally time consuming and can lead to significant downtime for the customer.
- It should be appreciated that this Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
- Embodiments of the disclosure presented herein include methods, systems, and computer-readable media for identifying and resolving a silent failure in a telecommunications network. According to one aspect, a method for identifying and resolving a silent failure in a telecommunications network is provided. According to the method, performance data associated with data traffic passing through a network element in the telecommunications network is collected. A determination is made whether the performance data has fallen below a threshold to identify the silent failure at the network element. The silent failure fails to trigger an alarm included on the network element. Responsive to determining that the performance data is below the threshold and thereby identifying the silent failure at the network element, troubleshooting rules may be retrieved. The silent failure is resolved based on the performance data and the troubleshooting rules.
- According to another aspect, a system for identifying and resolving a silent failure in a telecommunications network is provided. The system includes a memory and a processor functionally coupled to the memory. The memory stores a program containing code for identifying and resolving the silent failure in the telecommunications network. The processor is responsive to computer-executable instructions contained in the program and operative to collect performance data associated with data traffic passing through a network element in the telecommunications network, determine whether the performance data has fallen below a threshold to identify the silent failure at the network element, responsive to determining that the performance data is below the threshold and thereby identifying the silent failure at the network element, retrieve troubleshooting rules, and resolving the silent failure based on the performance data and the troubleshooting rules. The silent failure fails to trigger an alarm included on the network element.
- According to yet another aspect, a computer-readable medium having instructions stored thereon for execution by a processor to perform a method for identifying and resolving a silent failure in a telecommunications network is provided. According to the method, performance data associated with data traffic passing through a network element in the telecommunications network is collected. A determination is made whether the performance data has fallen below a threshold to identify the silent failure at the network element. The silent failure fails to trigger an alarm included on the network element. Responsive to determining that the performance data is below the threshold and thereby identifying the silent failure at the network element, troubleshooting rules may be retrieved. The silent failure is resolved based on the performance data and the troubleshooting rules.
- Other systems, methods, and/or computer program products according to embodiments will be or become apparent to one with skill in the art upon review of the following drawings and detailed description. It is intended that all such additional systems, methods, and/or computer program products be included within this description, be within the scope of the present invention, and be protected by the accompanying claims.
-
FIG. 1 is a diagram illustrating a network architecture operative to identify and resolve a silent failure in a telecommunications network, in accordance with exemplary embodiments. -
FIG. 2 is a flow diagram illustrating a method for identifying and resolving a silent failure in a telecommunications network, in accordance with exemplary embodiments. -
FIG. 3 is a computer architecture diagram showing aspects of an illustrative computer hardware architecture for a computing system capable of implementing aspects of the embodiments presented herein. - The following detailed description is directed to identifying and resolving silent failures in a telecommunications network. While the subject matter described herein is presented in the general context of program modules that execute in conjunction with the execution of an operating system and application programs on a computer system, those skilled in the art will recognize that other implementations may be performed in combination with other types of program modules. Generally, program modules include routines, programs, components, data structures, and other types of structures that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the subject matter described herein may be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like.
- In the following detailed description, references are made to the accompanying drawings that form a part hereof, and which are shown by way of illustration, using specific embodiments or examples. Referring now to the drawings, in which like numerals represent like elements through the several figures, aspects of a computing system and methodology for detecting silent failures in a telecommunications network will be described.
FIG. 1 shows an illustrativetelecommunications network architecture 100 according to exemplary embodiments. Thearchitecture 100 includes anInternet backbone 102, acore network 104, anaccess network 106, and a customer premises 108. The customer premises 108 may include a variety of customer devices (that may be used by users other than the customers), such as telephones and computers. In one embodiment, these customer devices are functionally coupled to theaccess network 106 via a router (not shown). In particular, theaccess network 106, which is may be operated by a service provider, may include a Digital Subscriber Line Access Multiplexer (“DSLAM”) (not shown) functionally coupled to the router via a local loop. The DSLAM functionally couples theaccess network 106 to thecore network 104, which provides a variety of services to customers connected to theaccess network 106. Further, thecore network 104 provides access to theInternet backbone 102 to enable communications with other private networks and Internet Service Providers (“ISPs”). TheInternet backbone 102, thecore network 104, theaccess network 106, and the customer premises 108 are well known to those skilled in the art as common components in telecommunications infrastructures, and as such, are not described in greater detail herein. - The
architecture 100 further includes a networkhealth monitoring module 110, arule management module 112, arule store 114, atrouble diagnostics module 116, and aticketing module 118. According to embodiments, the networkhealth monitoring module 110 monitors incoming and outgoing data traffic at thecore network 104 and theaccess network 106 in order to collect current performance data, such as the number of packets being transmitted and/or received within a given interval. The performance data may include any suitable data that indicates the relative performance of thedata core network 104 and theaccess network 106. In one embodiment, the performance data is retrieved from the network elements present in thedata core network 104 and theaccess network 106. The networkhealth monitoring module 110 then compares the current performance data with a given threshold in order to determine whether a change in the performance data has occurred. In particular, a significant and detrimental change in the performance data may be an indication of a silent failure. If the current performance data exceeds or falls below the threshold, the networkhealth monitoring module 110 informs therule management module 112 that a possible silent failure has been detected. An example threshold may be the number of packets being transmitted within a given time period. Other suitable indicators of network performance may be similarly utilized as contemplated by those skilled in the art. - According to embodiments, the
rule management module 112 retrieves troubleshooting rules from therule store 114 and provides the retrieved troubleshooting rules to thetrouble diagnostics module 116, which executes automated diagnostics and recovery procedures in accordance with the troubleshooting rules and the performance data acquired by the networkhealth monitoring module 110. For a silent failure that thetrouble diagnostics module 116 cannot resolve under automated procedures, therule management module 112 may inform theticketing module 118 to generate a trouble ticket, which is then provided to a service provider (not shown). Responsive to receiving the trouble ticket, the service provider may dispatch maintenance personnel or perform other procedures in order to resolve the silent failure as contemplated by those skilled in the art. - The
rule management module 112 may further determine whether the silent failure is due to a switchover by performing a root cause analysis. As used herein, a switchover refers to a “switch over” from a primary device to a standby device when the primary device fails. The switchover may cause a silent failure if the primary device is deactivated while the standby device is not properly activated, resulting in neither the primary device nor the standby device being operative. If therule management module 112 determines that the silent failure is due to the switchover, therule management module 112 may inform theticketing module 118 to generate a trouble ticket. - In conventional practice, service providers typically do not take proactive steps to identify silent failures, relying primarily on alarms that are built into network elements by their manufacturers. In many cases, the service providers do not discover the presence of a silent failure until a customer complains about degradation in their data traffic. This often leads to unhappy and dissatisfied customers. The network
health monitoring module 110, therule management module 112, and thetrouble diagnostics module 116 enable service providers to take a proactive approach towards identifying and resolving silent failures. In particular, the networkhealth monitoring module 110, therule management module 112, and thetrouble diagnostics module 116 can identify and resolve silent failures prior to the customer complaining. This is particularly useful if the silent failure occurs, for example, on a late Friday afternoon before the weekend or before an extended holiday break. In these cases, the service provider can begin resolving the silent failure immediately, instead of waiting until the customer complains on the next work day. -
FIG. 2 is a flow diagram illustrated amethod 200 for identifying and resolving silent failures in a telecommunications network. According to themethod 200, the networkhealth monitoring module 110 collects (at 202) current performance data associated with incoming and/or outgoing traffic from thecore network 104 and theaccess network 106. In particular, the networkhealth monitoring module 110 may collect the current performance data for a given time interval. An example of performance data is the number of packets being transmitted and/or received within the given time interval. In one embodiment, the performance data is obtained directly from the network elements by querying the network elements for the performance data. - Responsive to collecting the current performance data, the network
health monitoring module 110 compares (at 204) the current performance data with a given threshold in order to determine whether the data traffic being transmitted and/or received through thecore network 104 has degraded to a level that indicates a silent failure. In one embodiment, a difference is determined between the current performance data and previous performance data. The difference is then compared with a minimum value indicating a silent failure. Thus, if the difference is greater than the minimum value, then the current performance data has fallen below the threshold. If the difference less than the minimum value, then the current performance data has not fallen below the threshold. - Network elements may include single-port network elements and dual-port network elements. In a single-port network element, data traffic flows through the single port. In this case, the current performance data obtained from the network element is necessarily associated with the single port. In a dual-port (or multi-port) network element, one port typically serves as a primary port while another port serves as a standby port. If the primary port fails, then the standby port can become active. In this case, the network
health monitoring module 110 may retrieve performance data associated with both ports in order to determine which port is active. Responsive to determining which port is active, the networkhealth monitoring module 110 can then compare the performance data associated with the active port against the threshold. - If it is determined (at 206) that the current performance data does not fall below the threshold, then the network
health monitoring module 110 continues to monitor thecore network 104 and theaccess network 106 by collecting (at 202) performance data from thecore network 104 and theaccess network 106. If it is determined (at 206) that the current performance data falls below the threshold, then the networkhealth monitoring module 110 retrieves (at 208) troubleshooting rules from the trouble from therule store 114 and provides the troubleshooting rules to thetrouble diagnostics module 116. - Responsive to receiving the troubleshooting rules and the performance data from the
rule management module 112, thetrouble diagnostics module 116 performs (at 210) various actions in order to isolate and resolve (or attempt to resolve) the silent failure in accordance with the troubleshooting rules and the performance data obtained by the networkhealth monitoring module 110. If thetrouble diagnostics module 116 determines (at 212) that the silent failure is successfully resolved, then thetrouble diagnostics module 116 resets (at 214) the network element where the silent failure was identified. If it is determined (at 212) that the silent failure was not successfully resolved, then thetrouble diagnostics module 116 may not reset the identified network element and may inform theticketing module 118, which generates (at 216) a trouble ticket for the silent failure. - In an illustrative implementation of a troubleshooting process in which two-way traffic has degraded into one-way traffic, the
trouble diagnostics module 116 examines User-to-Network Interface (“UNI”) ports on a given network element, which in this case is an Asynchronous Transfer Mode (“ATM”) switch, and at the customer premises 108 through theaccess network 106. In particular, thetrouble diagnostics module 116 examines the UNI ports of virtual paths (“VPs”) or virtual channels (“VCs”) associated with the network element. Thetrouble diagnostics module 116 may examine traffic at the port level (i.e., at the network element) as well as the endpoint level (i.e., at the customer premises 108) with respect to the UNI ports. For example, if one-way traffic is found at the port level of a given UNI port, then the silent failure may isolate the silent failure to a processor card containing the UNI port. - However, if the traffic at the port level appears to be normal but one-way traffic is found at the endpoint level, then the
trouble diagnostics module 116 may examine other endpoints, if available, in order to determine whether the other endpoints are experiencing the same issue. Responsive to finding a second endpoint that is experiencing two-way traffic, thetrouble diagnostics module 116 may return to the original endpoint experiencing the one-way traffic and optimize (or attempt to optimize) the original endpoint's associated path in accordance with the second endpoint's associated path. That is, thetrouble diagnostics module 116 may trigger a re-route attempt. If multiple VPs or VCs experience problems, then thetrouble diagnostics module 116 may examine multiple path points along the VPs or VCs in order to find common points, which may indicate the source of the silent failure. - The
rule management module 112 may further perform (at 218) a root cause analysis to determine whether the silent failure is due a switchover. An illustrative implementation of the root-cause analysis involving an Automatic Protection Switching (“APS”) scheme will now be described. According to exemplary embodiments, a APS scheme includes an active line and a protection line, each of which is associated with a separate APS-enabled processor card. When the active line fails, data traffic is switched from the active line to the protection line. In order to represent this transition, the APS-enabled processor card associated with the active line switches from an UP state into a DOWN state, and the APS-enabled processor card associated with the protection line switches from a DOWN state into an UP state. Even after the active line is recovered, the data traffic may remain on the protection line. In this example, therule management module 112 may determine a switchover failure if both of the APS-enabled processor cards are in a DOWN state and neither the active line nor the protection line is handling data traffic. Responsive to performing the root cause analysis, if it is determined (at 220) that the silent failure is due to a switchover, then the rule management module informs theticketing module 118, which generates (at 222) a trouble ticket for the silent failure. If it is determined (at 220) that the silent failure is not due to a switchover, then themethod 200 ends. -
FIG. 3 and the following discussion are intended to provide a brief, general description of a suitable computing environment in which embodiments may be implemented. While embodiments will be described in the general context of program modules that execute in conjunction with an application program that runs on an operating system on a computer system, those skilled in the art will recognize that the embodiments may also be implemented in combination with other program modules. - Generally, program modules include routines, programs, components, data structures, and other types of structures that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that embodiments may be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like. The embodiments may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.
-
FIG. 3 is a block diagram illustrating asystem 300 operative to identify and resolve a silent failure, in accordance with exemplary embodiments. Thesystem 300 includes aprocessing unit 302, amemory 304, one or more user interface devices 306, one or more input/output (“I/O”)devices 308, and one ormore network devices 310, each of which is operatively connected to a system bus 312. The bus 312 enables bi-directional communication between theprocessing unit 302, thememory 304, the user interface devices 306, the I/O devices 308, and thenetwork devices 310. Examples of thesystem 300 include, but are not limited to, computers, servers, personal digital assistants, cellular phones, or any suitable computing devices. - The
processing unit 302 may be a standard central processor that performs arithmetic and logical operations, a more specific purpose programmable logic controller (“PLC”), a programmable gate array, or other type of processor known to those skilled in the art and suitable for controlling the operation of the server computer. Processing units are well-known in the art, and therefore not described in further detail herein. - The
memory 304 communicates with theprocessing unit 302 via the system bus 312. In one embodiment, thememory 304 is operatively connected to a memory controller (not shown) that enables communication with theprocessing unit 302 via the system bus 312. Thememory 304 includes anoperating system 314, one ormore databases 315, and one ormore program modules 316, according to exemplary embodiments. Theprogram modules 316 may include the networkhealth monitoring module 110, therule management module 112, thetrouble diagnostics module 116, and theticketing module 118. In one embodiment, themethod 200 as described above with respect toFIG. 2 is embodied as a program module in thememory 304. An example of thedatabases 315 is therule store 114. Examples of operating systems, such as theoperating system 314, include, but are not limited to, WINDOWS and WINDOWS MOBILE operating systems from MICROSOFT CORPORATION, MAC OS operating system from APPLE CORPORATION, LINUX operating system, SYMBIAN OS from SYMBIAN SOFTWARE LIMITED, BREW from QUALCOMM INCORPORATED, and FREEBSD operating system. - By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to, RAM, ROM, Erasable Programmable ROM (“EPROM”), Electrically Erasable Programmable ROM (“EEPROM”), flash memory or other solid state memory technology, CD-ROM, digital versatile disks (“DVD”), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the
system 300. - The user interface devices 306 may include one or more devices with which a user accesses the
system 300. The user interface devices 306 may include, but are not limited to, computers, servers, personal digital assistants, cellular phones, or any suitable computing devices. In one embodiment, the I/O devices 308 are operatively connected to an I/O controller (not shown) that enables communication with theprocessing unit 302 via the system bus 312. The I/O devices 308 may include one or more input devices, such as, but not limited to, a keyboard, a mouse, or an electronic stylus. Further, the I/O devices 308 may include one or more output devices, such as, but not limited to, a display screen or a printer. - The
network devices 310 enable thesystem 300 to communicate with other networks or remote systems via anetwork 318. In one embodiment, thenetwork 318 is an ATM-based network. Examples ofnetwork devices 310 may include, but are not limited to, a modem, a radio frequency (“RF”) or infrared (“IR”) transceiver, a telephonic interface, a bridge, a router, or a network card. Thenetwork 318 may include a wireless network such as, but not limited to, a Wireless Local Area Network (“WLAN”) such as a WI-FI network, a Wireless Wide Area Network (“WWAN”), a Wireless Personal Area Network (“WPAN”) such as BLUETOOTH, a Wireless Metropolitan Area Network (“WMAN”) such a WiMAX network, or a cellular network. Alternatively, thenetwork 318 may be a wired network such as, but not limited to, a Wide Area Network (“WAN”) such as the Internet, a Local Area Network (“LAN”) such as the Ethernet, a wired Personal Area Network (“PAN”), or a wired Metropolitan Area Network (“MAN”). - Although the subject matter presented herein has been described in conjunction with one or more particular embodiments and implementations, it is to be understood that the embodiments defined in the appended claims are not necessarily limited to the specific structure, configuration, or functionality described herein. Rather, the specific structure, configuration, and functionality are disclosed as example forms of implementing the claims.
- The subject matter described above is provided by way of illustration only and should not be construed as limiting. Various modifications and changes may be made to the subject matter described herein without following the example embodiments and applications illustrated and described, and without departing from the true spirit and scope of the embodiments, which is set forth in the following claims.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/274,737 US7855952B2 (en) | 2008-11-20 | 2008-11-20 | Silent failure identification and trouble diagnosis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/274,737 US7855952B2 (en) | 2008-11-20 | 2008-11-20 | Silent failure identification and trouble diagnosis |
Publications (2)
Publication Number | Publication Date |
---|---|
US20100124165A1 true US20100124165A1 (en) | 2010-05-20 |
US7855952B2 US7855952B2 (en) | 2010-12-21 |
Family
ID=42172004
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/274,737 Expired - Fee Related US7855952B2 (en) | 2008-11-20 | 2008-11-20 | Silent failure identification and trouble diagnosis |
Country Status (1)
Country | Link |
---|---|
US (1) | US7855952B2 (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130230151A1 (en) * | 2008-12-12 | 2013-09-05 | At&T Intellectual Property I, L.P. | Methods and apparatus to trigger maintenance and upgrades of access networks |
US8769088B2 (en) * | 2011-09-30 | 2014-07-01 | International Business Machines Corporation | Managing stability of a link coupling an adapter of a computing system to a port of a networking device for in-band data communications |
US9213727B1 (en) * | 2012-09-28 | 2015-12-15 | Emc Corporation | Methods and apparatus for obtaining database performance snapshots |
US20150372893A1 (en) * | 2014-06-20 | 2015-12-24 | Microsoft Corporation | Identification of candidate problem network entities |
US9229800B2 (en) | 2012-06-28 | 2016-01-05 | Microsoft Technology Licensing, Llc | Problem inference from support tickets |
US9262253B2 (en) | 2012-06-28 | 2016-02-16 | Microsoft Technology Licensing, Llc | Middlebox reliability |
US9325748B2 (en) | 2012-11-15 | 2016-04-26 | Microsoft Technology Licensing, Llc | Characterizing service levels on an electronic network |
US9350601B2 (en) | 2013-06-21 | 2016-05-24 | Microsoft Technology Licensing, Llc | Network event processing and prioritization |
US9565080B2 (en) | 2012-11-15 | 2017-02-07 | Microsoft Technology Licensing, Llc | Evaluating electronic network devices in view of cost and service level considerations |
JP2020088786A (en) * | 2018-11-30 | 2020-06-04 | 富士通株式会社 | Switch device and fault detection program |
US10728085B1 (en) * | 2015-09-15 | 2020-07-28 | Amazon Technologies, Inc. | Model-based network management |
US10951462B1 (en) * | 2017-04-27 | 2021-03-16 | 8X8, Inc. | Fault isolation in data communications centers |
US20250088409A1 (en) * | 2023-03-16 | 2025-03-13 | Rakuten Mobile, Inc. | Estimation of router that is cause of silent failures |
US20250106095A1 (en) * | 2023-03-31 | 2025-03-27 | Rakuten Mobile, Inc. | Estimation of router that is cause of silent failures |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8667123B2 (en) * | 2008-09-29 | 2014-03-04 | Woodhead Industries, Inc. | Microcontroller network diagnostic system |
US8838745B2 (en) * | 2009-12-14 | 2014-09-16 | At&T Intellectual Property I, L.P. | Systems, methods and machine-readable mediums for integrated quality assurance brokering services |
CN103209093B (en) * | 2013-03-05 | 2015-11-25 | 青岛海信传媒网络技术有限公司 | Collecting method when network element is abnormal and system |
CN106385273B (en) * | 2016-09-13 | 2018-08-07 | 北京叮叮关爱科技有限公司 | Bluetooth communication based on response type and Bluetooth communication method |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020143494A1 (en) * | 2001-03-28 | 2002-10-03 | Jeff Conrad | System for time-bucketing of baselined data collector data |
US20060047807A1 (en) * | 2004-08-25 | 2006-03-02 | Fujitsu Limited | Method and system for detecting a network anomaly in a network |
US20060159017A1 (en) * | 2005-01-17 | 2006-07-20 | Seung-Cheol Mun | Dynamic quality of service (QoS) management |
US20060168271A1 (en) * | 2001-05-07 | 2006-07-27 | Pabari Vipul J | Method and apparatus for measurement, analysis, and optimization of content delivery |
US20060277299A1 (en) * | 2002-04-12 | 2006-12-07 | John Baekelmans | Arrangement for automated fault detection and fault resolution of a network device |
US20070036544A1 (en) * | 2005-07-25 | 2007-02-15 | Yasuyuki Fukashiro | Optical network, node apparatus and method for relieving path fault |
US20070168505A1 (en) * | 2006-01-19 | 2007-07-19 | Hewlett-Packard Development Company, L.P. | Performance monitoring in a network |
US20080069002A1 (en) * | 2006-09-15 | 2008-03-20 | Sbc Knowledge Ventures, L.P. | In-band media performance monitoring |
US20080181100A1 (en) * | 2007-01-31 | 2008-07-31 | Charlie Chen-Yui Yang | Methods and apparatus to manage network correction procedures |
US20080281607A1 (en) * | 2007-05-13 | 2008-11-13 | System Services, Inc. | System, Method and Apparatus for Managing a Technology Infrastructure |
-
2008
- 2008-11-20 US US12/274,737 patent/US7855952B2/en not_active Expired - Fee Related
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020143494A1 (en) * | 2001-03-28 | 2002-10-03 | Jeff Conrad | System for time-bucketing of baselined data collector data |
US20040098223A1 (en) * | 2001-03-28 | 2004-05-20 | Jeff Conrad | Computing performance thresholds based on variations in network traffic patterns |
US20060168271A1 (en) * | 2001-05-07 | 2006-07-27 | Pabari Vipul J | Method and apparatus for measurement, analysis, and optimization of content delivery |
US20060277299A1 (en) * | 2002-04-12 | 2006-12-07 | John Baekelmans | Arrangement for automated fault detection and fault resolution of a network device |
US20060047807A1 (en) * | 2004-08-25 | 2006-03-02 | Fujitsu Limited | Method and system for detecting a network anomaly in a network |
US20060159017A1 (en) * | 2005-01-17 | 2006-07-20 | Seung-Cheol Mun | Dynamic quality of service (QoS) management |
US20070036544A1 (en) * | 2005-07-25 | 2007-02-15 | Yasuyuki Fukashiro | Optical network, node apparatus and method for relieving path fault |
US20070168505A1 (en) * | 2006-01-19 | 2007-07-19 | Hewlett-Packard Development Company, L.P. | Performance monitoring in a network |
US20080069002A1 (en) * | 2006-09-15 | 2008-03-20 | Sbc Knowledge Ventures, L.P. | In-band media performance monitoring |
US20080181100A1 (en) * | 2007-01-31 | 2008-07-31 | Charlie Chen-Yui Yang | Methods and apparatus to manage network correction procedures |
US20080281607A1 (en) * | 2007-05-13 | 2008-11-13 | System Services, Inc. | System, Method and Apparatus for Managing a Technology Infrastructure |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130230151A1 (en) * | 2008-12-12 | 2013-09-05 | At&T Intellectual Property I, L.P. | Methods and apparatus to trigger maintenance and upgrades of access networks |
US8654930B2 (en) * | 2008-12-12 | 2014-02-18 | At&T Intellectual Property I, L.P. | Methods and apparatus to trigger maintenance and upgrades of access networks |
US8769088B2 (en) * | 2011-09-30 | 2014-07-01 | International Business Machines Corporation | Managing stability of a link coupling an adapter of a computing system to a port of a networking device for in-band data communications |
US9229800B2 (en) | 2012-06-28 | 2016-01-05 | Microsoft Technology Licensing, Llc | Problem inference from support tickets |
US9262253B2 (en) | 2012-06-28 | 2016-02-16 | Microsoft Technology Licensing, Llc | Middlebox reliability |
US9213727B1 (en) * | 2012-09-28 | 2015-12-15 | Emc Corporation | Methods and apparatus for obtaining database performance snapshots |
US10075347B2 (en) | 2012-11-15 | 2018-09-11 | Microsoft Technology Licensing, Llc | Network configuration in view of service level considerations |
US9325748B2 (en) | 2012-11-15 | 2016-04-26 | Microsoft Technology Licensing, Llc | Characterizing service levels on an electronic network |
US9565080B2 (en) | 2012-11-15 | 2017-02-07 | Microsoft Technology Licensing, Llc | Evaluating electronic network devices in view of cost and service level considerations |
US9350601B2 (en) | 2013-06-21 | 2016-05-24 | Microsoft Technology Licensing, Llc | Network event processing and prioritization |
CN106664217A (en) * | 2014-06-20 | 2017-05-10 | 微软技术许可有限责任公司 | Identification of candidate problem network entities |
US20150372893A1 (en) * | 2014-06-20 | 2015-12-24 | Microsoft Corporation | Identification of candidate problem network entities |
US10135704B2 (en) * | 2014-06-20 | 2018-11-20 | Microsoft Technology Licensing, Llc | Identification of candidate problem network entities |
US20190081875A1 (en) * | 2014-06-20 | 2019-03-14 | Microsoft Technology Licensing, Llc | Identification of candidate problem network entities |
US10721145B2 (en) * | 2014-06-20 | 2020-07-21 | Microsoft Technology Licensing, Llc | Identification of candidate problem network entities |
US10728085B1 (en) * | 2015-09-15 | 2020-07-28 | Amazon Technologies, Inc. | Model-based network management |
US10951462B1 (en) * | 2017-04-27 | 2021-03-16 | 8X8, Inc. | Fault isolation in data communications centers |
US11876666B1 (en) | 2017-04-27 | 2024-01-16 | 8X8, Inc. | Fault isolation in data communications centers |
JP2020088786A (en) * | 2018-11-30 | 2020-06-04 | 富士通株式会社 | Switch device and fault detection program |
JP7119957B2 (en) | 2018-11-30 | 2022-08-17 | 富士通株式会社 | Switch device and failure detection program |
US20250088409A1 (en) * | 2023-03-16 | 2025-03-13 | Rakuten Mobile, Inc. | Estimation of router that is cause of silent failures |
US20250106095A1 (en) * | 2023-03-31 | 2025-03-27 | Rakuten Mobile, Inc. | Estimation of router that is cause of silent failures |
Also Published As
Publication number | Publication date |
---|---|
US7855952B2 (en) | 2010-12-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7855952B2 (en) | Silent failure identification and trouble diagnosis | |
EP1603272B1 (en) | Communication network event logging systems and methods | |
US6978302B1 (en) | Network management apparatus and method for identifying causal events on a network | |
US7280486B2 (en) | Detection of forwarding problems for external prefixes | |
US10153950B2 (en) | Data communications performance monitoring | |
KR20080055744A (en) | Telecommunication-based Link Monitoring System | |
US8976681B2 (en) | Network system, network management server, and OAM test method | |
US20100265832A1 (en) | Method and apparatus for managing a slow response on a network | |
US8332690B1 (en) | Method and apparatus for managing failures in a datacenter | |
US20050204214A1 (en) | Distributed montoring in a telecommunications system | |
US20060168263A1 (en) | Monitoring telecommunication network elements | |
US10708155B2 (en) | Systems and methods for managing network operations | |
US8205116B2 (en) | Common chronics resolution management | |
US20040006619A1 (en) | Structure for event reporting in SNMP systems | |
EP3232620B1 (en) | Data center based fault analysis method and device | |
EP1703671B1 (en) | Device and method for network monitoring | |
US20050015683A1 (en) | Method, system and computer program product for improving system reliability | |
GB2452025A (en) | Alarm event management for a network with alarm event storm detection and management mode | |
CN105281927A (en) | Method and device for multilink protection switching | |
KR101078461B1 (en) | System for survellance of network trouble using voice of customer information and method thereof | |
US7673035B2 (en) | Apparatus and method for processing data relating to events on a network | |
GB2372674A (en) | Network management | |
JP4364879B2 (en) | Failure notification system, failure notification method and failure notification program | |
KR100887874B1 (en) | Disability Management System and Method in Internet Network | |
US20230216771A1 (en) | Algorithm for building in-context report dashboards |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: AT&T INTELLECTUAL PROPERTY I, L.P.,NEVADA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:YANG, CHEN-YUI;HUNT, MARK;LAMBERT, MICHAEL;AND OTHERS;SIGNING DATES FROM 20081115 TO 20081119;REEL/FRAME:021868/0252 Owner name: AT&T INTELLECTUAL PROPERTY I, L.P., NEVADA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:YANG, CHEN-YUI;HUNT, MARK;LAMBERT, MICHAEL;AND OTHERS;SIGNING DATES FROM 20081115 TO 20081119;REEL/FRAME:021868/0252 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552) Year of fee payment: 8 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20221221 |