US11086738B2 - System and method to automate solution level contextual support - Google Patents

System and method to automate solution level contextual support Download PDF

Info

Publication number
US11086738B2
US11086738B2 US15/961,237 US201815961237A US11086738B2 US 11086738 B2 US11086738 B2 US 11086738B2 US 201815961237 A US201815961237 A US 201815961237A US 11086738 B2 US11086738 B2 US 11086738B2
Authority
US
United States
Prior art keywords
component
computing
support
state information
monitoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US15/961,237
Other versions
US20190324873A1 (en
Inventor
Dharmesh M. Patel
Ravikanth Chaganti
Rizwan Ali
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
EMC Corp
Original Assignee
EMC IP Holding Co LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US15/961,237 priority Critical patent/US11086738B2/en
Application filed by EMC IP Holding Co LLC filed Critical EMC IP Holding Co LLC
Assigned to EMC IP Holding Company LLC reassignment EMC IP Holding Company LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CHAGANTI, RAVIKANTH, ALI, RIZWAN, PATEL, DHARMESH M.
Assigned to CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH, AS COLLATERAL AGENT reassignment CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH, AS COLLATERAL AGENT PATENT SECURITY AGREEMENT (CREDIT) Assignors: DELL PRODUCTS L.P., EMC CORPORATION, EMC IP Holding Company LLC
Assigned to THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT reassignment THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT PATENT SECURITY AGREEMENT (NOTES) Assignors: DELL PRODUCTS L.P., EMC CORPORATION, EMC IP Holding Company LLC
Assigned to THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A. reassignment THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A. SECURITY AGREEMENT Assignors: CREDANT TECHNOLOGIES, INC., DELL INTERNATIONAL L.L.C., DELL MARKETING L.P., DELL PRODUCTS L.P., DELL USA L.P., EMC CORPORATION, EMC IP Holding Company LLC, FORCE10 NETWORKS, INC., WYSE TECHNOLOGY L.L.C.
Publication of US20190324873A1 publication Critical patent/US20190324873A1/en
Assigned to THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A. reassignment THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A. SECURITY AGREEMENT Assignors: CREDANT TECHNOLOGIES INC., DELL INTERNATIONAL L.L.C., DELL MARKETING L.P., DELL PRODUCTS L.P., DELL USA L.P., EMC CORPORATION, EMC IP Holding Company LLC, FORCE10 NETWORKS, INC., WYSE TECHNOLOGY L.L.C.
Publication of US11086738B2 publication Critical patent/US11086738B2/en
Application granted granted Critical
Assigned to EMC IP Holding Company LLC, DELL PRODUCTS L.P., EMC CORPORATION reassignment EMC IP Holding Company LLC RELEASE OF SECURITY INTEREST AT REEL 046286 FRAME 0653 Assignors: CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH
Assigned to DELL PRODUCTS L.P., EMC IP Holding Company LLC, EMC CORPORATION reassignment DELL PRODUCTS L.P. RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (046366/0014) Assignors: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3495Performance evaluation by tracing or monitoring for systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/82Solving problems relating to consistency

Definitions

  • Computing devices may include any number of internal components such as processors, memory, and persistent storage. Each of the internal components may operate with other components of the computing devices. For example, some processors store generated data in a persistent storage and may utilize capacity of the memory to perform computations.
  • multiple computing devices may cooperate to accomplish a task.
  • multiple computing devices may perform different computations that may be used, in turn, to generate a final result.
  • a support engine for managing computing clusters in accordance with one or more embodiments of the invention includes a persistent storage and a processor.
  • the persistent storage includes monitoring policies.
  • the processor monitors a computing cluster of the computing clusters and identifies a potential component failure of the computing cluster based on the monitoring and the monitoring policies.
  • the processor identifies an error state of the computing cluster; obtains solution level state information from the computing cluster based on the identified error state; generates a support package comprising the solution level state information; and initiates a support session by sending the generated support package to a support manager.
  • a method for managing computing clusters in accordance with one or more embodiments of the invention includes monitoring a computing cluster of the computing clusters and identifying a potential component failure of the computing cluster based on the monitoring and monitoring policies. The method further includes, in response to identifying the potential component failure, identifying an error state of the of the computing cluster; obtaining solution level state information from the computing cluster based on the identified error state; generating a support package comprising the solution level state information; and initiating a support session by sending the generated support package to a support manager to correct the potential component failure.
  • a non-transitory computer readable medium in accordance with one or more embodiments of the invention includes computer readable program code, which when executed by a computer processor enables the computer processor to perform a method for managing computing clusters, the method includes monitoring a computing cluster of the computing clusters and identifying a potential component failure of the computing cluster based on the monitoring and monitoring policies. The method further includes, in response to identifying the potential component failure, identifying an error state of the of the computing cluster; obtaining solution level state information from the computing cluster based on the identified error state; generating a support package comprising the solution level state information; and initiating a support session by sending the generated support package to a support manager to correct the potential component failure.
  • FIG. 1 shows a diagram of a system in accordance with one or more embodiments of the invention.
  • FIG. 2 shows a diagram of an example support manager in accordance with one or more embodiments of the invention.
  • FIG. 3 shows a diagram of an example solution manager in accordance with one or more embodiments of the invention.
  • FIG. 4 shows a diagram of an example support engine in accordance with one or more embodiments of the invention.
  • FIG. 5A shows a diagram of example monitoring policies in accordance with one or more embodiments of the invention.
  • FIG. 5B shows a diagram of example actions of monitoring policies in accordance with one or more embodiments of the invention.
  • FIG. 6 shows a diagram of a flowchart of a method of correcting a solution in accordance with one or more embodiments of the invention.
  • FIG. 7 shows a diagram of a flowchart of a method of managing computing clusters of a solution in accordance with one or more embodiments of the invention.
  • FIG. 8 shows a diagram of a flowchart of a method of selecting replacement components in accordance with one or more embodiments of the invention.
  • FIG. 9A shows a diagram of a malfunctioning computing cluster.
  • FIG. 9B shows a diagram of the computing cluster of FIG. 9A after corrective actions have been performed.
  • FIG. 10 shows a diagram of a computing device in accordance with one or more embodiments of the invention.
  • any component described with regard to a figure in various embodiments of the invention, may be equivalent to one or more like-named components described with regard to any other figure.
  • descriptions of these components will not be repeated with regard to each figure.
  • each and every embodiment of the components of each figure is incorporated by reference and assumed to be optionally present within every other figure having one or more like-named components.
  • any description of the components of a figure is to be interpreted as an optional embodiment, which may be implemented in addition to, in conjunction with, or in place of the embodiments described with regard to a corresponding like-named component in any other figure.
  • a solution may include one or more computing clusters. Each of the computing clusters may include computing devices. Each of the computing devices may include individual components.
  • the solution may orchestrate the computing devices to perform predetermined actions to accomplish a goal.
  • failure of any component of the solution may impact the performance of any portion of the solution.
  • failure of a data storage device of a computing device may manifest itself as a decrease in the transaction rate of a database application hosted by a second computing device.
  • a system in one or more embodiments of the invention, includes a support engine that monitors the components of a solution.
  • the support engine may detect potential component failures based upon the decrease in performance of a portion of the solution.
  • the detected potential component failures may be real or may be the manifestations of failures of other components.
  • the support engine may automatically obtain solution level state information.
  • the solution level state information may include various performance metrics regarding the overall functionality of the solution.
  • the solution level state information may be used to determine a corrective action to be performed. In this manner, embodiments of the invention may quickly and automatically obtain all of the information that may be necessary to identify a component that has truly failed. Doing so reduces the amount of time required to identify a corrective action that actually remediates the failure of the component in the solution.
  • FIG. 1 shows a system in accordance with one or more embodiments of the invention.
  • the system may include a support manager ( 100 ) that manages a solution ( 115 ) by diagnosing problems with the supported solution ( 115 ) and performing corrective actions.
  • the system may include a support engine ( 120 ) that obtains information regarding the state of computing clusters ( 130 ).
  • the support engine ( 120 ) may provide the obtained information to the support manager ( 100 ) to enable it to perform its management functions.
  • the support manager ( 100 ) may identify replacement hardware via a solution manager ( 110 ).
  • Each of the aforementioned components may be operably connected by any combination of wired and wireless networks. Each components is discussed below.
  • the support manager ( 100 ) diagnoses and remediates problems of the computing clusters ( 130 ) that may otherwise disrupt the functionality of the computing clusters ( 130 ). For example, computing cluster A ( 130 A) may report a decrease in the ability of one computing devices of the cluster to communicate with other computing devices of the cluster. In such a scenario, the support manager ( 100 ) may identify a problem with computing cluster A ( 130 A), identify a solution, and initiate corrective actions in computing cluster A ( 130 A) to address the identified problem.
  • the support manager ( 100 ) may obtain state information of the computing clusters ( 130 ) from the support engine ( 120 ), or another entity. Similarly, when a hardware problem with a computing cluster is identified, the support manager ( 100 ) may identify a replacement component via the solution manager ( 110 ). For additional details regarding the support manager ( 100 ), See FIG. 2 .
  • the support manager ( 100 ) is a computing device.
  • the computing device may be, for example, a mobile phone, a tablet computer, a laptop computer, a desktop computer, a server, or a cloud resource.
  • the computing device may include one or more processors, memory (e.g., random access memory), and persistent storage (e.g., disk drives, solid state drives, etc.).
  • the persistent storage may store computer instructions, e.g., computer code, that when executed by the processor(s) of the computing device cause the computing device to perform the functions of the support manager ( 100 ) described in this application and/or perform all or portion of the methods illustrated in FIGS. 6-8 .
  • FIG. 10 See FIG. 10 .
  • the support manager ( 100 ) is a logical device.
  • a logical device may be a virtual device that utilizes the computing resources of any number of computing devices to perform its functions.
  • the logical device may be implemented as computer instructions, e.g., computer code, that when executed by the processor(s) of one or more computing devices cause the computing devices to perform the functions of the support manager ( 100 ) described in this application and/or all or portion of the methods illustrated in FIGS. 6-8 .
  • the solution manager ( 110 ) identifies replacement hardware for components of the computing clusters ( 130 ). Due to underlying hardware or software requirements, each of the components may only be compatible with specific hardware components. Further, due to finite limitations, not all types of hardware may be available at all points in time. The solution manager ( 110 ) may identify replacement hardware for a computing cluster by identifying hardware that is both compatible with the computing cluster and available for deployment. For additional details regarding the solution manager, See FIG. 3 .
  • the solution manager ( 110 ) is a computing device.
  • the computing device may be, for example, a mobile phone, a tablet computer, a laptop computer, a desktop computer, a server, or a cloud resource.
  • the computing device may include one or more processors, memory (e.g., random access memory), and persistent storage (e.g., disk drives, solid state drives, etc.).
  • the persistent storage may store computer instructions, e.g., computer code, that when executed by the processor(s) of the computing device cause the computing device to perform the functions of the solution manager ( 110 ) described in this application and/or perform all or portion of the methods illustrated in FIGS. 6-8 .
  • FIG. 10 See FIG. 10 .
  • the solution manager ( 110 ) is a logical device.
  • a logical device may be a virtual device that utilizes the computing resources of any number of computing devices to perform its functions.
  • the logical device may be implemented as computer instructions, e.g., computer code, that when executed by the processor(s) of one or more computing devices cause the computing devices to perform the functions of the solution manager ( 110 ) described in this application and/or perform all or portion of the methods illustrated in FIGS. 6-8 .
  • the support engine ( 120 ) monitors the computing clusters ( 130 ). Based on the monitoring, the support engine ( 120 ) identifies an error state of the computing clusters ( 130 ). The support engine ( 120 ) may perform monitoring based on instructions or configurations received from the support manager ( 100 ). For example, the support manager ( 100 ) may provide thresholds that when exceeded trigger the support engine ( 120 ) to obtain detailed state information from the computing clusters ( 130 ). The support engine ( 120 ) may provide the obtained information to the support manager ( 100 ). For additional details regarding the support engine, See FIG. 4 .
  • the support engine ( 120 ) is a computing device.
  • the computing device may be, for example, a mobile phone, a tablet computer, a laptop computer, a desktop computer, a server, or a cloud resource.
  • the computing device may include one or more processors, memory (e.g., random access memory), and persistent storage (e.g., disk drives, solid state drives, etc.).
  • the persistent storage may store computer instructions, e.g., computer code, that when executed by the processor(s) of the computing device cause the computing device to perform the functions of the support engine ( 120 ) described in this application and/or perform all or portion of the methods illustrated in FIGS. 6-8 .
  • FIG. 10 See FIG. 10 .
  • the support engine ( 120 ) is a logical device.
  • a logical device may be a virtual device that utilizes the computing resources of any number of computing devices to perform its functions.
  • the logical device may be implemented as computer instructions, e.g., computer code, that when executed by the processor(s) of one or more computing devices cause the computing devices to perform the functions of the support engine ( 120 ) described in this application and/or perform all or portion of the methods illustrated in FIGS. 6-8 .
  • the computing clusters are physical devices that perform computations.
  • Each of the computing clusters may operate independently to perform different functions or may be orchestrated to cooperatively perform a predetermined function. In such a scenario, each of the computing clusters may perform similar or different functions cooperatively to accomplish a shared goal.
  • a first computing cluster may provide computing functionality while a second computing cluster may provide storage functionality.
  • Any number of computing clusters may perform similar or different functions without departing from the invention.
  • each computing cluster includes a number of computing devices.
  • Each computing device may be, for example, a mobile phone, a tablet computer, a laptop computer, a desktop computer, a server, or a cloud resource.
  • the computing device may include one or more processors, memory (e.g., random access memory), and persistent storage (e.g., disk drives, solid state drives, etc.).
  • the persistent storage may store computer instructions, e.g., computer code, that when executed by the processor(s) of the computing device cause the computing device to perform the functions of the computing clusters ( 130 A, 130 N) described in this application.
  • FIG. 10 See FIG. 10 .
  • FIG. 2 shows a diagram of an example support manager ( 200 ) in accordance with one or more embodiments of the invention.
  • the example support manager ( 200 ) may: (i) monitor the computing clusters, (ii) identify potential component failures, (iii) obtain solution level state information in response to identifying potential component failures, and (iv) initiate the performance of corrective actions to reduce the likelihood that the identified potential component failures will impact the functionality of the computing clusters.
  • the example support manager ( 200 ) may include a session manager ( 210 ) and a persistent storage ( 220 ). Each component of the example support manager ( 200 ) is discussed below.
  • the session manager ( 210 ) configures support engines to monitor computing clusters, obtains alerts from the support engines when a cluster is identified as having a potential component failure, obtains solution level state information in response to an identified potential component failure, and initiates performance of a corrective action.
  • the corrective action may be a solution level corrective action, i.e., based on solution level state information, rather than a component level corrective action, i.e., based on component level state information.
  • the session manager ( 210 ) may initiate a support session with the support engine.
  • the support session may be associated with computing cluster, e.g., a triggering computing cluster, that exceeded a threshold or other trigger criteria.
  • the session may be placed in a queue for analysis by the system. After being placed in the queue, the session manager ( 210 ) may obtain solution level state information from the triggering computing cluster.
  • the solution level state information may include a solution type of the computing cluster, e.g., a predetermine function of the computing cluster.
  • the solution level state information may include a hardware state of the computing device hosting, i.e., host computing device, the component that triggered the potential component failure.
  • the hardware state may be a listing of the physical component of the aforementioned computing device.
  • the solution level state information may include the hardware state of other computing devices of the triggering cluster that are operably connected to the host computing device.
  • the solution level state information may include a software state of the host computing device, e.g., a listing of the applications, firmware, and associated settings for the applications and firmware of the host computing device.
  • the solution level state information may include the software state of other computing devices of the triggering cluster that are operably connected to the host computing device.
  • the solution level state information may include a transaction log of the host computing device, e.g., a listing of the actions performed over a predetermined period of time.
  • the solution level state information may include transaction logs of other computing devices of the triggering cluster that are operably connected to the host computing device.
  • the session manager ( 210 ) may determine that a solution level corrective action involves replacing of a physical component of a computing cluster. In such a scenario, the session manager ( 210 ) may send a request to a solution manager to identify replacement hardware. The request may specify the component to be replaced. The solution manager may respond by providing an identify of a replacement hardware that is available to facilitate the replacement of the component.
  • the session manager ( 210 ) is a hardware device including circuitry.
  • the session manager ( 210 ) may be, for example, digital signal processor, a field programmable gate array, or an application specific integrated circuit.
  • the session manager ( 210 ) may be other types of hardware devices without departing from the invention.
  • the session manager ( 210 ) is implemented as computing code stored on a persistent storage that when executed by a processor performs the functionality of the session manager ( 210 ).
  • the processor may be hardware processor including circuitry such as, for example, a central processing unit or a microcontroller.
  • the processor may be other types of hardware devices for processing digital information without departing from the invention.
  • the session manager ( 210 ) may perform all or a portion of the methods illustrated in FIGS. 6-8 .
  • the session manager ( 210 ) may utilize data structures stored in the persistent storage ( 220 ).
  • the persistent storage ( 220 ) is a storage device that stores data structures.
  • the persistent storage may be a physical or virtual device.
  • the persistent storage ( 220 ) may include solid state drives, solid state drives, tape drives, and other components to provide data storage functionality.
  • the persistent storage ( 220 ) may be a virtual device that utilizes the physical computing resources of other components to provide data storage functionality.
  • the persistent storage ( 220 ) stores sessions ( 220 A), support engine settings ( 220 B), a software repository ( 220 C), and a knowledge base ( 220 D).
  • the persistent storage ( 220 ) may store additional data structures without departing from the invention.
  • the sessions ( 220 A) may be data structures that include information regarding support sessions, the support sessions may be ongoing or previously completed.
  • the sessions ( 220 A) may include information obtained from a support engine used by the session manager ( 210 ) to identify a corrective action.
  • the sessions ( 220 A) may include component state information and/or solution state information used by the session manager ( 210 ) to identify a solution level corrective action.
  • Each session may be associated with a place in a support queue and a potential component failure that caused the session to be initiated.
  • Each session may be generated by the session manager ( 210 ) when a support engine notifies the example support manager ( 200 ) that a component of a computing cluster has triggered a potential component failure.
  • the support engine settings ( 220 B) may be a data structure that includes settings used by the session manager ( 210 ) to configure support engines. Specifically, the support engine settings ( 220 B) may be used to configure component policies of the support engines that are used to determined whether a component is potentially in a failure state. For additional details regarding component policies, See FIGS. 5A and 5B .
  • the software repository ( 220 C) may be a data structure that includes copies of software used by the session manager ( 210 ) to initiate the performance of solution level corrective actions. For example, in a scenario in which the session manager ( 210 ) makes a determination that a solution level corrective action involves a software replacement, the session manager ( 210 ) may obtain a copy of the replacement software and/or software settings and send the copy to a computing device for implementation, i.e., replacing existing software and/or software settings.
  • the software may be applications and/or firmware.
  • the knowledge base ( 220 D) may be a data structure used by the session manager ( 210 ) to identify a solution level corrective action.
  • the knowledge base ( 220 D) may map component level and/or solution level state information to solution level corrective actions.
  • the mappings may be specified at any of granularity.
  • the knowledge base ( 220 D) may be generated based on previous sessions. In other words, the contents of the knowledge base ( 220 D) may be generated heuristically.
  • the knowledge base ( 220 D) may be automatically updated by the session manager ( 210 ) upon completion of a support session. In other words, the session manager ( 210 ) may generate a new mapping between component level and/or solution level state information to a solution level corrective action that resulted in the elimination of the potential component failure, i.e., restored the performance of the computing cluster.
  • mappings of the knowledge base ( 220 D) are unconventional because the mappings assume that a potential component failure is not necessarily based on the component. Rather, the aforementioned mappings make an assumption that the identification of a potential component failure is merely a symptom of a solution level defect. Thus, the knowledge base ( 220 D) mappings are between a solution state and a corrective action, not necessarily a component state and a corrective action.
  • data structures of the persistent storage ( 220 ) are illustrated as separate data structures, the aforementioned data structures may be combined with each other and/or other data without departing from the invention. Additionally, while the aforementioned data structures are illustrated as being stored on the example support manager ( 200 ), the data structures may be stored on persistent storage of other devices without departing from the invention.
  • FIG. 3 shows a diagram of an example solution manager ( 300 ) in accordance with one or more embodiments of the invention.
  • the example solution manager ( 300 ) may identify replacement components for computing clusters. In other words, the solution manager may identify whether a replacement component both exists and is available, i.e., in stock or otherwise obtainable.
  • the example solution manager may include a replacement component identifier ( 310 ) that identifies replacement components in response to request for replacements and a persistent storage that stores data structures used by the replacement component identifier ( 310 ) to identify replacement components.
  • a replacement component identifier 310
  • Each component of the example solution manager is discussed below.
  • the replacement component identifier ( 310 ) obtains requests from the support manager to identify a replacement component for a component.
  • the replacement component identifier ( 310 ) may use the data structures stored in the persistent storage ( 320 ) to identify a replacement component.
  • the data structure may enable a replacement component that both exists and is available to be identifier.
  • the replacement component identifier ( 310 ) may notify the support manager of the identifier replacement component in response to the request.
  • the replacement component identifier ( 310 ) may perform all or a portion of the method illustrated in FIG. 7 .
  • the replacement component identifier ( 310 ) is a hardware device including circuitry.
  • the replacement component identifier ( 310 ) may be, for example, digital signal processor, a field programmable gate array, or an application specific integrated circuit.
  • the replacement component identifier ( 310 ) may be other types of hardware devices without departing from the invention.
  • the replacement component identifier ( 310 ) is implemented as computing code stored on a persistent storage that when executed by a processor performs the functionality of the replacement component identifier ( 310 ).
  • the processor may be hardware processor including circuitry such as, for example, a central processing unit or a microcontroller.
  • the processor may be other types of hardware devices for processing digital information without departing from the invention.
  • the persistent storage ( 320 ) is a storage device that stores data structures.
  • the persistent storage ( 320 ) may be a physical or virtual device.
  • the persistent storage ( 320 ) may include solid state drives, solid state drives, tape drives, and other components to provide data storage functionality.
  • the persistent storage ( 320 ) may be a virtual device that utilizes the physical computing resources of other components to provide data storage functionality.
  • the persistent storage ( 320 ) stores component to replacement mappings ( 320 A) and a replacement availability repository ( 320 B).
  • the persistent storage ( 320 ) may store additional data structures without departing from the invention.
  • the component to replacement mappings ( 320 A) may be data structures that maps components to replacement components. Over time, a particular component may not longer be produced and, consequently, become unavailable.
  • the component to replacement mappings ( 320 A) may specify mappings between a component and all other types of components that are suitable replacements.
  • the replacement availability repository ( 320 B) may specify the availability for each replacement component specified by the component to replacement mappings ( 320 A).
  • the component to replacement mappings ( 320 A) may be used to identify all possible replacement components for a given component and the replacement availability repository ( 320 B) may specify the availability for each of the possible replacement components.
  • the aforementioned data structures may specify the availability for each possible replacement component.
  • the data structures of the persistent storage ( 320 ) are illustrated as separate data structures, the aforementioned data structures may be combined with each other and/or other data without departing from the invention. Additionally, while the aforementioned data structures are illustrated as being stored on the example solution manager ( 300 ), the data structures may be stored on persistent storage of other devices without departing from the invention.
  • FIG. 4 shows a diagram of an example support engine ( 400 ) in accordance with one or more embodiments of the invention.
  • the example support engine ( 400 ) may: (i) monitor computing clusters, (ii) identify potential component failures, (iii) notify a support manager of the potential component failures to start support sessions, (iv) collect solution level state information, (v) generate a support package including the solution level state information, and (vi) send the generated support package to the support manager to initiate the support session.
  • the example support engine ( 400 ) may perform solution level corrective actions to address the potential component failures.
  • the solution level corrective actions may be specified by the support manager.
  • the example support engine ( 400 ) may include a cluster monitor ( 410 ) that performs both component level and solution level monitoring of computing clusters via cluster interfaces ( 420 ).
  • the example support engine ( 400 ) may also include a persistent storage ( 430 ) that stores data structures used by the cluster monitor ( 410 ) when monitoring the computing clusters.
  • a cluster monitor 410
  • the example support engine ( 400 ) may also include a persistent storage ( 430 ) that stores data structures used by the cluster monitor ( 410 ) when monitoring the computing clusters.
  • the cluster monitor ( 410 ) monitors the computing clusters based on monitoring policies ( 430 A) stored in the persistent storage ( 430 ).
  • the cluster monitor ( 410 ) may monitor the operations of the computing clusters.
  • the monitoring may be performed by, for example, periodically obtaining performance statistics of the respective components of the computing clusters.
  • the performance statistics may be, for example, the compression ratio of data stored in the computing clusters, the transaction rate of applications executing on the clusters, the bandwidth between computing devices of the computing clusters, etc. Other performance statistics may be used without departing from the invention.
  • the cluster monitor may initiate a support session with a support manager, perform solution level monitoring to obtain solution level state information, and perform solution level corrective actions.
  • the cluster monitor ( 410 ) may perform all or a portion of the method illustrated in FIG. 6 .
  • the cluster monitor ( 410 ) is a hardware device including circuitry.
  • the cluster monitor ( 410 ) may be, for example, digital signal processor, a field programmable gate array, or an application specific integrated circuit.
  • the cluster monitor ( 410 ) may be other types of hardware devices without departing from the invention.
  • the cluster monitor ( 410 ) is implemented as computing code stored on a persistent storage that when executed by a processor performs the functionality of the cluster monitor ( 410 ).
  • the processor may be hardware processor including circuitry such as, for example, a central processing unit or a microcontroller.
  • the processor may be other types of hardware devices for processing digital information without departing from the invention.
  • the cluster interfaces ( 420 ) may be operable connections between the example support engine and one or more computing clusters.
  • the operable connections may be supported by any combination of wired and/or wireless networking.
  • the example support engine ( 400 ) only supports a single computing cluster, only a single cluster interface may be present.
  • the example support engine ( 400 ) may include any number of cluster interfaces without departing from the invention.
  • the persistent storage ( 430 ) is a storage device that stores data structures.
  • the persistent storage ( 430 ) may be a physical or virtual device.
  • the persistent storage ( 430 ) may include solid state drives, solid state drives, tape drives, and other components to provide data storage functionality.
  • the persistent storage ( 430 ) may be a virtual device that utilizes the physical computing resources of other components to provide data storage functionality.
  • the persistent storage ( 430 ) stores monitoring policies ( 430 A).
  • the persistent storage ( 430 ) may store additional data structures without departing from the invention.
  • the monitoring policies ( 430 A) may be data structures that specify how the monitoring is to be performed and actions that are to be taken in response to the occurrence of a predetermined event identified by the monitoring.
  • the monitoring policies ( 430 A) may specify that the monitoring is to be performed by monitoring a transaction rate of a database executing on a cluster and if the transaction rate falls below a predetermined threshold a solid state disk storing a cache used by the database is to be identified as having potentially failed.
  • FIGS. 5A and 5B See FIGS. 5A and 5B .
  • the data structure of the persistent storage ( 430 ) is illustrated as a solitary data structure, the aforementioned data structure may be combined with other data without departing from the invention. Additionally, while the aforementioned data structure is illustrated as being stored on the example support engine ( 400 ), the data structure may be stored on persistent storage of other devices without departing from the invention.
  • FIG. 5A shows a diagram of example monitoring policies ( 500 ) in accordance with one or more embodiments of the invention.
  • the example monitoring policies ( 500 ) may specify monitoring to be performed by a support engine and actions to be taken when predetermined condition occurs that is identified based on the monitoring.
  • the example monitoring policies ( 500 ) may specify the policies at any level of granularity without departing from the invention.
  • the example monitoring policies ( 500 ) include a number of entries ( 502 , 504 ). Each of the entries may specify a monitoring policy. Each entry may include an entity identifier ( 502 A, 504 A), a state ( 502 B, 504 B), and actions ( 502 C, 504 C).
  • the entity identifier ( 502 A, 504 A) may be an identifier of entity to be monitored.
  • the entity identifier ( 502 A, 504 A) may identify a component, a computing device, a computing cluster, or a solution, i.e., group of computing devices or group of computing clusters.
  • the state ( 502 B, 504 B) may specify a condition of the component identifier by the entity identifier ( 502 A, 504 A) that, when met, triggers the performance of the actions ( 502 C, 504 C) of the respective entry.
  • the state ( 502 B, 504 B) may be a condition such as, for example, a threshold or a range.
  • the state ( 502 B, 504 B) may be other types of conditions without departing from the invention.
  • the policy associated with entry may invoke the performance of the actions associated with the respective policy.
  • the actions ( 502 C, 504 C) may be actions to be performed by the support engine in response to the conditions specified by the state being met.
  • the actions may be, for example, the performance of solution level monitoring, obtaining of predetermined information from a computing cluster, modifying the operation of a computing cluster, or any other type of action.
  • See FIG. 5B See FIG. 5B .
  • example monitoring policies ( 500 ) are illustrated as a list of entries, the information of the example monitoring policies ( 500 ) may be stored in other formats, may include additional, less, and/or different information, and/or may be broken down into multiple data structures without departing from the invention.
  • FIG. 5B shows a diagram of example actions ( 520 ) in accordance with one or more embodiments of the invention.
  • the example actions ( 520 ) may specify one or more actions to be performed.
  • the example actions ( 520 ) include obtaining of a hardware state of a component host ( 522 ), obtaining of a software state of the component host ( 522 ), obtaining of host settings ( 524 ), validation of the component ( 526 ), initiation of a new support session ( 528 ), and generation of a log ( 530 ).
  • the component host may be the computing device that hosts the component that was identified as potentially failed.
  • the validation may be to compare the functionality of the component to a baseline, e.g., performance test the component.
  • the generated log may include all of the actions performed by a support engine and the response from a computing cluster.
  • example actions ( 520 ) are shown as included a limited number and selection of actions that may be performed, the example actions ( 520 ) may include any quantity and type of actions without departing from the invention.
  • the example actions ( 520 ) may include a triggering of another policy.
  • the example actions ( 520 ) may cause a condition specified by a state of another action to be met and, thus, trigger the performance of another set of actions.
  • monitoring policies may be nested and/or recursively trigger any number of monitoring policies upon the triggering of any one monitoring policy.
  • FIG. 1 performs the methods of FIGS. 6-8 .
  • FIG. 6 shows a flowchart of a method in accordance with one or more embodiments of the invention.
  • the method depicted in FIG. 6 may be used to manage a computing cluster in accordance with one or more embodiments of the invention.
  • the method shown in FIG. 6 may be performed by, for example, an support engine ( 120 , FIG. 1 ).
  • Other component of the system illustrated in FIG. 1 may perform the method of FIG. 6 without departing from the invention.
  • Step 600 computing clusters are monitored using monitoring policies.
  • the monitoring may be accomplished by obtaining state information from the computing clusters and comparing the state information to the state policies.
  • the monitoring may be performed, periodically, in accordance with a schedule, or via other regimes.
  • the monitoring may obtain component level state information from components of computing devices of the computing clusters.
  • the component level state information may be, for example, performance characteristics of the components such as, for example, a processing rate, bandwidth, or storage rate.
  • the performance characteristics may be other types of characteristics without departing from the invention.
  • Step 602 a potential component failure is identified based on the monitoring.
  • the potential component failure is identified by comparing the obtain component level state information to monitoring policies. If the component level state information exceeds a threshold specified by the monitoring policies, the component associated with the component level state information may be identified as a potential component failure.
  • Step 604 an error state of the computing cluster is identified in response to the identified potential component failure.
  • the error state of the computing cluster is identified based on the potential component failure. For example, each type of component failure may be associated with an error state of the computing cluster. In contrast to traditional approaches that may focus on component level failures, embodiments of the invention may use the presence of a potential component failure as an indicated of a cluster level error state.
  • Step 606 solution level state information is obtained from the computing cluster based on the identified error state of the cluster.
  • the solution level state information is obtained by characterizing the performance of each component of each computing device of a cluster associated with the cluster level error state.
  • embodiments of the invention may automatically obtain solution level state information in response to the identification of a potential component level failure.
  • Step 608 a support package is generated that includes the solution level state information.
  • the support package may include an identifier of the solution, an identifier of the computing cluster, identifiers of the computing devices of the computing cluster, identifiers of the components of the computing devices, the potential component failure, an identifier of the potentially failed component, and/or other information regarding the solution.
  • Step 610 a support session is initiated by sending the generated support package to a support manager.
  • Step 612 a solution level corrective action is obtained.
  • the solution level corrective action may be obtained from the support manager or another entity.
  • Step 614 the solution level corrective action is performed.
  • the solution level corrective action is replacing the component with a replacement component selected based on the solution level state information.
  • the replacement component may be identified by identifying a solution type of the computing cluster, e.g., data storage, processing, etc.
  • a certified component e.g., a component that is known to be compatible with the solution type, may be selected.
  • the selected certified component may be used as the replacement component.
  • the solution level corrective action is modifying firmware associated with the potentially failed component.
  • the modification may be a replacement of the firmware or a modification of the settings of the firmware.
  • the modification may be obtained from a support manager.
  • the solution level corrective action is modifying the firmware of a second component that is not the potentially failed component.
  • the second component and the potentially failed component may be hosted by the same computing device.
  • the solution level corrective action is modifying the firmware of a component of a computing device that does not host the potentially failed component.
  • the method may end following Step 614 .
  • FIG. 7 shows a flowchart of a method in accordance with one or more embodiments of the invention.
  • the method depicted in FIG. 7 may be used to manage a computing cluster in accordance with one or more embodiments of the invention.
  • the method shown in FIG. 7 may be performed by, for example, a support manager ( 100 , FIG. 1 ).
  • Other component of the system illustrated in FIG. 1 may perform the method of FIG. 7 without departing from the invention.
  • Step 700 a support package associated with a support session is obtained.
  • the support package includes solution level state information.
  • Step 702 the support package is analyzed using a knowledge base to identify a corrective action.
  • the support package is analyzed by matching the solution level state information to similar information in the knowledge base. A corrective action associated with the matched solution level state information may be identified.
  • the corrective action is replacement of the potentially failed component with a replacement component.
  • the replacement component may be selected by a solution manager.
  • the corrective action is a replacement of an application or firmware.
  • the replacement may change a version of the application or firmware.
  • the replacement may be reinstallation of an already installed version.
  • the corrective action is a changing of setting of an application or firmware.
  • Step 704 the corrective action is sent to a support engine associated with the support session.
  • Step 706 an outcome associated with the corrective action is obtained from the support engine.
  • Step 708 the knowledge base is updated based on the outcome.
  • the knowledge base is updated by storing an associated between the solution state information and the outcome in the knowledge base.
  • the method may end following Step 708 .
  • FIG. 8 shows a flowchart of a method in accordance with one or more embodiments of the invention.
  • the method depicted in FIG. 8 may be used to select a replacement component in accordance with one or more embodiments of the invention.
  • the method shown in FIG. 8 may be performed by, for example, a solution manager ( 110 , FIG. 1 ).
  • Other component of the system illustrated in FIG. 1 may perform the method of FIG. 8 without departing from the invention.
  • Step 800 a replacement hardware identification request is obtained.
  • the request is obtained from a support manager.
  • the request may include an identifier of a potentially failed component.
  • Step 802 replacement hardware associated with a component specified by the request of Step 800 is selected.
  • the selected replacement hardware is any hardware that is certified to work with the solution which hosts the potentially failed component.
  • Step 804 it is determined whether the selected replacement hardware is available. If the replacement hardware is available, the method proceeds to Step 806 . If the selected replacement hardware is not available, the method proceeds to Step 808 .
  • Step 806 the selected replacement hardware is sent.
  • Step 808 an alternative replacement hardware that is both certified and available is sent.
  • the method may end follow Step 806 or Step 808 .
  • FIG. 9A shows a diagram of a system including a computing cluster ( 900 ) that includes two computing devices ( 910 , 920 ).
  • a malfunctioning solid state drive ( 912 ) may start to impact the performance of the computing cluster ( 900 ) that manifests in a deterioration of the performance of an interconnect ( 930 ).
  • a support engine may identify that the interconnect ( 930 ) is potentially failing due to its reduced performance. In response to the identified potential component failure, the support engine may obtain solution level state information.
  • the solution level state information may include the iops of the malfunctioning solid state drive ( 912 ).
  • the solution level state information may be analyzed based on information in a knowledge base (not shown).
  • the analysis shows that the malfunctioning solid state drive ( 912 ) is malfunctioning. Based on the analysis, it is determined that the malfunctioning solid state drive ( 912 ) should be replaced.
  • FIG. 9B shows a diagram of the system of FIG. 9A after the malfunctioning solid state drive is replaced with a replacement solid state drive ( 916 ).
  • a replacement solid state drive 916
  • the malfunctioning solid state drive rather than component that facilitated the interconnect ( 930 ) such as the network interface cards ( 914 , 924 ), the performance of the system was restored without replacing component that were operating properly.
  • FIG. 10 shows a diagram of a computing ( 1000 ).
  • the computing device ( 1000 ) may include one or more computer processors ( 1002 ), non-persistent storage ( 1004 ) (e.g., volatile memory, such as random access memory (RAM), cache memory), persistent storage ( 1006 ) (e.g., a hard disk, an optical drive such as a compact disk (CD) drive or digital versatile disk (DVD) drive, a flash memory, etc.), a communication interface ( 1012 ) (e.g., Bluetooth interface, infrared interface, network interface, optical interface, etc.), input devices ( 1010 ), output devices ( 1008 ), and numerous other elements (not shown) and functionalities. Each of these components is described below.
  • the computer processor(s) ( 1002 ) may be an integrated circuit for processing instructions.
  • the computer processor(s) may be one or more cores or micro-cores of a processor.
  • the computing device ( 1000 ) may also include one or more input devices ( 1010 ), such as a touchscreen, keyboard, mouse, microphone, touchpad, electronic pen, or any other type of input device.
  • the communication interface ( 1012 ) may include an integrated circuit for connecting the computing device ( 1000 ) to a network (not shown) (e.g., a local area network (LAN), a wide area network (WAN) such as the Internet, mobile network, or any other type of network) and/or to another device, such as another computing device.
  • a network not shown
  • LAN local area network
  • WAN wide area network
  • the computing device ( 1000 ) may include one or more output devices ( 1008 ), such as a screen (e.g., a liquid crystal display (LCD), a plasma display, touchscreen, cathode ray tube (CRT) monitor, projector, or other display device), a printer, external storage, or any other output device.
  • a screen e.g., a liquid crystal display (LCD), a plasma display, touchscreen, cathode ray tube (CRT) monitor, projector, or other display device
  • One or more of the output devices may be the same or different from the input device(s).
  • the input and output device(s) may be locally or remotely connected to the computer processor(s) ( 1002 ), non-persistent storage ( 1004 ), and persistent storage ( 1006 ).
  • the computer processor(s) 1002
  • non-persistent storage 1004
  • persistent storage 1006
  • Embodiments of the invention may improve the performance of solutions that utilize distributed computations performed by computing clusters or other groups of computing devices. Embodiments may improve the performance of the solutions by improving the uptime of the solutions. Specifically, embodiments of the invention may provide a method of determining corrective actions based on solution level state information rather than state information of a component that is potentially failing. Doing so reduces the number of corrective actions that may need to be performed to repair the solution when a portion of the solution fails. For example, a failure of a component may manifest itself throughout a solution in unexpected ways and impact the performance of other components. Traditional approaches to repair solutions focus on relating decreases in the performance of individual components to corrective actions.
  • Embodiments of the invention may prevent this problem by performing corrective actions that are based on solution level state information rather than component level state information. In this manner, a solution that is impacted by a malfunctioning or otherwise failed component may be corrected without needlessly replacing components, application, or firmware. Thus, embodiments of the invention may improve an uptime of a solution and decrease a cost of supporting a solution.
  • one or more embodiments of the invention address the problem of detecting and correcting component failure in a distributed system. Since the failure of a component in a distributed system may cause unpredictable performance penalties, embodiments of the invention necessarily address problems that are rooted in computing technology. That is the identification of component failure and remediation of failed components in a distributed environment that might otherwise mask the true cause of a performance decrease of a distributed system.
  • a data structure may include a first element labeled as A and a second element labeled as N.
  • This labeling convention means that the data structure may include any number of the elements.
  • a second data structure also labeled as A to N, may also include any number of elements. The number of elements of the first data structure and the number of elements of the second data structure may be the same or different.
  • One or more embodiments of the invention may be implemented using instructions executed by one or more processors of the data management device. Further, such instructions may correspond to computer readable instructions that are stored on one or more non-transitory computer readable mediums.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A support engine for managing computing clusters includes a persistent storage and a processor. The persistent storage includes monitoring policies. The processor monitors a computing cluster of the computing clusters and identifies a potential component failure of the computing cluster based on the monitoring and the monitoring policies. In response to identifying the potential component failure the processor identifies an error state of the computing cluster; obtains solution level state information from the computing cluster based on the identified error state; generates a support package comprising the solution level state information; and initiates a support session by sending the generated support package to a support manager.

Description

BACKGROUND
Computing devices may include any number of internal components such as processors, memory, and persistent storage. Each of the internal components may operate with other components of the computing devices. For example, some processors store generated data in a persistent storage and may utilize capacity of the memory to perform computations.
In a network environment, multiple computing devices may cooperate to accomplish a task. For example, multiple computing devices may perform different computations that may be used, in turn, to generate a final result.
SUMMARY
In one aspect, a support engine for managing computing clusters in accordance with one or more embodiments of the invention includes a persistent storage and a processor. The persistent storage includes monitoring policies. The processor monitors a computing cluster of the computing clusters and identifies a potential component failure of the computing cluster based on the monitoring and the monitoring policies. In response to identifying the potential component failure the processor identifies an error state of the computing cluster; obtains solution level state information from the computing cluster based on the identified error state; generates a support package comprising the solution level state information; and initiates a support session by sending the generated support package to a support manager.
In one aspect, a method for managing computing clusters in accordance with one or more embodiments of the invention includes monitoring a computing cluster of the computing clusters and identifying a potential component failure of the computing cluster based on the monitoring and monitoring policies. The method further includes, in response to identifying the potential component failure, identifying an error state of the of the computing cluster; obtaining solution level state information from the computing cluster based on the identified error state; generating a support package comprising the solution level state information; and initiating a support session by sending the generated support package to a support manager to correct the potential component failure.
In one aspect, a non-transitory computer readable medium in accordance with one or more embodiments of the invention includes computer readable program code, which when executed by a computer processor enables the computer processor to perform a method for managing computing clusters, the method includes monitoring a computing cluster of the computing clusters and identifying a potential component failure of the computing cluster based on the monitoring and monitoring policies. The method further includes, in response to identifying the potential component failure, identifying an error state of the of the computing cluster; obtaining solution level state information from the computing cluster based on the identified error state; generating a support package comprising the solution level state information; and initiating a support session by sending the generated support package to a support manager to correct the potential component failure.
BRIEF DESCRIPTION OF DRAWINGS
Certain embodiments of the invention will be described with reference to the accompanying drawings. However, the accompanying drawings illustrate only certain aspects or implementations of the invention by way of example and are not meant to limit the scope of the claims.
FIG. 1 shows a diagram of a system in accordance with one or more embodiments of the invention.
FIG. 2 shows a diagram of an example support manager in accordance with one or more embodiments of the invention.
FIG. 3 shows a diagram of an example solution manager in accordance with one or more embodiments of the invention.
FIG. 4 shows a diagram of an example support engine in accordance with one or more embodiments of the invention.
FIG. 5A shows a diagram of example monitoring policies in accordance with one or more embodiments of the invention.
FIG. 5B shows a diagram of example actions of monitoring policies in accordance with one or more embodiments of the invention.
FIG. 6 shows a diagram of a flowchart of a method of correcting a solution in accordance with one or more embodiments of the invention.
FIG. 7 shows a diagram of a flowchart of a method of managing computing clusters of a solution in accordance with one or more embodiments of the invention.
FIG. 8 shows a diagram of a flowchart of a method of selecting replacement components in accordance with one or more embodiments of the invention.
FIG. 9A shows a diagram of a malfunctioning computing cluster.
FIG. 9B shows a diagram of the computing cluster of FIG. 9A after corrective actions have been performed.
FIG. 10 shows a diagram of a computing device in accordance with one or more embodiments of the invention.
DETAILED DESCRIPTION
Specific embodiments will now be described with reference to the accompanying figures. In the following description, numerous details are set forth as examples of the invention. It will be understood by those skilled in the art that one or more embodiments of the present invention may be practiced without these specific details and that numerous variations or modifications may be possible without departing from the scope of the invention. Certain details known to those of ordinary skill in the art are omitted to avoid obscuring the description.
In the following description of the figures, any component described with regard to a figure, in various embodiments of the invention, may be equivalent to one or more like-named components described with regard to any other figure. For brevity, descriptions of these components will not be repeated with regard to each figure. Thus, each and every embodiment of the components of each figure is incorporated by reference and assumed to be optionally present within every other figure having one or more like-named components. Additionally, in accordance with various embodiments of the invention, any description of the components of a figure is to be interpreted as an optional embodiment, which may be implemented in addition to, in conjunction with, or in place of the embodiments described with regard to a corresponding like-named component in any other figure.
In general, embodiments of the invention relate to systems, devices, and methods for managing a solution. A solution may include one or more computing clusters. Each of the computing clusters may include computing devices. Each of the computing devices may include individual components. The solution may orchestrate the computing devices to perform predetermined actions to accomplish a goal.
Due to the complexity of a solution, the failure of any component of the solution may impact the performance of any portion of the solution. For example, failure of a data storage device of a computing device may manifest itself as a decrease in the transaction rate of a database application hosted by a second computing device.
In one or more embodiments of the invention, a system includes a support engine that monitors the components of a solution. The support engine may detect potential component failures based upon the decrease in performance of a portion of the solution. The detected potential component failures may be real or may be the manifestations of failures of other components.
In response to the detection of a potential component failure, the support engine may automatically obtain solution level state information. The solution level state information may include various performance metrics regarding the overall functionality of the solution. The solution level state information may be used to determine a corrective action to be performed. In this manner, embodiments of the invention may quickly and automatically obtain all of the information that may be necessary to identify a component that has truly failed. Doing so reduces the amount of time required to identify a corrective action that actually remediates the failure of the component in the solution.
FIG. 1 shows a system in accordance with one or more embodiments of the invention. The system may include a support manager (100) that manages a solution (115) by diagnosing problems with the supported solution (115) and performing corrective actions. The system may include a support engine (120) that obtains information regarding the state of computing clusters (130). The support engine (120) may provide the obtained information to the support manager (100) to enable it to perform its management functions. In the case of a hardware problem, rather than a software problem, the support manager (100) may identify replacement hardware via a solution manager (110). Each of the aforementioned components may be operably connected by any combination of wired and wireless networks. Each components is discussed below.
In one or more embodiments of the invention, the support manager (100) diagnoses and remediates problems of the computing clusters (130) that may otherwise disrupt the functionality of the computing clusters (130). For example, computing cluster A (130A) may report a decrease in the ability of one computing devices of the cluster to communicate with other computing devices of the cluster. In such a scenario, the support manager (100) may identify a problem with computing cluster A (130A), identify a solution, and initiate corrective actions in computing cluster A (130A) to address the identified problem.
To provide the aforementioned functionality, the support manager (100) may obtain state information of the computing clusters (130) from the support engine (120), or another entity. Similarly, when a hardware problem with a computing cluster is identified, the support manager (100) may identify a replacement component via the solution manager (110). For additional details regarding the support manager (100), See FIG. 2.
In one or more embodiments of the invention, the support manager (100) is a computing device. The computing device may be, for example, a mobile phone, a tablet computer, a laptop computer, a desktop computer, a server, or a cloud resource. The computing device may include one or more processors, memory (e.g., random access memory), and persistent storage (e.g., disk drives, solid state drives, etc.). The persistent storage may store computer instructions, e.g., computer code, that when executed by the processor(s) of the computing device cause the computing device to perform the functions of the support manager (100) described in this application and/or perform all or portion of the methods illustrated in FIGS. 6-8. For additional details regarding a computing device, See FIG. 10.
In one or more embodiments of the invention, the support manager (100) is a logical device. A logical device may be a virtual device that utilizes the computing resources of any number of computing devices to perform its functions. The logical device may be implemented as computer instructions, e.g., computer code, that when executed by the processor(s) of one or more computing devices cause the computing devices to perform the functions of the support manager (100) described in this application and/or all or portion of the methods illustrated in FIGS. 6-8.
In one or more embodiments of the invention, the solution manager (110) identifies replacement hardware for components of the computing clusters (130). Due to underlying hardware or software requirements, each of the components may only be compatible with specific hardware components. Further, due to finite limitations, not all types of hardware may be available at all points in time. The solution manager (110) may identify replacement hardware for a computing cluster by identifying hardware that is both compatible with the computing cluster and available for deployment. For additional details regarding the solution manager, See FIG. 3.
In one or more embodiments of the invention, the solution manager (110) is a computing device. The computing device may be, for example, a mobile phone, a tablet computer, a laptop computer, a desktop computer, a server, or a cloud resource. The computing device may include one or more processors, memory (e.g., random access memory), and persistent storage (e.g., disk drives, solid state drives, etc.). The persistent storage may store computer instructions, e.g., computer code, that when executed by the processor(s) of the computing device cause the computing device to perform the functions of the solution manager (110) described in this application and/or perform all or portion of the methods illustrated in FIGS. 6-8. For additional details regarding a computing device, See FIG. 10.
In one or more embodiments of the invention, the solution manager (110) is a logical device. A logical device may be a virtual device that utilizes the computing resources of any number of computing devices to perform its functions. The logical device may be implemented as computer instructions, e.g., computer code, that when executed by the processor(s) of one or more computing devices cause the computing devices to perform the functions of the solution manager (110) described in this application and/or perform all or portion of the methods illustrated in FIGS. 6-8.
In one or more embodiments of the invention, the support engine (120) monitors the computing clusters (130). Based on the monitoring, the support engine (120) identifies an error state of the computing clusters (130). The support engine (120) may perform monitoring based on instructions or configurations received from the support manager (100). For example, the support manager (100) may provide thresholds that when exceeded trigger the support engine (120) to obtain detailed state information from the computing clusters (130). The support engine (120) may provide the obtained information to the support manager (100). For additional details regarding the support engine, See FIG. 4.
In one or more embodiments of the invention, the support engine (120) is a computing device. The computing device may be, for example, a mobile phone, a tablet computer, a laptop computer, a desktop computer, a server, or a cloud resource. The computing device may include one or more processors, memory (e.g., random access memory), and persistent storage (e.g., disk drives, solid state drives, etc.). The persistent storage may store computer instructions, e.g., computer code, that when executed by the processor(s) of the computing device cause the computing device to perform the functions of the support engine (120) described in this application and/or perform all or portion of the methods illustrated in FIGS. 6-8. For additional details regarding a computing device, See FIG. 10.
In one or more embodiments of the invention, the support engine (120) is a logical device. A logical device may be a virtual device that utilizes the computing resources of any number of computing devices to perform its functions. The logical device may be implemented as computer instructions, e.g., computer code, that when executed by the processor(s) of one or more computing devices cause the computing devices to perform the functions of the support engine (120) described in this application and/or perform all or portion of the methods illustrated in FIGS. 6-8.
In one or more embodiments of the invention, the computing clusters (130A, 130N) are physical devices that perform computations. Each of the computing clusters may operate independently to perform different functions or may be orchestrated to cooperatively perform a predetermined function. In such a scenario, each of the computing clusters may perform similar or different functions cooperatively to accomplish a shared goal.
For example, a first computing cluster may provide computing functionality while a second computing cluster may provide storage functionality. Any number of computing clusters may perform similar or different functions without departing from the invention.
In one or more embodiments of the invention, each computing cluster includes a number of computing devices. Each computing device may be, for example, a mobile phone, a tablet computer, a laptop computer, a desktop computer, a server, or a cloud resource. The computing device may include one or more processors, memory (e.g., random access memory), and persistent storage (e.g., disk drives, solid state drives, etc.). The persistent storage may store computer instructions, e.g., computer code, that when executed by the processor(s) of the computing device cause the computing device to perform the functions of the computing clusters (130A, 130N) described in this application. For additional details regarding a computing device, See FIG. 10.
As discussed above, the support manager (100) may manage the computing clusters (130). FIG. 2 shows a diagram of an example support manager (200) in accordance with one or more embodiments of the invention.
In one or more embodiments of the invention, the example support manager (200) may: (i) monitor the computing clusters, (ii) identify potential component failures, (iii) obtain solution level state information in response to identifying potential component failures, and (iv) initiate the performance of corrective actions to reduce the likelihood that the identified potential component failures will impact the functionality of the computing clusters. To provide the aforementioned functionality, the example support manager (200) may include a session manager (210) and a persistent storage (220). Each component of the example support manager (200) is discussed below.
In one or more embodiments of the invention, the session manager (210) configures support engines to monitor computing clusters, obtains alerts from the support engines when a cluster is identified as having a potential component failure, obtains solution level state information in response to an identified potential component failure, and initiates performance of a corrective action. The corrective action may be a solution level corrective action, i.e., based on solution level state information, rather than a component level corrective action, i.e., based on component level state information.
When an alert from a support engine is obtained, the session manager (210) may initiate a support session with the support engine. The support session may be associated with computing cluster, e.g., a triggering computing cluster, that exceeded a threshold or other trigger criteria. When a support session is initiated, the session may be placed in a queue for analysis by the system. After being placed in the queue, the session manager (210) may obtain solution level state information from the triggering computing cluster.
In one or more embodiments of the invention, the solution level state information may include a solution type of the computing cluster, e.g., a predetermine function of the computing cluster. The solution level state information may include a hardware state of the computing device hosting, i.e., host computing device, the component that triggered the potential component failure. The hardware state may be a listing of the physical component of the aforementioned computing device. The solution level state information may include the hardware state of other computing devices of the triggering cluster that are operably connected to the host computing device. The solution level state information may include a software state of the host computing device, e.g., a listing of the applications, firmware, and associated settings for the applications and firmware of the host computing device. The solution level state information may include the software state of other computing devices of the triggering cluster that are operably connected to the host computing device. The solution level state information may include a transaction log of the host computing device, e.g., a listing of the actions performed over a predetermined period of time. The solution level state information may include transaction logs of other computing devices of the triggering cluster that are operably connected to the host computing device.
In some scenarios, the session manager (210) may determine that a solution level corrective action involves replacing of a physical component of a computing cluster. In such a scenario, the session manager (210) may send a request to a solution manager to identify replacement hardware. The request may specify the component to be replaced. The solution manager may respond by providing an identify of a replacement hardware that is available to facilitate the replacement of the component.
In one or more embodiments of the invention, the session manager (210) is a hardware device including circuitry. The session manager (210) may be, for example, digital signal processor, a field programmable gate array, or an application specific integrated circuit. The session manager (210) may be other types of hardware devices without departing from the invention.
In one or more embodiments of the invention, the session manager (210) is implemented as computing code stored on a persistent storage that when executed by a processor performs the functionality of the session manager (210). The processor may be hardware processor including circuitry such as, for example, a central processing unit or a microcontroller. The processor may be other types of hardware devices for processing digital information without departing from the invention.
To provide the aforementioned functionality, the session manager (210) may perform all or a portion of the methods illustrated in FIGS. 6-8. When performing the aforementioned methods or other functionality, the session manager (210) may utilize data structures stored in the persistent storage (220).
In one or more embodiments of the invention, the persistent storage (220) is a storage device that stores data structures. The persistent storage may be a physical or virtual device. For example, the persistent storage (220) may include solid state drives, solid state drives, tape drives, and other components to provide data storage functionality. Alternatively, the persistent storage (220) may be a virtual device that utilizes the physical computing resources of other components to provide data storage functionality.
In one or more embodiments of the invention, the persistent storage (220) stores sessions (220A), support engine settings (220B), a software repository (220C), and a knowledge base (220D). The persistent storage (220) may store additional data structures without departing from the invention.
The sessions (220A) may be data structures that include information regarding support sessions, the support sessions may be ongoing or previously completed. The sessions (220A) may include information obtained from a support engine used by the session manager (210) to identify a corrective action. For example, the sessions (220A) may include component state information and/or solution state information used by the session manager (210) to identify a solution level corrective action. Each session may be associated with a place in a support queue and a potential component failure that caused the session to be initiated. Each session may be generated by the session manager (210) when a support engine notifies the example support manager (200) that a component of a computing cluster has triggered a potential component failure.
The support engine settings (220B) may be a data structure that includes settings used by the session manager (210) to configure support engines. Specifically, the support engine settings (220B) may be used to configure component policies of the support engines that are used to determined whether a component is potentially in a failure state. For additional details regarding component policies, See FIGS. 5A and 5B.
The software repository (220C) may be a data structure that includes copies of software used by the session manager (210) to initiate the performance of solution level corrective actions. For example, in a scenario in which the session manager (210) makes a determination that a solution level corrective action involves a software replacement, the session manager (210) may obtain a copy of the replacement software and/or software settings and send the copy to a computing device for implementation, i.e., replacing existing software and/or software settings. The software may be applications and/or firmware.
The knowledge base (220D) may be a data structure used by the session manager (210) to identify a solution level corrective action. In other words, the knowledge base (220D) may map component level and/or solution level state information to solution level corrective actions. The mappings may be specified at any of granularity.
The knowledge base (220D) may be generated based on previous sessions. In other words, the contents of the knowledge base (220D) may be generated heuristically. The knowledge base (220D) may be automatically updated by the session manager (210) upon completion of a support session. In other words, the session manager (210) may generate a new mapping between component level and/or solution level state information to a solution level corrective action that resulted in the elimination of the potential component failure, i.e., restored the performance of the computing cluster.
The aforementioned mappings of the knowledge base (220D) are unconventional because the mappings assume that a potential component failure is not necessarily based on the component. Rather, the aforementioned mappings make an assumption that the identification of a potential component failure is merely a symptom of a solution level defect. Thus, the knowledge base (220D) mappings are between a solution state and a corrective action, not necessarily a component state and a corrective action.
While the data structures of the persistent storage (220) are illustrated as separate data structures, the aforementioned data structures may be combined with each other and/or other data without departing from the invention. Additionally, while the aforementioned data structures are illustrated as being stored on the example support manager (200), the data structures may be stored on persistent storage of other devices without departing from the invention.
As discussed above, the example support manager (200) may interact with a solution manager when attempting to initiate the performance of a solution level corrective action. FIG. 3 shows a diagram of an example solution manager (300) in accordance with one or more embodiments of the invention. The example solution manager (300) may identify replacement components for computing clusters. In other words, the solution manager may identify whether a replacement component both exists and is available, i.e., in stock or otherwise obtainable. To provide the aforementioned functionality, the example solution manager may include a replacement component identifier (310) that identifies replacement components in response to request for replacements and a persistent storage that stores data structures used by the replacement component identifier (310) to identify replacement components. Each component of the example solution manager is discussed below.
In one or more embodiments of the invention, the replacement component identifier (310) obtains requests from the support manager to identify a replacement component for a component. In response to obtaining a request, the replacement component identifier (310) may use the data structures stored in the persistent storage (320) to identify a replacement component. The data structure may enable a replacement component that both exists and is available to be identifier. The replacement component identifier (310) may notify the support manager of the identifier replacement component in response to the request. To provide the aforementioned functionality, the replacement component identifier (310) may perform all or a portion of the method illustrated in FIG. 7.
In one or more embodiments of the invention, the replacement component identifier (310) is a hardware device including circuitry. The replacement component identifier (310) may be, for example, digital signal processor, a field programmable gate array, or an application specific integrated circuit. The replacement component identifier (310) may be other types of hardware devices without departing from the invention.
In one or more embodiments of the invention, the replacement component identifier (310) is implemented as computing code stored on a persistent storage that when executed by a processor performs the functionality of the replacement component identifier (310). The processor may be hardware processor including circuitry such as, for example, a central processing unit or a microcontroller. The processor may be other types of hardware devices for processing digital information without departing from the invention.
In one or more embodiments of the invention, the persistent storage (320) is a storage device that stores data structures. The persistent storage (320) may be a physical or virtual device. For example, the persistent storage (320) may include solid state drives, solid state drives, tape drives, and other components to provide data storage functionality. Alternatively, the persistent storage (320) may be a virtual device that utilizes the physical computing resources of other components to provide data storage functionality.
In one or more embodiments of the invention, the persistent storage (320) stores component to replacement mappings (320A) and a replacement availability repository (320B). The persistent storage (320) may store additional data structures without departing from the invention.
The component to replacement mappings (320A) may be data structures that maps components to replacement components. Over time, a particular component may not longer be produced and, consequently, become unavailable. The component to replacement mappings (320A) may specify mappings between a component and all other types of components that are suitable replacements.
The replacement availability repository (320B) may specify the availability for each replacement component specified by the component to replacement mappings (320A). Thus, the component to replacement mappings (320A) may be used to identify all possible replacement components for a given component and the replacement availability repository (320B) may specify the availability for each of the possible replacement components. In this manner, the aforementioned data structures may specify the availability for each possible replacement component.
While the data structures of the persistent storage (320) are illustrated as separate data structures, the aforementioned data structures may be combined with each other and/or other data without departing from the invention. Additionally, while the aforementioned data structures are illustrated as being stored on the example solution manager (300), the data structures may be stored on persistent storage of other devices without departing from the invention.
As discussed above, the example support manager (200) may interact with a support engine supporting computing clusters. FIG. 4 shows a diagram of an example support engine (400) in accordance with one or more embodiments of the invention. The example support engine (400) may: (i) monitor computing clusters, (ii) identify potential component failures, (iii) notify a support manager of the potential component failures to start support sessions, (iv) collect solution level state information, (v) generate a support package including the solution level state information, and (vi) send the generated support package to the support manager to initiate the support session. Once a support session is initiated, the example support engine (400) may perform solution level corrective actions to address the potential component failures. The solution level corrective actions may be specified by the support manager.
To provide the aforementioned functionality, the example support engine (400) may include a cluster monitor (410) that performs both component level and solution level monitoring of computing clusters via cluster interfaces (420). The example support engine (400) may also include a persistent storage (430) that stores data structures used by the cluster monitor (410) when monitoring the computing clusters. Each component of the example support engine is discussed below.
In one or more embodiments of the invention, the cluster monitor (410) monitors the computing clusters based on monitoring policies (430A) stored in the persistent storage (430). The cluster monitor (410) may monitor the operations of the computing clusters. The monitoring may be performed by, for example, periodically obtaining performance statistics of the respective components of the computing clusters. The performance statistics may be, for example, the compression ratio of data stored in the computing clusters, the transaction rate of applications executing on the clusters, the bandwidth between computing devices of the computing clusters, etc. Other performance statistics may be used without departing from the invention. When the monitoring indicates that a component has potentially failed, the cluster monitor may initiate a support session with a support manager, perform solution level monitoring to obtain solution level state information, and perform solution level corrective actions. To provide the aforementioned functionality, the cluster monitor (410) may perform all or a portion of the method illustrated in FIG. 6.
In one or more embodiments of the invention, the cluster monitor (410) is a hardware device including circuitry. The cluster monitor (410) may be, for example, digital signal processor, a field programmable gate array, or an application specific integrated circuit. The cluster monitor (410) may be other types of hardware devices without departing from the invention.
In one or more embodiments of the invention, the cluster monitor (410) is implemented as computing code stored on a persistent storage that when executed by a processor performs the functionality of the cluster monitor (410). The processor may be hardware processor including circuitry such as, for example, a central processing unit or a microcontroller. The processor may be other types of hardware devices for processing digital information without departing from the invention.
The cluster interfaces (420) may be operable connections between the example support engine and one or more computing clusters. The operable connections may be supported by any combination of wired and/or wireless networking. In a scenario in which the example support engine (400) only supports a single computing cluster, only a single cluster interface may be present. The example support engine (400) may include any number of cluster interfaces without departing from the invention.
In one or more embodiments of the invention, the persistent storage (430) is a storage device that stores data structures. The persistent storage (430) may be a physical or virtual device. For example, the persistent storage (430) may include solid state drives, solid state drives, tape drives, and other components to provide data storage functionality. Alternatively, the persistent storage (430) may be a virtual device that utilizes the physical computing resources of other components to provide data storage functionality.
In one or more embodiments of the invention, the persistent storage (430) stores monitoring policies (430A). The persistent storage (430) may store additional data structures without departing from the invention.
The monitoring policies (430A) may be data structures that specify how the monitoring is to be performed and actions that are to be taken in response to the occurrence of a predetermined event identified by the monitoring. For example, the monitoring policies (430A) may specify that the monitoring is to be performed by monitoring a transaction rate of a database executing on a cluster and if the transaction rate falls below a predetermined threshold a solid state disk storing a cache used by the database is to be identified as having potentially failed. For additional details regarding monitoring policies (430A), See FIGS. 5A and 5B.
While the data structure of the persistent storage (430) is illustrated as a solitary data structure, the aforementioned data structure may be combined with other data without departing from the invention. Additionally, while the aforementioned data structure is illustrated as being stored on the example support engine (400), the data structure may be stored on persistent storage of other devices without departing from the invention.
FIG. 5A shows a diagram of example monitoring policies (500) in accordance with one or more embodiments of the invention. The example monitoring policies (500) may specify monitoring to be performed by a support engine and actions to be taken when predetermined condition occurs that is identified based on the monitoring. The example monitoring policies (500) may specify the policies at any level of granularity without departing from the invention.
In one or more embodiments of the invention, the example monitoring policies (500) include a number of entries (502, 504). Each of the entries may specify a monitoring policy. Each entry may include an entity identifier (502A, 504A), a state (502B, 504B), and actions (502C, 504C).
The entity identifier (502A, 504A) may be an identifier of entity to be monitored. The entity identifier (502A, 504A) may identify a component, a computing device, a computing cluster, or a solution, i.e., group of computing devices or group of computing clusters.
The state (502B, 504B) may specify a condition of the component identifier by the entity identifier (502A, 504A) that, when met, triggers the performance of the actions (502C, 504C) of the respective entry. The state (502B, 504B) may be a condition such as, for example, a threshold or a range. The state (502B, 504B) may be other types of conditions without departing from the invention. When the monitoring indicates that the monitored entity meets the condition specified by the state (502B, 504B), the policy associated with entry may invoke the performance of the actions associated with the respective policy.
The actions (502C, 504C) may be actions to be performed by the support engine in response to the conditions specified by the state being met. The actions may be, for example, the performance of solution level monitoring, obtaining of predetermined information from a computing cluster, modifying the operation of a computing cluster, or any other type of action. For additional details regarding the actions (502C, 504C), See FIG. 5B.
While the example monitoring policies (500) are illustrated as a list of entries, the information of the example monitoring policies (500) may be stored in other formats, may include additional, less, and/or different information, and/or may be broken down into multiple data structures without departing from the invention.
FIG. 5B shows a diagram of example actions (520) in accordance with one or more embodiments of the invention. The example actions (520) may specify one or more actions to be performed.
In one or more embodiments of the invention, the example actions (520) include obtaining of a hardware state of a component host (522), obtaining of a software state of the component host (522), obtaining of host settings (524), validation of the component (526), initiation of a new support session (528), and generation of a log (530). The component host may be the computing device that hosts the component that was identified as potentially failed. The validation may be to compare the functionality of the component to a baseline, e.g., performance test the component. The generated log may include all of the actions performed by a support engine and the response from a computing cluster.
While the example actions (520) are shown as included a limited number and selection of actions that may be performed, the example actions (520) may include any quantity and type of actions without departing from the invention.
In some embodiments of the invention, the example actions (520) may include a triggering of another policy. For example, the example actions (520) may cause a condition specified by a state of another action to be met and, thus, trigger the performance of another set of actions. In this manner, monitoring policies may be nested and/or recursively trigger any number of monitoring policies upon the triggering of any one monitoring policy.
As discussed above, the system of FIG. 1 perform the methods of FIGS. 6-8.
FIG. 6 shows a flowchart of a method in accordance with one or more embodiments of the invention. The method depicted in FIG. 6 may be used to manage a computing cluster in accordance with one or more embodiments of the invention. The method shown in FIG. 6 may be performed by, for example, an support engine (120, FIG. 1). Other component of the system illustrated in FIG. 1 may perform the method of FIG. 6 without departing from the invention.
In Step 600, computing clusters are monitored using monitoring policies.
In one or more embodiments of the invention, the monitoring may be accomplished by obtaining state information from the computing clusters and comparing the state information to the state policies. The monitoring may be performed, periodically, in accordance with a schedule, or via other regimes. The monitoring may obtain component level state information from components of computing devices of the computing clusters. The component level state information may be, for example, performance characteristics of the components such as, for example, a processing rate, bandwidth, or storage rate. The performance characteristics may be other types of characteristics without departing from the invention.
In Step 602, a potential component failure is identified based on the monitoring.
In one or more embodiments of the invention, the potential component failure is identified by comparing the obtain component level state information to monitoring policies. If the component level state information exceeds a threshold specified by the monitoring policies, the component associated with the component level state information may be identified as a potential component failure.
In Step 604, an error state of the computing cluster is identified in response to the identified potential component failure.
In one or more embodiments of the invention, the error state of the computing cluster is identified based on the potential component failure. For example, each type of component failure may be associated with an error state of the computing cluster. In contrast to traditional approaches that may focus on component level failures, embodiments of the invention may use the presence of a potential component failure as an indicated of a cluster level error state.
In Step 606, solution level state information is obtained from the computing cluster based on the identified error state of the cluster.
In one or more embodiments of the invention, the solution level state information is obtained by characterizing the performance of each component of each computing device of a cluster associated with the cluster level error state. Thus, embodiments of the invention may automatically obtain solution level state information in response to the identification of a potential component level failure.
In Step 608, a support package is generated that includes the solution level state information.
The support package may include an identifier of the solution, an identifier of the computing cluster, identifiers of the computing devices of the computing cluster, identifiers of the components of the computing devices, the potential component failure, an identifier of the potentially failed component, and/or other information regarding the solution.
In Step 610, a support session is initiated by sending the generated support package to a support manager.
In Step 612, a solution level corrective action is obtained. The solution level corrective action may be obtained from the support manager or another entity.
In Step 614, the solution level corrective action is performed.
In one or more embodiments of the invention, the solution level corrective action is replacing the component with a replacement component selected based on the solution level state information.
In one or more embodiments of the invention, the replacement component may be identified by identifying a solution type of the computing cluster, e.g., data storage, processing, etc. A certified component, e.g., a component that is known to be compatible with the solution type, may be selected. The selected certified component may be used as the replacement component.
In one or more embodiments of the invention, the solution level corrective action is modifying firmware associated with the potentially failed component. The modification may be a replacement of the firmware or a modification of the settings of the firmware. The modification may be obtained from a support manager.
In one or more embodiments of the invention, the solution level corrective action is modifying the firmware of a second component that is not the potentially failed component. The second component and the potentially failed component may be hosted by the same computing device.
In one or more embodiments of the invention, the solution level corrective action is modifying the firmware of a component of a computing device that does not host the potentially failed component.
The method may end following Step 614.
FIG. 7 shows a flowchart of a method in accordance with one or more embodiments of the invention. The method depicted in FIG. 7 may be used to manage a computing cluster in accordance with one or more embodiments of the invention. The method shown in FIG. 7 may be performed by, for example, a support manager (100, FIG. 1). Other component of the system illustrated in FIG. 1 may perform the method of FIG. 7 without departing from the invention.
In Step 700, a support package associated with a support session is obtained.
In one or more embodiments of the invention, the support package includes solution level state information.
In Step 702, the support package is analyzed using a knowledge base to identify a corrective action.
In one or more embodiments of the invention, the support package is analyzed by matching the solution level state information to similar information in the knowledge base. A corrective action associated with the matched solution level state information may be identified.
In one or more embodiments of the invention, the corrective action is replacement of the potentially failed component with a replacement component. The replacement component may be selected by a solution manager.
In one or more embodiments of the invention, the corrective action is a replacement of an application or firmware. The replacement may change a version of the application or firmware. The replacement may be reinstallation of an already installed version.
In one or more embodiments of the invention, the corrective action is a changing of setting of an application or firmware.
In Step 704, the corrective action is sent to a support engine associated with the support session.
In Step 706, an outcome associated with the corrective action is obtained from the support engine.
In Step 708, the knowledge base is updated based on the outcome.
In one or more embodiments of the invention, the knowledge base is updated by storing an associated between the solution state information and the outcome in the knowledge base.
The method may end following Step 708.
FIG. 8 shows a flowchart of a method in accordance with one or more embodiments of the invention. The method depicted in FIG. 8 may be used to select a replacement component in accordance with one or more embodiments of the invention. The method shown in FIG. 8 may be performed by, for example, a solution manager (110, FIG. 1). Other component of the system illustrated in FIG. 1 may perform the method of FIG. 8 without departing from the invention.
In Step 800, a replacement hardware identification request is obtained.
In one or more embodiments of the invention, the request is obtained from a support manager. The request may include an identifier of a potentially failed component.
In Step 802, replacement hardware associated with a component specified by the request of Step 800 is selected.
In one or more embodiments of the invention, the selected replacement hardware is any hardware that is certified to work with the solution which hosts the potentially failed component.
In Step 804, it is determined whether the selected replacement hardware is available. If the replacement hardware is available, the method proceeds to Step 806. If the selected replacement hardware is not available, the method proceeds to Step 808.
In Step 806, the selected replacement hardware is sent.
In Step 808, an alternative replacement hardware that is both certified and available is sent.
The method may end follow Step 806 or Step 808.
To further clarify embodiments of the invention, a non-limiting example is provided below.
Example 1
Consider a deployment scenario as illustrated in FIG. 9A which shows a diagram of a system including a computing cluster (900) that includes two computing devices (910, 920). At a point in time, a malfunctioning solid state drive (912) may start to impact the performance of the computing cluster (900) that manifests in a deterioration of the performance of an interconnect (930).
Based on its monitoring of the computing cluster (900), a support engine (not shown) may identify that the interconnect (930) is potentially failing due to its reduced performance. In response to the identified potential component failure, the support engine may obtain solution level state information. The solution level state information may include the iops of the malfunctioning solid state drive (912).
The solution level state information may be analyzed based on information in a knowledge base (not shown). The analysis shows that the malfunctioning solid state drive (912) is malfunctioning. Based on the analysis, it is determined that the malfunctioning solid state drive (912) should be replaced.
FIG. 9B shows a diagram of the system of FIG. 9A after the malfunctioning solid state drive is replaced with a replacement solid state drive (916). As seen from FIG. 9B, since the malfunctioning solid state drive, rather than component that facilitated the interconnect (930) such as the network interface cards (914, 924), the performance of the system was restored without replacing component that were operating properly.
End of Example 1
As discussed above, embodiments of the invention may be implemented using computing devices. FIG. 10 shows a diagram of a computing (1000). The computing device (1000) may include one or more computer processors (1002), non-persistent storage (1004) (e.g., volatile memory, such as random access memory (RAM), cache memory), persistent storage (1006) (e.g., a hard disk, an optical drive such as a compact disk (CD) drive or digital versatile disk (DVD) drive, a flash memory, etc.), a communication interface (1012) (e.g., Bluetooth interface, infrared interface, network interface, optical interface, etc.), input devices (1010), output devices (1008), and numerous other elements (not shown) and functionalities. Each of these components is described below.
In one embodiment of the invention, the computer processor(s) (1002) may be an integrated circuit for processing instructions. For example, the computer processor(s) may be one or more cores or micro-cores of a processor. The computing device (1000) may also include one or more input devices (1010), such as a touchscreen, keyboard, mouse, microphone, touchpad, electronic pen, or any other type of input device. Further, the communication interface (1012) may include an integrated circuit for connecting the computing device (1000) to a network (not shown) (e.g., a local area network (LAN), a wide area network (WAN) such as the Internet, mobile network, or any other type of network) and/or to another device, such as another computing device.
In one embodiment of the invention, the computing device (1000) may include one or more output devices (1008), such as a screen (e.g., a liquid crystal display (LCD), a plasma display, touchscreen, cathode ray tube (CRT) monitor, projector, or other display device), a printer, external storage, or any other output device. One or more of the output devices may be the same or different from the input device(s). The input and output device(s) may be locally or remotely connected to the computer processor(s) (1002), non-persistent storage (1004), and persistent storage (1006). Many different types of computing devices exist, and the aforementioned input and output device(s) may take other forms.
Embodiments of the invention may improve the performance of solutions that utilize distributed computations performed by computing clusters or other groups of computing devices. Embodiments may improve the performance of the solutions by improving the uptime of the solutions. Specifically, embodiments of the invention may provide a method of determining corrective actions based on solution level state information rather than state information of a component that is potentially failing. Doing so reduces the number of corrective actions that may need to be performed to repair the solution when a portion of the solution fails. For example, a failure of a component may manifest itself throughout a solution in unexpected ways and impact the performance of other components. Traditional approaches to repair solutions focus on relating decreases in the performance of individual components to corrective actions. Consequently, the traditional approach may needlessly cause applications or hardware to be replaced that is not, in fact, the cause of the decreased performance of the system. Embodiments of the invention may prevent this problem by performing corrective actions that are based on solution level state information rather than component level state information. In this manner, a solution that is impacted by a malfunctioning or otherwise failed component may be corrected without needlessly replacing components, application, or firmware. Thus, embodiments of the invention may improve an uptime of a solution and decrease a cost of supporting a solution.
Accordingly, one or more embodiments of the invention address the problem of detecting and correcting component failure in a distributed system. Since the failure of a component in a distributed system may cause unpredictable performance penalties, embodiments of the invention necessarily address problems that are rooted in computing technology. That is the identification of component failure and remediation of failed components in a distributed environment that might otherwise mask the true cause of a performance decrease of a distributed system.
While embodiments of the invention have been described as addressing one or more problems, embodiments of the invention are applicable to address other problems and the scope of the invention should not be limited to addressing the problems specifically discussed throughout this application.
Throughout this application, elements of figures may be labeled as A to N. As used herein, the aforementioned labeling means that the element may include any number of items and does not require that the element include the same number of elements as any other item labeled as A to N. For example, a data structure may include a first element labeled as A and a second element labeled as N. This labeling convention means that the data structure may include any number of the elements. A second data structure, also labeled as A to N, may also include any number of elements. The number of elements of the first data structure and the number of elements of the second data structure may be the same or different.
One or more embodiments of the invention may be implemented using instructions executed by one or more processors of the data management device. Further, such instructions may correspond to computer readable instructions that are stored on one or more non-transitory computer readable mediums.
While the invention has been described above with respect to a limited number of embodiments, those skilled in the art, having the benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as disclosed herein. Accordingly, the scope of the invention should be limited only by the attached claims.

Claims (20)

What is claimed is:
1. A support engine for managing computing clusters, comprising:
a persistent storage comprising monitoring policies; and
a processor programmed to:
monitor a computing cluster of the computing clusters;
identify a potential component failure of a component of the computing cluster based on the monitoring and the monitoring policies;
in response to identifying the potential component failure:
identify an error state of the computing cluster;
obtain solution level state information from the computing cluster based on the identified error state;
generate a support package comprising the solution level state information;
initiate a support session by sending the generated support package to a support manager;
after initiating the support session:
obtain a solution level corrective action comprising:
replacing the component with a second component selected based on the solution level state information, the replacing of the component comprising:
 identifying a solution type of the computing cluster,
 selecting a certified component associated with the solution type, and
 using the certified component as the second component, and
perform the solution level corrective action.
2. The support engine of claim 1, wherein the solution level state information comprises:
a hardware state of a computing device hosting the component.
3. The support engine of claim 1, wherein the solution level state information comprises:
a software state of a computing device hosting the component.
4. The support engine of claim 1, wherein the solution level state information comprises:
a transaction log of a computing device hosting the component.
5. The support engine of claim 1, wherein monitor the computing cluster of the computing clusters comprises:
obtaining component level state information from a plurality of components of the computing cluster; and
performing a comparison of the component level state information to the monitoring policies to obtain component-policy comparisons;
identifying any of the monitoring policies invoked by the component level state information based on the comparison.
6. The support engine of claim 5, wherein the monitoring policies comprise:
a plurality of thresholds associated with the component level state information; and
actions associated with each of the plurality of thresholds.
7. The support engine of claim 5, where identifying the potential component failure of the computing cluster based on the monitoring and the monitoring policies comprises:
making a determination that a threshold specified by the monitoring policies has been exceeded based on the component-policy comparisons; and
identifying a component of the components having the component level state information that exceeded the threshold.
8. The support engine of claim 1, wherein the support package further comprises:
an identifier of the component;
an identifier of the computing cluster;
a solution type of the computing cluster.
9. The support engine of claim 1, wherein the support package further comprises:
a transaction log of a computing device that does not host the component.
10. The support engine of claim 9, wherein computing device is part of the computing cluster.
11. The support engine of claim 1, wherein the support package further comprises:
a listing of network address identifiers of each computing device of the computing cluster.
12. A method for managing computing clusters, comprising:
monitoring a computing cluster of the computing clusters;
identifying a potential component failure of a component of the computing cluster based on the monitoring and monitoring policies;
in response to identifying the potential component failure:
identifying an error state of the of the computing cluster;
obtaining solution level state information from the computing cluster based on the identified error state;
generating a support package comprising the solution level state information;
initiating a support session by sending the generated support package to a support manager to correct the potential component failure;
after initiating the support session:
obtain a solution level corrective action comprising:
replacing the component with a second component selected based on the solution level state information, the replacing of the component comprising:
identifying a solution type of the computing cluster,
selecting a certified component associated with the solution type, and
using the certified component as the second component, and
performing the solution level corrective action.
13. The method of claim 12, wherein the solution level state information comprises:
a hardware state of a computing device hosting the component.
14. The method of claim 12, wherein monitoring the computing cluster of the computing clusters comprises:
obtaining component level state information from a plurality of components of the computing cluster; and
performing a comparison of the component level state information to the monitoring policies to obtain component-policy comparisons;
identifying any of the monitoring policies.
15. The method of claim 14, wherein the monitoring policies comprise:
a plurality of thresholds associated with the component level state information; and
actions associated with each of the plurality of thresholds.
16. The method of claim 14, where identifying the potential component failure of the computing cluster based on the monitoring and the monitoring policies comprises:
making a determination that a threshold specified by the monitoring policies has been exceeded based on the component-policy comparisons; and
identifying a component of the components having the component level state information that exceeded the threshold.
17. A non-transitory computer readable medium comprising computer readable program code, which when executed by a computer processor enables the computer processor to perform a method for managing computing clusters, the method comprising:
monitoring a computing cluster of the computing clusters;
identifying a potential component failure of a component of the computing cluster based on the monitoring and monitoring policies;
in response to identifying the potential component failure:
identifying an error state of the of the computing cluster;
obtaining solution level state information from the computing cluster based on the identified error state;
generating a support package comprising the solution level state information;
initiating a support session by sending the generated support package to a support manager to correct the potential component failure;
after initiating the support session:
obtaining a solution level corrective action comprising:
replacing the component with a second component selected based on the solution level state information, the replacing of the component comprising:
identifying a solution type of the computing cluster,
selecting a certified component associated with the solution type, and
using the certified component as the second component, and
performing the solution level corrective action.
18. The non-transitory computer readable medium of claim 17, wherein monitoring the computing cluster of the computing clusters comprises:
obtaining component level state information from a plurality of components of the computing cluster; and
performing a comparison of the component level state information to the monitoring policies to obtain component-policy comparisons;
identifying any of the monitoring policies.
19. The non-transitory computer readable medium of claim 18, wherein the monitoring policies comprise:
a plurality of thresholds associated with the component level state information; and
actions associated with each of the plurality of thresholds.
20. The non-transitory computer readable medium of claim 18, where identifying the potential component failure of the computing cluster based on the monitoring and the monitoring policies comprises:
making a determination that a threshold specified by the monitoring policies has been exceeded based on the component-policy comparisons; and
identifying a component of the components having the component level state information that exceeded the threshold.
US15/961,237 2018-04-24 2018-04-24 System and method to automate solution level contextual support Active 2038-08-19 US11086738B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/961,237 US11086738B2 (en) 2018-04-24 2018-04-24 System and method to automate solution level contextual support

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US15/961,237 US11086738B2 (en) 2018-04-24 2018-04-24 System and method to automate solution level contextual support

Publications (2)

Publication Number Publication Date
US20190324873A1 US20190324873A1 (en) 2019-10-24
US11086738B2 true US11086738B2 (en) 2021-08-10

Family

ID=68237867

Family Applications (1)

Application Number Title Priority Date Filing Date
US15/961,237 Active 2038-08-19 US11086738B2 (en) 2018-04-24 2018-04-24 System and method to automate solution level contextual support

Country Status (1)

Country Link
US (1) US11086738B2 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11586490B2 (en) * 2020-03-31 2023-02-21 Hewlett Packard Enterprise Development Lp System and method of identifying self-healing actions for computing systems using reinforcement learning
US11321197B2 (en) * 2020-04-27 2022-05-03 Vmware, Inc. File service auto-remediation in storage systems
US20240143473A1 (en) * 2022-10-31 2024-05-02 Bitdrift, Inc. Systems and methods for dynamically configuring a client application

Citations (163)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5483637A (en) * 1994-06-27 1996-01-09 International Business Machines Corporation Expert based system and method for managing error events in a local area network
US5867714A (en) 1996-10-31 1999-02-02 Ncr Corporation System and method for distributing configuration-dependent software revisions to a computer system
US6012152A (en) 1996-11-27 2000-01-04 Telefonaktiebolaget Lm Ericsson (Publ) Software fault management system
US6205409B1 (en) * 1998-06-26 2001-03-20 Advanced Micro Devices, Inc. Predictive failure monitoring system for a mass flow controller
US6317028B1 (en) 1998-07-24 2001-11-13 Electronic Security And Identification Llc Electronic identification, control, and security system and method for consumer electronics and the like
US6473794B1 (en) * 1999-05-27 2002-10-29 Accenture Llp System for establishing plan to test components of web based framework by displaying pictorial representation and conveying indicia coded components of existing network framework
US20030149919A1 (en) * 2000-05-05 2003-08-07 Joseph Greenwald Systems and methods for diagnosing faults in computer networks
US6606744B1 (en) * 1999-11-22 2003-08-12 Accenture, Llp Providing collaborative installation management in a network-based supply chain environment
US6633782B1 (en) 1999-02-22 2003-10-14 Fisher-Rosemount Systems, Inc. Diagnostic expert in a process control system
US20040078683A1 (en) 2000-05-05 2004-04-22 Buia Christhoper A. Systems and methods for managing and analyzing faults in computer networks
US20040088145A1 (en) 2002-11-06 2004-05-06 Rosenthal Richard Edwin Methods and apparatus for designing the racking and wiring configurations for pieces of hardware
US6742141B1 (en) * 1999-05-10 2004-05-25 Handsfree Networks, Inc. System for automated problem detection, diagnosis, and resolution in a software driven system
US20040177168A1 (en) 2003-03-03 2004-09-09 Microsoft Corporation Verbose hardware identification for binding a software package to a computer system having tolerance for hardware changes
US20040177354A1 (en) 2003-03-03 2004-09-09 Microsoft Corporation Compact hardware indentification for binding a software package to a computer system having tolerance for hardware changes
US6795935B1 (en) 1999-10-28 2004-09-21 General Electric Company Diagnosis of faults in a complex system
US20040225381A1 (en) * 2003-05-07 2004-11-11 Andrew Ritz Programmatic computer problem diagnosis and resolution and automated reporting and updating of the same
US20040250260A1 (en) 2003-06-09 2004-12-09 Pioso Bennett G. Middle-ware interface status tool and method for using same
US20050033770A1 (en) * 2003-08-07 2005-02-10 Oglesby Katheryn E. Dynamically evolving memory recall and idea generation tool
US6871224B1 (en) * 1999-01-04 2005-03-22 Cisco Technology, Inc. Facility to transmit network management data to an umbrella management system
US20050078656A1 (en) 2003-10-14 2005-04-14 Bryant Stewart Frederick Method and apparatus for generating routing information in a data communications network
US20050120112A1 (en) * 2000-11-15 2005-06-02 Robert Wing Intelligent knowledge management and content delivery system
US20050144188A1 (en) 2003-12-16 2005-06-30 International Business Machines Corporation Determining the impact of a component failure on one or more services
US20050144151A1 (en) 2003-04-02 2005-06-30 Fischman Reuben S. System and method for decision analysis and resolution
US20060117212A1 (en) * 2001-02-13 2006-06-01 Network Appliance, Inc. Failover processing in a storage system
US20060149408A1 (en) * 2003-10-10 2006-07-06 Speeter Thomas H Agent-less discovery of software components
US20060178864A1 (en) 2005-02-08 2006-08-10 Madhavi Khanijo Automated system and method for configuring a rack assembly
US7103874B2 (en) 2003-10-23 2006-09-05 Microsoft Corporation Model-based management of computer systems and distributed applications
US20060235962A1 (en) * 2005-04-15 2006-10-19 Microsoft Corporation Model-based system monitoring
US7222127B1 (en) 2003-11-14 2007-05-22 Google Inc. Large scale machine learning systems and methods
US20070143851A1 (en) * 2005-12-21 2007-06-21 Fiberlink Method and systems for controlling access to computing resources based on known security vulnerabilities
US20070202469A1 (en) * 2006-02-13 2007-08-30 The Boeing Company System for trouble shooting and controlling signals to and from an aircraft simulator
US20070283011A1 (en) 2006-06-02 2007-12-06 Google Inc. Synchronizing Configuration Information Among Multiple Clients
US20080037532A1 (en) 2005-08-20 2008-02-14 Sykes Edward A Managing service levels on a shared network
US7334222B2 (en) 2002-09-11 2008-02-19 International Business Machines Corporation Methods and apparatus for dependency-based impact simulation and vulnerability analysis
US20080065700A1 (en) 2005-12-29 2008-03-13 Blue Jungle Analyzing Usage Information of an Information Management System
US7370102B1 (en) * 1998-12-15 2008-05-06 Cisco Technology, Inc. Managing recovery of service components and notification of service errors and failures
US20080201470A1 (en) * 2005-11-11 2008-08-21 Fujitsu Limited Network monitor program executed in a computer of cluster system, information processing method and computer
US20080228755A1 (en) 2007-03-14 2008-09-18 Futoshi Haga Policy creation support method, policy creation support system, and program therefor
US20080262860A1 (en) 2007-04-20 2008-10-23 Sap Ag System and Method for Supporting Software
US20090012805A1 (en) 2007-07-06 2009-01-08 Microsoft Corporation Portable Digital Rights for Multiple Devices
US7490073B1 (en) 2004-12-21 2009-02-10 Zenprise, Inc. Systems and methods for encoding knowledge for automated management of software application deployments
US7500142B1 (en) * 2005-12-20 2009-03-03 International Business Machines Corporation Preliminary classification of events to facilitate cause-based analysis
US7516362B2 (en) 2004-03-19 2009-04-07 Hewlett-Packard Development Company, L.P. Method and apparatus for automating the root cause analysis of system failures
US20090113248A1 (en) 2007-10-26 2009-04-30 Megan Elena Bock Collaborative troubleshooting computer systems using fault tree analysis
US7536595B1 (en) 2005-10-19 2009-05-19 At&T Intellectual Property, Ii, L.P. Systems, devices, and methods for initiating recovery
US20090165099A1 (en) 2007-12-21 2009-06-25 Avigdor Eldar Provisioning active management technology (amt) in computer systems
US20090183010A1 (en) 2008-01-14 2009-07-16 Microsoft Corporation Cloud-Based Movable-Component Binding
US20090260071A1 (en) 2008-04-14 2009-10-15 Microsoft Corporation Smart module provisioning of local network devices
US20090282283A1 (en) * 2008-05-09 2009-11-12 Hitachi, Ltd. Management server in information processing system and cluster management method
US20090307333A1 (en) 2008-06-05 2009-12-10 Palm, Inc. Restoring of data to mobile computing device
US20100024001A1 (en) 2008-07-25 2010-01-28 International Business Machines Corporation Securing Blade Servers In A Data Center
US20100057677A1 (en) 2008-08-27 2010-03-04 Sap Ag Solution search for software support
US7757124B1 (en) 2007-07-16 2010-07-13 Oracle America, Inc. Method and system for automatic correlation of asynchronous errors and stimuli
US20100180221A1 (en) 2009-01-14 2010-07-15 Microsoft Corporation Configuration Creation for Deployment and Monitoring
US20100229022A1 (en) * 2009-03-03 2010-09-09 Microsoft Corporation Common troubleshooting framework
US7827136B1 (en) 2001-09-20 2010-11-02 Emc Corporation Management for replication of data stored in a data storage environment including a system and method for failover protection of software agents operating in the environment
US7831693B2 (en) 2003-08-18 2010-11-09 Oracle America, Inc. Structured methodology and design patterns for web services
US20100306489A1 (en) * 2009-05-29 2010-12-02 Cray Inc. Error management firewall in a multiprocessor computer
US20100312522A1 (en) 2009-06-04 2010-12-09 Honeywell International Inc. Method and system for identifying systemic failures and root causes of incidents
US20100318487A1 (en) 2006-09-27 2010-12-16 Marvasti Mazda A Self-learning integrity management system and related methods
US20100325493A1 (en) * 2008-09-30 2010-12-23 Hitachi, Ltd. Root cause analysis method, apparatus, and program for it apparatuses from which event information is not obtained
US7886031B1 (en) 2002-06-04 2011-02-08 Symantec Operating Corporation SAN configuration utility
US20110078428A1 (en) 2009-09-30 2011-03-31 Memory Experts International Inc. Portable desktop device and method of host computer system hardware recognition and configuration
US20110093703A1 (en) 2009-10-16 2011-04-21 Etchegoyen Craig S Authentication of Computing and Communications Hardware
US7987353B2 (en) 2008-01-09 2011-07-26 International Business Machines Corporation Remote BIOS for servers and blades
US20110270482A1 (en) 2008-12-17 2011-11-03 Airbus Operations Gmbh Adaptive central maintenance system and method for planning maintenance operations for systems
US20110289342A1 (en) * 2010-05-21 2011-11-24 Schaefer Diane E Method for the file system of figure 7 for the cluster
US20120041976A1 (en) 2010-10-26 2012-02-16 ParElastic Corporation Mechanism for co-located data placement in a parallel elastic database management system
US20120083917A1 (en) * 2009-06-10 2012-04-05 Fisher-Rosemount Systems, Inc. Predicted fault analysis
US20120096272A1 (en) 2010-10-15 2012-04-19 Rockwell Automation Technologies, Inc. Security model for industrial devices
US8166552B2 (en) 2008-09-12 2012-04-24 Hytrust, Inc. Adaptive configuration management system
US20120110142A1 (en) 2010-10-29 2012-05-03 Bank Of America Corporation Configuration management utility
US20120144244A1 (en) 2010-12-07 2012-06-07 Yie-Fong Dan Single-event-upset controller wrapper that facilitates fault injection
US20120150926A1 (en) 2010-12-08 2012-06-14 International Business Machines Corporation Distributed free block map for a clustered redirect-on-write file system
US20120166142A1 (en) * 2009-09-07 2012-06-28 Hitachi, Ltd. Anomaly Detection and Diagnosis/Prognosis Method, Anomaly Detection and Diagnosis/Prognosis System, and Anomaly Detection and Diagnosis/Prognosis Program
US20120182151A1 (en) 2011-01-19 2012-07-19 Hon Hai Precision Industry Co., Ltd. Server rack having payload weighing function
US20120233216A1 (en) * 2005-12-29 2012-09-13 Nextlabs, Inc. Intelligent Policy Deployment
US8290970B2 (en) 2004-06-29 2012-10-16 Hewlett-Packard Development Company, L.P. System and method for offering one or more drivers to run on the computer
US20120265872A1 (en) 2011-04-18 2012-10-18 Cox Communications, Inc. Systems and Methods of Automatically Remediating Fault Conditions
US20120271927A1 (en) * 2010-06-23 2012-10-25 Bulat Shakirzyanov System and method for managing a computing cluster
US20120331526A1 (en) 2011-06-22 2012-12-27 TerraWi, Inc. Multi-level, hash-based device integrity checks
US8386930B2 (en) 2009-06-05 2013-02-26 International Business Machines Corporation Contextual data center management utilizing a virtual environment
US8401982B1 (en) 2010-01-14 2013-03-19 Symantec Corporation Using sequencing and timing information of behavior events in machine learning to detect malware
US20130151975A1 (en) 2010-09-07 2013-06-13 Tomer Shadi System and method for automated deployment of multi-component computer environment
US20130185667A1 (en) 2012-01-18 2013-07-18 International Business Machines Corporation Open resilience framework for simplified and coordinated orchestration of multiple availability managers
US20130257627A1 (en) * 2012-03-29 2013-10-03 Yokogawa Electric Corporation Apparatus and method for determining operation compatibility between field devices
US8583769B1 (en) 2011-08-16 2013-11-12 Edgecast Networks, Inc. Configuration management repository for a distributed platform
US20130317870A1 (en) 2012-05-25 2013-11-28 Bank Of America Apparatus and methods for process change control
US20130326029A1 (en) * 2011-11-11 2013-12-05 Level 3 Communications, Llc System and methods for configuration management
US8639798B2 (en) 2008-01-21 2014-01-28 International Business Machines Corporation Managing configuration items
US20140069291A1 (en) 2012-09-11 2014-03-13 Shenzhen China Star Optoelectronics Technology Co., Ltd Transmission system of LCD panel and automatic crane thereof
US20140082417A1 (en) 2011-08-03 2014-03-20 Honeywell International Inc. Systems and methods for using a corrective action as diagnostic evidence
US20140115176A1 (en) * 2012-10-22 2014-04-24 Cassidian Communications, Inc. Clustered session management
US8774054B2 (en) 2011-08-01 2014-07-08 Huawei Technologies Co., Ltd. Network policy configuration method, management device, and network management center device
US20140245085A1 (en) * 2013-02-22 2014-08-28 International Business Machines Corporation Managing error logs in a distributed network fabric
US8826077B2 (en) 2007-12-28 2014-09-02 International Business Machines Corporation Defining a computer recovery process that matches the scope of outage including determining a root cause and performing escalated recovery operations
US20140281675A1 (en) 2000-03-16 2014-09-18 Sony Computer Entertainment America Llc Flexible failover policies in high availability computing systems
US20140304399A1 (en) * 2013-04-06 2014-10-09 Citrix Systems, Inc. Systems and methods for providing monitoring in a cluster system
US20140304402A1 (en) * 2013-04-06 2014-10-09 Citrix Systems, Inc. Systems and methods for cluster statistics aggregation
US8868987B2 (en) 2010-02-05 2014-10-21 Tripwire, Inc. Systems and methods for visual correlation of log events, configuration changes and conditions producing alerts in a virtual infrastructure
US8874892B1 (en) 2011-05-26 2014-10-28 Phoenix Technologies Ltd. Assessing BIOS information prior to reversion
US20140324276A1 (en) 2013-04-30 2014-10-30 Cummins Inc. Engine diagnostic system for high volume feedback processing
US20140337957A1 (en) 2013-05-07 2014-11-13 Dannie Gerrit Feekes Out-of-band authentication
US20140344101A1 (en) * 2013-05-14 2014-11-20 International Business Machines Corporation Automated guidance for selecting components of an it solution
US8938621B2 (en) 2011-11-18 2015-01-20 Qualcomm Incorporated Computing device integrity protection
US8973118B2 (en) 2011-12-14 2015-03-03 Cellco Partnership Token based security protocol for managing access to web services
US8995439B2 (en) 2010-05-13 2015-03-31 Comcast Cable Communications, Llc Control of multicast content distribution
US20150117174A1 (en) * 2013-10-31 2015-04-30 Oracle International Corporation Media and drive validation in tape libraries
US20150120359A1 (en) 2013-05-13 2015-04-30 Fulcrum Collaborations, Llc System and Method for Integrated Mission Critical Ecosystem Management
US20150149822A1 (en) * 2013-11-26 2015-05-28 International Business Machines Corporation Event handling in storage area networks
US9122739B1 (en) * 2011-01-28 2015-09-01 Netapp, Inc. Evaluating proposed storage solutions
US9122501B1 (en) 2014-09-08 2015-09-01 Quanta Computer Inc. System and method for managing multiple bios default configurations
US20150256394A1 (en) 2014-03-06 2015-09-10 Dell Products, Lp System and Method for Providing a Data Center Management Controller
US20150324255A1 (en) 2014-05-09 2015-11-12 Commvault Systems, Inc. Load balancing across multiple data paths
US9201751B1 (en) 2011-04-18 2015-12-01 American Megatrends, Inc. Data migration between multiple tiers in a storage system using policy based ILM for QOS
US9225625B1 (en) 2015-03-26 2015-12-29 Linkedin Corporation Detecting and alerting performance degradation during features ramp-up
US9229902B1 (en) 2013-02-14 2016-01-05 Amazon Technologies, Inc. Managing update deployment
US20160042288A1 (en) * 2014-08-11 2016-02-11 International Business Machines Corporation Mapping user actions to historical paths to determine a predicted endpoint
US20160050222A1 (en) * 2014-08-18 2016-02-18 Bank Of America Corporation Modification of Computing Resource Behavior Based on Aggregated Monitoring Information
US20160048611A1 (en) 2014-08-15 2016-02-18 Vce Company, Llc System, Method, Apparatus, and Computer Program Product for Generation of an Elevation Plan for a Computing System
US20160057009A1 (en) 2014-08-21 2016-02-25 Netapp, Inc. Configuration of peered cluster storage environment organized as disaster recovery group
US9278481B2 (en) 2010-10-26 2016-03-08 Rinco Ultrasononics USA, INC. Sonotrode and anvil energy director grids for narrow/complex ultrasonic welds of improved durability
US20160112504A1 (en) * 2011-01-28 2016-04-21 Netapp, Inc. Proposed storage system solution selection for service level objective management
US20160110240A1 (en) * 2014-10-17 2016-04-21 Netapp Inc. Forensics collection for failed storage controllers
US9323789B1 (en) 2012-03-14 2016-04-26 Emc Corporation Automated application protection and reuse using a workflow component
US9355036B2 (en) 2012-09-18 2016-05-31 Netapp, Inc. System and method for operating a system to cache a networked file system utilizing tiered storage and customizable eviction policies based on priority and tiers
US20160173690A1 (en) 2014-12-12 2016-06-16 Xerox Corporation Spectral diagnostic engine for customer support call center
US9384082B1 (en) * 2015-10-23 2016-07-05 Pure Storage, Inc. Proactively providing corrective measures for storage arrays
US20160294643A1 (en) 2015-04-03 2016-10-06 Electronics And Telecommunications Research Institute System and method for service orchestration in distributed cloud environment
US20160302323A1 (en) 2015-04-09 2016-10-13 Ortronics, Inc. Equipment Cabinet and Associated Methods
US9542177B1 (en) 2012-10-30 2017-01-10 Amazon Technologies, Inc. Peer configuration analysis and enforcement
US20170017881A1 (en) * 2014-03-28 2017-01-19 Casebank Technologies Inc. Methods and systems for troubleshooting problems in complex systems using multiple knowledgebases
US20170032091A1 (en) 2014-05-20 2017-02-02 Siemens Healthcare Diagnostics Inc. Intelligent Service Assistant Inference Engine
US9594620B2 (en) * 2011-04-04 2017-03-14 Microsoft Technology Licensing, Llc Proactive failure handling in data processing systems
US20170085644A1 (en) * 2015-09-22 2017-03-23 Netapp, Inc. Methods and systems for selecting compatible resources in networked storage environments
US20170094003A1 (en) 2015-09-30 2017-03-30 Symantec Corporation Preventing data corruption due to pre-existing split brain
US20170206128A1 (en) 2014-11-06 2017-07-20 International Business Machines Corporation Cognitive analysis for healing an it system
US9729615B2 (en) 2013-11-18 2017-08-08 Nuwafin Holdings Ltd System and method for collaborative designing, development, deployment, execution, monitoring and maintenance of enterprise applications
US20170242740A1 (en) * 2016-02-22 2017-08-24 International Business Machines Corporation User interface error prediction
US20170339005A1 (en) 2015-02-10 2017-11-23 Huawei Technologies Co., Ltd. Method and Device for Processing Failure in at Least One Distributed Cluster, and System
US9864634B2 (en) 2012-02-06 2018-01-09 International Business Machines Corporation Enhancing initial resource allocation management to provide robust reconfiguration
US20180025166A1 (en) 2015-02-11 2018-01-25 British Telecommunications Public Limited Company Validating computer resource usage
US20180034709A1 (en) 2015-01-08 2018-02-01 Zte Corporation Method and Device for Asset Information Management
US20180041388A1 (en) 2015-03-13 2018-02-08 Koninklijke Kpn N.V. Method and Control System for Controlling Provisioning of a Service in a Network
US9898224B1 (en) 2012-09-12 2018-02-20 EMC IP Holding Company LLC Automatic adjustment of capacity usage by data storage optimizer for data migration
US9999030B2 (en) 2013-08-15 2018-06-12 Huawei Technologies Co., Ltd. Resource provisioning method
US10048996B1 (en) 2015-09-29 2018-08-14 Amazon Technologies, Inc. Predicting infrastructure failures in a data center for hosted service mitigation actions
US10057184B1 (en) 2014-11-10 2018-08-21 Amazon Technologies, Inc. Resource configuration compliance service
US20180285009A1 (en) 2017-03-30 2018-10-04 Intel Corporation Dynamically composable computing system, a data center, and method for dynamically composing a computing system
US10097620B2 (en) 2014-07-11 2018-10-09 Vmware Inc. Methods and apparatus to provision a workload in a virtual server rack deployment
US20180302277A1 (en) 2017-04-12 2018-10-18 Cisco Technology, Inc. Virtualized network functions and service chaining in serverless computing infrastructure
US20180321934A1 (en) 2017-05-05 2018-11-08 Dell Products L.P. Infrastructure configuration and inventory manager
US20180322019A1 (en) 2017-05-05 2018-11-08 Pivotal Software, Inc. Backup and restore framework for distributed computing systems
US20180329579A1 (en) 2017-05-10 2018-11-15 Dell Products L.P. High Volume Configuration of Information Handling Systems
US20190123985A1 (en) * 2017-10-25 2019-04-25 Cisco Technology, Inc. Federated network and application data analytics platform
US20190149408A1 (en) 2016-08-05 2019-05-16 Huawei Technologies Co., Ltd. Method and apparatus for deploying service in virtualized network
US20190182105A1 (en) 2017-12-08 2019-06-13 At&T Mobility Ii Llc System facilitating prediction, detection and mitigation of network or device issues in communication systems
US20190306013A1 (en) 2018-03-28 2019-10-03 Dell Products L.P. Agentless method to bring solution and cluster awareness into infrastructure and support management portals
US20190303137A1 (en) 2018-03-28 2019-10-03 Dell Products L.P. System and method for out-of-the-box solution-level management via logical architecture awareness
US20190324841A1 (en) * 2018-04-24 2019-10-24 EMC IP Holding Company LLC System and method to predictively service and support the solution
US20200034069A1 (en) * 2015-02-03 2020-01-30 Netapp Inc. Monitoring storage cluster elements
US20200079403A1 (en) * 2017-12-21 2020-03-12 Hitachi, Ltd. Control arrangements for maintenance of a collection of physical devices and methods for controlling maintenance of a collection of physical devices
US10944561B1 (en) 2018-05-14 2021-03-09 Amazon Technologies Inc. Policy implementation using security tokens

Patent Citations (169)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5483637A (en) * 1994-06-27 1996-01-09 International Business Machines Corporation Expert based system and method for managing error events in a local area network
US5867714A (en) 1996-10-31 1999-02-02 Ncr Corporation System and method for distributing configuration-dependent software revisions to a computer system
US6012152A (en) 1996-11-27 2000-01-04 Telefonaktiebolaget Lm Ericsson (Publ) Software fault management system
US6205409B1 (en) * 1998-06-26 2001-03-20 Advanced Micro Devices, Inc. Predictive failure monitoring system for a mass flow controller
US6317028B1 (en) 1998-07-24 2001-11-13 Electronic Security And Identification Llc Electronic identification, control, and security system and method for consumer electronics and the like
US7370102B1 (en) * 1998-12-15 2008-05-06 Cisco Technology, Inc. Managing recovery of service components and notification of service errors and failures
US6871224B1 (en) * 1999-01-04 2005-03-22 Cisco Technology, Inc. Facility to transmit network management data to an umbrella management system
US6633782B1 (en) 1999-02-22 2003-10-14 Fisher-Rosemount Systems, Inc. Diagnostic expert in a process control system
US6742141B1 (en) * 1999-05-10 2004-05-25 Handsfree Networks, Inc. System for automated problem detection, diagnosis, and resolution in a software driven system
US6473794B1 (en) * 1999-05-27 2002-10-29 Accenture Llp System for establishing plan to test components of web based framework by displaying pictorial representation and conveying indicia coded components of existing network framework
US6795935B1 (en) 1999-10-28 2004-09-21 General Electric Company Diagnosis of faults in a complex system
US6606744B1 (en) * 1999-11-22 2003-08-12 Accenture, Llp Providing collaborative installation management in a network-based supply chain environment
US20140281675A1 (en) 2000-03-16 2014-09-18 Sony Computer Entertainment America Llc Flexible failover policies in high availability computing systems
US20040078683A1 (en) 2000-05-05 2004-04-22 Buia Christhoper A. Systems and methods for managing and analyzing faults in computer networks
US20030149919A1 (en) * 2000-05-05 2003-08-07 Joseph Greenwald Systems and methods for diagnosing faults in computer networks
US20050120112A1 (en) * 2000-11-15 2005-06-02 Robert Wing Intelligent knowledge management and content delivery system
US20060117212A1 (en) * 2001-02-13 2006-06-01 Network Appliance, Inc. Failover processing in a storage system
US7827136B1 (en) 2001-09-20 2010-11-02 Emc Corporation Management for replication of data stored in a data storage environment including a system and method for failover protection of software agents operating in the environment
US7886031B1 (en) 2002-06-04 2011-02-08 Symantec Operating Corporation SAN configuration utility
US7334222B2 (en) 2002-09-11 2008-02-19 International Business Machines Corporation Methods and apparatus for dependency-based impact simulation and vulnerability analysis
US20040088145A1 (en) 2002-11-06 2004-05-06 Rosenthal Richard Edwin Methods and apparatus for designing the racking and wiring configurations for pieces of hardware
US20040177168A1 (en) 2003-03-03 2004-09-09 Microsoft Corporation Verbose hardware identification for binding a software package to a computer system having tolerance for hardware changes
US20040177354A1 (en) 2003-03-03 2004-09-09 Microsoft Corporation Compact hardware indentification for binding a software package to a computer system having tolerance for hardware changes
US20050144151A1 (en) 2003-04-02 2005-06-30 Fischman Reuben S. System and method for decision analysis and resolution
US20040225381A1 (en) * 2003-05-07 2004-11-11 Andrew Ritz Programmatic computer problem diagnosis and resolution and automated reporting and updating of the same
US20040250260A1 (en) 2003-06-09 2004-12-09 Pioso Bennett G. Middle-ware interface status tool and method for using same
US20050033770A1 (en) * 2003-08-07 2005-02-10 Oglesby Katheryn E. Dynamically evolving memory recall and idea generation tool
US7831693B2 (en) 2003-08-18 2010-11-09 Oracle America, Inc. Structured methodology and design patterns for web services
US20060149408A1 (en) * 2003-10-10 2006-07-06 Speeter Thomas H Agent-less discovery of software components
US20060179116A1 (en) 2003-10-10 2006-08-10 Speeter Thomas H Configuration management system and method of discovering configuration data
US20050078656A1 (en) 2003-10-14 2005-04-14 Bryant Stewart Frederick Method and apparatus for generating routing information in a data communications network
US7103874B2 (en) 2003-10-23 2006-09-05 Microsoft Corporation Model-based management of computer systems and distributed applications
US7222127B1 (en) 2003-11-14 2007-05-22 Google Inc. Large scale machine learning systems and methods
US20050144188A1 (en) 2003-12-16 2005-06-30 International Business Machines Corporation Determining the impact of a component failure on one or more services
US7516362B2 (en) 2004-03-19 2009-04-07 Hewlett-Packard Development Company, L.P. Method and apparatus for automating the root cause analysis of system failures
US8290970B2 (en) 2004-06-29 2012-10-16 Hewlett-Packard Development Company, L.P. System and method for offering one or more drivers to run on the computer
US8001527B1 (en) 2004-12-21 2011-08-16 Zenprise, Inc. Automated root cause analysis of problems associated with software application deployments
US7490073B1 (en) 2004-12-21 2009-02-10 Zenprise, Inc. Systems and methods for encoding knowledge for automated management of software application deployments
US20060178864A1 (en) 2005-02-08 2006-08-10 Madhavi Khanijo Automated system and method for configuring a rack assembly
US20060235962A1 (en) * 2005-04-15 2006-10-19 Microsoft Corporation Model-based system monitoring
US20080037532A1 (en) 2005-08-20 2008-02-14 Sykes Edward A Managing service levels on a shared network
US7536595B1 (en) 2005-10-19 2009-05-19 At&T Intellectual Property, Ii, L.P. Systems, devices, and methods for initiating recovery
US20080201470A1 (en) * 2005-11-11 2008-08-21 Fujitsu Limited Network monitor program executed in a computer of cluster system, information processing method and computer
US7500142B1 (en) * 2005-12-20 2009-03-03 International Business Machines Corporation Preliminary classification of events to facilitate cause-based analysis
US20070143851A1 (en) * 2005-12-21 2007-06-21 Fiberlink Method and systems for controlling access to computing resources based on known security vulnerabilities
US20080065700A1 (en) 2005-12-29 2008-03-13 Blue Jungle Analyzing Usage Information of an Information Management System
US20120233216A1 (en) * 2005-12-29 2012-09-13 Nextlabs, Inc. Intelligent Policy Deployment
US20070202469A1 (en) * 2006-02-13 2007-08-30 The Boeing Company System for trouble shooting and controlling signals to and from an aircraft simulator
US20070283011A1 (en) 2006-06-02 2007-12-06 Google Inc. Synchronizing Configuration Information Among Multiple Clients
US20100318487A1 (en) 2006-09-27 2010-12-16 Marvasti Mazda A Self-learning integrity management system and related methods
US20080228755A1 (en) 2007-03-14 2008-09-18 Futoshi Haga Policy creation support method, policy creation support system, and program therefor
US20080262860A1 (en) 2007-04-20 2008-10-23 Sap Ag System and Method for Supporting Software
US20090012805A1 (en) 2007-07-06 2009-01-08 Microsoft Corporation Portable Digital Rights for Multiple Devices
US7757124B1 (en) 2007-07-16 2010-07-13 Oracle America, Inc. Method and system for automatic correlation of asynchronous errors and stimuli
US20090113248A1 (en) 2007-10-26 2009-04-30 Megan Elena Bock Collaborative troubleshooting computer systems using fault tree analysis
US20090165099A1 (en) 2007-12-21 2009-06-25 Avigdor Eldar Provisioning active management technology (amt) in computer systems
US8826077B2 (en) 2007-12-28 2014-09-02 International Business Machines Corporation Defining a computer recovery process that matches the scope of outage including determining a root cause and performing escalated recovery operations
US7987353B2 (en) 2008-01-09 2011-07-26 International Business Machines Corporation Remote BIOS for servers and blades
US20090183010A1 (en) 2008-01-14 2009-07-16 Microsoft Corporation Cloud-Based Movable-Component Binding
US8639798B2 (en) 2008-01-21 2014-01-28 International Business Machines Corporation Managing configuration items
US20090260071A1 (en) 2008-04-14 2009-10-15 Microsoft Corporation Smart module provisioning of local network devices
US20090282283A1 (en) * 2008-05-09 2009-11-12 Hitachi, Ltd. Management server in information processing system and cluster management method
US20090307333A1 (en) 2008-06-05 2009-12-10 Palm, Inc. Restoring of data to mobile computing device
US20100024001A1 (en) 2008-07-25 2010-01-28 International Business Machines Corporation Securing Blade Servers In A Data Center
US20100057677A1 (en) 2008-08-27 2010-03-04 Sap Ag Solution search for software support
US8166552B2 (en) 2008-09-12 2012-04-24 Hytrust, Inc. Adaptive configuration management system
US20100325493A1 (en) * 2008-09-30 2010-12-23 Hitachi, Ltd. Root cause analysis method, apparatus, and program for it apparatuses from which event information is not obtained
US20110302305A1 (en) * 2008-09-30 2011-12-08 Tomohiro Morimura Root cause analysis method, apparatus, and program for it apparatuses from which event information is not obtained
US20110270482A1 (en) 2008-12-17 2011-11-03 Airbus Operations Gmbh Adaptive central maintenance system and method for planning maintenance operations for systems
US20100180221A1 (en) 2009-01-14 2010-07-15 Microsoft Corporation Configuration Creation for Deployment and Monitoring
US20100229022A1 (en) * 2009-03-03 2010-09-09 Microsoft Corporation Common troubleshooting framework
US20100306489A1 (en) * 2009-05-29 2010-12-02 Cray Inc. Error management firewall in a multiprocessor computer
US20100312522A1 (en) 2009-06-04 2010-12-09 Honeywell International Inc. Method and system for identifying systemic failures and root causes of incidents
US8386930B2 (en) 2009-06-05 2013-02-26 International Business Machines Corporation Contextual data center management utilizing a virtual environment
US20120083917A1 (en) * 2009-06-10 2012-04-05 Fisher-Rosemount Systems, Inc. Predicted fault analysis
US20120166142A1 (en) * 2009-09-07 2012-06-28 Hitachi, Ltd. Anomaly Detection and Diagnosis/Prognosis Method, Anomaly Detection and Diagnosis/Prognosis System, and Anomaly Detection and Diagnosis/Prognosis Program
US20110078428A1 (en) 2009-09-30 2011-03-31 Memory Experts International Inc. Portable desktop device and method of host computer system hardware recognition and configuration
US20110093703A1 (en) 2009-10-16 2011-04-21 Etchegoyen Craig S Authentication of Computing and Communications Hardware
US8401982B1 (en) 2010-01-14 2013-03-19 Symantec Corporation Using sequencing and timing information of behavior events in machine learning to detect malware
US8868987B2 (en) 2010-02-05 2014-10-21 Tripwire, Inc. Systems and methods for visual correlation of log events, configuration changes and conditions producing alerts in a virtual infrastructure
US8995439B2 (en) 2010-05-13 2015-03-31 Comcast Cable Communications, Llc Control of multicast content distribution
US20110289343A1 (en) * 2010-05-21 2011-11-24 Schaefer Diane E Managing the Cluster
US20110289342A1 (en) * 2010-05-21 2011-11-24 Schaefer Diane E Method for the file system of figure 7 for the cluster
US20120271927A1 (en) * 2010-06-23 2012-10-25 Bulat Shakirzyanov System and method for managing a computing cluster
US9590849B2 (en) * 2010-06-23 2017-03-07 Twilio, Inc. System and method for managing a computing cluster
US20130151975A1 (en) 2010-09-07 2013-06-13 Tomer Shadi System and method for automated deployment of multi-component computer environment
US20120096272A1 (en) 2010-10-15 2012-04-19 Rockwell Automation Technologies, Inc. Security model for industrial devices
US9278481B2 (en) 2010-10-26 2016-03-08 Rinco Ultrasononics USA, INC. Sonotrode and anvil energy director grids for narrow/complex ultrasonic welds of improved durability
US20120041976A1 (en) 2010-10-26 2012-02-16 ParElastic Corporation Mechanism for co-located data placement in a parallel elastic database management system
US20120110142A1 (en) 2010-10-29 2012-05-03 Bank Of America Corporation Configuration management utility
US20120144244A1 (en) 2010-12-07 2012-06-07 Yie-Fong Dan Single-event-upset controller wrapper that facilitates fault injection
US20120150926A1 (en) 2010-12-08 2012-06-14 International Business Machines Corporation Distributed free block map for a clustered redirect-on-write file system
US20120182151A1 (en) 2011-01-19 2012-07-19 Hon Hai Precision Industry Co., Ltd. Server rack having payload weighing function
US20160112504A1 (en) * 2011-01-28 2016-04-21 Netapp, Inc. Proposed storage system solution selection for service level objective management
US9122739B1 (en) * 2011-01-28 2015-09-01 Netapp, Inc. Evaluating proposed storage solutions
US9594620B2 (en) * 2011-04-04 2017-03-14 Microsoft Technology Licensing, Llc Proactive failure handling in data processing systems
US20120265872A1 (en) 2011-04-18 2012-10-18 Cox Communications, Inc. Systems and Methods of Automatically Remediating Fault Conditions
US9201751B1 (en) 2011-04-18 2015-12-01 American Megatrends, Inc. Data migration between multiple tiers in a storage system using policy based ILM for QOS
US8874892B1 (en) 2011-05-26 2014-10-28 Phoenix Technologies Ltd. Assessing BIOS information prior to reversion
US20120331526A1 (en) 2011-06-22 2012-12-27 TerraWi, Inc. Multi-level, hash-based device integrity checks
US8774054B2 (en) 2011-08-01 2014-07-08 Huawei Technologies Co., Ltd. Network policy configuration method, management device, and network management center device
US20140082417A1 (en) 2011-08-03 2014-03-20 Honeywell International Inc. Systems and methods for using a corrective action as diagnostic evidence
US8583769B1 (en) 2011-08-16 2013-11-12 Edgecast Networks, Inc. Configuration management repository for a distributed platform
US20130326029A1 (en) * 2011-11-11 2013-12-05 Level 3 Communications, Llc System and methods for configuration management
US8938621B2 (en) 2011-11-18 2015-01-20 Qualcomm Incorporated Computing device integrity protection
US8973118B2 (en) 2011-12-14 2015-03-03 Cellco Partnership Token based security protocol for managing access to web services
US20130185667A1 (en) 2012-01-18 2013-07-18 International Business Machines Corporation Open resilience framework for simplified and coordinated orchestration of multiple availability managers
US9864634B2 (en) 2012-02-06 2018-01-09 International Business Machines Corporation Enhancing initial resource allocation management to provide robust reconfiguration
US9323789B1 (en) 2012-03-14 2016-04-26 Emc Corporation Automated application protection and reuse using a workflow component
US20130257627A1 (en) * 2012-03-29 2013-10-03 Yokogawa Electric Corporation Apparatus and method for determining operation compatibility between field devices
US20130317870A1 (en) 2012-05-25 2013-11-28 Bank Of America Apparatus and methods for process change control
US20140069291A1 (en) 2012-09-11 2014-03-13 Shenzhen China Star Optoelectronics Technology Co., Ltd Transmission system of LCD panel and automatic crane thereof
US9898224B1 (en) 2012-09-12 2018-02-20 EMC IP Holding Company LLC Automatic adjustment of capacity usage by data storage optimizer for data migration
US9355036B2 (en) 2012-09-18 2016-05-31 Netapp, Inc. System and method for operating a system to cache a networked file system utilizing tiered storage and customizable eviction policies based on priority and tiers
US20140115176A1 (en) * 2012-10-22 2014-04-24 Cassidian Communications, Inc. Clustered session management
US9542177B1 (en) 2012-10-30 2017-01-10 Amazon Technologies, Inc. Peer configuration analysis and enforcement
US9229902B1 (en) 2013-02-14 2016-01-05 Amazon Technologies, Inc. Managing update deployment
US20140245085A1 (en) * 2013-02-22 2014-08-28 International Business Machines Corporation Managing error logs in a distributed network fabric
US20140304402A1 (en) * 2013-04-06 2014-10-09 Citrix Systems, Inc. Systems and methods for cluster statistics aggregation
US20140304399A1 (en) * 2013-04-06 2014-10-09 Citrix Systems, Inc. Systems and methods for providing monitoring in a cluster system
US20140324276A1 (en) 2013-04-30 2014-10-30 Cummins Inc. Engine diagnostic system for high volume feedback processing
US20140337957A1 (en) 2013-05-07 2014-11-13 Dannie Gerrit Feekes Out-of-band authentication
US20150120359A1 (en) 2013-05-13 2015-04-30 Fulcrum Collaborations, Llc System and Method for Integrated Mission Critical Ecosystem Management
US20140344101A1 (en) * 2013-05-14 2014-11-20 International Business Machines Corporation Automated guidance for selecting components of an it solution
US9999030B2 (en) 2013-08-15 2018-06-12 Huawei Technologies Co., Ltd. Resource provisioning method
US20150117174A1 (en) * 2013-10-31 2015-04-30 Oracle International Corporation Media and drive validation in tape libraries
US9729615B2 (en) 2013-11-18 2017-08-08 Nuwafin Holdings Ltd System and method for collaborative designing, development, deployment, execution, monitoring and maintenance of enterprise applications
US20150149822A1 (en) * 2013-11-26 2015-05-28 International Business Machines Corporation Event handling in storage area networks
US20150256394A1 (en) 2014-03-06 2015-09-10 Dell Products, Lp System and Method for Providing a Data Center Management Controller
US20170017881A1 (en) * 2014-03-28 2017-01-19 Casebank Technologies Inc. Methods and systems for troubleshooting problems in complex systems using multiple knowledgebases
US20150324255A1 (en) 2014-05-09 2015-11-12 Commvault Systems, Inc. Load balancing across multiple data paths
US20170032091A1 (en) 2014-05-20 2017-02-02 Siemens Healthcare Diagnostics Inc. Intelligent Service Assistant Inference Engine
US10097620B2 (en) 2014-07-11 2018-10-09 Vmware Inc. Methods and apparatus to provision a workload in a virtual server rack deployment
US20160042288A1 (en) * 2014-08-11 2016-02-11 International Business Machines Corporation Mapping user actions to historical paths to determine a predicted endpoint
US20160048611A1 (en) 2014-08-15 2016-02-18 Vce Company, Llc System, Method, Apparatus, and Computer Program Product for Generation of an Elevation Plan for a Computing System
US20160050222A1 (en) * 2014-08-18 2016-02-18 Bank Of America Corporation Modification of Computing Resource Behavior Based on Aggregated Monitoring Information
US20160057009A1 (en) 2014-08-21 2016-02-25 Netapp, Inc. Configuration of peered cluster storage environment organized as disaster recovery group
US9122501B1 (en) 2014-09-08 2015-09-01 Quanta Computer Inc. System and method for managing multiple bios default configurations
US20160110240A1 (en) * 2014-10-17 2016-04-21 Netapp Inc. Forensics collection for failed storage controllers
US20170206128A1 (en) 2014-11-06 2017-07-20 International Business Machines Corporation Cognitive analysis for healing an it system
US10057184B1 (en) 2014-11-10 2018-08-21 Amazon Technologies, Inc. Resource configuration compliance service
US20160173690A1 (en) 2014-12-12 2016-06-16 Xerox Corporation Spectral diagnostic engine for customer support call center
US20180034709A1 (en) 2015-01-08 2018-02-01 Zte Corporation Method and Device for Asset Information Management
US20200034069A1 (en) * 2015-02-03 2020-01-30 Netapp Inc. Monitoring storage cluster elements
US20170339005A1 (en) 2015-02-10 2017-11-23 Huawei Technologies Co., Ltd. Method and Device for Processing Failure in at Least One Distributed Cluster, and System
US20180025166A1 (en) 2015-02-11 2018-01-25 British Telecommunications Public Limited Company Validating computer resource usage
US20180041388A1 (en) 2015-03-13 2018-02-08 Koninklijke Kpn N.V. Method and Control System for Controlling Provisioning of a Service in a Network
US9225625B1 (en) 2015-03-26 2015-12-29 Linkedin Corporation Detecting and alerting performance degradation during features ramp-up
US20160294643A1 (en) 2015-04-03 2016-10-06 Electronics And Telecommunications Research Institute System and method for service orchestration in distributed cloud environment
US20160302323A1 (en) 2015-04-09 2016-10-13 Ortronics, Inc. Equipment Cabinet and Associated Methods
US20170085644A1 (en) * 2015-09-22 2017-03-23 Netapp, Inc. Methods and systems for selecting compatible resources in networked storage environments
US10048996B1 (en) 2015-09-29 2018-08-14 Amazon Technologies, Inc. Predicting infrastructure failures in a data center for hosted service mitigation actions
US20170094003A1 (en) 2015-09-30 2017-03-30 Symantec Corporation Preventing data corruption due to pre-existing split brain
US9384082B1 (en) * 2015-10-23 2016-07-05 Pure Storage, Inc. Proactively providing corrective measures for storage arrays
US20170242740A1 (en) * 2016-02-22 2017-08-24 International Business Machines Corporation User interface error prediction
US20190149408A1 (en) 2016-08-05 2019-05-16 Huawei Technologies Co., Ltd. Method and apparatus for deploying service in virtualized network
US20180285009A1 (en) 2017-03-30 2018-10-04 Intel Corporation Dynamically composable computing system, a data center, and method for dynamically composing a computing system
US20180302277A1 (en) 2017-04-12 2018-10-18 Cisco Technology, Inc. Virtualized network functions and service chaining in serverless computing infrastructure
US20180321934A1 (en) 2017-05-05 2018-11-08 Dell Products L.P. Infrastructure configuration and inventory manager
US20180322019A1 (en) 2017-05-05 2018-11-08 Pivotal Software, Inc. Backup and restore framework for distributed computing systems
US20180329579A1 (en) 2017-05-10 2018-11-15 Dell Products L.P. High Volume Configuration of Information Handling Systems
US20190123985A1 (en) * 2017-10-25 2019-04-25 Cisco Technology, Inc. Federated network and application data analytics platform
US20190182105A1 (en) 2017-12-08 2019-06-13 At&T Mobility Ii Llc System facilitating prediction, detection and mitigation of network or device issues in communication systems
US20200079403A1 (en) * 2017-12-21 2020-03-12 Hitachi, Ltd. Control arrangements for maintenance of a collection of physical devices and methods for controlling maintenance of a collection of physical devices
US20190306013A1 (en) 2018-03-28 2019-10-03 Dell Products L.P. Agentless method to bring solution and cluster awareness into infrastructure and support management portals
US20190303137A1 (en) 2018-03-28 2019-10-03 Dell Products L.P. System and method for out-of-the-box solution-level management via logical architecture awareness
US10514907B2 (en) 2018-03-28 2019-12-24 EMC IP Holding Company LLC System and method for out-of-the-box solution-level management via logical architecture awareness
US20190324841A1 (en) * 2018-04-24 2019-10-24 EMC IP Holding Company LLC System and method to predictively service and support the solution
US10944561B1 (en) 2018-05-14 2021-03-09 Amazon Technologies Inc. Policy implementation using security tokens

Non-Patent Citations (12)

* Cited by examiner, † Cited by third party
Title
"Dell DRAC—Wikipedia"; XP055602141; Mar. 23, 2018; https://en.wikipedia.org/w/index.php?title=Dell_DRAC&oldid=831957421.
"Dell EMC OpenManage Essentials Version 2.3: User's Guide"; XP055602720; Oct. 1, 2017; https://topics-cdn.dell.com/pdf/openmanage-essentials-v23 users-guide en-us.pdf.
"Integrated Dell Remote Access Controller 8 (iDRAC8)", Version 2.05.05.05 User's Guide, Dell Inc., Dec. 2014 (348 pages).
Coulouris et al.; "Distributed Systems: Concepts and Design, Fifth Edition"; Addison-Wesley; pp. 37-61; 2012 (27 pages).
Duncan Tweed; "Baseline configuration"; BMC Software, Inc.; Apr. 7, 2015; retrieved from https://bmc.com/.
Duncan Tweed; "BMC Atrium Discovery User Guide"; BMC Software, Inc.; Mar. 2014; retrieved from https://bmc.com/.
Extended European Search Report issued in corresponding European Application No. 18200661.9 dated Apr. 1, 2019. (9 pages).
Extended European Search Report issued in corresponding European Application No. 19151952.9, dated Jul. 1, 2019.
George Coulouris et al, Dell EMC Data Protection Advisor: Distributed Systems Concepts and Design, Dell EMC Data Protection Advisor: Distributed Systems Concepts and Design, retrieved Nov. 29, 2020, pp. 1-25 (of 488) https://web.archive.org/web/20201129060023/https://www.delltechnologies.com/en-us/collaterals/unauth/technical-guides-support-inforrnation/products/networking-4/docu82478.pdf, Report Reference Guide 302-003-605 Rev 1 (2017) 27 pages.
Iler, Doug, et al., "Introducing iDRAC8 with Lifecycle Controller for Dell 13th Generation PowerEdge Servers", A Dell Deployment and Configuration Guide, Dell Inc., Sep. 2014 (16 pages).
Liang, "ClusterProbe: An Open, Flexible and Scalable Cluster Monitoring Tool", 1999, IEEE Computer Society International Workshop on Cluster Computing, pp. 1-8 (Year: 1999). *
Masoom Parvez; "AutomaticGroup Node"; BMC Software, Inc.; 2014; retrieved from https://bmc.com/.

Also Published As

Publication number Publication date
US20190324873A1 (en) 2019-10-24

Similar Documents

Publication Publication Date Title
US10795756B2 (en) System and method to predictively service and support the solution
US10684791B2 (en) System and method for environment aware backup and restoration
US9619311B2 (en) Error identification and handling in storage area networks
US10379838B1 (en) Update and rollback of code and API versions
US10725763B1 (en) Update and rollback of configurations in a cloud-based architecture
US11960873B2 (en) System and method for managing a model for solving issues using a set of actions performed on the client environment
US11086738B2 (en) System and method to automate solution level contextual support
US10977113B2 (en) System and method for fault identification, logging, and remediation
US11126504B2 (en) System and method for dynamic configuration of backup agents
US10754368B1 (en) Method and system for load balancing backup resources
US10936192B2 (en) System and method for event driven storage management
US10628170B1 (en) System and method for device deployment
US20240394149A1 (en) System and method for managing automatic service requests for workload management
US20240248751A1 (en) System and method for managing a migration of a production environment executing logical devices
US11494250B1 (en) Method and system for variable level of logging based on (long term steady state) system error equilibrium
US11934820B2 (en) System and method for managing a model for solving issues relating to application upgrades in a customer environment
US12141293B2 (en) Method and system for proactively detecting and filtering vulnerabilities of an application upgrade before performing the application upgrade
US20200012570A1 (en) System and method for resilient backup generation
US11422899B2 (en) System and method for an application container evaluation based on container events
US11321185B2 (en) Method to detect and exclude orphaned virtual machines from backup
US12253933B2 (en) Predictive load driven proactive pre-flight check for applications
US10942779B1 (en) Method and system for compliance map engine
US20240241767A1 (en) System and method for managing resource elasticity for a production environment with logical devices
US11068332B2 (en) Method and system for automatic recovery of a system based on distributed health monitoring
US11953992B2 (en) Device modification analysis framework

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: EMC IP HOLDING COMPANY LLC, MASSACHUSETTS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PATEL, DHARMESH M.;CHAGANTI, RAVIKANTH;ALI, RIZWAN;SIGNING DATES FROM 20180417 TO 20180423;REEL/FRAME:045645/0157

AS Assignment

Owner name: CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH, AS COLLAT

Free format text: PATENT SECURITY AGREEMENT (CREDIT);ASSIGNORS:DELL PRODUCTS L.P.;EMC CORPORATION;EMC IP HOLDING COMPANY LLC;REEL/FRAME:046286/0653

Effective date: 20180529

Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., A

Free format text: PATENT SECURITY AGREEMENT (NOTES);ASSIGNORS:DELL PRODUCTS L.P.;EMC CORPORATION;EMC IP HOLDING COMPANY LLC;REEL/FRAME:046366/0014

Effective date: 20180529

Owner name: CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH, AS COLLATERAL AGENT, NORTH CAROLINA

Free format text: PATENT SECURITY AGREEMENT (CREDIT);ASSIGNORS:DELL PRODUCTS L.P.;EMC CORPORATION;EMC IP HOLDING COMPANY LLC;REEL/FRAME:046286/0653

Effective date: 20180529

Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT, TEXAS

Free format text: PATENT SECURITY AGREEMENT (NOTES);ASSIGNORS:DELL PRODUCTS L.P.;EMC CORPORATION;EMC IP HOLDING COMPANY LLC;REEL/FRAME:046366/0014

Effective date: 20180529

AS Assignment

Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., T

Free format text: SECURITY AGREEMENT;ASSIGNORS:CREDANT TECHNOLOGIES, INC.;DELL INTERNATIONAL L.L.C.;DELL MARKETING L.P.;AND OTHERS;REEL/FRAME:049452/0223

Effective date: 20190320

Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., TEXAS

Free format text: SECURITY AGREEMENT;ASSIGNORS:CREDANT TECHNOLOGIES, INC.;DELL INTERNATIONAL L.L.C.;DELL MARKETING L.P.;AND OTHERS;REEL/FRAME:049452/0223

Effective date: 20190320

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

AS Assignment

Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., TEXAS

Free format text: SECURITY AGREEMENT;ASSIGNORS:CREDANT TECHNOLOGIES INC.;DELL INTERNATIONAL L.L.C.;DELL MARKETING L.P.;AND OTHERS;REEL/FRAME:053546/0001

Effective date: 20200409

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: EMC IP HOLDING COMPANY LLC, TEXAS

Free format text: RELEASE OF SECURITY INTEREST AT REEL 046286 FRAME 0653;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058298/0093

Effective date: 20211101

Owner name: EMC CORPORATION, MASSACHUSETTS

Free format text: RELEASE OF SECURITY INTEREST AT REEL 046286 FRAME 0653;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058298/0093

Effective date: 20211101

Owner name: DELL PRODUCTS L.P., TEXAS

Free format text: RELEASE OF SECURITY INTEREST AT REEL 046286 FRAME 0653;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058298/0093

Effective date: 20211101

AS Assignment

Owner name: EMC IP HOLDING COMPANY LLC, TEXAS

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (046366/0014);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:060450/0306

Effective date: 20220329

Owner name: EMC CORPORATION, MASSACHUSETTS

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (046366/0014);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:060450/0306

Effective date: 20220329

Owner name: DELL PRODUCTS L.P., TEXAS

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (046366/0014);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:060450/0306

Effective date: 20220329

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4

OSZAR »