US11030009B2 - Systems and methods for automatically scaling compute resources based on demand - Google Patents
Systems and methods for automatically scaling compute resources based on demand Download PDFInfo
- Publication number
- US11030009B2 US11030009B2 US16/368,122 US201916368122A US11030009B2 US 11030009 B2 US11030009 B2 US 11030009B2 US 201916368122 A US201916368122 A US 201916368122A US 11030009 B2 US11030009 B2 US 11030009B2
- Authority
- US
- United States
- Prior art keywords
- compute
- resources
- threshold value
- group
- compute resources
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
- 238000000034 method Methods 0.000 title claims abstract description 70
- 238000004891 communication Methods 0.000 description 21
- 230000007423 decrease Effects 0.000 description 21
- 230000000875 corresponding effect Effects 0.000 description 19
- 238000012360 testing method Methods 0.000 description 13
- 238000013341 scale-up Methods 0.000 description 10
- 238000004364 calculation method Methods 0.000 description 9
- 239000003795 chemical substances by application Substances 0.000 description 7
- 230000008520 organization Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000013468 resource allocation Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 238000007726 management method Methods 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 230000002730 additional effect Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5022—Mechanisms to release resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5022—Workload threshold
Definitions
- aspects of the present disclosure are directed to systems and methods for auto-scaling compute resources based on demand.
- Container-based virtualization or containerization is an alternative technology to the more traditional hypervisor based virtualization.
- software applications/programs are executed within ‘containers’.
- Each container includes not only the application that needs to be executed but everything needed to run the application including runtime, system libraries, system tools, and settings. Accordingly, each container can be considered a deployable unit of software that packages up code and all its dependencies so that an application can run quickly and reliably from one computing environment to another.
- container-based virtualization multiple containers share the hardware resources of a single operating system.
- container orchestration systems To manage the creation, destruction, deployment and scaling of containers, a number of container orchestration systems have been introduced. These include, e.g., Kubernetes, Docker Swarm, Nomad, etc. Most of these container orchestration systems offer some sort of auto-scaling capabilities—i.e., they are configured to monitor demand and automatically increase/decrease the available compute resources (i.e., processor and/or memory) for containers based on the monitored demand. However, most auto-scaling capabilities offered by known container orchestration systems are configured to increase or decrease compute resources gradually and linearly.
- auto-scaling capabilities may be suitable for long-term applications or for situations where demand increases or decreases gradually over time, they are often unsuitable for short-running applications (e.g., deployments) and/or for cases where demand increases and decreases sharply and sporadically during the day. In such cases more responsive auto-scaling is desirable.
- FIG. 1 is a block diagram of a networked environment according to some embodiments of the present disclosure.
- FIG. 2 is a block diagram of an example orchestration system in communication with the scaling manager.
- FIG. 3 is a flowchart illustrating an example method for scaling resources based on demand according to some embodiments of the present disclosure.
- FIG. 4A is a flowchart illustrating an example method for scaling resources based on demand according to some embodiments of the present disclosure.
- FIG. 4B is a flowchart illustrating an example method for scaling resources based on demand according to some embodiments of the present disclosure.
- FIG. 4C is a flowchart illustrating an example method for scaling resources based on demand according to some embodiments of the present disclosure.
- FIG. 5 is a block diagram of an example computer system on which some aspects of the present disclosure can be implemented.
- embodiments of the present disclosure introduce a new auto-scaling method and system to optimize end-user experience and minimize compute resource costs.
- the presently disclosed auto-scaling systems and methods achieve this by providing a buffer capacity when calculating resource requirements thereby allowing the compute resources allocated to a particular project/organization (also referred to as a compute group herein) to scale-up before the compute resources reach capacity.
- the auto-scaling systems and methods calculate the capacity (e.g., processor and memory requirements) required to perform scheduled tasks and the actual capacity (e.g., the available processor and memory) available to determine the utilization of the assigned resources.
- the utilization is determined to be above a first threshold (which can be set to include the buffer capacity)
- the resources are scaled up—i.e., more computing resources are activated.
- the utilization is determined to be below a second threshold, the resources are scaled down—i.e., underutilized or unused computing resources are released. If the utilization is calculated to be within the first and second thresholds, no scale-up or scale-down action is taken.
- first and/or second thresholds may be programmed or predetermined based on the amount of buffer required in the event of a spike in scheduled jobs.
- unused physical or virtual machines may be marked such that no new jobs are assigned to these machines. Instead, new jobs may be assigned to other active but underutilized machines.
- the calculated utilization within a certain time period exceeds the first threshold, one or more of the marked machines (depending on requirement) may be unmarked such that they can once again accept new jobs.
- the calculated utilization does not exceed the first threshold for a certain time period, one or more of the marked machines may be released. This way some compute resources may be maintained in standby mode for a certain period of time so that they can quickly be utilized if demand ramps-up suddenly.
- the auto-scaling system can operate in different scale-down modes—fast or slow—depending on the rate at which the job load reduces.
- fast mode the auto-scaling system may mark or terminate more number of resources in one cycle to allow the resources to scale down quickly if demand reduces drastically.
- slow mode the auto-scaling system marks or terminates fewer resources in one cycle to allow resources to scale down more gradually.
- FIG. 1 illustrates an environment 100 in which one or more aspects of the present disclosure are implemented. Specifically, FIG. 1 illustrates the systems involved in automatically scaling compute resources based on real time demand.
- compute resources refer to physical or virtual machines that are allocated predetermined units of CPU and memory.
- the systems in environment 100 include a resource provider 102 , an orchestration system 104 , and a scaling manager 106 .
- the resource provider 102 , orchestration system 104 and scaling manager 106 communicate with each other over one or more communication networks 108 .
- the environment 100 further includes one or more resource requesting systems 110 . The following section describes each of these systems and then proceeds to describe how they interact with each other.
- the resource provider 102 provides infrastructure (i.e., the compute resources) required to execute scheduled jobs.
- the infrastructure may be provided via one or more on-premises data centers or one or more remote data centers hosted by a cloud service provider such as Amazon Web Services.
- the resource provider 102 may assign infrastructure in the form of physical machines or virtual machines.
- a resource requesting system 110 may communicate with the resource provider 102 and request the resource provider to assign certain resources (e.g., CPU and memory) to the resource requesting system 110 .
- the resource provider 102 in turn may then determine the number of physical and/or virtual machines that would be required to fulfil the desired CPU and memory requirements and assign these physical or virtual machines to the resource requesting system 110 .
- the collection of compute resources assigned to the resource requesting system 110 at any given time is called a compute group.
- the resource provider 102 is also configured to increase or decrease the compute resources assigned in a compute group. In certain cases, the resource provider 102 may be configured to automatically scale the compute resources in the compute group based on monitored demand. In other cases, the resource provider 102 may be configured to scale-up or scale-down the assigned number of assigned physical/virtual machines based on external instructions.
- the orchestration system 104 is configured to automate the assignment and management of scheduled jobs. In particular, it is configured to assign jobs to the physical/virtual machines provided by the resource provider 102 . To this end, the orchestration system 104 determines the virtual/physical machines assigned to a particular resource requesting system 110 and automatically assigns a scheduled job from the resource requesting system 110 to a virtual/physical machine assigned to that resource requesting system 110 or compute group. In addition, the orchestration system 104 is configured to manage job deployments and scale the underlying compute group based on demand.
- the orchestration system 104 is configured to receive job descriptors from the resource requesting system 110 , create containers based on the received job descriptors and launch these containers on the physical/virtual machines in a compute group. Typically, the orchestration system 104 launches containers on the underlying machines in a manner that distributes the load evenly among the active machines. Examples of orchestration systems include Kubernetes, Docker Swarm, Titus, Nomad, etc.
- the scaling manager 106 is configured to determine real time resource requirements and scale-up or scale-down the resources to meet the resource requirements and prevent under-utilization of resources.
- the scaling manager 106 is configured to determine the available resources in a compute group and the required compute capacity and calculate a utilization of the underlying resources. If the resource utilization exceeds a predetermined upper threshold, the scaling manager 106 instructs the resource provider 102 to assign more resources to compute group. Alternatively, if the utilization is below a predetermined lower threshold, the scaling manager 106 may instruct the resource provider to terminate certain resources in the compute group.
- the scaling manager 106 instead of requesting the resource provider 102 to terminate resources immediately, marks one or more unused physical/virtual machines such that the orchestration system 104 cannot assign any new containers to the marked physical/virtual machines. These machines remain marked for a certain period of time. If resource demand suddenly increases during this time period (i.e., the calculated utilization exceeds the upper threshold), the scaling manager 106 unmarks one or more of these physical/virtual machines thereby allowing the orchestration system 104 to once again assign containers to these machines. Alternatively, if the resource demand does not increase beyond the upper threshold during this time period, the scaling manager 106 requests the resource provider to terminate the marked physical/virtual machines.
- the scaling manager 106 communicates with the orchestration system 104 to collect information about active compute resources and resource requirements and communicates with the resource provider 102 to instruct the resource provider to scale-up or scale-down the underlying resources.
- the resource requesting system 110 can be any system that creates and/or manages jobs (e.g., synthetic tests, builds, deployments, etc.).
- the resource requesting system 110 communicates with the resource provider 102 to provision infrastructure and communicates with the orchestration system 104 to provision one or more containers for executing the jobs on the provisioned infrastructure.
- the resource requesting system 110 may be a continuous integration/continuous deployment (Cl/CD) tool such as Bitbucket Pipelines (offered by Atlassian, Inc.) that is configured to manage builds.
- the Cl/CD tool detects whether source code in a repository that is registered for continuous integration is updated, retrieves a build description associated with that source code from the repository, and creates a job description for initializing one or more containers to test and/or build the source code based on the build description.
- the job description typically specifies an allocation of resources to complete the job. In certain embodiments, if the allocation of resources is not specified, a default amount of memory and CPU may be allocated to the job request.
- the orchestration system 104 utilizes this specified resource allocation to determine which underlying machine to allocate the job to.
- the resource requesting system 110 may be a test management system that manages synthetic tests (e.g., Pollinator offered by Atlassian, Inc.).
- the test management system is typically responsible for receiving test requests from client devices, scheduling synthetic tests based on test parameters included in the requests, and communicating descriptors of scheduled tests to the orchestration system 104 .
- the test descriptors specify an allocation of resources to complete the test.
- the orchestration system 104 can then utilize the specified resource allocation to determine which underlying machine to allocate the tests to.
- the type of jobs that are suitable for the presently disclosed scaling manager 106 and method are typically short lived—i.e., they can typically be completed in minutes or hours. However, in some embodiments, the typical jobs managed by the orchestration system 104 may take longer to complete and the scaling manager 106 of the present disclosure would still be capable of scaling the underlying resources in a responsive and efficient manner.
- the communication network 108 is depicted as a single network in FIG. 1 for ease of depiction.
- the various systems illustrated in FIG. 1 may communicate with each other over different communication networks.
- the container management system 104 and the resource provider 102 may communicate through one communication network whereas the scaling manager 106 and the orchestration system 104 may communicate over a different communication network.
- the resource requesting systems 110 may communicate with the orchestration system 104 via a local network and with the resource provider 102 via a public network without departing from the scope of the present disclosure.
- the systems may communicate with each other over open web protocols such as (HTTPS, REST, and JWT).
- FIG. 2 illustrates a typical Kubernetes architecture 200 .
- an underlying compute resource i.e., a physical or virtual machine
- a node group 204 A cluster of such worker machines that are all assigned to the same compute group.
- a node group is an abstracted version of a compute group.
- Different resource requesting systems 110 may be assigned different node groups.
- Each node 202 in a particular node group 204 directly correlates with a corresponding compute resource assigned to the resource requesting system 110 by the resource provider 102 and in this disclosure the terms node and compute resource may be interchangeably used.
- each node 202 in the node group 204 contains the services necessary to run containers and is managed by a common node controller 206 .
- the node controller 206 typically manages a list of the nodes 202 in the node group 204 and synchronizes this list with the resource provider's list of machines assigned to that particular resource requesting system 110 .
- the node controller 206 may also be configured to communicate with the resource provider 102 from time to time to determine if an underlying machine is still available or not. If an underlying machine is not available, the controller 206 is configured to delete the corresponding node 202 from its list of nodes. In this manner, the node controller 206 is always aware of the infrastructure assigned to the node group by the resource provider 102 .
- Each node includes an agent 208 that is configured to ensure that containers are running within the node and a runtime 210 that is responsible for running the containers. With the help of the agent 208 and runtime 210 , one or more pods 212 may be launched on the active nodes 202 in a node group 204 .
- a pod 212 is the basic building block of Kubernetes.
- a pod 212 encapsulates one or more containers 214 , storage resources (not shown), and options that govern how the containers 214 should run.
- the node controller 206 can query the agent 208 running on each node 202 in the node group 204 to retrieve information about the nodes including the available resources on the node: the CPU, memory, and the maximum number of pods 212 that can be scheduled onto the node 202 at any given time. Further, the agent 208 can inform the controller 206 of all active pods on the node and the job requests scheduled for execution on the pods 212 .
- the scaling manager 106 may be executed within a container inside the node group 204 . In other implementations, the scaling manager 106 may be executed in a container outside the node group 204 . In any event, the scaling manager 106 can communicate with the node controller 206 to obtain information about the nodes and the pods from time to time. For instance, the scaling manager 106 can request the controller 206 to provide a list of all nodes and active pods in the node group 204 . Similarly, the scaling manager 206 may setup a “watch” on all the nodes and pods in the node group to receive a stream of updates for the nodes 202 and active pods 212 in the node group.
- the scaling manager 106 is configurable, i.e., the resource requesting system 110 can program the scaling manager 106 to be as responsive to variations in demand as required. This can be done via a number of programmable thresholds and rates. This section describes these programmable thresholds and rates.
- the requesting system 110 can decide when the scaling manager 106 scales up or scales down the resources by setting upper and lower threshold values.
- the upper threshold value corresponds to a utilization value for the entire system above which the scaling manager 106 increases the number of available resource (e.g., increases the size of the node group 204 ).
- the lower threshold value corresponds to the utilization value for the entire system below which the scaling manager 106 decreases the number of available resources (e.g., decrease the size of the node group 204 ).
- the upper and lower threshold values can be set as required.
- the requesting system 110 can introduce a buffer capacity in the scaling manager 106 by setting the upper threshold value to a value lower than 100%, e.g., 70%.
- the requesting system 110 may choose to set a higher upper threshold (e.g., 85%) to maintain a smaller buffer capacity or completely eliminate the buffer capacity by setting the upper threshold value to 100%.
- the lower threshold value can be set taking into consideration the costs associated with underutilized resources and at the same time the time and effort required to constantly terminate and start resources.
- the scaling manager 106 may mark the excess resources/nodes as ‘unschedulable’. To this end, the scaling manager 106 requests the node controller 206 to update the properties of the selected nodes to indicate that these nodes are unschedulable. This prevents the orchestration system 104 from assigning any new jobs to the marked compute resources and can also allow any jobs that are active on the marked nodes to terminate before the nodes are permanently deleted.
- the scaling manager 106 can scale down (or mark) the resources at a fast or slow rate. For instance, if the utilization decreases gradually between two checks, the scaling manager 106 can adopt a slow scale down rate where a predetermined number of resources/nodes are terminated or marked. Alternatively, if the utilization decreases significantly between two checks, e.g., because a large number of active jobs have completed and no other jobs are scheduled for execution, the scaling manager can adopt a fast scale down rate where a predetermined number of resources/nodes are terminated or marked. This predetermined number is of course higher than the predetermined number corresponding to the slow rate.
- slow and fast rates are also configurable.
- the requesting system 110 can configure two lower thresholds for marking nodes corresponding to the fast and slow rates. For instance, a requesting system 110 can configure a slow scaling threshold value, which corresponds to a utilization value for the entire system below which the scaling manager 106 marks a number of excess resources unschedulable at a slow rate. Similarly, the requesting system 110 can configure a fast scaling threshold value, which corresponds to a utilization value for the entire system below which the scaling manager 106 marks a number of resources unschedulable at a faster rate. For example, the slow scaling threshold value may be set as 40% utilization whereas the fast scaling threshold value may be set as 10%.
- the requesting system 110 can configure the predetermined number of resources/nodes to be marked or terminated in the fast and slow scale down modes.
- the rate for removing/marking nodes in the slow scale down mode can be set as 2 nodes/underlying compute resources, whereas the rate for removing/marking nodes in fast scale-down mode can be set as 5 nodes/underlying compute resources.
- the scaling manager 106 maintains a number of timers. These timers are started when corresponding actions are initiated and end at the end of a predefined time period. These timers include a node marking timer and a scale lock timer.
- the marked nodes can be considered as ‘standby nodes’ for a particular period of time set by the node marking timer. If utilization increases above the upper threshold amount during this period, one or more marked nodes may be unmarked such that new pods can be assigned to those nodes. Alternatively, if utilization does not increase over the upper threshold value during the standby period and any active pods on the nodes have terminated, the scaling manager 106 can instruct the resource provider to terminate the corresponding computing infrastructure once the mark node timer expires.
- the node marking timer can be initiated when a corresponding node is marked.
- the node marking timer can be set for 2 minutes in one example.
- a timeout timer may also be initiated when the corresponding node is marked. This timer is utilized for cases where, e.g., a job is executing on the marked node and has not completed executing even after the end of the node marking period and the node has not been unmarked. In such cases, the scaling manager 106 can instruct the resource provider to force delete the corresponding compute resource and any jobs or pods active on the compute resource after completion of the timeout timer. In one example, the timeout timer may be set to expire is 10 minutes after a node is marked as unschedulable.
- a scale lock is a mechanism that is used to ensure that any previously attempted scale up or scale down requests from the scaling manager 106 have been successfully completed by the resource provider 102 before requesting additional scaling-up or scaling-down of resources. This helps prevent an “infinite” scale-up or scale-down due to the delay it takes for the orchestration system 104 to know that additional resources have been added and to add corresponding nodes in the node group 204 or conversely that nodes have been terminated by the resource provider and to delete corresponding nodes from the node group 204 .
- the scaling manager 106 instructs the resource provider 102 to add 6 additional physical/virtual machines to increase the overall node group size by 6.
- the scaling manager 106 may execute another cycle of method 400 .
- the scaling manager 106 may request the resource provider 106 to add six additional resources in this cycle, causing the resource provider to add 12 resources where in reality only 6 additional resources were required.
- the scale lock can prevent this situation.
- the scale lock mechanism also prevents the scaling manager 106 from issuing any scale-down commands whilst the resource provider 102 is mid-way through assigning/launching new resources. As such, the scale lock mechanism allows the scaling activity to safely finish before performing any additional actions that affect the node group 202 .
- the scale lock can be applied when a scaling instruction is issued to the resource provider 102 .
- the scaling manager 106 can set a scale lock timer which is internally maintained by the scaling manager 106 .
- the scale lock timer may be configured to timeout after a predetermined period (e.g., 2 minutes). Typically, this period is set taking into consideration the amount of time required for the resource provider 102 to perform the corresponding action, for the resource provider to inform the orchestration system 104 and/or the scaling manager 106 that the action has been performed, and for the orchestration system 104 to update the list of resources (e.g., nodes) it maintains based on the corresponding action.
- the scaling manager 106 may be configured to retrieve a list of active resources maintained by the resource provider 102 and the orchestration system 104 . If the numbers of resources match, the scaling manager 106 determines that the scaling operation has been successfully completed and removes the scale lock.
- the scaling manager 106 may be configured to maintain the scale lock for an additional period of time.
- the scaling manager 106 can maintain a maximum timeout period timer (e.g., 10 minutes) as well. If for some reason (e.g., because the scaling manager 106 cannot reach the resource provider 102 to determine the number of resources maintained by the resource provider) the scale lock is not unlocked before the maximum timeout timer expires, the scaling manager 106 assumes that the scaling operation has filed and forcefully unlocks the scale lock.
- the scale lock timer and the maximum timeout timer are configurable—i.e., a resource requesting system 110 can set its own timeout periods.
- the timers may be set and configured using any known computational techniques.
- the timers may be maintained by storing an expiry/timeout time.
- the expiry time may be stored along with the properties/metadata associated with the node in the node controller 206 or agent 208 . The current time can be compared with the expiry time to determine whether the timer has expired or not.
- FIG. 3 describes an auto-scaling process according to some embodiments
- FIGS. 4A-4C describe an auto-scaling process according to other embodiments.
- some non-limiting implementation details of the methods will be described with reference to Kubernetes as the orchestration system 104 . Further, the methods of FIGS. 3 and 4 are repeated periodically (e.g., every 30 or 60 seconds).
- the method 300 begins at step 302 , where the scaling manager 106 determines the required compute capacity for a particular compute group in real time—i.e., it determines the CPU and memory required to complete the currently allocated job requests created by the resource requesting system 110 . To this end, the scaling manager 106 may request the orchestration system 104 to provide a list of the current job requests.
- the scaling manager 106 may fetch the status of all the nodes 202 and pods 212 in a node group 204 . For instance, in the Kubernetes environment, this may be done by using a standard Kubernetes API call (via Kubernetes REST interface) such as ‘get nodes’ and ‘get pods’. In another example, this information may be fetched continuously/periodically via the ‘watch’ function and may be stored in a cache in the scaling manager 106 . This status typically includes the job requests assigned to each of the active pods. As noted previously, each job request may specify the resource allocation required to complete that job. The scaling manager 106 can add the resource allocation requirements of all the job requests to determine the requested compute resources at that specified time.
- a standard Kubernetes API call via Kubernetes REST interface
- this information may be fetched continuously/periodically via the ‘watch’ function and may be stored in a cache in the scaling manager 106 .
- This status typically includes the job requests assigned to each
- a particular node group has 10 pods (each with a single container) and each container is requesting 500 m CPU and 100 mb memory allocation.
- the calculated total resource allocation requirements in this case would be 5000 m CPU and 1000 mb memory.
- the scaling manager 106 determines the allocable compute capacity of the compute group. In one example, the scaling manager 106 determines the available CPU and memory at each of the nodes 202 . To launch containers on the underlying machines, the orchestration system 104 is aware at all times of the available capacity at each of the underlying physical/virtual machines. Accordingly, at step 204 , the scaling manager 106 may obtain allocable capacity at each of the underlying machines from the orchestration system 104 .
- the capacity of each node 202 may be provided to the node controller 206 by the agent 208 and the scaling manager 106 may fetch this capacity information from the node controller 206 .
- this capacity information may be retrieved from the orchestration system 104 at the same time when status information for the active nodes and pods is collected. Further, the same or similar commands may be utilized to retrieve this information.
- the scaling manager 106 combines the capacity of each node to determine the total allocable resources of the underlying infrastructure. For example, consider that a particular node group has 2 nodes, each with allocable resources of 1000 m CPU and 4000 mb memory. The calculated total capacity of the node group in this case would be 2000 m CPU and 8000 mb memory.
- the scaling manager 106 determines the utilization of the compute group.
- the utilization is determined as a percentage of the total required capacity divided by the total available capacity.
- the higher of the two computed utilizations can then be utilized for scaling decisions. In this example, it will be the CPU utilization which is at 250%.
- the upper threshold value is the utilization value for the compute group above which the scaling manager 106 increases the number of available resource (e.g., increases the size of the node group 204 ).
- the lower threshold value is the utilization value for the compute group below which the scaling manager 106 decreases the number of available resources (e.g., decreases the size of the node group 204 ).
- the scaling manager 106 can compare the utilization value with the upper and lower threshold values to determine whether it lies between these values.
- step 308 if a determination is made that the calculated utilization is between the upper and lower threshold values, the scaling manager 106 does nothing and the method 300 ends.
- step 308 if at step 308 , it is determined that the calculated utilization is not between the upper and lower threshold values, the method proceeds to step 310 where a determination is made whether the utilization is higher that the upper threshold. If it is determined at the utilization is above the upper threshold value, the method 300 proceeds to step 312 where the scaling manager 106 calculates the additional resources required to decrease the utilization of the compute group to be below the upper threshold. In one embodiment, the additional resources can be calculated by using a percent decrease formula.
- the scaling manager 106 causes additional resources to be added to the compute group based on the calculated number of compute resources identified in the previous step.
- the scaling manager 106 generates and sends a request to the resource provider 102 to assign additional resources (e.g., physical/virtual machines) to the resource requesting system's active infrastructure.
- additional resources e.g., physical/virtual machines
- the number of additional resources requested corresponds to the number of additional resources calculated at step 312 , e.g., 6 nodes in the above example.
- the resource provider 102 assigns the additional resources, it informs the orchestration system 104 that additional compute resources are added, which updates its list of nodes 202 for that node group 204 to include the newly added nodes 202 .
- step 310 if at step 310 , a determination is made that the calculated utilization is not above the upper threshold, the method 300 proceeds to step 316 where the scaling manager 106 calculates the number of compute resources (e.g., nodes) that need to be released/terminated to increase the overall utilization to be above the lower threshold value. In one embodiment, this number of nodes can be calculated in a similar fashion to the calculation performed at step 312 .
- compute resources e.g., nodes
- the scaling manager 106 causes the number of active compute resources to reduce by the number of calculated compute resources in the previous step. In certain embodiments, this may be done by generating and sending a command to the resource provider to terminate/release the calculated number of physical/virtual machines from the compute group assigned to the resource requesting system 110 . In certain embodiments, the scaling manager 106 may specify which resources/nodes should be terminated, whereas in other cases this may not be specified and the resource provider 102 may make this determination itself.
- the scaling manager 106 may do this based on a suitable criterion. For example, in some cases, the scaling manager 106 may determine the oldest nodes in the node group 204 (i.e., the nodes that were created earliest) and request that one or more of the oldest nodes be terminated (based on the number of nodes calculated at step 314 ). To this end, the scaling manager 106 reviews the creation time of each of the nodes in the node list for the node group and prioritizes the nodes that were created earliest for termination. By terminating the oldest nodes first, the scaling manager 106 ensures that there are always newer nodes in the node group 204 . Additionally, terminating older nodes first allows the resource requesting system 110 to slowly roll out configuration changes as newer nodes can be initialized and configured with new/updated configuration settings whereas the older nodes that are already executing based on older configuration settings are terminated.
- a suitable criterion For example, in some cases, the scaling manager 106 may determine the oldest nodes in the no
- the scaling manager 106 may identify the newest nodes in the node group 204 and request that the newest nodes be terminated first. It will be appreciated that any other criterion can also be employed—e.g., the scaling manager 106 can determine that the nodes with the most allocable capacity be terminated first or that the nodes with the least allocable capacity be terminated first.
- step 312 once the resource provider 102 deletes requested number of resources, it informs the orchestration system 104 .
- the orchestration system 102 can then update its list of nodes 202 for that node group 204 to remove the deleted nodes.
- FIGS. 4A-4C illustrate an end-to-end method of auto-scaling process for scaling resources in a compute group based on demand that utilizes marking and different scale down rates.
- the method begins at step 402 , where the scaling manager 106 retrieves a list of active compute resources allocated to the compute group.
- the scaling manager 106 may be configured to retrieve a list of all the nodes and pods in a given node group from the orchestration system 104 .
- scaling manager 106 sets a watch on the node group 204 to receive a stream of updates of all the nodes 202 and pods 212 in the given node group 204 from the orchestration system 104 .
- the updates may be stored in an internal cache maintained by the scaling manager 106 .
- the scaling manager 106 may communicate with the orchestration system 104 to retrieve this information when the method step 402 executes.
- the information retrieved either from the internal cache or directly from the orchestration system 104 at this step may include, e.g., a list of all the pods scheduled in the node group; a list of all the nodes in the node group; names of all the nodes, their allocable capacity, the number of pods currently active on the nodes, the status of the nodes (e.g., marked or unmarked), their creation time; the names of the pods, their creation time, etc.
- the scaling manager 106 filters the list of compute resources into marked and unmarked compute resources (e.g., marked and unmarked nodes). To this end, the scaling manager 106 may filter the list of fetched nodes based on the properties field of the nodes that indicates whether the node is marked unschedulable or not.
- the scaling manager 106 calculates the requested capacity for the compute group—i.e., it determines the CPU and memory required to complete all the currently allocated job requests created by the resource requesting system 110 . To this end, the scaling manager 106 may retrieve the requests queued for execution or currently executing at each of the pods 212 in the node group 202 . This step is similar to step 302 of FIG. 3 and therefore is not described in any further detail here.
- the scaling manager 106 ignores the marked compute resources in its calculation of allocable capacity and determines the allocable capacity in the compute group based on the unmarked compute resources in the compute group. In particular, the scaling manager 106 determines the total CPU and memory that can be utilized at the unmarked compute resources or nodes. As noted previously, in Kubernetes, the capacity of each node 202 may be provided to the node controller 206 by the agent 208 and the scaling manager 106 may fetch this capacity information from the node controller 206 . It then adds the capacity of each node to determine the total allocable resources of the underlying infrastructure.
- the utilization of the unmarked compute resources in the compute group is calculated.
- the utilization is determined as a percentage of the total requested capacity divided by the total allocable capacity. Further, as described with reference to FIG. 3 , the utilization may be calculated separately for CPU usage and memory usage. The scaling manager 106 can determine which of the calculated utilizations at step 410 is greater (the CPU or the memory) and then utilize the higher utilization value for further calculations.
- the scaling manager 106 determines whether there is any scale lock in place at step 412 . If a determination is made at this step that the scale lock is not released (e.g., because a previously requested scaling operation is not yet completed), the method 400 ends. The method 400 may then re-run after a predetermined period of time.
- step 414 a determination is made whether the higher of the calculated utilizations falls between the slow scaling threshold value and the upper threshold value or if the calculated utilization is higher than the upper threshold or lower than the slow scaling threshold.
- the scaling manager 106 can perform one of three actions—do nothing, scale up, or scale down. If at step 414 a determination is made that the calculated utilization is between the upper and slow scaling threshold values, the scaling manager 106 does nothing. However, it may perform some administrative tasks on the marked nodes at step 418 .
- the scaling manager 106 can mark excess nodes as unschedulable for a certain period of time. To this end, whenever a node is marked as unschedulable, a node marking timer that is set to expire after a predetermined period of time is started. At step 416 , the scaling manager 106 checks if the node marking time for any of the marked nodes has expired. In certain embodiments, when a node is marked as unschedulable, the orchestration system 104 updates the properties/metadata of the node to indicate that the node is marked. The orchestration system 104 can also set an expiry time for the marking within the properties/metadata of the node.
- the scaling manager 106 may be configured to determine whether the current time is equal to or exceeds the expiry time. It also checks if any pods or containers are still running on the marked nodes, e.g., by requesting the node controller to provide a list of all pods running on the identified node.
- the scaling manager 106 identifies that the node marking timer for any of the marked nodes has expired and they are empty, it instructs the resource provider 106 to terminate the corresponding underlying resources and applies the scale lock at step 416 .
- the scaling manager 106 checks if the timeout timer for that node has expired. If this is a case for any of the nodes that still have active pods, an instruction to terminate the identified nodes is also provided to the resource provider 102 at step 416 .
- step 414 if at step 414 , it is determined that the calculated utilization is higher that the upper threshold, the method 400 proceeds to step 418 where the scaling manager 106 calculates the additional resources required to decrease the utilization of the compute group to be below the upper threshold. This calculation is similar to the calculation performed at step 310 of FIG. 3 .
- the scaling manager 106 determines whether the number of unmarked nodes is sufficient to decrease the utilization of the compute group to be below the upper threshold at step 424 .
- step 426 the scaling manager 106 sends a request to the resource provider 102 to assign additional compute resources to the compute group and the scale lock is applied (e.g. 4 more resources).
- the resource provider 102 assigns the additional resources
- the orchestration system 104 is informed, which updates its list of nodes 202 for that node group 204 to include the newly added nodes 202 .
- the scale lock can then be released (if it hasn't already timed out).
- step 424 it is determined that the number of unmarked compute resources is sufficient to decrease the utilization of the entire system to be below the upper threshold value, the method 400 ends.
- step 420 if a determination is made that there are no marked compute resources in the node group 204 , the method proceeds straight to step 426 .
- step 414 if a determination is made that the calculated utilization is lower than the slow scaling threshold value, the method 400 proceeds to scale down the resources by marking and/or terminating resources.
- step 430 a determination is made whether the calculated utilization is below the fast scaling threshold value. In one example, this is done by comparing the calculated utilization value with the slow and fast scaling threshold values preset for the scaling manager 106 .
- the scaling manager 106 adopts the slow scaling mode and the method proceeds to step 430 where the scaling manager 106 marks the number of nodes corresponding to the predetermined number of compute resources configured for the slow scale down mode (e.g., 2 nodes).
- the scaling manager 106 adopts the fast scaling mode and the method proceeds to step 432 where the scaling manager 106 marks the number of nodes corresponding to the predetermined number of compute resources configured for the fast scale down mode (e.g., 5 nodes).
- the scaling manager 106 may select the compute resources for marking based on one or more criteria. For instance, it may attempt to mark the oldest nodes, the newest nodes, the nodes with the least number of active pods, or the nodes with the most number of active pods.
- step 434 the scaling manager 106 checks if the node marking timer for any of the marked nodes has expired. It also checks if any jobs are currently executing on the marked node. To this end, it may check if any pods or containers are still running on the marked nodes, e.g., by requesting the node controller 206 to provide a list of all pods running on the identified node(s). If the scaling manager 106 identifies that the marking node timer has expired for any of the marked nodes that are empty, it instructs the resource provider 106 to terminate the corresponding underlying compute resources and applies the scale lock.
- the scaling manager 106 checks if the timeout timer for that node has expired. If this is the case for any of the nodes that still have active pods, an instruction to terminate the identified nodes is provided to the resource provider 102 .
- the scaling manager 106 is configured to vary its scale down rate based on the rate at which the load on the compute group decreases.
- the scaling manager 106 may simply have a lower threshold value (as described with respect to FIG. 3 ). If the utilization falls below this lower utilization value, the scaling manager 106 may calculate the number of compute resources required to bring the utilization value to a value higher than the lower threshold. This calculation can be similar to the calculation performed for determining the number of compute resources required to bring the utilization value to a value lower than the upper threshold at steps 312 and 418 .
- the resources provided by the resource provider may be one or more computer systems; the orchestration system 104 may be provided by one or more computer systems; the resource requesting systems 110 may be provided by one or more computer systems; and the scaling manager 106 may be executed on a computer system.
- the computing devices may be hard-wired to perform the techniques, or may include digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more general purpose hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination.
- ASICs application-specific integrated circuits
- FPGAs field programmable gate arrays
- Such special-purpose computing devices may also combine custom hardwired logic, ASICs, or FPGAs with custom programming to accomplish the techniques.
- the computing devices may be desktop computer systems, portable computer systems, handheld devices, networking devices or any other device that incorporates hard-wired and/or program logic to implement relevant operations.
- FIG. 5 is a block diagram that illustrates a computer system 500 upon which an embodiment of the invention may be implemented.
- Computer system 500 includes a bus 502 or other communication mechanism for communicating information, and a hardware processor 504 coupled with bus 502 for processing information.
- Hardware processor 504 may be, for example, a general-purpose microprocessor.
- Computer system 500 also includes a main memory 506 , such as a random access memory (RAM) or other dynamic storage device, coupled to bus 502 for storing information and instructions to be executed by processor 504 .
- Main memory 506 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 504 .
- Such instructions when stored in non-transitory storage media accessible to processor 504 , render computer system 500 into a special-purpose machine that is customized to perform the operations specified in the instructions.
- Computer system 500 further includes a read only memory (ROM) 508 or other static storage device coupled to bus 502 for storing static information and instructions for processor 504 .
- ROM read only memory
- a storage device 510 such as a magnetic disk or optical disk, is provided and coupled to bus 302 for storing information and instructions.
- the methods disclosed herein are performed by computer system 500 in response to processor 504 executing one or more sequences of one or more instructions contained in main memory 506 .
- Such instructions may be read into main memory 506 from another storage medium, such as storage device 510 .
- Execution of the sequences of instructions contained in main memory 506 causes processor 504 to perform the process steps described herein.
- hardwired circuitry may be used in place of or in combination with software instructions.
- Non-volatile media includes, for example, optical or magnetic disks, such as storage device 510 .
- Volatile media includes dynamic memory, such as main memory 506 .
- Common forms of storage media include, for example, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.
- Storage media is distinct from but may be used in conjunction with transmission media.
- Transmission media participates in transferring information between storage media.
- transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 502 .
- transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
- Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor 504 for execution.
- the instructions may initially be carried on a magnetic disk or solid state drive of a remote computer.
- the remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem.
- a modem local to computer system 500 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal.
- An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 502 .
- Bus 502 carries the data to main memory 506 , from which processor 504 retrieves and executes the instructions.
- the instructions received by main memory 506 may optionally be stored on storage device 510 either before or after execution by processor 504 .
- Computer system 500 also includes a communication interface 518 coupled to bus 502 .
- Communication interface 518 provides a two-way data communication coupling to a network link 520 that is connected to network 108 .
- communication interface 518 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line.
- ISDN integrated services digital network
- communication interface 518 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN.
- LAN local area network
- Wireless links may also be implemented.
- communication interface 518 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
- Network link 520 typically provides data communication through one or more networks 108 to other computing systems. For example, if the computing system 500 is part of the physical machines assigned to a resource requesting system 110 , the network link 520 may provide a connection through network 108 to the orchestration system 104 or the resource requesting system 110 .
- Computer system 500 can send messages and receive data, including program code, through the network(s), network link 520 and communication interface 518 .
- a computer system 500 may receive requests for launching containers from the orchestration system 104 through the network 108 and communication interface 518 .
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
CPU=5000 m/8000 m*100=62.5%
Memory=1000 mb/3200 mb*100=3.125%
Claims (18)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/368,122 US11030009B2 (en) | 2019-03-28 | 2019-03-28 | Systems and methods for automatically scaling compute resources based on demand |
US17/337,336 US11734073B2 (en) | 2019-03-28 | 2021-06-02 | Systems and methods for automatically scaling compute resources based on demand |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/368,122 US11030009B2 (en) | 2019-03-28 | 2019-03-28 | Systems and methods for automatically scaling compute resources based on demand |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/337,336 Continuation US11734073B2 (en) | 2019-03-28 | 2021-06-02 | Systems and methods for automatically scaling compute resources based on demand |
Publications (2)
Publication Number | Publication Date |
---|---|
US20200310881A1 US20200310881A1 (en) | 2020-10-01 |
US11030009B2 true US11030009B2 (en) | 2021-06-08 |
Family
ID=72607581
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/368,122 Active 2039-06-15 US11030009B2 (en) | 2019-03-28 | 2019-03-28 | Systems and methods for automatically scaling compute resources based on demand |
US17/337,336 Active 2039-11-28 US11734073B2 (en) | 2019-03-28 | 2021-06-02 | Systems and methods for automatically scaling compute resources based on demand |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/337,336 Active 2039-11-28 US11734073B2 (en) | 2019-03-28 | 2021-06-02 | Systems and methods for automatically scaling compute resources based on demand |
Country Status (1)
Country | Link |
---|---|
US (2) | US11030009B2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11972193B1 (en) * | 2020-10-01 | 2024-04-30 | Synopsys, Inc. | Automatic elastic CPU for physical verification |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11650837B2 (en) * | 2019-04-26 | 2023-05-16 | Hewlett Packard Enterprise Development Lp | Location-based virtualization workload placement |
US11416285B1 (en) | 2019-04-30 | 2022-08-16 | Splunk Inc. | Efficient and secure scalable-two-stage data collection |
US12052175B2 (en) * | 2019-07-25 | 2024-07-30 | Snapt, Inc | Controlling a destination of network traffic |
CN111090495A (en) * | 2019-12-02 | 2020-05-01 | 中兴通讯股份有限公司 | Node management method, device, equipment, storage medium and system |
US11755608B2 (en) * | 2020-01-13 | 2023-09-12 | Salesforce, Inc. | Interactive dataflow preview |
CN111367659B (en) * | 2020-02-24 | 2022-07-12 | 苏州浪潮智能科技有限公司 | Resource management method, equipment and medium for nodes in Kubernetes |
US20230333904A1 (en) * | 2020-04-15 | 2023-10-19 | Alkira, Inc. | Flow management with services |
US10860381B1 (en) * | 2020-05-14 | 2020-12-08 | Snowflake Inc. | Flexible computing |
US11681557B2 (en) * | 2020-07-31 | 2023-06-20 | International Business Machines Corporation | Systems and methods for managing resources in a hyperconverged infrastructure cluster |
US11900119B2 (en) * | 2021-02-26 | 2024-02-13 | Jpmorgan Chase Bank, N.A. | System and method for automatically controlling operating state of an application |
US20230058959A1 (en) * | 2021-08-19 | 2023-02-23 | Pepperdata, Inc. | Systems, methods, and devices for capacity optimization in a cluster system |
CN114296913B (en) * | 2021-12-21 | 2024-12-06 | 浙江太美医疗科技股份有限公司 | Resource arrangement method, device, equipment, and storage medium |
US12137060B1 (en) * | 2022-06-30 | 2024-11-05 | Amazon Technologies, Inc. | Resource buffer capacity management |
US11843548B1 (en) * | 2022-12-09 | 2023-12-12 | Dell Products L.P. | Resource scaling of microservice containers |
US12229147B1 (en) * | 2023-12-31 | 2025-02-18 | Teradata Us, Inc. | Dynamically instantiated complex query processing |
Citations (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100322088A1 (en) * | 2009-06-22 | 2010-12-23 | Manikam Muthiah | Systems and methods for monitor distribution in a multi-core system |
US20120185848A1 (en) * | 2011-01-17 | 2012-07-19 | International Business Machines Corporation | Task prioritization management in a virtualized environment |
US8332860B1 (en) * | 2006-12-30 | 2012-12-11 | Netapp, Inc. | Systems and methods for path-based tier-aware dynamic capacity management in storage network environments |
US20130047158A1 (en) * | 2011-08-16 | 2013-02-21 | Esds Software Solution Pvt. Ltd. | Method and System for Real Time Detection of Resource Requirement and Automatic Adjustments |
US20130111035A1 (en) * | 2011-10-28 | 2013-05-02 | Sangram Alapati | Cloud optimization using workload analysis |
US20130254384A1 (en) * | 2012-03-21 | 2013-09-26 | Tier3, Inc. | Cloud application scaling framework |
US20140068611A1 (en) * | 2012-09-06 | 2014-03-06 | Michael P. McGrath | Mechanism for Automatic Scaling of Application Resources in a Multi-Tenant Platform-as-a-Service (PaaS) Environment in a Cloud Computing System |
US20140082612A1 (en) * | 2012-09-17 | 2014-03-20 | International Business Machines Corporation | Dynamic Virtual Machine Resizing in a Cloud Computing Infrastructure |
US20140280918A1 (en) * | 2013-03-15 | 2014-09-18 | Gravitant, Inc. | Implementing cloud service resource analytics functionality |
US20140282591A1 (en) * | 2013-03-13 | 2014-09-18 | Slater Stich | Adaptive autoscaling for virtualized applications |
US20140304399A1 (en) * | 2013-04-06 | 2014-10-09 | Citrix Systems, Inc. | Systems and methods for providing monitoring in a cluster system |
US20150058265A1 (en) * | 2013-08-23 | 2015-02-26 | Vmware, Inc. | Automated scaling of multi-tier applications using reinforced learning |
US9009722B2 (en) * | 2013-02-05 | 2015-04-14 | International Business Machines Corporation | Collaborative negotiation of system resources among virtual servers running in a network computing environment |
US20160028855A1 (en) * | 2014-07-23 | 2016-01-28 | Citrix Systems, Inc. | Systems and methods for application specific load balancing |
US20160103717A1 (en) * | 2014-10-10 | 2016-04-14 | International Business Machines Corporation | Autoscaling applications in shared cloud resources |
US20160117180A1 (en) * | 2014-10-28 | 2016-04-28 | International Business Machines Corporation | Auto-scaling thresholds in elastic computing environments |
US20160179560A1 (en) * | 2014-12-22 | 2016-06-23 | Mrittika Ganguli | CPU Overprovisioning and Cloud Compute Workload Scheduling Mechanism |
US20160275412A1 (en) * | 2015-03-17 | 2016-09-22 | Vmware, Inc. | System and method for reducing state space in reinforced learning by using decision tree classification |
US20160352649A1 (en) * | 2015-05-27 | 2016-12-01 | Acer Incorporated | Methods for an automatic scaling of data consumers and apparatuses using the same |
US20170034064A1 (en) * | 2015-07-31 | 2017-02-02 | Netapp Inc. | Dynamic resource allocation based upon network flow control |
US20170339065A1 (en) * | 2016-05-20 | 2017-11-23 | Shoretel, Inc. | Hybrid cloud deployment for hybrid unified communications |
US20180039516A1 (en) * | 2016-08-08 | 2018-02-08 | International Business Machines Corporation | Heterogeneous auto-scaling using homogeneous auto-scaling groups |
US20180067771A1 (en) * | 2016-09-07 | 2018-03-08 | Pure Storage, Inc. | Ensuring the fair utilization of system resources using workload based, time-independent scheduling |
US9971621B1 (en) * | 2015-02-02 | 2018-05-15 | Amazon Technologies, Inc. | Hotpooling virtual machines |
US20180225139A1 (en) * | 2015-08-03 | 2018-08-09 | Nokia Solutions And Networks Oy | Load and software configuration control among composite service function chains |
US20180241812A1 (en) * | 2017-02-21 | 2018-08-23 | Microsoft Technology Licensing, Llc | Predictive autoscaling in computing systems |
US20180255137A1 (en) * | 2017-03-02 | 2018-09-06 | Futurewei Technologies, Inc. | Unified resource management in a data center cloud architecture |
US20180287898A1 (en) * | 2017-03-31 | 2018-10-04 | Connectwise, Inc. | Systems and methods for managing resource utilization in cloud infrastructure |
US20180343169A1 (en) * | 2017-05-24 | 2018-11-29 | At&T Intellectual Property I, L.P. | De-allocation elasticity application system |
US10182033B1 (en) * | 2016-09-19 | 2019-01-15 | Amazon Technologies, Inc. | Integration of service scaling and service discovery systems |
US10191778B1 (en) * | 2015-11-16 | 2019-01-29 | Turbonomic, Inc. | Systems, apparatus and methods for management of software containers |
US20190042322A1 (en) * | 2017-08-04 | 2019-02-07 | Espressive, Inc. | Elastic multi-tenant container architecture |
US10235625B1 (en) * | 2018-02-09 | 2019-03-19 | Capital One Services, Llc | Automatically scaling neural networks based on load |
US20190109908A1 (en) * | 2017-10-10 | 2019-04-11 | Inference Communications Pty. Ltd. | Automatic scaling for communications event access through a stateful interface |
US20190121675A1 (en) * | 2016-02-16 | 2019-04-25 | Red Hat, Inc. | Automatically scaling up physical resources in a computing infrastructure |
US20190138338A1 (en) * | 2017-11-06 | 2019-05-09 | Fujitsu Limited | Management apparatus and information processing system |
US20190146847A1 (en) * | 2017-11-10 | 2019-05-16 | Mentor Graphics Corporation | Dynamic distributed resource management |
US10324763B1 (en) * | 2018-12-11 | 2019-06-18 | Palantir Technologies Inc. | Systems and methods for terminating instances and autoscaling instance groups of computing platforms |
US10476742B1 (en) * | 2015-09-24 | 2019-11-12 | Amazon Technologies, Inc. | Classification of auto scaling events impacting computing resources |
US10496432B1 (en) * | 2019-01-22 | 2019-12-03 | Capital One Services, Llc | Methods, mediums, and systems for provisioning application services |
US20200192708A1 (en) * | 2018-12-18 | 2020-06-18 | EMC IP Holding Company LLC | Scaling distributed computing system resources based on load and trend |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9183031B2 (en) * | 2012-06-19 | 2015-11-10 | Bank Of America Corporation | Provisioning of a virtual machine by using a secured zone of a cloud environment |
US9164808B2 (en) * | 2012-07-20 | 2015-10-20 | Verizon Patent And Licensing Inc. | Virtual container for network systems |
US9189260B2 (en) * | 2012-09-27 | 2015-11-17 | International Business Machines Corporation | Resource allocation for virtual machines and logical partitions |
US9262220B2 (en) * | 2013-11-15 | 2016-02-16 | International Business Machines Corporation | Scheduling workloads and making provision decisions of computer resources in a computing environment |
JP6277827B2 (en) * | 2014-03-31 | 2018-02-14 | 富士通株式会社 | Information processing apparatus, scale management method, and program |
WO2016018401A1 (en) * | 2014-07-31 | 2016-02-04 | Hewlett-Packard Development Company, L.P. | Dynamic adjustment of thresholds |
EP2988214A1 (en) * | 2014-08-20 | 2016-02-24 | Alcatel Lucent | Method for balancing a load, a system, an elasticity manager and a computer program product |
US9438529B1 (en) * | 2014-09-25 | 2016-09-06 | Amazon Technologies, Inc. | Computing process analysis by metrics profiling |
US10432734B2 (en) * | 2014-12-12 | 2019-10-01 | Hewlett Packard Enterprise Development Lp | Cloud service tuning |
US9645847B1 (en) * | 2015-06-08 | 2017-05-09 | Amazon Technologies, Inc. | Efficient suspend and resume of instances |
US20180278725A1 (en) * | 2017-03-24 | 2018-09-27 | Ca, Inc. | Converting a single-tenant application for multi-tenant use |
US10911367B2 (en) * | 2018-06-27 | 2021-02-02 | Oracle International Corporation | Computerized methods and systems for managing cloud computer services |
-
2019
- 2019-03-28 US US16/368,122 patent/US11030009B2/en active Active
-
2021
- 2021-06-02 US US17/337,336 patent/US11734073B2/en active Active
Patent Citations (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8332860B1 (en) * | 2006-12-30 | 2012-12-11 | Netapp, Inc. | Systems and methods for path-based tier-aware dynamic capacity management in storage network environments |
US20100322088A1 (en) * | 2009-06-22 | 2010-12-23 | Manikam Muthiah | Systems and methods for monitor distribution in a multi-core system |
US20120185848A1 (en) * | 2011-01-17 | 2012-07-19 | International Business Machines Corporation | Task prioritization management in a virtualized environment |
US20130047158A1 (en) * | 2011-08-16 | 2013-02-21 | Esds Software Solution Pvt. Ltd. | Method and System for Real Time Detection of Resource Requirement and Automatic Adjustments |
US20130111035A1 (en) * | 2011-10-28 | 2013-05-02 | Sangram Alapati | Cloud optimization using workload analysis |
US20130254384A1 (en) * | 2012-03-21 | 2013-09-26 | Tier3, Inc. | Cloud application scaling framework |
US20140068611A1 (en) * | 2012-09-06 | 2014-03-06 | Michael P. McGrath | Mechanism for Automatic Scaling of Application Resources in a Multi-Tenant Platform-as-a-Service (PaaS) Environment in a Cloud Computing System |
US20140082612A1 (en) * | 2012-09-17 | 2014-03-20 | International Business Machines Corporation | Dynamic Virtual Machine Resizing in a Cloud Computing Infrastructure |
US9009722B2 (en) * | 2013-02-05 | 2015-04-14 | International Business Machines Corporation | Collaborative negotiation of system resources among virtual servers running in a network computing environment |
US20140282591A1 (en) * | 2013-03-13 | 2014-09-18 | Slater Stich | Adaptive autoscaling for virtualized applications |
US20140280918A1 (en) * | 2013-03-15 | 2014-09-18 | Gravitant, Inc. | Implementing cloud service resource analytics functionality |
US20140304399A1 (en) * | 2013-04-06 | 2014-10-09 | Citrix Systems, Inc. | Systems and methods for providing monitoring in a cluster system |
US20150058265A1 (en) * | 2013-08-23 | 2015-02-26 | Vmware, Inc. | Automated scaling of multi-tier applications using reinforced learning |
US20160028855A1 (en) * | 2014-07-23 | 2016-01-28 | Citrix Systems, Inc. | Systems and methods for application specific load balancing |
US20160103717A1 (en) * | 2014-10-10 | 2016-04-14 | International Business Machines Corporation | Autoscaling applications in shared cloud resources |
US20160117180A1 (en) * | 2014-10-28 | 2016-04-28 | International Business Machines Corporation | Auto-scaling thresholds in elastic computing environments |
US20160179560A1 (en) * | 2014-12-22 | 2016-06-23 | Mrittika Ganguli | CPU Overprovisioning and Cloud Compute Workload Scheduling Mechanism |
US9971621B1 (en) * | 2015-02-02 | 2018-05-15 | Amazon Technologies, Inc. | Hotpooling virtual machines |
US20160275412A1 (en) * | 2015-03-17 | 2016-09-22 | Vmware, Inc. | System and method for reducing state space in reinforced learning by using decision tree classification |
US20160352649A1 (en) * | 2015-05-27 | 2016-12-01 | Acer Incorporated | Methods for an automatic scaling of data consumers and apparatuses using the same |
US20170034064A1 (en) * | 2015-07-31 | 2017-02-02 | Netapp Inc. | Dynamic resource allocation based upon network flow control |
US20180225139A1 (en) * | 2015-08-03 | 2018-08-09 | Nokia Solutions And Networks Oy | Load and software configuration control among composite service function chains |
US10476742B1 (en) * | 2015-09-24 | 2019-11-12 | Amazon Technologies, Inc. | Classification of auto scaling events impacting computing resources |
US10191778B1 (en) * | 2015-11-16 | 2019-01-29 | Turbonomic, Inc. | Systems, apparatus and methods for management of software containers |
US20190121675A1 (en) * | 2016-02-16 | 2019-04-25 | Red Hat, Inc. | Automatically scaling up physical resources in a computing infrastructure |
US20170339065A1 (en) * | 2016-05-20 | 2017-11-23 | Shoretel, Inc. | Hybrid cloud deployment for hybrid unified communications |
US20180039516A1 (en) * | 2016-08-08 | 2018-02-08 | International Business Machines Corporation | Heterogeneous auto-scaling using homogeneous auto-scaling groups |
US20180067771A1 (en) * | 2016-09-07 | 2018-03-08 | Pure Storage, Inc. | Ensuring the fair utilization of system resources using workload based, time-independent scheduling |
US10182033B1 (en) * | 2016-09-19 | 2019-01-15 | Amazon Technologies, Inc. | Integration of service scaling and service discovery systems |
US20180241812A1 (en) * | 2017-02-21 | 2018-08-23 | Microsoft Technology Licensing, Llc | Predictive autoscaling in computing systems |
US20180255137A1 (en) * | 2017-03-02 | 2018-09-06 | Futurewei Technologies, Inc. | Unified resource management in a data center cloud architecture |
US20180287898A1 (en) * | 2017-03-31 | 2018-10-04 | Connectwise, Inc. | Systems and methods for managing resource utilization in cloud infrastructure |
US20180343169A1 (en) * | 2017-05-24 | 2018-11-29 | At&T Intellectual Property I, L.P. | De-allocation elasticity application system |
US20190042322A1 (en) * | 2017-08-04 | 2019-02-07 | Espressive, Inc. | Elastic multi-tenant container architecture |
US20190109908A1 (en) * | 2017-10-10 | 2019-04-11 | Inference Communications Pty. Ltd. | Automatic scaling for communications event access through a stateful interface |
US20190138338A1 (en) * | 2017-11-06 | 2019-05-09 | Fujitsu Limited | Management apparatus and information processing system |
US20190146847A1 (en) * | 2017-11-10 | 2019-05-16 | Mentor Graphics Corporation | Dynamic distributed resource management |
US10235625B1 (en) * | 2018-02-09 | 2019-03-19 | Capital One Services, Llc | Automatically scaling neural networks based on load |
US10324763B1 (en) * | 2018-12-11 | 2019-06-18 | Palantir Technologies Inc. | Systems and methods for terminating instances and autoscaling instance groups of computing platforms |
US20200192708A1 (en) * | 2018-12-18 | 2020-06-18 | EMC IP Holding Company LLC | Scaling distributed computing system resources based on load and trend |
US10496432B1 (en) * | 2019-01-22 | 2019-12-03 | Capital One Services, Llc | Methods, mediums, and systems for provisioning application services |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11972193B1 (en) * | 2020-10-01 | 2024-04-30 | Synopsys, Inc. | Automatic elastic CPU for physical verification |
Also Published As
Publication number | Publication date |
---|---|
US20210294658A1 (en) | 2021-09-23 |
US11734073B2 (en) | 2023-08-22 |
US20200310881A1 (en) | 2020-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11734073B2 (en) | Systems and methods for automatically scaling compute resources based on demand | |
JP6892729B2 (en) | Code execution request routing | |
US10282229B2 (en) | Asynchronous task management in an on-demand network code execution environment | |
US9952896B2 (en) | Asynchronous task management in an on-demand network code execution environment | |
US20220391238A1 (en) | Low latency computational capacity provisioning | |
US11354169B2 (en) | Adjusting variable limit on concurrent code executions | |
US10725826B1 (en) | Serializing duration-limited task executions in an on demand code execution system | |
US10445140B1 (en) | Serializing duration-limited task executions in an on demand code execution system | |
CN114930295B (en) | Serverless call allocation method and system utilizing reserved capacity without inhibiting scaling | |
CN109478134B (en) | Executing on-demand network code with cross-account aliases | |
JP6352535B2 (en) | Programmatic event detection and message generation for requests to execute program code | |
US9977691B2 (en) | Adjusting variable limit on concurrent code executions based on communication between frontends | |
JP6363796B2 (en) | Dynamic code deployment and versioning | |
US11966768B2 (en) | Apparatus and method for multi-cloud service platform | |
CN107209682B (en) | Automatic management of resource adjustments | |
CN109564525B (en) | Asynchronous task management in an on-demand network code execution environment | |
US20160357589A1 (en) | Methods and apparatus to scale application deployments in cloud computing environments using virtual machine pools | |
EP4002113A1 (en) | Method and system to process requests to execute user code on one or more virtual machine instances identified from a plurality of warmed unassigned virtual machine | |
US20170031622A1 (en) | Methods for allocating storage cluster hardware resources and devices thereof | |
KR20180059528A (en) | Management of periodic requests for compute capacity | |
WO2018108001A1 (en) | System and method to handle events using historical data in serverless systems | |
JP2005056391A (en) | Method and system for balancing workload of computing environment | |
US11449350B2 (en) | Systems and methods for automatically updating compute resources | |
JP2018527668A (en) | Method and system for limiting data traffic | |
CN115756822A (en) | Method and system for optimizing performance of high-performance computing application |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: ATLASSIAN PTY LTD, AUSTRALIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GONZALEZ, JACOB CHRISTOPHER JOSEPH;PRICE, ALEXANDER WILLIAM;ANGOT, DAVID;AND OTHERS;SIGNING DATES FROM 20190325 TO 20190327;REEL/FRAME:048760/0149 Owner name: ATLASSIAN, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GONZALEZ, JACOB CHRISTOPHER JOSEPH;PRICE, ALEXANDER WILLIAM;ANGOT, DAVID;AND OTHERS;SIGNING DATES FROM 20190325 TO 20190327;REEL/FRAME:048760/0149 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: ATLASSIAN US, INC., CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:ATLASSIAN, INC.;REEL/FRAME:061085/0690 Effective date: 20220701 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |