EP3561671A1 - Allocating workload - Google Patents
Allocating workload Download PDFInfo
- Publication number
- EP3561671A1 EP3561671A1 EP18169702.0A EP18169702A EP3561671A1 EP 3561671 A1 EP3561671 A1 EP 3561671A1 EP 18169702 A EP18169702 A EP 18169702A EP 3561671 A1 EP3561671 A1 EP 3561671A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- containers
- function
- container
- requests
- container group
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 claims abstract description 27
- 230000004044 response Effects 0.000 claims abstract description 3
- 239000013589 supplement Substances 0.000 claims description 4
- 230000001502 supplementing effect Effects 0.000 claims 2
- 230000006870 function Effects 0.000 description 105
- 230000008569 process Effects 0.000 description 10
- 238000007726 management method Methods 0.000 description 5
- 230000001934 delay Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000012512 characterization method Methods 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009118 appropriate response Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002688 persistence Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
Definitions
- Examples relate, in general, to methods for allocating workload and to serverless platforms.
- a serverless platform provides a form of utility computing in which an application can be executed as a set of small, stateless functions with access to a data store. These functions can be triggered by external and/or internal events or other functions, thereby forming function chains. A serverless platform can therefore provide a function as a service.
- a method in a function as a service platform comprising multiple containers configured to execute a function, for allocating workload in response to incoming requests, the method comprising determining a current number of requests to be executed by respective ones of the multiple containers, logically isolating one or more of the multiple containers in which the current number of requests to be executed exceeds a predetermined threshold value related to a service level objective for the function, forming a container group composed of non-isolated containers, and allocating the incoming requests between respective ones of the containers in the container group.
- the number of containers in the container group can be supplemented by instantiating one or more containers for the function.
- the number of containers in the container group can be supplemented by reassigning a container configured to execute a second function in the platform to the container group.
- a reassigned container can be re-tasked to execute the function.
- An optimal number of the multiple containers to logically isolate for the function can be determined.
- An isolated container can be unified into the container group.
- a virtualised computing platform comprising a workload controller to logically detach one or more of multiple containers configured to execute a first function for the platform, whereby to form a container group comprising a set of available containers, and allocate respective ones of incoming requests for the function to one or more containers in the container group.
- the workload controller can supplement the number of containers in the container group by instantiating one or more containers for the function.
- the workload controller can supplement the number of containers in the container group by reassigning an existing container configured to execute a second function in the platform to the container group.
- the workload controller can determine a first period of time remaining before the existing container finishes executing any pending requests relating to the second function, determine a second period of time to instantiate a new container for the container group, and re-task the existing container to execute the first function when the pending requests are executed and the first period of time is less than the second period of time.
- a workload controller in a serverless function-as-a-service platform, the workload controller to logically isolate one or more of multiple containers configured to execute a first function for the platform, whereby to form a container group comprising a set of available containers, and allocate respective ones of incoming requests for the function to one or more containers in the container group.
- the workload controller can augment the container group with one or more additional containers configured to execute the first function, wherein at least one of the additional containers is reassigned from a second function.
- the workload controller can receive data representing an expected load for the function.
- the workload controller can remove a reassigned container from a set of available containers.
- the workload controller can re-attach a logically isolated container to the platform.
- server management and capacity planning decisions are hidden from developers or operators such that they do not need to specify and configure cloud instances (i.e. virtual machines - VMs and/or containers) within which functions execute.
- Configuration and dynamic management of virtual resources, as well as the runtime environment, are the responsibility of the cloud operator. That is, the operator provides and manages the virtual environment where developers can deploy applications and services without the burden of configuring cores and memory of virtual machines or specifying scaling thresholds and populate templates and so on.
- a billing perspective instead of acquiring cloud instances on a time basis (i.e. hour, day, month, year), developers pay for the number of requests accessing their applications.
- a serverless platform can comprise a proxy (or API gateway), a set of containers, and a datastore.
- Applications running on top of serverless platforms may have many stateless functions whose state information can be saved in the datastore and fetched during the execution of the function.
- the execution of functions can be triggered by such events as: user request accessing the application; termination of a function that trigger another function in the application chain; a change in a database; etc.
- the serverless proxy routes every request to the proper containers, and each container has a proper runtime that executes the functions.
- FIG. 1 is a schematic representation of a serverless platform according to an example.
- a serverless application executes one function.
- User devices access serverless applications using client function requests, e.g. HTTP (marked as I in Figure 1 ), that include the API of the application and input data to the functions.
- client function requests e.g. HTTP (marked as I in Figure 1 )
- the serverless proxy 101 selects a container 103 to handle a request.
- Container 103 hosts a function that can serve the request. If the function is not already available in the container 103, it can be fetched from the datastore 105. When the function is available, it consumes the data from the request I, executes the application's code and returns the appropriate response to the client (5).
- serverless platform providers manage the cloud instances used to serve application requests, platform management is transparent to the application developer or service owner. This means that developers do not need to acquire virtual machines or container instances to develop, test and run application - as is the case in laaS or PaaS cloud models. Instead, the cloud provider will manage the infrastructure and users are billed according to request demand.
- a method to optimize resource management and allocation in serverless platforms in order to seamlessly provide capacity and reduce delay uncertainty when scaling can be configured with the service level objective (SLO) that each serverless application should adhere to in mind.
- SLO service level objective
- the correct number of instances VMs or containers
- VMs or containers can be instantiated in advance, based on, for example, short-term forecasts of traffic demand and without affecting the SLOs.
- the number of containers to be created can be reduced by leveraging the fact that is inexpensive to send function code to an existing container instead of creating a new one to improve performance, particularly in platforms hosting multiple applications and serving varying demands.
- resource scaling can be optimised by computing the minimal (or optimal) number of instances (containers) to serve incoming demand whilst also considering a user's SLOs.
- a serverless proxy can then be used to distribute the demand fairly among relatively less congested instances.
- Instances can be created by an orchestrator, or, in an example, some (or all) instances can be reused. That is, information relating to the number of instances that can be reused can be leveraged in order to reassign containers between functions.
- the scaling process can create fewer instances from the scratch, thereby reducing instantiation delays.
- Figure 2 is a schematic representation of a serverless function-as-a-service platform according to an example.
- the platform comprises 3 containers (c1 , c2, c3) that execute a function f1, and 3 other containers (c4, c5, c6) that execute a function f2.
- the numbers inside each container represent the requests currently being processed by that container for the function in question.
- each container is assigned with a static amount of physical resources - this can be done, for example, using control groups on Linux containers.
- each function can accommodate or concurrently serve a maximum number of requests (where a request can interchangeably be considered as load, workload and traffic).
- the number of requests that a function can serve depends on target SLOs as well as the statically allocated physical resources of the underlying container.
- Each function f1 and f2 in figure 2 can handle 10 requests. Accordingly, any number of requests beyond this value means that an SLO associated with the function in question will be violated.
- the application with f1 will receive, via the serverless proxy 201, 12 new requests, and the application with f2 will receive 8 new requests.
- a baseline solution could attempt to accommodate the new requests based on the SLO targets.
- the cloud instance capacity is 10
- containers c2 and c3 will receive 2 requests each, since the serverless proxy 201 will typically implement a load balancing strategy that equally assigns requests to the function pool.
- an orchestrator would thus create 3 more containers 203 (see figure 3 ) to accommodate the new requests, thereby resulting in an over-dimensioned number of containers that is worse under traffic surges (i.e. rapid increase/decrease of workloads).
- figure 3 is a schematic representation of a serverless function as a service platform according to an example in which a baseline approach is used.
- resource usage can be computed as the number of requests that need to be processed, divided by the total capacity of the containers.
- highly congested containers can be detached or logically isolated from the load balancer (serverless proxy). Containers that are logically isolated continue to function and process their existing traffic, but do not receive new requests until they are re-attached to the load balancer.
- Figures 4a-c are schematic representations of a serverless function-as-a-service platform according to an example.
- container c1 is detached for a period of time as it is the container serving the largest number of requests in a given time slot (described in more detail below).
- the next most congested container is c2 (with 7 requests), so no more than 3 requests can be sent to it (again assuming that in the example of figure 4 , each container executing the function f1 can accommodate 10 concurrent requests before an SLO violation and/or a physical hardware overload). In this case therefore, 2 new containers would be needed to serve the workload - a total of 4 active containers if we consider that there are 2 containers already running.
- Figure 4b shows another option in which 2 containers, c1 and c2, are detached. In this case, only one new container would be added.
- figure 4c shows an example in which containers c1, c2 and c3 are detached. In this case, two new containers would be added.
- the corresponding resource usage is depicted for each of the examples shown in figures 4a to 4c , calculated as described above.
- the permutation of figure 4b results in a requirement for only one additional container, and a performance improvement as a result of 78% of resources being utilised (31 requests to be served/40 requests as maximum capacity). This compared to the examples of 4a and 4b in which 62% of resources are utilised, which is less efficient in the context of a serverless platform. Since a container can take, for example, at least 500ms to start, the solution of figure 4b reduces the delay (compared to the other examples of figure 4 and the baseline of figure 3 ) to 500ms (only I additional container instantiated).
- various permutations of the containers available to accommodate additional load are compared in order to determine a minimum number of containers to be instantiated in order to meet demand and maximise resource usage.
- the various configurations of containers are generated by logically isolating different combinations of one or more containers, particularly those with an already relatively higher workload compared to the others or a workload that is equal to or greater than that specified in an SLO for the function. In this manner, a minimum number of additional containers to accommodate the workload can be determined.
- a permutation can be generated by logically isolating any number of containers from none to all containers.
- existing containers being utilised for a second function, different from the first function can be reassigned to serve the workload.
- one or more new containers need not be instantiated since existing containers can be reused.
- a decision to reuse a container can be made considering the current throughput of the second function, the average completion time of the requests for such a function (in order to estimate the draining or request depletion time for a container), and the creation plus provisioning times of new containers. Accordingly, existing resources serving a workload that is low enough to be consolidated into a lower number of containers can be reused.
- a new container can be instantiated, otherwise, to reduce delays, the existing container can be reassigned when available, i.e. once it has finished serving its current load.
- Figure 5 is a schematic representation of a serverless function-as-a-service platform according to an example.
- the requests served by the containers for function f1 are the same as in figure 4b , as is the determination that an additional container can be used to efficiently service incoming workload, which as before is 12 requests for the function f1.
- other containers in the platform are inspected in order to identify potential targets for reassignment before deciding to scale.
- FIG. 5 another function (f2) is deployed on containers c4, c5, c6, and the workload for f2 is predicted to decrease.
- all requests for f2 are gracefully drained and then the code of f1 can be retrieved from the datastore (e.g. 105) and deployed in c6, thereby eliminating the need to create a new container.
- Container c6 is processing 3 requests, and there are 8 incoming requests (501) for function f2. If the maximum number of requests allowed per container for function f2 is 10, the 8 new requests can be accommodated using the existing containers c4 and c5 by distributing 4 requests to each. The 3 requests being processing in container c6 will therefore be dealt with, leaving container c6 'empty'.
- container c6 is reassigned to function f1.
- the cost of reassigning a container to another function can be computed so that a determination can be made as to whether is more effective to reassign a container or to scale a new one.
- the optimal number of instances to serve incoming requests without violating SLOs by preventing congested instances from receiving new (incoming) workload (requests) can be determined.
- the need to instantiate new instances can be reduced by reassigning functions amongst extant containers.
- Figure 6 is a schematic representation of a virtualised computing platform according to an example.
- Function requests 601 from users are distributed across instantiated containers within a container pool 603 using a load balancer (serverless proxy) 602.
- the container pool 603 can comprise multiple containers executing over physical hardware, and multiple functions can be served by multiple sets of containers.
- functions run over a pool of containers, which are managed by an underlying orchestration system.
- Function instructions and execution states can be stored in the datastore 605 accessible from the resource pool 603.
- workload controller 607 periodically receives information from the Monitor Entity 609 available in the platform.
- the information has i) the current collocation of functions across containers; ii) the current number of requests each function is processing.
- the workload controller 607 can be either coupled to a characterization module 611 to obtain information about the maximum number of requests that a function can process simultaneously in each container; or calculate this information on-demand and compare it to SLO targets 612 of the application.
- a forecast engine 613 can be provided to enable provisioning for functions ahead of time thereby enabling more accurate decisions to reassign or scale containers while reducing performance degradation caused by workloads surges.
- the forecast engine 613 can receive information representing incoming requests for functions before such requests are distributed by the serverless proxy 602.
- workload controller 607 can create and terminate containers and control the pool of resources using the Container Manager 615, which is configured to enable instantiation or termination of container instances within the pool 603.
- the workload controller 607 can decide on a level of scaling and reassignment periodically, i.e. every time slot T, where T can be of the order of seconds or minutes.
- the virtualised computing platform with workload controller 607 can logically detach one or more of multiple containers configured to execute a first function for the platform, the containers within the container pool 603, and form a container group comprising a set of available containers and allocate respective ones of incoming requests 601 for the function to one or more containers in the container group.
- a container in the group may comprise an existing container for the function and/or a reassigned container that was previously used to serve requests relating to a different function. That is, isolation of high workload containers enables a determination to be made as to an optimal number of additional containers to be provided to service an incoming workload.
- One or more of the additional containers can be provided by instantiation or reassignment of existing containers.
- Figure 7 is a flowchart of a method according to an example.
- Expected request arrivals are treated in time slots as "atomic" arrivals.
- the workload controller 607 receives (block 703) a current number of requests in process (workload) at each container, and information representing a set containing the containers processing each function from the Monitor 609.
- the workload controller 607 receives a maximum number of requests that can be processed per function in each container from the Characterization Module 611
- the workload controller 607 receives a short-term prediction of the number of requests expected for each function in the time slot from Forecast Engine 613.
- a check of SLO violations is performed. That is, containers in which the current number of requests in process are larger than a corresponding maximum specified SLO load, or which are at a predetermined threshold level that corresponds to a proportion of the maximum specified SLO load (e.g. 80%) are not considered. That is, such containers are detached or logically isolated from the load balancer so that they will not receive additional requests in the following timeslot.
- a proportion of the maximum specified SLO load e.g. 80%
- OC f arg min inde skipped ⁇ L ⁇ l f ⁇ l index skipped ⁇ ⁇ C f ⁇ index skipped + 1 where:
- functions in F + are ranked in decreasing order of the highest loaded container executing that function.
- Each container in C - (containers that can be reassigned) may have a load to be depleted before they may be reassigned.
- the containers in C - are ranked in decreasing order of such a delay.
- the functions in F + are iterated over in their order.
- f, in F + OC f containers are desired.
- containers that may be desired to augment an existing set of containers serving a function may be found from C - in the event that the load of a container to be reassigned is less than the maximum load of the containers assigned to f .
- containers successfully assigned to the function under consideration are removed from C - . If more containers are needed that are not available in C - they can be instantiated for the function. Spare container in C - , if any, are stopped if they are idle (zero load). Otherwise they can be reassigned to their initial functions.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
- Examples relate, in general, to methods for allocating workload and to serverless platforms.
- A serverless platform provides a form of utility computing in which an application can be executed as a set of small, stateless functions with access to a data store. These functions can be triggered by external and/or internal events or other functions, thereby forming function chains. A serverless platform can therefore provide a function as a service.
- According to an example, there is provided a method, in a function as a service platform comprising multiple containers configured to execute a function, for allocating workload in response to incoming requests, the method comprising determining a current number of requests to be executed by respective ones of the multiple containers, logically isolating one or more of the multiple containers in which the current number of requests to be executed exceeds a predetermined threshold value related to a service level objective for the function, forming a container group composed of non-isolated containers, and allocating the incoming requests between respective ones of the containers in the container group. The number of containers in the container group can be supplemented by instantiating one or more containers for the function. The number of containers in the container group can be supplemented by reassigning a container configured to execute a second function in the platform to the container group. A reassigned container can be re-tasked to execute the function. An optimal number of the multiple containers to logically isolate for the function can be determined. An isolated container can be unified into the container group.
- According to an example, there is provided a virtualised computing platform, comprising a workload controller to logically detach one or more of multiple containers configured to execute a first function for the platform, whereby to form a container group comprising a set of available containers, and allocate respective ones of incoming requests for the function to one or more containers in the container group. The workload controller can supplement the number of containers in the container group by instantiating one or more containers for the function. The workload controller can supplement the number of containers in the container group by reassigning an existing container configured to execute a second function in the platform to the container group. The workload controller can determine a first period of time remaining before the existing container finishes executing any pending requests relating to the second function, determine a second period of time to instantiate a new container for the container group, and re-task the existing container to execute the first function when the pending requests are executed and the first period of time is less than the second period of time.
- According to an example, there is provided a workload controller in a serverless function-as-a-service platform, the workload controller to logically isolate one or more of multiple containers configured to execute a first function for the platform, whereby to form a container group comprising a set of available containers, and allocate respective ones of incoming requests for the function to one or more containers in the container group. The workload controller can augment the container group with one or more additional containers configured to execute the first function, wherein at least one of the additional containers is reassigned from a second function. The workload controller can receive data representing an expected load for the function. The workload controller can remove a reassigned container from a set of available containers. The workload controller can re-attach a logically isolated container to the platform.
- Embodiments will now be described, by way of example only, with reference to the accompanying drawings, in which:
-
Figure 1 is a schematic representation of a serverless platform according to an example; -
Figure 2 is a schematic representation of a serverless function-as-a-service platform according to an example; -
Figure 3 is a schematic representation of a serverless function as a service platform according to an example; -
Figures 4a-c are schematic representations of a serverless function-as-a-service platform according to an example; -
Figure 5 is a schematic representation of a serverless function-as-a-service platform according to an example; -
Figure 6 is a schematic representation of a virtualised computing platform according to an example; and -
Figure 7 is a flowchart of a method according to an example. - Example embodiments are described below in sufficient detail to enable those of ordinary skill in the art to embody and implement the systems and processes herein described. It is important to understand that embodiments can be provided in many alternate forms and should not be construed as limited to the examples set forth herein.
- Accordingly, while embodiments can be modified in various ways and take on various alternative forms, specific embodiments thereof are shown in the drawings and described in detail below as examples. There is no intent to limit to the particular forms disclosed. On the contrary, all modifications, equivalents, and alternatives falling within the scope of the appended claims should be included. Elements of the example embodiments are consistently denoted by the same reference numerals throughout the drawings and detailed description where appropriate.
- The terminology used herein to describe embodiments is not intended to limit the scope. The articles "a," "an," and "the" are singular in that they have a single referent, however the use of the singular form in the present document should not preclude the presence of more than one referent. In other words, elements referred to in the singular can number one or more, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes," and/or "including," when used herein, specify the presence of stated features, items, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, items, steps, operations, elements, components, and/or groups thereof.
- Unless otherwise defined, all terms (including technical and scientific terms) used herein are to be interpreted as is customary in the art. It will be further understood that terms in common usage should also be interpreted as is customary in the relevant art and not in an idealized or overly formal sense unless expressly so defined herein.
- In a serverless platform, server management and capacity planning decisions are hidden from developers or operators such that they do not need to specify and configure cloud instances (i.e. virtual machines - VMs and/or containers) within which functions execute. Configuration and dynamic management of virtual resources, as well as the runtime environment, are the responsibility of the cloud operator. That is, the operator provides and manages the virtual environment where developers can deploy applications and services without the burden of configuring cores and memory of virtual machines or specifying scaling thresholds and populate templates and so on. From a billing perspective, instead of acquiring cloud instances on a time basis (i.e. hour, day, month, year), developers pay for the number of requests accessing their applications.
- A serverless platform can comprise a proxy (or API gateway), a set of containers, and a datastore. Applications running on top of serverless platforms may have many stateless functions whose state information can be saved in the datastore and fetched during the execution of the function. The execution of functions can be triggered by such events as: user request accessing the application; termination of a function that trigger another function in the application chain; a change in a database; etc. The serverless proxy routes every request to the proper containers, and each container has a proper runtime that executes the functions.
-
Figure 1 is a schematic representation of a serverless platform according to an example. In the example offigure 1 a serverless application executes one function. User devices access serverless applications using client function requests, e.g. HTTP (marked as I inFigure 1 ), that include the API of the application and input data to the functions. In turn, requests are routed through theserverless proxy 101. Theserverless proxy 101 selects acontainer 103 to handle a request.Container 103 hosts a function that can serve the request. If the function is not already available in thecontainer 103, it can be fetched from thedatastore 105. When the function is available, it consumes the data from the request I, executes the application's code and returns the appropriate response to the client (5). - Since serverless platform providers manage the cloud instances used to serve application requests, platform management is transparent to the application developer or service owner. This means that developers do not need to acquire virtual machines or container instances to develop, test and run application - as is the case in laaS or PaaS cloud models. Instead, the cloud provider will manage the infrastructure and users are billed according to request demand.
- In such an environment, over-provisioning does not generate any revenue - i.e., developers are not charged for un-used instances as happens in other billing models. Therefore, keeping instances of applications up and running for a long time is costly and inefficient for cloud providers.
- Currently, orchestration of serverless platforms is neither sophisticated nor efficient. In fact:
- They do not consider service level objectives SLOs (e.g. maximum latency), meaning that delays of provisioning functions can be uncertain. This can jeopardize quality and impact user expectations;
- Although serverless is a lightweight technology, compared to VMs for example, the creation and start up time of containers and functions is not negligible and can be affect application performance, particularly, when demand requires scaling the number of containers;
- They do not share the instances among functions to improve persistence of the execution environments and reduce instantiation delays. That is, containers cannot be reused for different functions and are terminated - even when they could be reused for other functions.
- According to an example, there is provided a method to optimize resource management and allocation in serverless platforms in order to seamlessly provide capacity and reduce delay uncertainty when scaling. The method of optimisation can be configured with the service level objective (SLO) that each serverless application should adhere to in mind. Furthermore, the correct number of instances (VMs or containers) can be instantiated in advance, based on, for example, short-term forecasts of traffic demand and without affecting the SLOs. In addition, the number of containers to be created can be reduced by leveraging the fact that is inexpensive to send function code to an existing container instead of creating a new one to improve performance, particularly in platforms hosting multiple applications and serving varying demands.
- According to an example, resource scaling can be optimised by computing the minimal (or optimal) number of instances (containers) to serve incoming demand whilst also considering a user's SLOs. A serverless proxy can then be used to distribute the demand fairly among relatively less congested instances. Instances can be created by an orchestrator, or, in an example, some (or all) instances can be reused. That is, information relating to the number of instances that can be reused can be leveraged in order to reassign containers between functions. Thus, the scaling process can create fewer instances from the scratch, thereby reducing instantiation delays.
-
Figure 2 is a schematic representation of a serverless function-as-a-service platform according to an example. The platform comprises 3 containers (c1 , c2, c3) that execute a function f1, and 3 other containers (c4, c5, c6) that execute a function f2. The numbers inside each container represent the requests currently being processed by that container for the function in question. In the example offigure 2 , each container is assigned with a static amount of physical resources - this can be done, for example, using control groups on Linux containers. - In the example of
figure 2 , each function can accommodate or concurrently serve a maximum number of requests (where a request can interchangeably be considered as load, workload and traffic). The number of requests that a function can serve depends on target SLOs as well as the statically allocated physical resources of the underlying container. Each function f1 and f2 infigure 2 can handle 10 requests. Accordingly, any number of requests beyond this value means that an SLO associated with the function in question will be violated. In the example offigure 2 , in the short term, the application with f1 will receive, via theserverless proxy - In such scenario, a baseline solution could attempt to accommodate the new requests based on the SLO targets. In this case, since the cloud instance capacity is 10, only 2 requests would be forwarded to c1 (8+2=10) for function I. Similarly, containers c2 and c3 will receive 2 requests each, since the
serverless proxy 201 will typically implement a load balancing strategy that equally assigns requests to the function pool. Then, since there are 12 new requests, an orchestrator would thus create 3 more containers 203 (seefigure 3 ) to accommodate the new requests, thereby resulting in an over-dimensioned number of containers that is worse under traffic surges (i.e. rapid increase/decrease of workloads). This is depicted infigure 3 , which is a schematic representation of a serverless function as a service platform according to an example in which a baseline approach is used. - In an example, resource usage can be computed as the number of requests that need to be processed, divided by the total capacity of the containers. Thus, in the example of
figure 3 there are 31 requests (19 already in the containers plus 12 expected) with a total capacity of 60 (6 containers and 10 requests per container): 31/60 ∼ 0,52 (52%). - According to an example, in order to accommodate new requests, highly congested containers can be detached or logically isolated from the load balancer (serverless proxy). Containers that are logically isolated continue to function and process their existing traffic, but do not receive new requests until they are re-attached to the load balancer.
-
Figures 4a-c are schematic representations of a serverless function-as-a-service platform according to an example. Infigure 4a , container c1 is detached for a period of time as it is the container serving the largest number of requests in a given time slot (described in more detail below). The next most congested container is c2 (with 7 requests), so no more than 3 requests can be sent to it (again assuming that in the example offigure 4 , each container executing the function f1 can accommodate 10 concurrent requests before an SLO violation and/or a physical hardware overload). In this case therefore, 2 new containers would be needed to serve the workload - a total of 4 active containers if we consider that there are 2 containers already running.Figure 4b shows another option in which 2 containers, c1 and c2, are detached. In this case, only one new container would be added. Finally,figure 4c shows an example in which containers c1, c2 and c3 are detached. In this case, two new containers would be added. The corresponding resource usage is depicted for each of the examples shown infigures 4a to 4c , calculated as described above. - According to an example, from these permutations, in which various different containers for the function are logically isolated from the others, there is a determination of how to reassign a function to (non-isolated) containers to minimize the number of containers that need to be created. In the example of
figure 4 , the permutation offigure 4b results in a requirement for only one additional container, and a performance improvement as a result of 78% of resources being utilised (31 requests to be served/40 requests as maximum capacity). This compared to the examples of 4a and 4b in which 62% of resources are utilised, which is less efficient in the context of a serverless platform. Since a container can take, for example, at least 500ms to start, the solution offigure 4b reduces the delay (compared to the other examples offigure 4 and the baseline offigure 3 ) to 500ms (only I additional container instantiated). - Accordingly, for a given number of containers configured to execute a function in a serverless platform, various permutations of the containers available to accommodate additional load are compared in order to determine a minimum number of containers to be instantiated in order to meet demand and maximise resource usage. In an example, the various configurations of containers are generated by logically isolating different combinations of one or more containers, particularly those with an already relatively higher workload compared to the others or a workload that is equal to or greater than that specified in an SLO for the function. In this manner, a minimum number of additional containers to accommodate the workload can be determined. In an example, a permutation can be generated by logically isolating any number of containers from none to all containers.
- According to an example, having determined an optimum number of additional containers to service an incoming workload for a first function, existing containers being utilised for a second function, different from the first function, can be reassigned to serve the workload. As such, one or more new containers need not be instantiated since existing containers can be reused. A decision to reuse a container can be made considering the current throughput of the second function, the average completion time of the requests for such a function (in order to estimate the draining or request depletion time for a container), and the creation plus provisioning times of new containers. Accordingly, existing resources serving a workload that is low enough to be consolidated into a lower number of containers can be reused. If the time it would take for a container to become available for reassignment is larger than the time it would take to instantiate a new container for a function, a new container can be instantiated, otherwise, to reduce delays, the existing container can be reassigned when available, i.e. once it has finished serving its current load.
-
Figure 5 is a schematic representation of a serverless function-as-a-service platform according to an example. The requests served by the containers for function f1 are the same as infigure 4b , as is the determination that an additional container can be used to efficiently service incoming workload, which as before is 12 requests for the function f1. In the example offigure 5 however, other containers in the platform are inspected in order to identify potential targets for reassignment before deciding to scale. - In the example of
figure 5 , another function (f2) is deployed on containers c4, c5, c6, and the workload for f2 is predicted to decrease. In this case, all requests for f2 are gracefully drained and then the code of f1 can be retrieved from the datastore (e.g. 105) and deployed in c6, thereby eliminating the need to create a new container. Container c6 is processing 3 requests, and there are 8 incoming requests (501) for function f2. If the maximum number of requests allowed per container for function f2 is 10, the 8 new requests can be accommodated using the existing containers c4 and c5 by distributing 4 requests to each. The 3 requests being processing in container c6 will therefore be dealt with, leaving container c6 'empty'. If the time taken to complete the three requests in c6 plus the time taken to retrieve and deploy the function f1 in container c6 is less than time it would take to instantiate a new container for the function f1, in an example, container c6 is reassigned to function f1. - Draining all requests of f2 gracefully may take time and is dependent on the application. Therefore, in an example, the cost of reassigning a container to another function can be computed so that a determination can be made as to whether is more effective to reassign a container or to scale a new one.
- Therefore, the optimal number of instances to serve incoming requests without violating SLOs by preventing congested instances from receiving new (incoming) workload (requests) can be determined. In addition, the need to instantiate new instances can be reduced by reassigning functions amongst extant containers. The combination of these two techniques enables automation of a resource management process in a serverless platform, thereby making it more efficient in terms of resources usage and achieving SLO targets of applications.
-
Figure 6 is a schematic representation of a virtualised computing platform according to an example. Function requests 601 from users (not shown) are distributed across instantiated containers within acontainer pool 603 using a load balancer (serverless proxy) 602. Thecontainer pool 603 can comprise multiple containers executing over physical hardware, and multiple functions can be served by multiple sets of containers. As depicted infigure 6 for example, there are 6 groups of containers in thecontainer pool 603, each of which may serve requests relating to different functions. - Accordingly, functions run over a pool of containers, which are managed by an underlying orchestration system. Function instructions and execution states can be stored in the
datastore 605 accessible from theresource pool 603. - According to an example,
workload controller 607 periodically receives information from theMonitor Entity 609 available in the platform. The information has i) the current collocation of functions across containers; ii) the current number of requests each function is processing. Theworkload controller 607 can be either coupled to acharacterization module 611 to obtain information about the maximum number of requests that a function can process simultaneously in each container; or calculate this information on-demand and compare it toSLO targets 612 of the application. - A
forecast engine 613 can be provided to enable provisioning for functions ahead of time thereby enabling more accurate decisions to reassign or scale containers while reducing performance degradation caused by workloads surges. Theforecast engine 613 can receive information representing incoming requests for functions before such requests are distributed by theserverless proxy 602. - In an example,
workload controller 607 can create and terminate containers and control the pool of resources using theContainer Manager 615, which is configured to enable instantiation or termination of container instances within thepool 603. Theworkload controller 607 can decide on a level of scaling and reassignment periodically, i.e. every time slot T, where T can be of the order of seconds or minutes. - Accordingly, the virtualised computing platform with
workload controller 607 can logically detach one or more of multiple containers configured to execute a first function for the platform, the containers within thecontainer pool 603, and form a container group comprising a set of available containers and allocate respective ones ofincoming requests 601 for the function to one or more containers in the container group. A container in the group may comprise an existing container for the function and/or a reassigned container that was previously used to serve requests relating to a different function. That is, isolation of high workload containers enables a determination to be made as to an optimal number of additional containers to be provided to service an incoming workload. One or more of the additional containers can be provided by instantiation or reassignment of existing containers. -
Figure 7 is a flowchart of a method according to an example. Expected request arrivals are treated in time slots as "atomic" arrivals. Thus, at eachtimeslot 701 theworkload controller 607 receives (block 703) a current number of requests in process (workload) at each container, and information representing a set containing the containers processing each function from theMonitor 609. Inblock 705 theworkload controller 607 receives a maximum number of requests that can be processed per function in each container from theCharacterization Module 611, and inblock 707 theworkload controller 607 receives a short-term prediction of the number of requests expected for each function in the time slot fromForecast Engine 613. - In block 709 a check of SLO violations is performed. That is, containers in which the current number of requests in process are larger than a corresponding maximum specified SLO load, or which are at a predetermined threshold level that corresponds to a proportion of the maximum specified SLO load (e.g. 80%) are not considered. That is, such containers are detached or logically isolated from the load balancer so that they will not receive additional requests in the following timeslot.
-
- Lf is the forecasted future requests for f;
- Lf is the maximum number of requests that can be processed by each container according to an SLO;
- indexskipped is a variable that controls the number of highly congested containers that can be skipped;
- lindexskipped is the number of requests currently in process for the analysed container (if this number is less than the number in process at instantiation of the container that number is used instead); and
- Cf is a set containing the containers that process f without the containers that are isolated.
- According to an example, for each function, the corresponding value of OCf is checked:
- OCf > 0; additional containers are needed, and the function in question is added to a set F+;
- OCf < 0; fewer containers are needed. Empty or emptying containers are detached from the function pool and added to an 'available' pool, C-
- In
block 713, functions in F+ are ranked in decreasing order of the highest loaded container executing that function. Each container in C- (containers that can be reassigned) may have a load to be depleted before they may be reassigned. In an example, the containers in C- are ranked in decreasing order of such a delay. The functions in F+ are iterated over in their order. For a function, f, in F+ , OCf containers are desired. According to an example, containers that may be desired to augment an existing set of containers serving a function may be found from C- in the event that the load of a container to be reassigned is less than the maximum load of the containers assigned to f. - In
block 715 containers successfully assigned to the function under consideration are removed from C-. If more containers are needed that are not available in C - they can be instantiated for the function. Spare container in C- , if any, are stopped if they are idle (zero load). Otherwise they can be reassigned to their initial functions. - The present inventions can be embodied in other specific apparatus and/or methods. The described embodiments are to be considered in all respects as illustrative and not restrictive. In particular, the scope of the invention is indicated by the appended claims rather than by the description and figures herein. All changes that come within the meaning and range of equivalency of the claims are to be embraced within their scope.
Claims (15)
- A method, in a function as a service platform comprising multiple containers configured to execute a function, for allocating workload in response to incoming requests, the method comprising:determining a current number of requests to be executed by respective ones of the multiple containers;logically isolating one or more of the multiple containers in which the current number of requests to be executed exceeds a predetermined threshold value related to a service level objective for the function;forming a container group composed of non-isolated containers; andallocating the incoming requests between respective ones of the containers in the container group.
- The method as claimed in claim 1, further comprising:
supplementing the number of containers in the container group by instantiating one or more containers for the function. - The method as claimed in claim 1 or 2, further comprising:
supplementing the number of containers in the container group by reassigning a container configured to execute a second function in the platform to the container group. - The method as claimed in claim 3, further comprising:
re-tasking the reassigned container to execute the function. - The method as claimed in any preceding claim, further comprising:
for the function, determining an optimal number of the multiple containers to logically isolate. - The method as claimed in any preceding claim further comprising:
unifying an isolated container into the container group. - A virtualised computing platform, comprising a workload controller to:logically detach one or more of multiple containers configured to execute a first function for the platform, whereby to form a container group comprising a set of available containers; andallocate respective ones of incoming requests for the function to one or more containers in the container group.
- The virtualised computing platform as claimed in claim 7, the workload controller further to:
supplement the number of containers in the container group by instantiating one or more containers for the function. - The virtualised computing platform as claimed in claim 7 or 8, the workload controller further to:
supplement the number of containers in the container group by reassigning an existing container configured to execute a second function in the platform to the container group. - The virtualised computing platform as claimed in claim 9, the workload controller further to:determine a first period of time remaining before the existing container finishes executing any pending requests relating to the second function;determine a second period of time to instantiate a new container for the container group;andre-task the existing container to execute the first function when the pending requests are executed and the first period of time is less than the second period of time.
- A workload controller in a serverless function-as-a-service platform, the workload controller to:logically isolate one or more of multiple containers configured to execute a first function for the platform, whereby to form a container group comprising a set of available containers; andallocate respective ones of incoming requests for the function to one or more containers in the container group.
- The workload controller as claimed in claim 11, further to:
augment the container group with one or more additional containers configured to execute the first function, wherein at least one of the additional containers is reassigned from a second function. - The workload controller as claimed in claim 11 or 12, further to:
receive data representing an expected load for the function. - The workload controller as claimed in claim 12, further to:
remove a reassigned container from a set of available containers. - The workload controller as claimed in any of claims 11 to 14, further to:
re-attach a logically isolated container to the platform.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP18169702.0A EP3561671A1 (en) | 2018-04-27 | 2018-04-27 | Allocating workload |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP18169702.0A EP3561671A1 (en) | 2018-04-27 | 2018-04-27 | Allocating workload |
Publications (1)
Publication Number | Publication Date |
---|---|
EP3561671A1 true EP3561671A1 (en) | 2019-10-30 |
Family
ID=62089572
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP18169702.0A Withdrawn EP3561671A1 (en) | 2018-04-27 | 2018-04-27 | Allocating workload |
Country Status (1)
Country | Link |
---|---|
EP (1) | EP3561671A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11044173B1 (en) | 2020-01-13 | 2021-06-22 | Cisco Technology, Inc. | Management of serverless function deployments in computing networks |
US20220092480A1 (en) * | 2020-09-24 | 2022-03-24 | Adobe Inc. | Dynamically adjusting a serverless execution container pool for training and utilizing online machine-learning models |
US11301301B2 (en) | 2020-07-22 | 2022-04-12 | International Business Machines Corporation | Workload offloading between computing environments |
US11388253B1 (en) | 2021-03-30 | 2022-07-12 | Teso LT, UAB | Proxy selection by monitoring quality and available capacity |
US11652890B1 (en) | 2022-07-13 | 2023-05-16 | Oxylabs, Uab | Methods and systems to maintain multiple persistent channels between proxy servers |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120060171A1 (en) * | 2010-09-02 | 2012-03-08 | International Business Machines Corporation | Scheduling a Parallel Job in a System of Virtual Containers |
WO2013003031A2 (en) * | 2011-06-27 | 2013-01-03 | Microsoft Corporation | Resource management for cloud computing platforms |
US20130346969A1 (en) * | 2012-06-21 | 2013-12-26 | Vmware, Inc. | Opportunistically Proactive Resource Management Using Spare Capacity |
US20150234670A1 (en) * | 2014-02-19 | 2015-08-20 | Fujitsu Limited | Management apparatus and workload distribution management method |
WO2016090292A1 (en) * | 2014-12-05 | 2016-06-09 | Amazon Technologies, Inc. | Automatic management of resource sizing |
US9405593B2 (en) * | 2012-09-06 | 2016-08-02 | Red Hat, Inc. | Scaling of application resources in a multi-tenant platform-as-a-service environment in a cloud computing system |
-
2018
- 2018-04-27 EP EP18169702.0A patent/EP3561671A1/en not_active Withdrawn
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120060171A1 (en) * | 2010-09-02 | 2012-03-08 | International Business Machines Corporation | Scheduling a Parallel Job in a System of Virtual Containers |
WO2013003031A2 (en) * | 2011-06-27 | 2013-01-03 | Microsoft Corporation | Resource management for cloud computing platforms |
US20130346969A1 (en) * | 2012-06-21 | 2013-12-26 | Vmware, Inc. | Opportunistically Proactive Resource Management Using Spare Capacity |
US9405593B2 (en) * | 2012-09-06 | 2016-08-02 | Red Hat, Inc. | Scaling of application resources in a multi-tenant platform-as-a-service environment in a cloud computing system |
US20150234670A1 (en) * | 2014-02-19 | 2015-08-20 | Fujitsu Limited | Management apparatus and workload distribution management method |
WO2016090292A1 (en) * | 2014-12-05 | 2016-06-09 | Amazon Technologies, Inc. | Automatic management of resource sizing |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11044173B1 (en) | 2020-01-13 | 2021-06-22 | Cisco Technology, Inc. | Management of serverless function deployments in computing networks |
WO2021146055A1 (en) * | 2020-01-13 | 2021-07-22 | Cisco Technology, Inc. | Management of serverless function deployments in computing networks |
US11301301B2 (en) | 2020-07-22 | 2022-04-12 | International Business Machines Corporation | Workload offloading between computing environments |
US20220092480A1 (en) * | 2020-09-24 | 2022-03-24 | Adobe Inc. | Dynamically adjusting a serverless execution container pool for training and utilizing online machine-learning models |
US12045701B2 (en) * | 2020-09-24 | 2024-07-23 | Adobe Inc. | Dynamically adjusting a serverless execution container pool for training and utilizing online machine-learning models |
WO2022208175A1 (en) * | 2021-03-30 | 2022-10-06 | Teso LT, UAB | Proxy selection by monitoring quality and available capacity |
US11463537B1 (en) | 2021-03-30 | 2022-10-04 | Teso LT, UAB | Proxy selection by monitoring quality and available capacity |
US11606438B2 (en) | 2021-03-30 | 2023-03-14 | Oxylabs, Uab | Proxy selection by monitoring quality and available capacity |
US11817946B2 (en) | 2021-03-30 | 2023-11-14 | Oxylabs, Uab | Proxy selection by monitoring quality and available capacity |
US11388253B1 (en) | 2021-03-30 | 2022-07-12 | Teso LT, UAB | Proxy selection by monitoring quality and available capacity |
US11652890B1 (en) | 2022-07-13 | 2023-05-16 | Oxylabs, Uab | Methods and systems to maintain multiple persistent channels between proxy servers |
US11936742B2 (en) | 2022-07-13 | 2024-03-19 | Oxylabs, Uab | Methods and systems to maintain multiple persistent channels between proxy servers |
US12294628B2 (en) | 2022-07-13 | 2025-05-06 | Oxylabs, Uab | Methods and systems to maintain multiple persistent channels between proxy servers |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3561671A1 (en) | Allocating workload | |
US9846595B2 (en) | Managed services coordinator | |
US20220318060A1 (en) | Full-dimensional scheduling and scaling for microservice applications | |
US9755990B2 (en) | Automated reconfiguration of shared network resources | |
US10402227B1 (en) | Task-level optimization with compute environments | |
US7734676B2 (en) | Method for controlling the number of servers in a hierarchical resource environment | |
US20200151025A1 (en) | Job scheduling based on job execution history | |
US20200174844A1 (en) | System and method for resource partitioning in distributed computing | |
US10554575B2 (en) | Equitable sharing of system resources in workflow execution | |
US10360074B2 (en) | Allocating a global resource in a distributed grid environment | |
US10886743B2 (en) | Providing energy elasticity services via distributed virtual batteries | |
US10841369B2 (en) | Determining allocatable host system resources to remove from a cluster and return to a host service provider | |
US20140201371A1 (en) | Balancing the allocation of virtual machines in cloud systems | |
US20230034835A1 (en) | Parallel Processing in Cloud | |
CN105898383A (en) | Bandwidth allocation method and system | |
WO2022262476A1 (en) | Dynamic renewable runtime resource management | |
US20230037293A1 (en) | Systems and methods of hybrid centralized distributive scheduling on shared physical hosts | |
GB2480764A (en) | Load balancing traffic manager for multiple server cluster with multiple parallel queues running substantially independently | |
US20210232438A1 (en) | Serverless lifecycle management dispatcher | |
US20230289214A1 (en) | Intelligent task messaging queue management | |
WO2023156884A1 (en) | Selecting best cloud computing environment in a hybrid cloud scenario | |
US20230125765A1 (en) | Container pool management | |
CN112445569B (en) | Deployment method, device, electronic equipment and storage medium | |
Alwaeli et al. | Task Scheduling Algorithms in Cloud Computing | |
Girase et al. | Dynamic resource provisioning in Cloud Computing environment using priority based virtual machine's |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN PUBLISHED |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20200430 |
|
RBV | Designated contracting states (corrected) |
Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20210430 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN |
|
18W | Application withdrawn |
Effective date: 20210824 |