US20060168274A1 - Method and system for high availability when utilizing a multi-stream tunneled marker-based protocol data unit aligned protocol - Google Patents
Method and system for high availability when utilizing a multi-stream tunneled marker-based protocol data unit aligned protocol Download PDFInfo
- Publication number
- US20060168274A1 US20060168274A1 US11/269,062 US26906205A US2006168274A1 US 20060168274 A1 US20060168274 A1 US 20060168274A1 US 26906205 A US26906205 A US 26906205A US 2006168274 A1 US2006168274 A1 US 2006168274A1
- Authority
- US
- United States
- Prior art keywords
- rdma
- local
- different network
- rnic
- network interfaces
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims description 48
- 239000003550 marker Substances 0.000 title description 5
- 238000004891 communication Methods 0.000 claims description 43
- 238000004590 computer program Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 description 36
- 230000008569 process Effects 0.000 description 19
- 238000010586 diagram Methods 0.000 description 12
- 238000012546 transfer Methods 0.000 description 12
- 230000032258 transport Effects 0.000 description 10
- 230000004044 response Effects 0.000 description 8
- 238000013459 approach Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 4
- 235000014510 cooky Nutrition 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000011084 recovery Methods 0.000 description 4
- 230000005641 tunneling Effects 0.000 description 4
- 238000012790 confirmation Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000009432 framing Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000000523 sample Substances 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000001010 compromised effect Effects 0.000 description 1
- 238000005538 encapsulation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/14—Multichannel or multilink protocols
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/16—Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/16—Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
- H04L69/169—Special adaptations of TCP, UDP or IP for interworking of IP based networks with other networks
Definitions
- Certain embodiments of the invention relate to data communications. More specifically, certain embodiments of the invention relate to a method and system for high availability when utilizing a multi-stream tunneled marker-based protocol data unit (PDU) aligned (MST-MPA) protocol.
- PDU protocol data unit
- a single computer system is often utilized to perform operations on data.
- the operations may be performed by a single processor, or central processing unit (CPU) within the computer.
- the operations performed on the data may include numerical calculations, or database access, for example.
- the CPU may perform the operations under the control of a stored program containing executable code.
- the code may include a series of instructions that may be executed by the CPU that cause the computer to perform specified operations on the data.
- the capability of a computer in performing operations may variously be measured in units of millions of instructions per second (MIPS), or millions of operations per second (MOPS).
- Moore's law postulates that the speed of integrated circuit devices may increase at a predictable, and approximately constant, rate over time.
- technology limitations may begin to limit the ability to maintain predictable speed improvements in integrated circuit devices.
- Parallel processing may be utilized.
- computer systems may utilize a plurality of CPUs within a computer system that may work together to perform operations on data.
- Parallel processing computers may offer computing performance that may increase as the number of parallel processing CPUs in increased.
- the size and expense of parallel processing computer systems result in special purpose computer systems. This may limit the range of applications in which the systems may be feasibly or economically utilized.
- cluster computing An alternative to large parallel processing computer systems is cluster computing.
- cluster computing a plurality of smaller computer, connected via a network, may work together to perform operations on data.
- Cluster computing systems may be implemented, for example, utilizing relatively low cost, general purpose, personal computers or servers.
- computers in the cluster may exchange information across a network similar to the way that parallel processing CPUs exchange information across an internal bus.
- Cluster computing systems may also scale to include networked supercomputers.
- the collaborative arrangement of computers working cooperatively to perform operations on data may be referred to as high performance computing (HPC).
- HPC high performance computing
- RDMA Remote direct memory access
- LAN local area network
- RDMA when utilized in wide area network (WAN) and Internet environments, is referred to as RDMA over TCP, RDMA over IP, or RDMA over TCP/IP.
- One of the problems attendant with some distributed cluster computing systems is that the frequent communications between distributed processors may impose a processing burden on the processors.
- the increase in processor utilization associated with the increasing processing burden may reduce the efficiency of the computing cluster for solving computing problems.
- the performance of cluster computing systems may be further compromised by bandwidth bottlenecks that may occur when sending and/or receiving data from processors distributed across the network.
- TCP connection Once a TCP connection is established, it may be bound to a source network address and a destination network address. If either address becomes inaccessible, the corresponding TCP connection may fail. A network address may become inaccessible due to a failure at a single point in the path of the TCP connection between the source and destination.
- a system and/or method is provided for high availability when utilizing a multi-stream tunneled marker-based protocol data unit (PDU) aligned (MST-MPA) protocol, substantially as shown in and/or described in connection with at least one of the figures, as set forth more completely in the claims.
- PDU protocol data unit
- FIG. 1 a illustrates an exemplary distributed database processing environment, in connection with an embodiment of the invention.
- FIG. 1 b illustrates an exemplary system for multihoming, in connection with an embodiment of the invention.
- FIG. 2 is an illustration of an exemplary conventional write operation from a local node to a remote node, in connection with an embodiment of the invention.
- FIG. 3 is an illustration of an exemplary conventional write operation from a local node to a remote node, in connection with an embodiment of the invention.
- FIG. 4 is an illustration of an exemplary conventional RDMA over TCP protocol stack, in connection with an embodiment of the invention.
- FIG. 5 is an illustration of an exemplary RDMA over TCP protocol stack utilizing SCTP, in connection with an embodiment of the invention.
- FIG. 6 is a block diagram of an exemplary system for an MST-MPA protocol, in accordance with an embodiment of the invention.
- FIG. 7 is a block diagram of an exemplary system for high availability when utilizing an MST-MPA with a single RNIC, in accordance with an embodiment of the invention.
- FIG. 8 is a block diagram of fault recovery in an exemplary system for high availability when utilizing an MST-MPA with a single RNIC, in accordance with an embodiment of the invention.
- FIG. 9 is a block diagram illustrating data striping in an exemplary system for high availability when utilizing an MST-MPA with a single RNIC, in accordance with an embodiment of the invention.
- FIG. 10 is a block diagram of an exemplary system for high availability when utilizing an MST-MPA with a duplex RNIC configuration, in accordance with an embodiment of the invention.
- FIG. 11 is a block diagram of an exemplary system for high availability when utilizing an MST-MPA with a duplex RNIC configuration, in accordance with an embodiment of the invention.
- FIG. 12 is a flowchart illustrating an exemplary process for high availability when utilizing a MST-MPA protocol, in accordance with an embodiment of the invention.
- Certain embodiments of the invention may be found in a method and system for high availability when utilizing a multi-stream tunneled marker-based PDU aligned (MST-MPA) protocol.
- the invention may comprise a method and a system that may enable reliable communications between cooperating processors in a cluster computing environment while reducing the amount of processing burden in comparison to some conventional approaches to inter-processor communication among processors in the cluster.
- Various embodiments of the invention may provide high availability that enables fault tolerant reliable communications.
- Various aspects of the invention may provide an exemplary system for transporting information and may comprise a processor that enables establishment of TCP connections or communication channels between a local remote direct memory access (RDMA) enabled network interface card (RNIC) and at least one remote RNIC via at least one network.
- the processor may enable establishment of at least one RDMA connection between one of a plurality of local RDMA endpoints and at least one remote RDMA endpoint utilizing one or more of the communication channels.
- the processor may further enable communication of messages via the established RDMA connections between one of the plurality of local RDMA endpoints and at least one remote RDMA endpoint independent of whether the messages are in-sequence or out-of-sequence.
- an RDMA connection may be transported, between a local RDMA endpoint and a remote RDMA endpoint, across a network via a TCP tunnel.
- the TCP tunnel may comprise a plurality of TCP connections that may be logically associated with a single TCP tunnel.
- the TCP tunnel may also be associated with a plurality of different network interfaces and/or network routes. At least a portion of the plurality of different network interfaces may be associated with at least one RNIC. At least a portion of the plurality of TCP connections may be associated with each of the plurality of different network interfaces.
- At least a current portion of a plurality of messages communicated via an RDMA connection may be transported by a current TCP connection associated with a current network interface located at a current RNIC.
- a subsequent portion of the plurality of messages may be communicated via a subsequent TCP connection associated with a different network interface.
- the subsequent TCP connection may be associated with the same TCP tunnel as the current TCP connection.
- the different network interface may be located at the current RNIC or at a subsequent RNIC.
- TCP may provide mechanisms by which each of a plurality of messages may be delivered to a destination node once, and in the order in which a source node transmitted the messages, when utilizing a single interface.
- Various embodiments of the invention may provide mechanisms by which each of the plurality of messages may be delivered to the destination node once, and in the order in which the source node sent the messages, when utilizing a plurality of interfaces.
- FIG. 1 a illustrates an exemplary distributed database processing environment, in connection with an embodiment of the invention.
- a network 102 there is shown a network 102 , a plurality of computer systems 104 a , 106 a , 108 a , 110 a , and 112 a , and a corresponding plurality of database applications 104 b , 106 b , 108 b , 110 b , and 112 b .
- the computer systems 104 a , 106 a , 108 a , 110 a , and 112 a may be coupled to the network 102 .
- One or more of the computer systems 104 a , 106 a , 108 a , 110 a , and 112 a may execute a corresponding database application 104 b , 106 b , 108 b , 110 b , and 112 b , respectively, for example.
- a plurality of software processes for example a database application, may be executing concurrently at a computer system.
- a database application may communicate with one or more peer database applications, for example 106 b , 108 b , 110 b , or 112 b , via a network, for example, 102 .
- the operation of the database application 104 b may be considered to be coupled to the operation of one or more of the peer databases 106 b , 108 b , 110 b , or 112 b .
- a plurality of applications, for example database applications, which execute cooperatively, may form a cluster environment.
- a cluster environment may also be referred to as a cluster.
- the applications that execute cooperatively in the cluster environment may be referred to as cluster applications.
- a cluster application may communicate with a peer cluster application via a network by establishing a network connection between the cluster application and the peer application, exchanging information via the network connection, and subsequently terminating the connection at the end of the information exchange.
- An exemplary communications protocol that may be utilized to establish a network connection is the Transmission Control Protocol (TCP).
- TCP Transmission Control Protocol
- RFC 793 discloses communication via TCP and is hereby incorporated herein by reference.
- An exemplary protocol that may be utilized to route information transported in a network connection across a network is the Internet Protocol (IP).
- IP Internet Protocol
- RFC 791 discloses communication via IP and is hereby incorporated herein by reference.
- An exemplary medium for transporting and routing information across a network is Ethernet, which is defined by Institute of Electrical and Electronics Engineers (IEEE) resolution 802.3 is hereby incorporated herein by reference.
- database application 104 b may establish a TCP connection to database application 110 b .
- the database application 104 b may initiate establishment of the TCP connection by sending a connection establishment request to the peer database application 110 b .
- the connection establishment request may be routed from the computer system 104 a , across the network 102 , to the computer system 110 a , via IP.
- the peer database application 110 b may respond to the received connection establishment request by sending a connection establishment confirmation to the database application 104 b .
- the connection establishment confirmation may be routed from the computer system 110 a , across the network 102 , to the computer system 104 a , via IP.
- the database application 104 b may issue a query to the database application 110 b via the established TCP connection.
- the database application 110 b may access data stored at computer system 110 a .
- the database application 110 b may subsequently send the accessed information to the database application 104 b via the established TCP connection.
- the database application 104 b may send an acknowledgement of receipt of the accessed data to the database application 110 b via the established TCP connection.
- the database application 104 b may terminate the established TCP connection by sending a connection terminate indication to the database application
- NC P 2 ⁇ N ⁇ ( N - 1 ) 2 equation ⁇ [ 1 ]
- An exemplary cluster environment may comprise 8 computing systems, for example 104 a , wherein 8 cluster applications, for example 104 b , are executing at each of the 8 computer systems.
- 1,712 connections may be established across a network, for example 102 , at a given time instant.
- connections established in some conventional cluster environments may be transient in nature. This may be true, for example, in transaction oriented cluster environments in which a cluster application may establish a connection when it needs to communicate with a peer cluster application across a network. At the completion of the communication, or transaction, the connection may be terminated. At a subsequent time instant, when the cluster application and peer cluster application needs to communicate, the process of connection establishment, transaction, and connection termination may be repeated.
- the processing overhead required for maintaining large numbers of connections and/or frequent connection establishment and connection terminations may significantly decrease the processing efficiency of the cluster.
- FIG. 1 b illustrates an exemplary system for multihoming, in connection with an embodiment of the invention.
- a local node 122 may comprise interfaces 132 a and 132 b .
- the remote node may comprise routers 134 a and 134 b.
- the local subnet 142 may communicatively couple the local interface 132 a and router 152 .
- the local subnet 142 may also communicatively couple the local interface 132 a and router 154 .
- the local subnet 142 may communicatively couple the local interface 132 b and router 152 .
- the local subnet 142 may also communicatively couple the local interface 132 b and router 154 .
- the local subnet 144 may communicatively couple the local interface 134 a and router 152 .
- the local subnet 144 may also communicatively couple the local interface 134 a and router 154 .
- the local subnet 144 may communicatively couple the local interface 134 b and router 152 .
- the local subnet 144 may also communicatively couple the local interface 134 b and router 154 .
- Each of the interfaces and routers may be associated with at least one network address.
- the interface 132 a may be associated with network addresses 192.168.1.17 and 192.168.1.19.
- the interface 132 b may be associated with network addresses 192.168.3.17 and 192.168.3.19.
- the interface 134 a may be associated with network addresses 192.168.2.18 and 192.168.2.20.
- the interface 134 b may be associated with network addresses 192.168.4.18 and 192.168.4.20.
- the router 152 may be associated with network address 192.168.1.1 at local subnet 142 .
- the router 152 may be associated with network address 192.168.2.1 at local subnet 144 .
- the router 154 may be associated with network address 192.168.3.1 at local subnet 142 .
- the router 154 may be associated with network address 192.168.4.1 at local subnet 144 .
- the local subnets 142 and 144 , and routers 152 and 154 may be utilized to establish at least one route between the interface 132 a and interface 134 a .
- the local subnets 142 and 144 , and routers 152 and 154 may be utilized to establish at least one route between the interface 132 a and interface 134 b .
- the local subnets 142 and 144 , and routers 152 and 154 may be utilized to establish at least one route between the interface 132 b and interface 134 a .
- the local subnets 142 and 144 , and routers 152 and 154 may be utilized to establish at least one route between the interface 132 b and interface 134 b .
- the routes may be utilized to send an IP frame from a source address 192.168.1.17 located in the local node 122 to a destination address 192.168.2.18 in the remote node 124 .
- Multihoming may comprise utilizing a plurality of different routes to send information between the local node 122 and the remote node 124 .
- Information may be sent between the local node 122 and remote node 124 via IP frames, for example.
- the IP frame may comprise a source address indicating the sender, and a destination address indicating the recipient. The source and destination addresses may be utilized when routing the IP frame between the local node 122 and remote node 124 .
- a first exemplary route may comprise sending an IP frame from network address 192.168.1.17, via the local subnet 142 , to the router 152 at network address 192.168.1.1, and from the router 152 at network address 192.168.2.1, via the remote subnet 144 , to the destination address 192.168.2.18.
- a second exemplary route may comprise sending an IP frame from network address 192.168.3.17, via the local subnet 142 , to the router 154 at network address 192.168.3.1, and from the router 154 at network address 192.168.4.1, via the remote subnet 144 , to the destination address 192.168.4.18.
- a third exemplary route may comprise sending an IP frame from network address 192.168.1.19, via the local subnet 142 , to the router 152 at network address 192.168.1.1, and from the router 152 at network address 192.168.2.1, via the remote subnet 144 , to the destination address 192.168.2.20.
- a fourth exemplary route may comprise sending an IP frame from network address 192.168.3.19, via the local subnet 142 , to the router 154 at network address 192.168.3.1, and from the router 154 at network address 192.168.4.1, via the remote subnet 144 , to the destination address 192.168.4.20.
- FIG. 2 is an illustration of an exemplary conventional write operation from a local node to a remote node, in connection with an embodiment of the invention.
- the local node 202 may comprise a system memory 220 , a network interface card (NIC) 212 , and a processor 214 .
- NIC network interface card
- a local computer system may be referred to as a local node while a remote computer system may be referred to as a remote node.
- the system memory 220 may comprise memory, which may store an application user space 222 and a kernel space 224 .
- the processor 214 may execute an application 210 .
- the NIC 212 may comprise a memory 234 .
- the remote node 206 may comprise a system memory 250 , an NIC 242 , and a processor 244 .
- the system memory 250 may comprise an application user space 252 and/or a kernel space 254 .
- the processor 244 may execute an application 240 .
- the NIC 242 may comprise a memory 264 .
- the system memory 220 may comprise suitable logic, circuitry, and/or code that may be utilized to store, or write, and/or retrieve, or read, information, data, and/or executable code.
- the system memory 220 may comprise a plurality of memory technologies such as random access memory (RAM).
- RAM random access memory
- the system memory 220 may be utilized to store and/or retrieve data that may be processed by the processor 214 .
- the memory 220 may comprise computer program or code, which may be executed by the processor 214 .
- the application user space 222 may comprise a portion of information, and/or data that may be utilized by the application 210 .
- the kernel space 224 may comprise a portion of information, data, and/or code associated with an operating system or other execution environment that provides services that may be utilized by the application 210 .
- the processor 214 may comprise suitable logic, circuitry, and/or code that may be utilized to transmit, receive and/or process data.
- the processor 214 may execute an application 210 , for example a database application.
- the application 210 may comprise at least one code section that may be executed by the processor 214 .
- the network interface chip/card (NIC) 212 may comprise suitable circuitry, logic and/or code that may transmit and/or receive data from a network, for example, an Ethernet network.
- the NIC 212 may be coupled to the network 204 .
- the NIC 212 may process data received and/or transmitted via the network 204 .
- the system memory 250 may comprise suitable logic, circuitry, and/or code that may be utilized to store, or write, and/or retrieve, or read, information, data, and/or executable code.
- the system memory 250 may comprise different types of exemplary random access memory (RAM) such as DRAM and/or SRAM.
- RAM random access memory
- the system memory 250 may be utilized to store and/or retrieve data that may be processed by the processor 244 .
- the memory 250 may store a computer program or code that may be executed by the processor 244 .
- the application user space 252 may comprise a portion of information, and/or data that may be utilized by the application 240 .
- the kernel space 254 may comprise a portion of information, data, and/or code associated with an operating system or other execution environment that provides services that may be utilized by the application 240 .
- the processor 244 may comprise suitable logic, circuitry, and/or code that may be utilized to transmit, receive and/or process data.
- the processor 244 may execute an application 240 or code, such as, for example a database application.
- the application 240 may comprise at least one code section that may be executed by the processor 244 .
- the NIC 242 may comprise suitable circuitry, logic and/or code that may enable transmission and/or reception of data from a network, for example, an Ethernet network.
- the NIC 242 may be coupled to the network 204 .
- the NIC 242 may process data received and/or transmitted via the network 204 .
- the local node 202 may transfer data to the remote node 206 via the network 204 .
- the data may comprise information that may be transferred from the application user space 222 in the local node 202 to the application user space 252 in the remote node 206 .
- the application 210 may cause the processor 214 to issue instructions to the system memory 220 as illustrated in segment 1 of FIG. 2 .
- the instruction illustrated in segment 1 may cause information stored in the application user space 222 to be transferred to the kernel space 224 as illustrated in segment 2 .
- the information may be subsequently transferred from the kernel space 224 to the NIC memory 234 as illustrated in segment 3 .
- the NIC 212 may cause the information to be transferred from the memory 234 in the local node 202 , via the network 204 , to the memory 264 within the NIC 242 in the remote node 206 as illustrated in segment 4 .
- the information may be transferred from the system memory 264 to the kernel space 254 within the system memory 250 in the remote node 206 as illustrated in segment 5 .
- the information in the kernel space 254 may be transferred to the application user space 252 as illustrated in segment 6 .
- the remote direct memory access (RDMA) protocol may provide a more efficient method by which a database application, for example, executing at a local computer system may exchange information with a remote computer system across the network 102 .
- RDMA remote direct memory access
- an RDMA based transfer of information may be accomplished without requiring the intervening step of transferring the information from application user space to kernel space as illustrated in FIG. 2 .
- the RDMA protocol may include two basic operations, an RDMA write operation, and an RDMA read operation.
- a third operation is a send/receive operation.
- the RDMA write operation may be utilized to transfer data from a local computer system to the remote computer system.
- the RDMA read operation may be utilized to retrieve data from a remote computer system that may subsequently be stored at the local computer system.
- the database application 104 b executing at a local computer system 104 a may attempt to retrieve information stored at a remote computer system 110 a .
- the database application 104 b may issue the RDMA read instruction that may be sent across the network 102 , and received by the remote computer system 110 a .
- the requested information may subsequently be retrieved from the remote computer system 110 a , transported across the network 102 , and stored at the local computer system 104 a.
- the database application 104 b executing at the local computer system 104 a may attempt to transfer information to the remote computer system 110 a by issuing an RDMA write instruction that may be sent from the local computer system 104 a , across the network 102 , and received by the remote computer system 110 a .
- the database application 104 b may subsequently cause the local computer system 104 a to send information across the network 102 that is stored at the remote computer system 110 a.
- FIG. 3 is an illustration of an exemplary conventional write operation from a local node to a remote node, in connection with an embodiment of the invention.
- the local node 302 may comprise a system memory 220 , an RDMA-enabled network interface card (RNIC) 312 , and a processor 214 .
- the system memory 220 may comprise an application user space 222 and/or a kernel space 224 .
- the processor 214 may execute an application 210 .
- the RNIC 312 may comprise an RDMA engine 314 , and a memory 234 .
- the remote node 306 may comprise a system memory 250 , an RNIC 342 , and a processor 244 .
- the RNIC 342 may comprise an RDMA engine 344 and a memory 264 .
- the RNIC 312 may comprise suitable circuitry, logic and/or code that may enable transmission and reception of data from a network, for example, an Ethernet network.
- the RNIC 312 may be coupled to the network 204 .
- the RNIC 312 may process data received and/or transmitted via the network 204 .
- the RDMA engine 314 may comprise suitable logic, circuitry, and/or code that may be utilized to send instructions to system memory 220 and/or memory 234 that may result in the transfer of information from the local node 302 to the remote node 306 via the network 204 .
- the RDMA engine 314 may be programmed with a local memory address, a local node address, a remote memory address, a remote node address, and a length.
- the RDMA engine 314 may then cause a block of information of a size, length, starting at location, local memory address, within the system memory 220 of the local node 302 , local node address, to be transferred via the network 204 to a location starting at location, remote memory address, within the system memory 250 of the remote node 306 , remote node address.
- the RNIC 342 may comprise suitable circuitry, logic and/or code that may transmit and receive data from a network, for example, an Ethernet network.
- the RNIC 342 may be coupled to the network 204 .
- the RNIC 342 may process data received and/or transmitted via the network 204 .
- the RDMA engine 344 may comprise suitable logic, circuitry, and/or code that may be utilized to send instructions to system memory 250 and/or memory 264 that may result in the transfer of information from the remote node 306 to the local node 302 via the network 204 as described for the RDMA engine 314 .
- the local node 302 may transfer data to the remote node 306 via the network 204 .
- the data may comprise information that may be transferred from the application user space 222 in the local node 202 to the application user space 252 in the remote node 206 .
- the application 210 may cause the processor 214 to issue instructions to the RDMA engine 314 as illustrated in segment 1 of FIG. 2 .
- the instructions may comprise a local memory address, local node address, remote memory address, remote node address, and length.
- the instruction illustrated in segment 1 may cause the RDMA engine 314 to issue instructions to the system memory 220 as illustrated in segment 2 .
- the instructions as illustrated in segment 2 may cause information stored in the application user space 222 to be transferred to the RNIC memory 234 as illustrated in segment 3 .
- the RNIC 312 may cause the information to be transferred from the memory 234 in the local node 302 , via the network 204 , to the memory 264 within the RNIC 342 in the remote node 306 as illustrated in segment 4 .
- the information may be transferred from the system memory 264 to the application user space 252 as illustrated in segment 5 .
- FIG. 4 is an illustration of an exemplary conventional RDMA over TCP protocol stack, in connection with an embodiment of the invention.
- a conventional RDMA over TCP protocol stack 402 may comprise an upper layer protocol 404 , an RDMA protocol 406 , a direct data placement protocol (DDP) 408 , a marker-based PDU aligned protocol (MPA) 410 , a TCP 412 , an IP 414 , and an Ethernet protocol 416 .
- An RNIC may comprise functionality associated with the RDMA protocol 406 , DDP 408 , MPA protocol 410 , TCP 412 , IP 414 , and Ethernet protocol 416 .
- the RDMA protocol specifies various methods that may enable a local computer system to exchange information with a remote computer system via a network 204 .
- the methods may comprise an RDMA read operation and/or an RDMA write operation.
- the RDMA protocol may also comprise the establishment of an RDMA connection between the local computer system and the remote computer system prior to the exchange of information.
- An RDMA connection may be established by, for example, a local computer system that sends an RDMA connection request message to the remote computer system and, in response, the remote computer system that sends an RDMA response message to the local computer system.
- the local computer system and remote computer system may subsequently utilize the established RDMA connection to exchange information via the network 204 .
- the exchange of information may comprise a local computer system that sends one or more sequence numbered frames to the remote computer system.
- the exchange of information may also comprise a remote computer system that sends one or more sequence numbered frames to the local computer system.
- the sequence numbers may indicate a relative ordering among frames. For example, the sequence number in a current frame may indicate, to the receiver of the frame, a relationship between the current frame and a preceding frame and/or subsequent frame.
- the DDP 408 may enable copy of information from an application user space in a local computer system to an application user space in a remote computer system without performing an intermediate copy of the information to kernel space. This may be referred to as a “zero copy” model.
- the DDP 408 may embed information in each transmitted sequence numbered frame that enables information contained in the frame to be copied to the application user space in the remote computer system. This copy may be done regardless of whether a current sequence numbered frame is received in-sequence, or out-of-sequence, relative to a preceding sequence numbered frame, or subsequent sequence numbered frame, that is sent via the established RDMA connection.
- the MPA protocol 410 may comprise methods that enable frames transmitted in an RDMA connection to be transported, via the network 204 , via a TCP connection.
- the MPA protocol 410 may enable a single TCP connection to carry frames associated with a corresponding single RDMA connection.
- the MPA protocol 410 may receive a sequence numbered frame associated with an RDMA connection.
- the MPA protocol 410 may derive information from the received RDMA frame to identify the corresponding RDMA connection.
- the MPA protocol 410 may determine the corresponding TCP connection associated with the RDMA connection.
- the MPA protocol 410 may utilize the sequence numbered frame from the RDMA connection, or RDMA sequence numbered frame, to form a TCP packet.
- the formation of a TCP packet from the RDMA sequence numbered frame may be referred to as encapsulation, for example.
- the TCP packet may be transmitted, via the network 204 , utilizing the corresponding TCP connection.
- the MPA protocol 410 may receive a TCP packet associated with a TCP connection from the network 204 .
- the MPA protocol 410 may derive information from the received TCP packet to determine the corresponding RDMA connection associated with the TCP connection.
- the MPA protocol 410 may extract an RDMA sequence numbered frame from the TCP packet.
- the extraction of an RDMA sequence numbered frame from the TCP packet may be referred to as decapsulation, for example.
- At least a portion of the information contained within the received RDMA sequence numbered frame, referred to as a payload, may be copied to the application user space.
- the TCP 412 , and IP 414 may comprise methods that enable information to be exchanged via a network according to applicable standards as defined by the Internet Engineering Task Force (IETF).
- the Ethernet 416 may comprise methods that enable information to be exchanged via a network according to applicable standards as defined by the IEEE.
- the local node 302 may transfer data to the remote node 306 via the network 204 .
- An upper layer protocol 404 may comprise an application 210 that issues an RDMA write request to write information from the application user space 222 to the application user space 254 .
- the RDMA write request may cause the RDMA protocol 406 to establish an RDMA connection between the local node 302 , and the remote node 306 .
- the RDMA protocol 406 may send a connection request message to the remote computer system 306 .
- the MPA protocol 410 may request that the TCP 412 establish a TCP connection between the local node 302 and the remote node 306 .
- the MPA protocol 410 may encapsulate at least a portion of the RDMA connection request message in a TCP packet that may be sent to the remote node 306 via the established TCP connection.
- the MPA protocol 410 may subsequently receive a TCP packet containing the corresponding RDMA response message.
- the MPA protocol 410 may decapsulate the TCP packet and send at least a portion of the RDMA response message to the RDMA protocol 406 .
- a TCP connection may be established between the local node 302 and the remote node 306 .
- the TCP connection may be utilized by a corresponding RDMA connection to exchange information via the network 204 .
- An upper layer protocol 404 may be utilized to transfer information from the local node 302 in an RDMA sequence numbered frame to the remote node 306 via established the RDMA connection.
- the RDMA connection may be terminated.
- the TCP connection utilized in connection with the RDMA connection may also be terminated.
- the number of RDMA connections may be equal to the number of TCP connections. Consequently, in a cluster environment, the total number of TCP and RDMA connection may be equal to twice the number of connections as indicated in equation[1].
- the total number of connections may be reduced if a single TCP connection is utilized to transport information corresponding to a plurality of RDMA connections between the local node 302 and the remote node 306 .
- the TCP connection may be utilized as a tunnel.
- One approach to TCP tunneling may utilize the stream control transport protocol (SCTP).
- SCTP stream control transport protocol
- FIG. 5 is an illustration of an exemplary RDMA over TCP protocol stack utilizing SCTP, in connection with an embodiment of the invention.
- a conventional RDMA over TCP protocol stack 502 may comprise an upper layer protocol 404 , an RDMA protocol 406 , a direct data placement protocol 408 , an SCTP 510 , an IP 414 , and an Ethernet protocol 416 .
- An RNIC may comprise functionality associated with the RDMA protocol 406 , DDP 408 , SCTP 510 , IP 414 , and Ethernet protocol 416 .
- aspects of the SCTP 510 may comprise functionality equivalent to the MPA protocol 410 and TCP 412 .
- the SCTP 510 may allow a TCP connection to correspond to a plurality of RDMA connections.
- the SCTP 510 may comprise methods that enable frames transmitted in an RDMA connection to be transported, via the network, through an SCTP association.
- An SCTP association may comprise functionality comparable to a TCP connection.
- an SCTP association may also be referred to as an SCTP connection.
- An SCTP connection may incorporate additional functionality beyond a TCP connection that may enable the SCTP connection to be utilized as a tunnel.
- the SCTP 510 may enable a single SCTP connection to carry frames associated with a corresponding plurality of RDMA connections.
- SCTP 510 may be utilized in the exemplary protocol stack 502 to reduce the total number of connections in a cluster environment in comparison to the exemplary protocol stack 402 .
- an RNIC may be required to store executable code that may comprise overlapping functionality.
- a TCP 412 stack may typically be stored in an RNIC.
- the RNIC may be required to store executable code for SCTP 510 , including code that comprises functionality that substantially overlaps that of TCP 412 .
- some intermediate nodes within the network 204 may be unable to process packets in an SCTP connection. For example, firewalls and/or port network address translation (PNAT) nodes may be unable to process packets transported in an SCTP connection.
- PNAT port network address translation
- Various embodiments of the invention may provide a method and a system for tunneling a plurality of RDMA connections within a TCP connection. In one aspect, this may enable greater reuse of existing protocol stacks stored in the RNIC while achieving the benefits of tunneling.
- Various embodiments of the invention may be utilized with existing network infrastructures that comprise firewall nodes, PNAT nodes, and/or devices that implement various security methods within the network 204 .
- FIG. 6 is a block diagram of an exemplary system for an MST-MPA protocol, in accordance with an embodiment of the invention.
- the local computer system 602 may comprise an RDMA-enabled network interface card (RNIC) 612 , a plurality of processors 614 a , 616 a and 618 a , a plurality of local applications 614 b , 616 b , and 618 b , a system memory 620 , and a bus 622 .
- RNIC RDMA-enabled network interface card
- the RNIC 612 may comprise a TCP offload engine (TOE) 641 , a memory 634 , a plurality of network interfaces 632 and 633 , and a bus 636 .
- the TOE 641 may comprise a processor 643 , a local connection point 645 , and a local RDMA access point 647 .
- the remote computer system 606 may comprise a RNIC 642 , a plurality of processors 644 a , 646 a , and 648 a , a plurality of remote applications 644 b , 646 b , and 648 b , a system memory 650 , and a bus 652 .
- the RNIC 642 may comprise a TOE 672 , a memory 664 , a network interface 662 , and a bus 666 .
- the TOE 672 may comprise a processor 674 , a remote connection point 676 , and a remote RDMA access point.
- the processor 614 a may comprise suitable logic, circuitry, and/or code that may be utilized to transmit, receive and/or process data.
- the processor 614 a may execute application code, for example a database application.
- the processor 614 a may be coupled to a bus 622 .
- the processor 614 a may perform protocol processing when transmitting and/or receiving data via the bus 622 .
- the protocol processing performed by the processor 614 a may comprise receiving data and/or instructions from an application 614 b , for example.
- the data may comprise one or more upper layer protocol (ULP) protocol data units (PDU).
- the instructions may comprise instructions that cause the processor 614 a to perform tasks related to the RDMA protocol.
- the instructions may result from function calls from an RDMA application programming interface (API).
- An instruction may cause the processor 614 a to perform steps to initiate one or more RDMA connections.
- the protocol processing performed by the processor 614 a may comprise receiving ULP PDUs via the bus 622 that were received via the NIC 612 .
- the processor 614 a may perform protocol processing on at least a portion of the ULP PDU received from the NIC 612 , via the bus 622 . At least a portion of the ULP PDU may be subsequently utilized by an application 614 b , for example.
- the local application 614 b may comprise a computer program that comprises at least one code section that may be executable by the processor 614 a for causing the processor 614 a to perform steps comprising protocol processing, in accordance with an embodiment of the invention.
- the processor 616 a may be substantially as described for the processor 614 a .
- the local application 616 b may be substantially as described for the local application 614 b .
- the processor 618 a may be substantially as described for the processor 614 a .
- the local application 618 b may be substantially as described for the local application 614 b.
- the system memory 620 may comprise suitable logic, circuitry, and/or code that may be utilized to store, or write, and/or retrieve, or read, information, data, and/or executable code.
- the system memory 620 may comprise a plurality of as random access memory (RAM) technologies such as, for example, DRAM.
- RAM random access memory
- the system memory 620 may be utilized to store and/or retrieve data and/or PDUs that may be processed by one or more of the processors 614 a , 616 a , or 618 a .
- the memory 620 may comprise code that may be executed by the one or more of the processors 614 a , 616 a , or 618 a.
- the RNIC 612 may comprise suitable circuitry, logic and/or code that may transmit and/or receive data from a network, for example, an Ethernet network.
- the RNIC 612 may be coupled to the network 604 .
- the RNIC 612 may enable the local computer system 602 to utilize RDMA to exchange information with a peer computer system in a cluster environment.
- the RNIC 612 may process data received and/or transmitted via the network 204 .
- the RNIC 612 may be coupled to the bus 622 .
- the RNIC 612 may process data received and/or transmitted via the bus 622 .
- In the transmitting direction the RNIC 612 may receive data via the bus 622 .
- the NIC 612 may process the data received via the bus 622 and transmit the processed data via the network 204 .
- the RNIC 612 may receive data via the network 204 .
- the RNIC 612 may process the data received via the network 204 and transmit the processed data via the bus 622 .
- the TOE 641 may comprise suitable logic, circuitry, and/or code to receive data via the bus 222 from one or more processors 614 a , 614 b , or 614 c , and to perform protocol processing and to construct one or more packets and/or one or more frames. In the transmitting direction the TOE 641 may receive data via the bus 622 .
- the TOE 641 may perform protocol processing that encapsulates at least a portion of the received data in a protocol data unit (PDU) that may be constructed in accordance with a protocol specification, for example, RDMA.
- the RDMA PDU may be referred to as an RDMA frame, or frame.
- the TOE 641 may also perform protocol processing that encapsulates at least a portion of the RDMA frame in a PDU that may be constructed in accordance with a protocol specification, for example, TCP.
- the TCP PDU may be referred to as a TCP packet, or packet.
- the portion of the RDMA frame may in turn be contained in one or more MST-MPA protocol messages.
- the MST-MPA protocol message may contain a frame length, source endpoint identifier, destination endpoint identifier, source sequence number, and/or error check fields.
- At least a portion of the MST-MPA protocol message may then be contained in a TCP packet.
- the TCP protocol processing may comprise constructing one or more PDU header fields comprising source and/or destination network addresses, source and/or destination port identifiers, and/or computation of error check fields.
- the packet may be transmitted via the bus 236 for subsequent transmission via the network 204 .
- the TOE 641 may associate a plurality of RDMA connections with a TCP connection.
- the TCP connection may be utilized as a tunnel that transports encapsulated MST-MPA protocol messages, or portions thereof, in TCP packets across a network 204 via the TCP connection.
- the TOE 641 may receive PDUs via the bus 636 that were previously received via the network 204 .
- the TOE 641 may perform TCP protocol processing that decapsulates at least a portion the PDU received from the network 204 , via the bus 236 in accordance with a protocol specification, to extract one or more MST-MPA protocol messages.
- the TCP protocol processing may comprise verifying one or more PDU header fields comprising source and/or destination network addresses, source and/or destination port identifiers, and/or computations to detect and/or correct bit errors in the received PDU.
- the MST-MPA protocol processing may comprise verifying source and/or destination endpoint identifiers, source sequence numbers, and/or computations to detecte and/or correct bit errors in the received MST-MPA protocol message.
- the RDMA frame may be derived from one or more lower layer protocol PDUs, for example, one or more MST-MPA protocol messages.
- the TOE 641 may perform RDMA protocol processing that decapsulates at least a portion of the RDMA frame to extract data.
- the RDMA protocol processing may comprise verifying one or more frame header fields comprising frame length, source endpoint identifier, destination endpoint identifier, source sequence number and/or error check fields.
- the data may be subsequently processed by the TOE 641 any transmitted via the bus 622 .
- the TOE 641 may cause at least a portion of a PDU that was received via the bus 636 that was previously received via the network 204 to be stored in the memory 634 .
- the TOE 641 may cause at least a portion of a PDU, which is to be subsequently transmitted via the network 204 , to be stored in the memory 634 .
- the TOE 641 may cause an intermediate result, comprising a PDU or data, which is processed at least in part by the TOE 641 , to be stored in the memory 634 .
- the memory 634 may comprise suitable logic, circuitry, and/or code that may be utilized to store, or write, and/or retrieve, or read, information, data, and/or executable code.
- the memory 634 may comprise a random access memory (RAM) such as DRAM and/or SRAM.
- RAM random access memory
- the memory 634 may be utilized to store and/or retrieve data and/or PDUs that may be processed by the TOE 641 .
- the memory 634 may store code that may be executed by the TOE 641 .
- the network interface 632 may comprise suitable logic, circuitry, and/or code that may be utilized to transmit and/or receive PDUs via a network 204 .
- the network interface may be coupled to the network 204 .
- the network interface 632 may be coupled to the bus 636 .
- the network interface 632 may receive bits via the bus 636 .
- the network interface 632 may subsequently transmit the bits via the network 204 that may be contained in a representation of a PDU by converting the bits into electrical and/or optical signals, with timing parameters, and with signal amplitude, energy and/or power levels as specified by an appropriate specification for a network medium, for example, Ethernet.
- the network interface 632 may also transmit framing information that identifies the start and/or end of a transmitted PDU.
- the network interface 632 may receive bits that may be contained in a PDU received via the network 204 by detecting framing bits indicating the start and/or end of the PDU. Between the indication of the start of the PDU and the end of the PDU, the network interface 632 may receive subsequent bits based on detected electrical and/or optical signals, with timing parameters, and with signal amplitude, energy and/or power levels as specified by an appropriate specification for a network medium, for example, Ethernet. The network interface 632 may subsequently transmit the bits via the bus 636 .
- the network interface 633 may be substantially as described for network interface 632 .
- the processor 643 may comprise suitable logic, circuitry, and/or code that may be utilized to perform at least a portion of the protocol processing tasks within the TOE 641 .
- the local connection point 645 may comprise a computer program and/or code may be executable by the processor 643 , which may perform RDMA and/or TCP protocol processing.
- Exemplary protocol processing may comprise establishment of TCP tunnels, in accordance with an embodiment of the invention.
- the local RDMA access point 647 may comprise a computer program that comprises at least one code section that may be executable by the processor 643 for causing the processor 643 to perform steps comprising protocol processing, for example protocol processing related to the establishment of RDMA connection and/or the association of a plurality of RDMA connections with a corresponding one or more TCP tunnels, in accordance with an embodiment of the invention.
- protocol processing for example protocol processing related to the establishment of RDMA connection and/or the association of a plurality of RDMA connections with a corresponding one or more TCP tunnels, in accordance with an embodiment of the invention.
- the processor 644 a may be substantially as described for the processor 614 a .
- the processor 644 a may be coupled to the bus 652 .
- the local application 644 b may be substantially as described for the local application 614 b .
- the processor 646 a may be substantially as described for the processor 614 a .
- the processor 646 a may be coupled to the bus 652 .
- the local application 646 b may be substantially as described for the local application 614 b .
- the processor 648 a may be substantially as described for the processor 614 a .
- the processor 648 a may be coupled to the bus 652 .
- the local application 648 b may be substantially as described for the local application 614 b .
- the system memory 650 may be substantially as described for the system memory 620 .
- the system memory 650 may be coupled to the bus 652 .
- the RNIC 642 may be substantially as described for the RNIC 612 .
- the RNIC 642 may be coupled to the bus 652 .
- the TOE 672 may be substantially as described for the TOE 641 .
- the TOE 672 may be coupled to the bus 652 .
- the TOE 672 may be coupled to the bus 666 .
- the network interface 662 may be substantially as described for the network interface 632 .
- the network interface 662 may be coupled to the bus 666 .
- the memory 664 may be substantially as described for the memory 634 .
- the memory 664 may be coupled to the bus 666 .
- the processor 674 may be substantially as described for the processor 643 .
- the remote connection point 676 may be substantially as described for the local connection point 645 .
- the remote RDMA access point 677 may be substantially as described for the local RDMA access point 647 .
- one or more local applications 614 b , 616 b , and/or 618 b may attempt to establish a plurality of RDMA connections with one or more remote applications 644 b , 646 b , and/or 648 b .
- a corresponding plurality of TCP connections may be established between the local computer system 602 , and the remote computer system 606 .
- the TCP connections may be referred to as communication channels.
- the plurality of TCP connections may be associated with a TCP tunnel.
- the TCP tunnel may be associated with a plurality of network interfaces, for example network interfaces 633 and 634 located in the RNIC 612 .
- any of the plurality of TCP connections associated with the TCP tunnel may be utilized by at least a portion of the plurality of RDMA connections.
- An individual RDMA connection may utilize at least a portion of the plurality of TCP connections.
- An individual TCP connection among the plurality of TCP connections may be associated with a single network interface among the plurality of network interfaces. For example, in a TCP tunnel comprising two individual TCP connections, a first TCP connection may be associated with a first network interface 633 , while a second TCP connection may be associated with a second network interface 634 .
- a TCP connection may be associated with a network interface if information transported across a network 204 via the TCP connection utilizes the network interface.
- An RDMA connection may utilize the first TCP to transport a current portion of a plurality messages, and the second TCP connection to transport a subsequent portion of the plurality of messages.
- the RDMA connection may utilize the first TCP connection to transport at least a portion of the plurality of messages. If a failure occurs in the first TCP connection such that the local computer system 602 is unable to continue sending messages to the remote computer system 606 , subsequent messages may utilize the second TCP connection.
- the first TCP connection may be referred to as the active TCP connection with respect to the RDMA connection
- the second TCP connection may be referred to as the standby TCP connection.
- the active or standby status of a TCP connection may be with respect to a single RDMA connection.
- a second RDMA connection that utilizes the tunnel may utilize the second TCP connection as the active TCP connection, while utilizing the first TCP connection as the standby TCP connection.
- the routing of the first TCP connection within the network 204 may differ from the routing of the second TCP connection.
- a first network interface 633 may be coupled to a first access router or switch within the network 204
- a second network interface 634 may be coupled to a second access router or switch within the network 204 .
- failure of a single component within the network, or a single point of failure may not result in a failure of both the first and second TCP connections.
- the utilization of a plurality of network interfaces at the RNIC 612 may enable the TCP tunnel to transport messages associated with the RDMA connection in the event of a failure of a single network interface 633 or 634 .
- each of the TCP connections within a TCP tunnel should follow a different route, within the network, between the local computer system and the remote computer system.
- the routes may be evaluated by, for example, estimating a distance between a local network address and a remote network address within the network.
- the TCP tunnel may comprise a plurality of TCP connections associated with interfaces located at each RNIC.
- a first TCP connection may be associated with a first network interface located at the first RNIC
- a second TCP connection may be associated with a second network interface located at the first RNIC.
- a third TCP connection may be associated with a first network interface located at the second RNIC
- a fourth TCP connection may be associated with a second network interface located at the second RNIC.
- An RDMA connection may utilize the first TCP connection to transport at least a portion of the plurality of messages. If a failure occurs in the first TCP connection such that the local computer system 602 is unable to continue sending messages to the remote computer system 606 , subsequent messages may utilize the third TCP connection.
- An RDMA connection may comprise state information about the connection. For example, MST-MPA protocol messages sent via the RDMA connection may be sequence numbered.
- the RNICs may exchange information about the state of individual RDMA connections that utilize the respective RNICs. For example, in the above example, when the RDMA connection utilized the first TCP connection, the first RNIC may maintain state information related to the RDMA connection. The first RNIC may be referred to as the active RNIC with respect to the RDMA connection. The second RNIC, which was utilized when the first TCP connection failed, may be referred to as the standby RNIC with respect to the RDMA connection. The active RNIC may update the standby RNIC with state information related to the RDMA connection. This process of active RNIC to standby RNIC updating of information may be referred to as checkpointing.
- the RDMA connection utilized the first TCP connection, which was associated with the first interface located at the first RNIC, as the active TCP connection. Consequently, the first RNIC was the active RNIC.
- the active or standby status of an RNIC may be with respect to a single RDMA connection.
- a second RDMA connection that utilizes the tunnel may utilize the second RNIC as the active RNIC, while utilizing the first RNIC as the standby RNIC.
- the second RDMA connection may utilize the third TCP connection, which was associated with the first interface located at the second RNIC, as the active TCP connection. In the event of a failure of the third TCP connection, the second RDMA connection may utilize the first TCP connection, for example.
- the network interfaces 633 and 634 may be utilized to provide an aggregate increase in the data transfer rate across the network 204 .
- an RDMA connection may utilize the first TCP connection to transport a current portion of a plurality of messages while concurrently utilizing the second TCP connection to transport a subsequent portion of the plurality of messages.
- an n th message, sent via the RDMA connection may utilize the first network interface 633
- an (n+1) th message also sent via the RDMA connection, may concurrently utilize the second network interface 634 .
- Probe messages may comprise one or more echo messages as specified by the Internet Control Message Protocol (ICMP), for example.
- ICMP Internet Control Message Protocol
- a local TOE 641 may establish a high availability TCP tunnel to a remote TOE 672 .
- the high availability tunnel may comprise a plurality of TCP connections. With respect to an individual RDCP connection that may utilize the TCP tunnel, one of the plurality of TCP connections may be an active TCP connection, while other TCP connections associated with the TCP tunnel may be standby connections.
- the local TOE 641 may send a connection request message to the remote TOE 672 .
- the connection request message may comprise a plurality of elements. Exemplary elements may comprise a tunnel cookie, a maximum number of tunnel connections, and a list of one or more endpoint addresses. Optionally, a maximum endpoint identifier may be specified.
- the maximum endpoint identifier may identify one or more local endpoints 614 b that may utilize the RDMA tunnel.
- the maximum endpoint identifier may correspond to a maximum local port value associated with an application associated with the corresponding local endpoint 614 b .
- the local port value may identify a specific local endpoint 614 b.
- the tunnel cookie may represent an identifier of the TCP tunnel. This value may be useful when subsequently modifying the TCP tunnel. For example, when issuing a subsequent connection request message to add TCP connections, or remove existing TCP connections, the TCP tunnel may be utilized to authenticate the request.
- the maximum number of tunnel connections may represent an indication of the maximum number of TCP connections that may be contained within the established TCP tunnel. The number of TCP connections may be associated with a single RNIC or a plurality of RNICs.
- the list of one or more endpoint identifiers may represent a plurality of local addresses.
- the local addresses may represent local network addresses that may be associated with a network interface located at an RNIC.
- the RNIC may be located at the local computer system 602 .
- each of the one or more endpoint identifiers may be associated with a different network interface and/or different access router or switch corresponding to a different route through the network 204 .
- a first endpoint identifier may be associated with the network interface 633
- a second endpoint identifier may be associated with the network interface 634 .
- the network address may enable the network 204 to route TCP connections, and the messages carried within RDMA connections that utilize the TCP connections, to be properly routed between an interface located at a local computer system 602 and a remote computer system 606 via the network 204 .
- FIG. 7 is a block diagram of an exemplary system for high availability when utilizing an MST-MPA with a single RNIC, in accordance with an embodiment of the invention.
- a network 204 there is shown a network 204 , a local computer system 602 , and a TCP tunnel 702 .
- the local computer system 602 may comprise an RNIC 612 , a processor 643 , a memory 634 , and network interfaces 633 and 634 .
- the TCP tunnel 702 may comprise a plurality of TCP connections indicated by the reference numbers 1 and 2 .
- the TCP tunnel 702 may comprise a plurality of TCP connections between the local computer system 602 and a remote computer system 606 via the network 204 as illustrated in FIG. 6 .
- the TCP connection 1 may represent an active TCP connection
- the TCP connection 2 may represent a standby TCP connection.
- the active TCP connection may be associated with the network interface 634
- the standby interface may be associated with the network interface 633 .
- RDMA frames transported via an RDMA connection may utilize the TCP connection 1 .
- the RDMA connection may be transported across the network 204 via the network interface 634 .
- Various embodiments of the invention may not be limited to utilizing an established TCP connection 2 .
- a new TCP connection may be established within the tunnel.
- the new TCP connection may be established by sending a connection request message that comprises a tunnel cookie that identifies the TCP tunnel 702 , for example.
- FIG. 8 is a block diagram of fault recovery in an exemplary system for high availability when utilizing an MST-MPA with a single RNIC, in accordance with an embodiment of the invention.
- the local computer system 602 may comprise an RNIC 612 , a processor 643 , a memory 634 , and network interfaces 633 and 634 .
- FIG. 8 represents an annotation of FIG. 7 to illustrate a fault recovery response to a failure of an active TCP connection.
- the TCP connection 1 may fail for various reasons, for example, a cable may inadvertently be removed from the network interface 634 , a hardware, software, or firmware failure may occur causing a failure at the network interface 634 , or a failure may occur within the network 204 .
- a failure of the TCP connection 1 may be determined if failures are detected in other TCP connections that utilize the same network interface.
- the failure of the TCP connection 1 may be detected at the RNIC 612 by TCP procedures as specified in applicable TCP specifications.
- the processor 643 within the RNIC 612 may cause the active TCP connection 1 to enter an out-of-service state with respect to the RDMA connection.
- the standby TCP connection 2 may subsequently enter an active state with respect to the RDMA connection.
- Subsequent RDMA frames associated with the RDMA connection may be transported across the network 204 via the network interface 633 .
- FIG. 9 is a block diagram illustrating data striping in an exemplary system for high availability when utilizing an MST-MPA with a single RNIC, in accordance with an embodiment of the invention.
- the local computer system 602 may comprise an RNIC 612 , a processor 643 , a memory 634 , and network interfaces 633 and 634 .
- FIG. 9 represents an annotation of FIG. 7 to illustrate data striping.
- Data striping may utilize a plurality of network interfaces to enable information to be transported in an RDMA connection at a data rate that exceeds the data rate of a single network interface.
- the TCP connection 1 may represent an active TCP connection
- the TCP connection 2 may also represent an active TCP connection.
- a portion of RDMA frames from an RDMA connection may be transported via the TCP connection 1
- a subsequent portion of the RDMA frames from the RDMA connection may be concurrently transported via the TCP connection 2 .
- FIG. 10 is a block diagram of an exemplary system for high availability when utilizing an MST-MPA with a duplex RNIC configuration, in accordance with an embodiment of the invention.
- the local computer system 602 may comprise an RNIC 612 a , and an RNIC 612 b .
- the RNIC 612 a may comprise a processor 643 a , a memory 634 a , a network interfaces 633 a and 634 a .
- the RNIC 612 b may comprise a processor 643 b , a memory 634 b , and network interfaces 633 b and 634 b .
- the RNIC 612 b may be referred to as a mate RNIC to the RNIC 612 a .
- the RNIC 612 a may be referred as a mate RNIC to the RNIC 612 b.
- the TCP tunnel 1002 may comprise a plurality of TCP connections indicated by the reference numbers 1 , 2 , 3 , and 4 .
- the TCP tunnel 1002 may comprise a plurality of TCP connections between the local computer system 602 and a remote computer system 606 via the network 204 as illustrated in FIG. 6 .
- the TCP connection 1 may represent an active TCP connection
- the TCP connection 2 may represent a standby TCP connection.
- the active TCP connection may be associated with the network interface 634 a
- the standby interface may be associated with the network interface 634 b .
- the TCP connection 3 may be associated with the network interface 633 a .
- the TCP connection 4 may be associated with the network interface 633 b .
- the network interfaces 633 a and 634 a may be located at the RNIC 612 a
- the network interface 633 b and 634 b may be located at the RNIC 612 b.
- the RNIC 612 a may represent an active RNIC 612 a
- the RNIC 612 b may represent a standby RNIC 612 b
- RDMA frames transported via an RDMA connection may utilize the TCP connection 1 .
- the RDMA connection may be transported across the network 204 via the network interface 634 b .
- the TCP connections 3 and 4 may be utilized by other RDMA connections.
- TCP connections 1 and 2 may also be utilized by other RDMA connections.
- the processor 643 a located in the RNIC 612 a may checkpoint to the processor 643 b located in the mate RNIC 612 b .
- the checkpointing between the processors, indicated by the reference number 5 may comprise updating on the state of RDMA active connections carried via the respective RNICs.
- the RNIC 612 a may maintain state information related to RDMA connections that utilize active TCP connections associated with network interfaces 633 a and 634 a
- the RNIC 612 b may maintain state information related to RDMA connections that utilize active TCP connections associated with network interfaces 633 b and 634 b .
- the processor 643 a may checkpoint the processor 643 b with state information related to active TCP connections associated with network interfaces 633 a and 634 a .
- the processor 643 b may checkpoint the processor 643 a with state information related to active TCP connections associated with network interfaces 633 b and 634 b.
- FIG. 11 is a block diagram of an exemplary system for high availability when utilizing an MST-MPA with a duplex RNIC configuration, in accordance with an embodiment of the invention.
- the local computer system 602 may comprise an RNIC 612 a , and an RNIC 612 b .
- the RNIC 612 a may comprise a processor 643 a , a memory 634 a , a network interfaces 633 a and 634 a .
- the RNIC 612 b may comprise a processor 643 b , a memory 634 b , and network interfaces 633 b and 634 b .
- the RNIC 612 b may be referred to as a mate RNIC to the RNIC 612 a .
- the RNIC 612 a may be referred as a mate RNIC to the RNIC 612 b.
- FIG. 11 represents an annotation of FIG. 10 to illustrate a fault recovery response to a failure of an active TCP connection.
- the failure of the TCP connection 1 may be detected at the RNIC 612 a by TCP procedures as specified in applicable TCP specifications.
- the processor 643 a within the RNIC 612 a may cause the active TCP connection 1 to enter an out-of-service state with respect to the RDMA connection.
- the processor 643 a may checkpoint the processor 643 b in the mate RNIC 612 b to indicate the failure of the TCP connection 1 via the checkpointing link 5 .
- the standby TCP connection 2 may subsequently enter an active state with respect to the RDMA connection.
- Subsequent RDMA frames associated with the RDMA connection may be transported across the network 204 via the network interface 634 b .
- Various embodiments of the invention may not be limited to utilizing an established TCP connection 2 .
- a new TCP connection may be established within the tunnel.
- the new TCP connection may be established by sending a connection request message that comprises a tunnel cookie that identifies the TCP tunnel 1002 , for example.
- FIG. 12 is a flowchart illustrating an exemplary process for high availability when utilizing a MST-MPA protocol, in accordance with an embodiment of the invention.
- a local connection point 645 may establish a TCP tunnel 1002 to a remote connection point 676 via a network 204 .
- the local RDMA access point 647 may establish an RDMA connection via an active TCP connection over the TCP tunnel 1002 .
- the local connection point 645 may send RDMA frames via the active TCP connection over the TCP tunnel 1002 .
- Step 1206 may determine whether the local computer system 602 comprises a single RNIC 612 a , or a plurality of RNICs, for example, a duplex configuration comprising a mate RNIC 612 b . If there is no mate RNIC, in step 1208 , the local connection point 645 may detect a failure in the active TCP connection. The local connection point 645 may receive notification of the failure of the active TCP connection from the network interface 633 and/or 634 . In step 1210 , the local connection point 645 may switch the RDMA connection from a current network interface 634 such that subsequent RDMA frames may be transported via a TCP connection associated with a subsequent network interface 633 .
- the RNIC 612 a may checkpoint the mate RNIC 612 b .
- the local connection point 645 may detect a failure in the active TCP connection.
- the local connection point 645 may receive notification of the failure of the active TCP connection from the network interface 633 a and/or 634 a .
- the local connection point 645 may switch the RDMA connection from a current network interface 634 a such that subsequent RDMA frames may be transported via a TCP connection associated with a subsequent network interface 634 b located at the mate RNIC 612 b.
- aspects of a system for transporting information via a communications system may include a processor 643 that may enable establishing a plurality of TCP communication channels between a local RDMA enabled NIC (RNIC) 612 and at least one of a plurality of remote RNICs 642 .
- RNIC local RDMA enabled NIC
- Each of the plurality of TCP communication channels may be communicatively coupled to a plurality of different network interfaces at the local RNIC 612 .
- the processor 643 may enable establishing of RDMA connections between one of a plurality of local RDMA endpoints and at least one remote RDMA endpoint utilizing the established plurality of TCP communication channels.
- the processor 643 may enable communicating of a portion of a plurality of messages from one of a plurality of local RDMA endpoints communicatively coupled to a first of the plurality of different network interfaces at the local RNIC.
- the portion of the plurality of messages may be communicated to at least one remote RDMA endpoint communicatively coupled to one of the plurality of remote RNICs via a first of the established plurality of TCP communication channels.
- the processor 643 may also enable communicating a remaining portion of the plurality of messages from one of the plurality of local RDMA endpoints communicatively coupled to a second of the plurality of different network interfaces at the local RNIC.
- the remaining portion of the messages may be communicated to at least one remote endpoint via a second of the established plurality of TCP communication channels.
- Each of the plurality of different network interfaces may utilize a different network address.
- the processor 643 may enable placing the first of the plurality of different network interfaces in an out-of-service state prior to communication of the remaining portion of the plurality of messages.
- the first of the plurality of different network interfaces and the second of the plurality of different network interfaces may each be in either an active state or a standby state.
- the processor 643 may enable communicating of a subsequent message, to the remaining portion of the plurality of messages, via said first of the plurality of different network interfaces.
- the first of the plurality of different network interfaces and the second of said plurality of different network interfaces may be associated with said local RNIC.
- the first of the plurality of different network interfaces may be associated with a first local RNIC and the second of said plurality of different network interfaces may be associated with a different local RNIC.
- the present invention may be realized in hardware, software, or a combination of hardware and software.
- the present invention may be realized in a centralized fashion in at least one computer system, or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system or other apparatus adapted for carrying out the methods described herein is suited.
- a typical combination of hardware and software may be a general-purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein.
- the present invention may also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which when loaded in a computer system is able to carry out these methods.
- Computer program in the present context means any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following: a) conversion to another language, code or notation; b) reproduction in a different material form.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
- This application makes reference to, claims priority to, and claims the benefit of U.S. Provisional Application Ser. No. 60/626,283 filed Nov. 8, 2004.
- This application also makes reference to:
- U.S. application Ser. No. ______ (Attorney Docket No. 17036US02) filed on even date herewith; and
- U.S. application Ser. No. ______ (Attorney Docket No. 17097US02) filed on even date herewith.
- Each of the above stated applications is hereby incorporated herein by reference in its entirety.
- Certain embodiments of the invention relate to data communications. More specifically, certain embodiments of the invention relate to a method and system for high availability when utilizing a multi-stream tunneled marker-based protocol data unit (PDU) aligned (MST-MPA) protocol.
- In conventional computing, a single computer system is often utilized to perform operations on data. The operations may be performed by a single processor, or central processing unit (CPU) within the computer. The operations performed on the data may include numerical calculations, or database access, for example. The CPU may perform the operations under the control of a stored program containing executable code. The code may include a series of instructions that may be executed by the CPU that cause the computer to perform specified operations on the data. The capability of a computer in performing operations may variously be measured in units of millions of instructions per second (MIPS), or millions of operations per second (MOPS).
- Historically, increases in computer performance have depended on improvements in integrated circuit technology, often referred to as “Moore's law”. Moore's law postulates that the speed of integrated circuit devices may increase at a predictable, and approximately constant, rate over time. However, technology limitations may begin to limit the ability to maintain predictable speed improvements in integrated circuit devices.
- Another approach to increasing computer performance implements changes in computer architecture. For example, the introduction of parallel processing may be utilized. In a parallel processing approach, computer systems may utilize a plurality of CPUs within a computer system that may work together to perform operations on data. Parallel processing computers may offer computing performance that may increase as the number of parallel processing CPUs in increased. The size and expense of parallel processing computer systems result in special purpose computer systems. This may limit the range of applications in which the systems may be feasibly or economically utilized.
- An alternative to large parallel processing computer systems is cluster computing. In cluster computing a plurality of smaller computer, connected via a network, may work together to perform operations on data. Cluster computing systems may be implemented, for example, utilizing relatively low cost, general purpose, personal computers or servers. In a cluster computing environment, computers in the cluster may exchange information across a network similar to the way that parallel processing CPUs exchange information across an internal bus. Cluster computing systems may also scale to include networked supercomputers. The collaborative arrangement of computers working cooperatively to perform operations on data may be referred to as high performance computing (HPC).
- Cluster computing offers the promise of systems with greatly increased computing performance relative to single processor computers by enabling a plurality of processors distributed across a network to work cooperatively to solve computationally intensive computing problems. One aspect of cooperation between computers may include the sharing of information among computers. Remote direct memory access (RDMA) is a method that enables a processor in a local computer to gain direct access to memory in a remote computer across the network. RDMA may provide improved information transfer performance when compared to traditional communications protocols. RDMA has been deployed in local area network (LAN) environments such as InfiniBand, Myrinet, and Quadrics. RDMA, when utilized in wide area network (WAN) and Internet environments, is referred to as RDMA over TCP, RDMA over IP, or RDMA over TCP/IP.
- One of the problems attendant with some distributed cluster computing systems is that the frequent communications between distributed processors may impose a processing burden on the processors. The increase in processor utilization associated with the increasing processing burden may reduce the efficiency of the computing cluster for solving computing problems. The performance of cluster computing systems may be further compromised by bandwidth bottlenecks that may occur when sending and/or receiving data from processors distributed across the network.
- Once a TCP connection is established, it may be bound to a source network address and a destination network address. If either address becomes inaccessible, the corresponding TCP connection may fail. A network address may become inaccessible due to a failure at a single point in the path of the TCP connection between the source and destination.
- Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of such systems with some aspects of the present invention as set forth in the remainder of the present application with reference to the drawings.
- A system and/or method is provided for high availability when utilizing a multi-stream tunneled marker-based protocol data unit (PDU) aligned (MST-MPA) protocol, substantially as shown in and/or described in connection with at least one of the figures, as set forth more completely in the claims.
- These and other advantages, aspects and novel features of the present invention, as well as details of an illustrated embodiment thereof, will be more fully understood from the following description and drawings.
-
FIG. 1 a illustrates an exemplary distributed database processing environment, in connection with an embodiment of the invention. -
FIG. 1 b illustrates an exemplary system for multihoming, in connection with an embodiment of the invention. -
FIG. 2 is an illustration of an exemplary conventional write operation from a local node to a remote node, in connection with an embodiment of the invention. -
FIG. 3 is an illustration of an exemplary conventional write operation from a local node to a remote node, in connection with an embodiment of the invention. -
FIG. 4 is an illustration of an exemplary conventional RDMA over TCP protocol stack, in connection with an embodiment of the invention. -
FIG. 5 is an illustration of an exemplary RDMA over TCP protocol stack utilizing SCTP, in connection with an embodiment of the invention. -
FIG. 6 is a block diagram of an exemplary system for an MST-MPA protocol, in accordance with an embodiment of the invention. -
FIG. 7 is a block diagram of an exemplary system for high availability when utilizing an MST-MPA with a single RNIC, in accordance with an embodiment of the invention. -
FIG. 8 is a block diagram of fault recovery in an exemplary system for high availability when utilizing an MST-MPA with a single RNIC, in accordance with an embodiment of the invention. -
FIG. 9 is a block diagram illustrating data striping in an exemplary system for high availability when utilizing an MST-MPA with a single RNIC, in accordance with an embodiment of the invention. -
FIG. 10 is a block diagram of an exemplary system for high availability when utilizing an MST-MPA with a duplex RNIC configuration, in accordance with an embodiment of the invention. -
FIG. 11 is a block diagram of an exemplary system for high availability when utilizing an MST-MPA with a duplex RNIC configuration, in accordance with an embodiment of the invention. -
FIG. 12 is a flowchart illustrating an exemplary process for high availability when utilizing a MST-MPA protocol, in accordance with an embodiment of the invention. - Certain embodiments of the invention may be found in a method and system for high availability when utilizing a multi-stream tunneled marker-based PDU aligned (MST-MPA) protocol. The invention may comprise a method and a system that may enable reliable communications between cooperating processors in a cluster computing environment while reducing the amount of processing burden in comparison to some conventional approaches to inter-processor communication among processors in the cluster. Various embodiments of the invention may provide high availability that enables fault tolerant reliable communications.
- Various aspects of the invention may provide an exemplary system for transporting information and may comprise a processor that enables establishment of TCP connections or communication channels between a local remote direct memory access (RDMA) enabled network interface card (RNIC) and at least one remote RNIC via at least one network. The processor may enable establishment of at least one RDMA connection between one of a plurality of local RDMA endpoints and at least one remote RDMA endpoint utilizing one or more of the communication channels. The processor may further enable communication of messages via the established RDMA connections between one of the plurality of local RDMA endpoints and at least one remote RDMA endpoint independent of whether the messages are in-sequence or out-of-sequence.
- In various embodiments of the invention, an RDMA connection may be transported, between a local RDMA endpoint and a remote RDMA endpoint, across a network via a TCP tunnel. The TCP tunnel may comprise a plurality of TCP connections that may be logically associated with a single TCP tunnel. The TCP tunnel may also be associated with a plurality of different network interfaces and/or network routes. At least a portion of the plurality of different network interfaces may be associated with at least one RNIC. At least a portion of the plurality of TCP connections may be associated with each of the plurality of different network interfaces. In a fault tolerant system, at least a current portion of a plurality of messages communicated via an RDMA connection may be transported by a current TCP connection associated with a current network interface located at a current RNIC. In the event of a subsequent failure in the current TCP connection a subsequent portion of the plurality of messages may be communicated via a subsequent TCP connection associated with a different network interface. The subsequent TCP connection may be associated with the same TCP tunnel as the current TCP connection. The different network interface may be located at the current RNIC or at a subsequent RNIC.
- The ability to send a current portion of a plurality of messages via a current interface, and a subsequent portion of the plurality of messages via a subsequent interface may be referred to as multi-homing. Various embodiments of the invention may enable multi-homing to be utilized with RDMA over TCP. TCP may provide mechanisms by which each of a plurality of messages may be delivered to a destination node once, and in the order in which a source node transmitted the messages, when utilizing a single interface. Various embodiments of the invention may provide mechanisms by which each of the plurality of messages may be delivered to the destination node once, and in the order in which the source node sent the messages, when utilizing a plurality of interfaces.
-
FIG. 1 a illustrates an exemplary distributed database processing environment, in connection with an embodiment of the invention. Referring toFIG. 1 a, there is shown anetwork 102, a plurality ofcomputer systems database applications computer systems network 102. One or more of thecomputer systems corresponding database application - In a distributed processing environment, such as in distributed database processing, for example, a database application, for example 104 b, may communicate with one or more peer database applications, for example 106 b, 108 b, 110 b, or 112 b, via a network, for example, 102. The operation of the
database application 104 b may be considered to be coupled to the operation of one or more of thepeer databases - In some conventional cluster environments, a cluster application may communicate with a peer cluster application via a network by establishing a network connection between the cluster application and the peer application, exchanging information via the network connection, and subsequently terminating the connection at the end of the information exchange. An exemplary communications protocol that may be utilized to establish a network connection is the Transmission Control Protocol (TCP). RFC 793 discloses communication via TCP and is hereby incorporated herein by reference. An exemplary protocol that may be utilized to route information transported in a network connection across a network is the Internet Protocol (IP). RFC 791 discloses communication via IP and is hereby incorporated herein by reference. An exemplary medium for transporting and routing information across a network is Ethernet, which is defined by Institute of Electrical and Electronics Engineers (IEEE) resolution 802.3 is hereby incorporated herein by reference.
- For example,
database application 104 b may establish a TCP connection todatabase application 110 b. Thedatabase application 104 b may initiate establishment of the TCP connection by sending a connection establishment request to thepeer database application 110 b. The connection establishment request may be routed from thecomputer system 104 a, across thenetwork 102, to thecomputer system 110 a, via IP. Thepeer database application 110 b may respond to the received connection establishment request by sending a connection establishment confirmation to thedatabase application 104 b. The connection establishment confirmation may be routed from thecomputer system 110 a, across thenetwork 102, to thecomputer system 104 a, via IP. - After establishing the TCP connection, the
database application 104 b may issue a query to thedatabase application 110 b via the established TCP connection. In response to the query, thedatabase application 110 b may access data stored atcomputer system 110 a. Thedatabase application 110 b may subsequently send the accessed information to thedatabase application 104 b via the established TCP connection. Thedatabase application 104 b may send an acknowledgement of receipt of the accessed data to thedatabase application 110 b via the established TCP connection. Thedatabase application 104 b may terminate the established TCP connection by sending a connection terminate indication to the database application - In a cluster environment comprising N computer systems wherein P cluster applications, or software processes, are concurrently executing at each of the computer systems, the number of connections, NC, that may be established across a network at a given time instant may be:
An exemplary cluster environment may comprise 8 computing systems, for example 104 a, wherein 8 cluster applications, for example 104 b, are executing at each of the 8 computer systems. In this exemplary regard, 1,712 connections may be established across a network, for example 102, at a given time instant. - Many of the connections established in some conventional cluster environments may be transient in nature. This may be true, for example, in transaction oriented cluster environments in which a cluster application may establish a connection when it needs to communicate with a peer cluster application across a network. At the completion of the communication, or transaction, the connection may be terminated. At a subsequent time instant, when the cluster application and peer cluster application needs to communicate, the process of connection establishment, transaction, and connection termination may be repeated. The processing overhead required for maintaining large numbers of connections and/or frequent connection establishment and connection terminations may significantly decrease the processing efficiency of the cluster.
-
FIG. 1 b illustrates an exemplary system for multihoming, in connection with an embodiment of the invention. Referring toFIG. 1 b, there is shown alocal node 122, aremote node 124, a local subnet 142, aremote subnet 144,router 152 androuter 154. Thelocal node 122 may compriseinterfaces routers - The local subnet 142 may communicatively couple the
local interface 132 a androuter 152. The local subnet 142 may also communicatively couple thelocal interface 132 a androuter 154. The local subnet 142 may communicatively couple thelocal interface 132 b androuter 152. The local subnet 142 may also communicatively couple thelocal interface 132 b androuter 154. - The
local subnet 144 may communicatively couple thelocal interface 134 a androuter 152. Thelocal subnet 144 may also communicatively couple thelocal interface 134 a androuter 154. Thelocal subnet 144 may communicatively couple thelocal interface 134 b androuter 152. Thelocal subnet 144 may also communicatively couple thelocal interface 134 b androuter 154. - Each of the interfaces and routers may be associated with at least one network address. For example, the
interface 132 a may be associated with network addresses 192.168.1.17 and 192.168.1.19. Theinterface 132 b may be associated with network addresses 192.168.3.17 and 192.168.3.19. Theinterface 134 a may be associated with network addresses 192.168.2.18 and 192.168.2.20. Theinterface 134 b may be associated with network addresses 192.168.4.18 and 192.168.4.20. Therouter 152 may be associated with network address 192.168.1.1 at local subnet 142. Therouter 152 may be associated with network address 192.168.2.1 atlocal subnet 144. Therouter 154 may be associated with network address 192.168.3.1 at local subnet 142. Therouter 154 may be associated with network address 192.168.4.1 atlocal subnet 144. - The
local subnets 142 and 144, androuters interface 132 a andinterface 134 a. Thelocal subnets 142 and 144, androuters interface 132 a andinterface 134 b. Thelocal subnets 142 and 144, androuters interface 132 b and interface 134 a. Thelocal subnets 142 and 144, androuters interface 132 b andinterface 134 b. The routes may be utilized to send an IP frame from a source address 192.168.1.17 located in thelocal node 122 to a destination address 192.168.2.18 in theremote node 124. - Multihoming may comprise utilizing a plurality of different routes to send information between the
local node 122 and theremote node 124. Information may be sent between thelocal node 122 andremote node 124 via IP frames, for example. The IP frame may comprise a source address indicating the sender, and a destination address indicating the recipient. The source and destination addresses may be utilized when routing the IP frame between thelocal node 122 andremote node 124. A first exemplary route may comprise sending an IP frame from network address 192.168.1.17, via the local subnet 142, to therouter 152 at network address 192.168.1.1, and from therouter 152 at network address 192.168.2.1, via theremote subnet 144, to the destination address 192.168.2.18. A second exemplary route may comprise sending an IP frame from network address 192.168.3.17, via the local subnet 142, to therouter 154 at network address 192.168.3.1, and from therouter 154 at network address 192.168.4.1, via theremote subnet 144, to the destination address 192.168.4.18. A third exemplary route may comprise sending an IP frame from network address 192.168.1.19, via the local subnet 142, to therouter 152 at network address 192.168.1.1, and from therouter 152 at network address 192.168.2.1, via theremote subnet 144, to the destination address 192.168.2.20. A fourth exemplary route may comprise sending an IP frame from network address 192.168.3.19, via the local subnet 142, to therouter 154 at network address 192.168.3.1, and from therouter 154 at network address 192.168.4.1, via theremote subnet 144, to the destination address 192.168.4.20. -
FIG. 2 is an illustration of an exemplary conventional write operation from a local node to a remote node, in connection with an embodiment of the invention. Referring toFIG. 2 there is shown alocal node 202, aremote node 206, and anetwork 204. Thelocal node 202 may comprise asystem memory 220, a network interface card (NIC) 212, and aprocessor 214. Within in context of a cluster environment, a local computer system may be referred to as a local node while a remote computer system may be referred to as a remote node. Thesystem memory 220 may comprise memory, which may store an application user space 222 and akernel space 224. Theprocessor 214 may execute anapplication 210. TheNIC 212 may comprise amemory 234. - The
remote node 206 may comprise asystem memory 250, anNIC 242, and aprocessor 244. Thesystem memory 250 may comprise anapplication user space 252 and/or akernel space 254. Theprocessor 244 may execute anapplication 240. TheNIC 242 may comprise amemory 264. - The
system memory 220 may comprise suitable logic, circuitry, and/or code that may be utilized to store, or write, and/or retrieve, or read, information, data, and/or executable code. Thesystem memory 220 may comprise a plurality of memory technologies such as random access memory (RAM). Thesystem memory 220 may be utilized to store and/or retrieve data that may be processed by theprocessor 214. Thememory 220 may comprise computer program or code, which may be executed by theprocessor 214. - The application user space 222 may comprise a portion of information, and/or data that may be utilized by the
application 210. Thekernel space 224 may comprise a portion of information, data, and/or code associated with an operating system or other execution environment that provides services that may be utilized by theapplication 210. Theprocessor 214 may comprise suitable logic, circuitry, and/or code that may be utilized to transmit, receive and/or process data. Theprocessor 214 may execute anapplication 210, for example a database application. Theapplication 210 may comprise at least one code section that may be executed by theprocessor 214. - The network interface chip/card (NIC) 212 may comprise suitable circuitry, logic and/or code that may transmit and/or receive data from a network, for example, an Ethernet network. The
NIC 212 may be coupled to thenetwork 204. TheNIC 212 may process data received and/or transmitted via thenetwork 204. - The
system memory 250 may comprise suitable logic, circuitry, and/or code that may be utilized to store, or write, and/or retrieve, or read, information, data, and/or executable code. Thesystem memory 250 may comprise different types of exemplary random access memory (RAM) such as DRAM and/or SRAM. Thesystem memory 250 may be utilized to store and/or retrieve data that may be processed by theprocessor 244. Thememory 250 may store a computer program or code that may be executed by theprocessor 244. - The
application user space 252 may comprise a portion of information, and/or data that may be utilized by theapplication 240. Thekernel space 254 may comprise a portion of information, data, and/or code associated with an operating system or other execution environment that provides services that may be utilized by theapplication 240. Theprocessor 244 may comprise suitable logic, circuitry, and/or code that may be utilized to transmit, receive and/or process data. Theprocessor 244 may execute anapplication 240 or code, such as, for example a database application. Theapplication 240 may comprise at least one code section that may be executed by theprocessor 244. TheNIC 242 may comprise suitable circuitry, logic and/or code that may enable transmission and/or reception of data from a network, for example, an Ethernet network. TheNIC 242 may be coupled to thenetwork 204. TheNIC 242 may process data received and/or transmitted via thenetwork 204. - In operation, the
local node 202 may transfer data to theremote node 206 via thenetwork 204. The data may comprise information that may be transferred from the application user space 222 in thelocal node 202 to theapplication user space 252 in theremote node 206. Theapplication 210 may cause theprocessor 214 to issue instructions to thesystem memory 220 as illustrated insegment 1 ofFIG. 2 . The instruction illustrated insegment 1 may cause information stored in the application user space 222 to be transferred to thekernel space 224 as illustrated insegment 2. The information may be subsequently transferred from thekernel space 224 to theNIC memory 234 as illustrated insegment 3. TheNIC 212 may cause the information to be transferred from thememory 234 in thelocal node 202, via thenetwork 204, to thememory 264 within theNIC 242 in theremote node 206 as illustrated in segment 4. The information may be transferred from thesystem memory 264 to thekernel space 254 within thesystem memory 250 in theremote node 206 as illustrated insegment 5. The information in thekernel space 254 may be transferred to theapplication user space 252 as illustrated in segment 6. - The remote direct memory access (RDMA) protocol may provide a more efficient method by which a database application, for example, executing at a local computer system may exchange information with a remote computer system across the
network 102. For example, an RDMA based transfer of information may be accomplished without requiring the intervening step of transferring the information from application user space to kernel space as illustrated inFIG. 2 . - The RDMA protocol may include two basic operations, an RDMA write operation, and an RDMA read operation. A third operation is a send/receive operation. The RDMA write operation may be utilized to transfer data from a local computer system to the remote computer system. The RDMA read operation may be utilized to retrieve data from a remote computer system that may subsequently be stored at the local computer system. For example, the
database application 104 b executing at alocal computer system 104 a may attempt to retrieve information stored at aremote computer system 110 a. Thedatabase application 104 b may issue the RDMA read instruction that may be sent across thenetwork 102, and received by theremote computer system 110 a. The requested information may subsequently be retrieved from theremote computer system 110 a, transported across thenetwork 102, and stored at thelocal computer system 104 a. - The
database application 104 b executing at thelocal computer system 104 a may attempt to transfer information to theremote computer system 110 a by issuing an RDMA write instruction that may be sent from thelocal computer system 104 a, across thenetwork 102, and received by theremote computer system 110 a. Thedatabase application 104 b may subsequently cause thelocal computer system 104 a to send information across thenetwork 102 that is stored at theremote computer system 110 a. -
FIG. 3 is an illustration of an exemplary conventional write operation from a local node to a remote node, in connection with an embodiment of the invention. Referring toFIG. 3 there is shown alocal node 302, aremote node 306, and anetwork 204. Thelocal node 302 may comprise asystem memory 220, an RDMA-enabled network interface card (RNIC) 312, and aprocessor 214. Thesystem memory 220 may comprise an application user space 222 and/or akernel space 224. Theprocessor 214 may execute anapplication 210. TheRNIC 312 may comprise anRDMA engine 314, and amemory 234. - The
remote node 306 may comprise asystem memory 250, anRNIC 342, and aprocessor 244. TheRNIC 342 may comprise anRDMA engine 344 and amemory 264. TheRNIC 312 may comprise suitable circuitry, logic and/or code that may enable transmission and reception of data from a network, for example, an Ethernet network. TheRNIC 312 may be coupled to thenetwork 204. TheRNIC 312 may process data received and/or transmitted via thenetwork 204. - The
RDMA engine 314 may comprise suitable logic, circuitry, and/or code that may be utilized to send instructions tosystem memory 220 and/ormemory 234 that may result in the transfer of information from thelocal node 302 to theremote node 306 via thenetwork 204. TheRDMA engine 314 may be programmed with a local memory address, a local node address, a remote memory address, a remote node address, and a length. TheRDMA engine 314 may then cause a block of information of a size, length, starting at location, local memory address, within thesystem memory 220 of thelocal node 302, local node address, to be transferred via thenetwork 204 to a location starting at location, remote memory address, within thesystem memory 250 of theremote node 306, remote node address. - The
RNIC 342 may comprise suitable circuitry, logic and/or code that may transmit and receive data from a network, for example, an Ethernet network. TheRNIC 342 may be coupled to thenetwork 204. TheRNIC 342 may process data received and/or transmitted via thenetwork 204. - The
RDMA engine 344 may comprise suitable logic, circuitry, and/or code that may be utilized to send instructions tosystem memory 250 and/ormemory 264 that may result in the transfer of information from theremote node 306 to thelocal node 302 via thenetwork 204 as described for theRDMA engine 314. - In operation, the
local node 302 may transfer data to theremote node 306 via thenetwork 204. The data may comprise information that may be transferred from the application user space 222 in thelocal node 202 to theapplication user space 252 in theremote node 206. Theapplication 210 may cause theprocessor 214 to issue instructions to theRDMA engine 314 as illustrated insegment 1 ofFIG. 2 . The instructions may comprise a local memory address, local node address, remote memory address, remote node address, and length. The instruction illustrated insegment 1 may cause theRDMA engine 314 to issue instructions to thesystem memory 220 as illustrated insegment 2. The instructions as illustrated insegment 2 may cause information stored in the application user space 222 to be transferred to theRNIC memory 234 as illustrated insegment 3. TheRNIC 312 may cause the information to be transferred from thememory 234 in thelocal node 302, via thenetwork 204, to thememory 264 within theRNIC 342 in theremote node 306 as illustrated in segment 4. The information may be transferred from thesystem memory 264 to theapplication user space 252 as illustrated insegment 5. -
FIG. 4 is an illustration of an exemplary conventional RDMA over TCP protocol stack, in connection with an embodiment of the invention. Referring toFIG. 4 , there is shown a conventional RDMA overTCP protocol stack 402. The RDMA overTCP protocol stack 402 may comprise anupper layer protocol 404, anRDMA protocol 406, a direct data placement protocol (DDP) 408, a marker-based PDU aligned protocol (MPA) 410, aTCP 412, anIP 414, and anEthernet protocol 416. An RNIC may comprise functionality associated with theRDMA protocol 406,DDP 408, MPA protocol 410,TCP 412,IP 414, andEthernet protocol 416. - The RDMA protocol specifies various methods that may enable a local computer system to exchange information with a remote computer system via a
network 204. The methods may comprise an RDMA read operation and/or an RDMA write operation. The RDMA protocol may also comprise the establishment of an RDMA connection between the local computer system and the remote computer system prior to the exchange of information. An RDMA connection may be established by, for example, a local computer system that sends an RDMA connection request message to the remote computer system and, in response, the remote computer system that sends an RDMA response message to the local computer system. The local computer system and remote computer system may subsequently utilize the established RDMA connection to exchange information via thenetwork 204. The exchange of information may comprise a local computer system that sends one or more sequence numbered frames to the remote computer system. The exchange of information may also comprise a remote computer system that sends one or more sequence numbered frames to the local computer system. The sequence numbers may indicate a relative ordering among frames. For example, the sequence number in a current frame may indicate, to the receiver of the frame, a relationship between the current frame and a preceding frame and/or subsequent frame. - The
DDP 408 may enable copy of information from an application user space in a local computer system to an application user space in a remote computer system without performing an intermediate copy of the information to kernel space. This may be referred to as a “zero copy” model. TheDDP 408 may embed information in each transmitted sequence numbered frame that enables information contained in the frame to be copied to the application user space in the remote computer system. This copy may be done regardless of whether a current sequence numbered frame is received in-sequence, or out-of-sequence, relative to a preceding sequence numbered frame, or subsequent sequence numbered frame, that is sent via the established RDMA connection. - The MPA protocol 410 may comprise methods that enable frames transmitted in an RDMA connection to be transported, via the
network 204, via a TCP connection. The MPA protocol 410 may enable a single TCP connection to carry frames associated with a corresponding single RDMA connection. In the transmitting direction, the MPA protocol 410 may receive a sequence numbered frame associated with an RDMA connection. The MPA protocol 410 may derive information from the received RDMA frame to identify the corresponding RDMA connection. The MPA protocol 410 may determine the corresponding TCP connection associated with the RDMA connection. The MPA protocol 410 may utilize the sequence numbered frame from the RDMA connection, or RDMA sequence numbered frame, to form a TCP packet. The formation of a TCP packet from the RDMA sequence numbered frame may be referred to as encapsulation, for example. The TCP packet may be transmitted, via thenetwork 204, utilizing the corresponding TCP connection. - In the receiving direction, the MPA protocol 410 may receive a TCP packet associated with a TCP connection from the
network 204. The MPA protocol 410 may derive information from the received TCP packet to determine the corresponding RDMA connection associated with the TCP connection. The MPA protocol 410 may extract an RDMA sequence numbered frame from the TCP packet. The extraction of an RDMA sequence numbered frame from the TCP packet may be referred to as decapsulation, for example. At least a portion of the information contained within the received RDMA sequence numbered frame, referred to as a payload, may be copied to the application user space. - The
TCP 412, andIP 414 may comprise methods that enable information to be exchanged via a network according to applicable standards as defined by the Internet Engineering Task Force (IETF). TheEthernet 416 may comprise methods that enable information to be exchanged via a network according to applicable standards as defined by the IEEE. - In operation, the
local node 302 may transfer data to theremote node 306 via thenetwork 204. Anupper layer protocol 404 may comprise anapplication 210 that issues an RDMA write request to write information from the application user space 222 to theapplication user space 254. The RDMA write request may cause theRDMA protocol 406 to establish an RDMA connection between thelocal node 302, and theremote node 306. TheRDMA protocol 406 may send a connection request message to theremote computer system 306. In response, the MPA protocol 410 may request that theTCP 412 establish a TCP connection between thelocal node 302 and theremote node 306. Upon establishment of the TCP connection the MPA protocol 410 may encapsulate at least a portion of the RDMA connection request message in a TCP packet that may be sent to theremote node 306 via the established TCP connection. The MPA protocol 410 may subsequently receive a TCP packet containing the corresponding RDMA response message. The MPA protocol 410 may decapsulate the TCP packet and send at least a portion of the RDMA response message to theRDMA protocol 406. Accordingly, a TCP connection may be established between thelocal node 302 and theremote node 306. The TCP connection may be utilized by a corresponding RDMA connection to exchange information via thenetwork 204. - An
upper layer protocol 404 may be utilized to transfer information from thelocal node 302 in an RDMA sequence numbered frame to theremote node 306 via established the RDMA connection. At the completion of the information transfer from thelocal node 302 to theremote node 306, the RDMA connection may be terminated. Correspondingly, the TCP connection utilized in connection with the RDMA connection may also be terminated. - In a conventional RDMA over TCP implementation the number of RDMA connections may be equal to the number of TCP connections. Consequently, in a cluster environment, the total number of TCP and RDMA connection may be equal to twice the number of connections as indicated in equation[1].
- The total number of connections may be reduced if a single TCP connection is utilized to transport information corresponding to a plurality of RDMA connections between the
local node 302 and theremote node 306. In this case, the TCP connection may be utilized as a tunnel. One approach to TCP tunneling may utilize the stream control transport protocol (SCTP). -
FIG. 5 is an illustration of an exemplary RDMA over TCP protocol stack utilizing SCTP, in connection with an embodiment of the invention. Referring toFIG. 5 , there is shown a conventional RDMA overTCP protocol stack 502. The RDMA overTCP protocol stack 502 may comprise anupper layer protocol 404, anRDMA protocol 406, a directdata placement protocol 408, anSCTP 510, anIP 414, and anEthernet protocol 416. An RNIC may comprise functionality associated with theRDMA protocol 406,DDP 408,SCTP 510,IP 414, andEthernet protocol 416. - Aspects of the
SCTP 510 may comprise functionality equivalent to the MPA protocol 410 andTCP 412. In addition, theSCTP 510 may allow a TCP connection to correspond to a plurality of RDMA connections. TheSCTP 510 may comprise methods that enable frames transmitted in an RDMA connection to be transported, via the network, through an SCTP association. An SCTP association may comprise functionality comparable to a TCP connection. For the purposes of this application, an SCTP association may also be referred to as an SCTP connection. An SCTP connection, however, may incorporate additional functionality beyond a TCP connection that may enable the SCTP connection to be utilized as a tunnel. TheSCTP 510 may enable a single SCTP connection to carry frames associated with a corresponding plurality of RDMA connections. -
SCTP 510 may be utilized in theexemplary protocol stack 502 to reduce the total number of connections in a cluster environment in comparison to theexemplary protocol stack 402. One disadvantage in the utilization ofSCTP 510 is that an RNIC may be required to store executable code that may comprise overlapping functionality. For example, aTCP 412 stack may typically be stored in an RNIC. To take advantage of the tunneling capability ofSCTP 510, the RNIC may be required to store executable code forSCTP 510, including code that comprises functionality that substantially overlaps that ofTCP 412. In addition, some intermediate nodes within thenetwork 204, may be unable to process packets in an SCTP connection. For example, firewalls and/or port network address translation (PNAT) nodes may be unable to process packets transported in an SCTP connection. - Various embodiments of the invention may provide a method and a system for tunneling a plurality of RDMA connections within a TCP connection. In one aspect, this may enable greater reuse of existing protocol stacks stored in the RNIC while achieving the benefits of tunneling. Various embodiments of the invention may be utilized with existing network infrastructures that comprise firewall nodes, PNAT nodes, and/or devices that implement various security methods within the
network 204. -
FIG. 6 is a block diagram of an exemplary system for an MST-MPA protocol, in accordance with an embodiment of the invention. Referring toFIG. 6 , there is shown anetwork 204, and alocal computer system 602, and aremote computer system 606. Thelocal computer system 602 may comprise an RDMA-enabled network interface card (RNIC) 612, a plurality ofprocessors local applications system memory 620, and abus 622. TheRNIC 612 may comprise a TCP offload engine (TOE) 641, amemory 634, a plurality ofnetwork interfaces bus 636. TheTOE 641 may comprise aprocessor 643, alocal connection point 645, and a localRDMA access point 647. Theremote computer system 606 may comprise aRNIC 642, a plurality ofprocessors remote applications system memory 650, and abus 652. TheRNIC 642 may comprise a TOE 672, amemory 664, anetwork interface 662, and abus 666. The TOE 672 may comprise aprocessor 674, aremote connection point 676, and a remote RDMA access point. - The
processor 614 a may comprise suitable logic, circuitry, and/or code that may be utilized to transmit, receive and/or process data. Theprocessor 614 a may execute application code, for example a database application. Theprocessor 614 a may be coupled to abus 622. Theprocessor 614 a may perform protocol processing when transmitting and/or receiving data via thebus 622. - In the transmitting direction, the protocol processing performed by the
processor 614 a may comprise receiving data and/or instructions from anapplication 614 b, for example. The data may comprise one or more upper layer protocol (ULP) protocol data units (PDU). The instructions may comprise instructions that cause theprocessor 614 a to perform tasks related to the RDMA protocol. The instructions may result from function calls from an RDMA application programming interface (API). An instruction may cause theprocessor 614 a to perform steps to initiate one or more RDMA connections. - In the receiving direction the protocol processing performed by the
processor 614 a may comprise receiving ULP PDUs via thebus 622 that were received via theNIC 612. Theprocessor 614 a may perform protocol processing on at least a portion of the ULP PDU received from theNIC 612, via thebus 622. At least a portion of the ULP PDU may be subsequently utilized by anapplication 614 b, for example. - The
local application 614 b may comprise a computer program that comprises at least one code section that may be executable by theprocessor 614 a for causing theprocessor 614 a to perform steps comprising protocol processing, in accordance with an embodiment of the invention. Theprocessor 616 a may be substantially as described for theprocessor 614 a. Thelocal application 616 b may be substantially as described for thelocal application 614 b. Theprocessor 618 a may be substantially as described for theprocessor 614 a. Thelocal application 618 b may be substantially as described for thelocal application 614 b. - The
system memory 620 may comprise suitable logic, circuitry, and/or code that may be utilized to store, or write, and/or retrieve, or read, information, data, and/or executable code. Thesystem memory 620 may comprise a plurality of as random access memory (RAM) technologies such as, for example, DRAM. Thesystem memory 620 may be utilized to store and/or retrieve data and/or PDUs that may be processed by one or more of theprocessors memory 620 may comprise code that may be executed by the one or more of theprocessors - The
RNIC 612 may comprise suitable circuitry, logic and/or code that may transmit and/or receive data from a network, for example, an Ethernet network. TheRNIC 612 may be coupled to the network 604. TheRNIC 612 may enable thelocal computer system 602 to utilize RDMA to exchange information with a peer computer system in a cluster environment. TheRNIC 612 may process data received and/or transmitted via thenetwork 204. TheRNIC 612 may be coupled to thebus 622. TheRNIC 612 may process data received and/or transmitted via thebus 622. In the transmitting direction, theRNIC 612 may receive data via thebus 622. TheNIC 612 may process the data received via thebus 622 and transmit the processed data via thenetwork 204. In the receiving direction, theRNIC 612 may receive data via thenetwork 204. TheRNIC 612 may process the data received via thenetwork 204 and transmit the processed data via thebus 622. - The
TOE 641 may comprise suitable logic, circuitry, and/or code to receive data via the bus 222 from one ormore processors TOE 641 may receive data via thebus 622. TheTOE 641 may perform protocol processing that encapsulates at least a portion of the received data in a protocol data unit (PDU) that may be constructed in accordance with a protocol specification, for example, RDMA. The RDMA PDU may be referred to as an RDMA frame, or frame. TheTOE 641 may also perform protocol processing that encapsulates at least a portion of the RDMA frame in a PDU that may be constructed in accordance with a protocol specification, for example, TCP. - The TCP PDU may be referred to as a TCP packet, or packet. The portion of the RDMA frame may in turn be contained in one or more MST-MPA protocol messages. In addition to containing at least a portion of an RDMA frame, the MST-MPA protocol message may contain a frame length, source endpoint identifier, destination endpoint identifier, source sequence number, and/or error check fields. At least a portion of the MST-MPA protocol message may then be contained in a TCP packet. The TCP protocol processing may comprise constructing one or more PDU header fields comprising source and/or destination network addresses, source and/or destination port identifiers, and/or computation of error check fields. The packet may be transmitted via the bus 236 for subsequent transmission via the
network 204. In various embodiments of the invention, theTOE 641 may associate a plurality of RDMA connections with a TCP connection. The TCP connection may be utilized as a tunnel that transports encapsulated MST-MPA protocol messages, or portions thereof, in TCP packets across anetwork 204 via the TCP connection. - In the receiving direction the
TOE 641 may receive PDUs via thebus 636 that were previously received via thenetwork 204. TheTOE 641 may perform TCP protocol processing that decapsulates at least a portion the PDU received from thenetwork 204, via the bus 236 in accordance with a protocol specification, to extract one or more MST-MPA protocol messages. The TCP protocol processing may comprise verifying one or more PDU header fields comprising source and/or destination network addresses, source and/or destination port identifiers, and/or computations to detect and/or correct bit errors in the received PDU. The MST-MPA protocol processing may comprise verifying source and/or destination endpoint identifiers, source sequence numbers, and/or computations to detecte and/or correct bit errors in the received MST-MPA protocol message. The RDMA frame may be derived from one or more lower layer protocol PDUs, for example, one or more MST-MPA protocol messages. TheTOE 641 may perform RDMA protocol processing that decapsulates at least a portion of the RDMA frame to extract data. The RDMA protocol processing may comprise verifying one or more frame header fields comprising frame length, source endpoint identifier, destination endpoint identifier, source sequence number and/or error check fields. The data may be subsequently processed by theTOE 641 any transmitted via thebus 622. - The
TOE 641 may cause at least a portion of a PDU that was received via thebus 636 that was previously received via thenetwork 204 to be stored in thememory 634. TheTOE 641 may cause at least a portion of a PDU, which is to be subsequently transmitted via thenetwork 204, to be stored in thememory 634. TheTOE 641 may cause an intermediate result, comprising a PDU or data, which is processed at least in part by theTOE 641, to be stored in thememory 634. - The
memory 634 may comprise suitable logic, circuitry, and/or code that may be utilized to store, or write, and/or retrieve, or read, information, data, and/or executable code. Thememory 634 may comprise a random access memory (RAM) such as DRAM and/or SRAM. Thememory 634 may be utilized to store and/or retrieve data and/or PDUs that may be processed by theTOE 641. Thememory 634 may store code that may be executed by theTOE 641. - The
network interface 632 may comprise suitable logic, circuitry, and/or code that may be utilized to transmit and/or receive PDUs via anetwork 204. The network interface may be coupled to thenetwork 204. Thenetwork interface 632 may be coupled to thebus 636. Thenetwork interface 632 may receive bits via thebus 636. Thenetwork interface 632 may subsequently transmit the bits via thenetwork 204 that may be contained in a representation of a PDU by converting the bits into electrical and/or optical signals, with timing parameters, and with signal amplitude, energy and/or power levels as specified by an appropriate specification for a network medium, for example, Ethernet. Thenetwork interface 632 may also transmit framing information that identifies the start and/or end of a transmitted PDU. - The
network interface 632 may receive bits that may be contained in a PDU received via thenetwork 204 by detecting framing bits indicating the start and/or end of the PDU. Between the indication of the start of the PDU and the end of the PDU, thenetwork interface 632 may receive subsequent bits based on detected electrical and/or optical signals, with timing parameters, and with signal amplitude, energy and/or power levels as specified by an appropriate specification for a network medium, for example, Ethernet. Thenetwork interface 632 may subsequently transmit the bits via thebus 636. Thenetwork interface 633 may be substantially as described fornetwork interface 632. - The
processor 643 may comprise suitable logic, circuitry, and/or code that may be utilized to perform at least a portion of the protocol processing tasks within theTOE 641. - The
local connection point 645 may comprise a computer program and/or code may be executable by theprocessor 643, which may perform RDMA and/or TCP protocol processing. Exemplary protocol processing may comprise establishment of TCP tunnels, in accordance with an embodiment of the invention. - The local
RDMA access point 647 may comprise a computer program that comprises at least one code section that may be executable by theprocessor 643 for causing theprocessor 643 to perform steps comprising protocol processing, for example protocol processing related to the establishment of RDMA connection and/or the association of a plurality of RDMA connections with a corresponding one or more TCP tunnels, in accordance with an embodiment of the invention. - The
processor 644 a may be substantially as described for theprocessor 614 a. Theprocessor 644 a may be coupled to thebus 652. Thelocal application 644 b may be substantially as described for thelocal application 614 b. Theprocessor 646 a may be substantially as described for theprocessor 614 a. Theprocessor 646 a may be coupled to thebus 652. Thelocal application 646 b may be substantially as described for thelocal application 614 b. Theprocessor 648 a may be substantially as described for theprocessor 614 a. Theprocessor 648 a may be coupled to thebus 652. - The
local application 648 b may be substantially as described for thelocal application 614 b. Thesystem memory 650 may be substantially as described for thesystem memory 620. Thesystem memory 650 may be coupled to thebus 652. TheRNIC 642 may be substantially as described for theRNIC 612. TheRNIC 642 may be coupled to thebus 652. The TOE 672 may be substantially as described for theTOE 641. The TOE 672 may be coupled to thebus 652. The TOE 672 may be coupled to thebus 666. Thenetwork interface 662 may be substantially as described for thenetwork interface 632. Thenetwork interface 662 may be coupled to thebus 666. Thememory 664 may be substantially as described for thememory 634. Thememory 664 may be coupled to thebus 666. Theprocessor 674 may be substantially as described for theprocessor 643. Theremote connection point 676 may be substantially as described for thelocal connection point 645. The remoteRDMA access point 677 may be substantially as described for the localRDMA access point 647. - In operation, one or more
local applications remote applications local computer system 602, and theremote computer system 606. The TCP connections may be referred to as communication channels. The plurality of TCP connections may be associated with a TCP tunnel. The TCP tunnel may be associated with a plurality of network interfaces, for example network interfaces 633 and 634 located in theRNIC 612. Any of the plurality of TCP connections associated with the TCP tunnel may be utilized by at least a portion of the plurality of RDMA connections. An individual RDMA connection may utilize at least a portion of the plurality of TCP connections. An individual TCP connection among the plurality of TCP connections may be associated with a single network interface among the plurality of network interfaces. For example, in a TCP tunnel comprising two individual TCP connections, a first TCP connection may be associated with afirst network interface 633, while a second TCP connection may be associated with asecond network interface 634. A TCP connection may be associated with a network interface if information transported across anetwork 204 via the TCP connection utilizes the network interface. An RDMA connection may utilize the first TCP to transport a current portion of a plurality messages, and the second TCP connection to transport a subsequent portion of the plurality of messages. - In a fault tolerant embodiment of the invention that utilizes a
single RNIC 612, the RDMA connection may utilize the first TCP connection to transport at least a portion of the plurality of messages. If a failure occurs in the first TCP connection such that thelocal computer system 602 is unable to continue sending messages to theremote computer system 606, subsequent messages may utilize the second TCP connection. - In the above example, the first TCP connection may be referred to as the active TCP connection with respect to the RDMA connection, while the second TCP connection may be referred to as the standby TCP connection. The active or standby status of a TCP connection may be with respect to a single RDMA connection. For example, a second RDMA connection that utilizes the tunnel may utilize the second TCP connection as the active TCP connection, while utilizing the first TCP connection as the standby TCP connection.
- The routing of the first TCP connection within the
network 204 may differ from the routing of the second TCP connection. In one aspect, afirst network interface 633 may be coupled to a first access router or switch within thenetwork 204, while asecond network interface 634 may be coupled to a second access router or switch within thenetwork 204. In this regard, failure of a single component within the network, or a single point of failure, may not result in a failure of both the first and second TCP connections. Similarly, the utilization of a plurality of network interfaces at theRNIC 612 may enable the TCP tunnel to transport messages associated with the RDMA connection in the event of a failure of asingle network interface - In a fault tolerant embodiment of the invention that utilizes a plurality of RNICs, the TCP tunnel may comprise a plurality of TCP connections associated with interfaces located at each RNIC. For example, in a TCP tunnel comprising four individual TCP connections, a first TCP connection may be associated with a first network interface located at the first RNIC, while a second TCP connection may be associated with a second network interface located at the first RNIC. Furthermore, a third TCP connection may be associated with a first network interface located at the second RNIC, while a fourth TCP connection may be associated with a second network interface located at the second RNIC. An RDMA connection may utilize the first TCP connection to transport at least a portion of the plurality of messages. If a failure occurs in the first TCP connection such that the
local computer system 602 is unable to continue sending messages to theremote computer system 606, subsequent messages may utilize the third TCP connection. - An RDMA connection may comprise state information about the connection. For example, MST-MPA protocol messages sent via the RDMA connection may be sequence numbered. In embodiments of the invention that utilize a plurality or RNICs, the RNICs may exchange information about the state of individual RDMA connections that utilize the respective RNICs. For example, in the above example, when the RDMA connection utilized the first TCP connection, the first RNIC may maintain state information related to the RDMA connection. The first RNIC may be referred to as the active RNIC with respect to the RDMA connection. The second RNIC, which was utilized when the first TCP connection failed, may be referred to as the standby RNIC with respect to the RDMA connection. The active RNIC may update the standby RNIC with state information related to the RDMA connection. This process of active RNIC to standby RNIC updating of information may be referred to as checkpointing.
- In the above example, the RDMA connection utilized the first TCP connection, which was associated with the first interface located at the first RNIC, as the active TCP connection. Consequently, the first RNIC was the active RNIC. The active or standby status of an RNIC may be with respect to a single RDMA connection. For example, a second RDMA connection that utilizes the tunnel may utilize the second RNIC as the active RNIC, while utilizing the first RNIC as the standby RNIC. The second RDMA connection may utilize the third TCP connection, which was associated with the first interface located at the second RNIC, as the active TCP connection. In the event of a failure of the third TCP connection, the second RDMA connection may utilize the first TCP connection, for example.
- In a data striping embodiment of the invention, the network interfaces 633 and 634 may be utilized to provide an aggregate increase in the data transfer rate across the
network 204. For example, an RDMA connection may utilize the first TCP connection to transport a current portion of a plurality of messages while concurrently utilizing the second TCP connection to transport a subsequent portion of the plurality of messages. For example, an nth message, sent via the RDMA connection, may utilize thefirst network interface 633, while an (n+1)th message, also sent via the RDMA connection, may concurrently utilize thesecond network interface 634. - Once failure of a TCP connection within the TCP tunnel is detected, a new TCP connection may be established within the tunnel as a replacement for the failed TCP connection. Furthermore, the RNIC associated with the failed TCP connection may send probe messages to the
network 204 to derive an indication of when the TCP connection failure may have ended. Probe messages may comprise one or more echo messages as specified by the Internet Control Message Protocol (ICMP), for example. - U.S. application Ser. No. ______ (Attorney Docket No. 17036US02) filed on an even date herewith, provides a detailed description of procedures for establishment of a communication channel, utilizing a TCP connection that may be utilized as a tunnel, and is hereby incorporated by reference in its entirety.
- U.S. application Ser. No. ______ (Attorney Docket No. 17097US02) filed on an even date herewith, provides a detailed description of procedures for establishment of an RDMA connection that utilizes a TCP tunnel, and is hereby incorporated by reference in its entirety.
- In various embodiments of the invention, a
local TOE 641 may establish a high availability TCP tunnel to a remote TOE 672. The high availability tunnel may comprise a plurality of TCP connections. With respect to an individual RDCP connection that may utilize the TCP tunnel, one of the plurality of TCP connections may be an active TCP connection, while other TCP connections associated with the TCP tunnel may be standby connections. Thelocal TOE 641 may send a connection request message to the remote TOE 672. The connection request message may comprise a plurality of elements. Exemplary elements may comprise a tunnel cookie, a maximum number of tunnel connections, and a list of one or more endpoint addresses. Optionally, a maximum endpoint identifier may be specified. The maximum endpoint identifier may identify one or morelocal endpoints 614 b that may utilize the RDMA tunnel. The maximum endpoint identifier may correspond to a maximum local port value associated with an application associated with the correspondinglocal endpoint 614 b. The local port value may identify a specificlocal endpoint 614 b. - The tunnel cookie may represent an identifier of the TCP tunnel. This value may be useful when subsequently modifying the TCP tunnel. For example, when issuing a subsequent connection request message to add TCP connections, or remove existing TCP connections, the TCP tunnel may be utilized to authenticate the request. The maximum number of tunnel connections may represent an indication of the maximum number of TCP connections that may be contained within the established TCP tunnel. The number of TCP connections may be associated with a single RNIC or a plurality of RNICs.
- The list of one or more endpoint identifiers may represent a plurality of local addresses. The local addresses may represent local network addresses that may be associated with a network interface located at an RNIC. The RNIC may be located at the
local computer system 602. In various embodiments of the invention, each of the one or more endpoint identifiers may be associated with a different network interface and/or different access router or switch corresponding to a different route through thenetwork 204. For example, in a connection request message comprising two endpoint identifiers, a first endpoint identifier may be associated with thenetwork interface 633, while a second endpoint identifier may be associated with thenetwork interface 634. The network address may enable thenetwork 204 to route TCP connections, and the messages carried within RDMA connections that utilize the TCP connections, to be properly routed between an interface located at alocal computer system 602 and aremote computer system 606 via thenetwork 204. -
FIG. 7 is a block diagram of an exemplary system for high availability when utilizing an MST-MPA with a single RNIC, in accordance with an embodiment of the invention. Referring toFIG. 7 , there is shown anetwork 204, alocal computer system 602, and aTCP tunnel 702. Thelocal computer system 602 may comprise anRNIC 612, aprocessor 643, amemory 634, andnetwork interfaces - The
TCP tunnel 702 may comprise a plurality of TCP connections indicated by thereference numbers TCP tunnel 702 may comprise a plurality of TCP connections between thelocal computer system 602 and aremote computer system 606 via thenetwork 204 as illustrated inFIG. 6 . With reference to an RDMA connection that may utilize theTCP tunnel 702, theTCP connection 1 may represent an active TCP connection, while theTCP connection 2 may represent a standby TCP connection. The active TCP connection may be associated with thenetwork interface 634, while the standby interface may be associated with thenetwork interface 633. RDMA frames transported via an RDMA connection may utilize theTCP connection 1. The RDMA connection may be transported across thenetwork 204 via thenetwork interface 634. Various embodiments of the invention may not be limited to utilizing an establishedTCP connection 2. For example, upon failure of theTCP connection 1, a new TCP connection may be established within the tunnel. The new TCP connection may be established by sending a connection request message that comprises a tunnel cookie that identifies theTCP tunnel 702, for example. -
FIG. 8 is a block diagram of fault recovery in an exemplary system for high availability when utilizing an MST-MPA with a single RNIC, in accordance with an embodiment of the invention. Referring toFIG. 7 , there is shown anetwork 204, alocal computer system 602, and aTCP tunnel 702. Thelocal computer system 602 may comprise anRNIC 612, aprocessor 643, amemory 634, andnetwork interfaces -
FIG. 8 represents an annotation ofFIG. 7 to illustrate a fault recovery response to a failure of an active TCP connection. TheTCP connection 1 may fail for various reasons, for example, a cable may inadvertently be removed from thenetwork interface 634, a hardware, software, or firmware failure may occur causing a failure at thenetwork interface 634, or a failure may occur within thenetwork 204. Similarly, a failure of theTCP connection 1 may be determined if failures are detected in other TCP connections that utilize the same network interface. The failure of theTCP connection 1 may be detected at theRNIC 612 by TCP procedures as specified in applicable TCP specifications. Upon detection of the failure of the TCP connection at thenetwork interface 634, theprocessor 643 within theRNIC 612 may cause theactive TCP connection 1 to enter an out-of-service state with respect to the RDMA connection. Thestandby TCP connection 2 may subsequently enter an active state with respect to the RDMA connection. Subsequent RDMA frames associated with the RDMA connection may be transported across thenetwork 204 via thenetwork interface 633. -
FIG. 9 is a block diagram illustrating data striping in an exemplary system for high availability when utilizing an MST-MPA with a single RNIC, in accordance with an embodiment of the invention. Referring toFIG. 9 , there is shown anetwork 204, alocal computer system 602, and aTCP tunnel 702. Thelocal computer system 602 may comprise anRNIC 612, aprocessor 643, amemory 634, andnetwork interfaces -
FIG. 9 represents an annotation ofFIG. 7 to illustrate data striping. Data striping may utilize a plurality of network interfaces to enable information to be transported in an RDMA connection at a data rate that exceeds the data rate of a single network interface. In a data striping configuration, with reference to an RDMA connection that may utilize theTCP tunnel 702, theTCP connection 1 may represent an active TCP connection, while theTCP connection 2 may also represent an active TCP connection. In a data striping configuration a portion of RDMA frames from an RDMA connection may be transported via theTCP connection 1, while a subsequent portion of the RDMA frames from the RDMA connection may be concurrently transported via theTCP connection 2. -
FIG. 10 is a block diagram of an exemplary system for high availability when utilizing an MST-MPA with a duplex RNIC configuration, in accordance with an embodiment of the invention. Referring toFIG. 10 , there is shown anetwork 204, alocal computer system 602, and aTCP tunnel 1002. Thelocal computer system 602 may comprise an RNIC 612 a, and anRNIC 612 b. TheRNIC 612 a may comprise aprocessor 643 a, amemory 634 a, a network interfaces 633 a and 634 a. TheRNIC 612 b may comprise aprocessor 643 b, amemory 634 b, andnetwork interfaces RNIC 612 b may be referred to as a mate RNIC to theRNIC 612 a. TheRNIC 612 a may be referred as a mate RNIC to theRNIC 612 b. - The
TCP tunnel 1002 may comprise a plurality of TCP connections indicated by thereference numbers TCP tunnel 1002 may comprise a plurality of TCP connections between thelocal computer system 602 and aremote computer system 606 via thenetwork 204 as illustrated inFIG. 6 . With reference to an RDMA connection that may utilize theTCP tunnel 1002, theTCP connection 1 may represent an active TCP connection, while theTCP connection 2 may represent a standby TCP connection. The active TCP connection may be associated with thenetwork interface 634 a, while the standby interface may be associated with thenetwork interface 634 b. TheTCP connection 3 may be associated with thenetwork interface 633 a. The TCP connection 4 may be associated with thenetwork interface 633 b. The network interfaces 633 a and 634 a may be located at theRNIC 612 a, while thenetwork interface RNIC 612 b. - With respect to the RDMA connection, the
RNIC 612 a may represent an active RNIC 612 a, while theRNIC 612 b may represent astandby RNIC 612 b. RDMA frames transported via an RDMA connection may utilize theTCP connection 1. The RDMA connection may be transported across thenetwork 204 via thenetwork interface 634 b. TheTCP connections 3 and 4 may be utilized by other RDMA connections.TCP connections - The
processor 643 a located in theRNIC 612 a may checkpoint to theprocessor 643 b located in themate RNIC 612 b. The checkpointing between the processors, indicated by thereference number 5, may comprise updating on the state of RDMA active connections carried via the respective RNICs. For example, theRNIC 612 a may maintain state information related to RDMA connections that utilize active TCP connections associated withnetwork interfaces RNIC 612 b may maintain state information related to RDMA connections that utilize active TCP connections associated withnetwork interfaces processor 643 a may checkpoint theprocessor 643 b with state information related to active TCP connections associated withnetwork interfaces processor 643 b may checkpoint theprocessor 643 a with state information related to active TCP connections associated withnetwork interfaces -
FIG. 11 is a block diagram of an exemplary system for high availability when utilizing an MST-MPA with a duplex RNIC configuration, in accordance with an embodiment of the invention. Referring toFIG. 10 , there is shown anetwork 204, alocal computer system 602, and aTCP tunnel 1002. Thelocal computer system 602 may comprise an RNIC 612 a, and anRNIC 612 b. TheRNIC 612 a may comprise aprocessor 643 a, amemory 634 a, a network interfaces 633 a and 634 a. TheRNIC 612 b may comprise aprocessor 643 b, amemory 634 b, andnetwork interfaces RNIC 612 b may be referred to as a mate RNIC to theRNIC 612 a. TheRNIC 612 a may be referred as a mate RNIC to theRNIC 612 b. -
FIG. 11 represents an annotation ofFIG. 10 to illustrate a fault recovery response to a failure of an active TCP connection. The failure of theTCP connection 1 may be detected at theRNIC 612 a by TCP procedures as specified in applicable TCP specifications. Upon detection of the failure of the TCP connection at thenetwork interface 634 a, theprocessor 643 a within theRNIC 612 a may cause theactive TCP connection 1 to enter an out-of-service state with respect to the RDMA connection. Theprocessor 643 a may checkpoint theprocessor 643 b in themate RNIC 612 b to indicate the failure of theTCP connection 1 via thecheckpointing link 5. Thestandby TCP connection 2 may subsequently enter an active state with respect to the RDMA connection. Subsequent RDMA frames associated with the RDMA connection may be transported across thenetwork 204 via thenetwork interface 634 b. Various embodiments of the invention may not be limited to utilizing an establishedTCP connection 2. For example, upon failure of theTCP connection 1, a new TCP connection may be established within the tunnel. The new TCP connection may be established by sending a connection request message that comprises a tunnel cookie that identifies theTCP tunnel 1002, for example. -
FIG. 12 is a flowchart illustrating an exemplary process for high availability when utilizing a MST-MPA protocol, in accordance with an embodiment of the invention. Referring toFIG. 12 , instep 1202, alocal connection point 645 may establish aTCP tunnel 1002 to aremote connection point 676 via anetwork 204. In step 1204, the localRDMA access point 647 may establish an RDMA connection via an active TCP connection over theTCP tunnel 1002. Instep 1205, thelocal connection point 645 may send RDMA frames via the active TCP connection over theTCP tunnel 1002.Step 1206, may determine whether thelocal computer system 602 comprises asingle RNIC 612 a, or a plurality of RNICs, for example, a duplex configuration comprising amate RNIC 612 b. If there is no mate RNIC, instep 1208, thelocal connection point 645 may detect a failure in the active TCP connection. Thelocal connection point 645 may receive notification of the failure of the active TCP connection from thenetwork interface 633 and/or 634. Instep 1210, thelocal connection point 645 may switch the RDMA connection from acurrent network interface 634 such that subsequent RDMA frames may be transported via a TCP connection associated with asubsequent network interface 633. - If there is a mate RNIC, in
step 1212, theRNIC 612 a may checkpoint themate RNIC 612 b. Instep 1214, thelocal connection point 645 may detect a failure in the active TCP connection. Thelocal connection point 645 may receive notification of the failure of the active TCP connection from thenetwork interface 633 a and/or 634 a. Instep 1216, thelocal connection point 645 may switch the RDMA connection from acurrent network interface 634 a such that subsequent RDMA frames may be transported via a TCP connection associated with asubsequent network interface 634 b located at themate RNIC 612 b. - Aspects of a system for transporting information via a communications system may include a
processor 643 that may enable establishing a plurality of TCP communication channels between a local RDMA enabled NIC (RNIC) 612 and at least one of a plurality ofremote RNICs 642. Each of the plurality of TCP communication channels may be communicatively coupled to a plurality of different network interfaces at thelocal RNIC 612. Theprocessor 643 may enable establishing of RDMA connections between one of a plurality of local RDMA endpoints and at least one remote RDMA endpoint utilizing the established plurality of TCP communication channels. Theprocessor 643 may enable communicating of a portion of a plurality of messages from one of a plurality of local RDMA endpoints communicatively coupled to a first of the plurality of different network interfaces at the local RNIC. The portion of the plurality of messages may be communicated to at least one remote RDMA endpoint communicatively coupled to one of the plurality of remote RNICs via a first of the established plurality of TCP communication channels. Theprocessor 643 may also enable communicating a remaining portion of the plurality of messages from one of the plurality of local RDMA endpoints communicatively coupled to a second of the plurality of different network interfaces at the local RNIC. The remaining portion of the messages may be communicated to at least one remote endpoint via a second of the established plurality of TCP communication channels. - Each of the plurality of different network interfaces may utilize a different network address. The
processor 643 may enable placing the first of the plurality of different network interfaces in an out-of-service state prior to communication of the remaining portion of the plurality of messages. The first of the plurality of different network interfaces and the second of the plurality of different network interfaces may each be in either an active state or a standby state. Theprocessor 643 may enable communicating of a subsequent message, to the remaining portion of the plurality of messages, via said first of the plurality of different network interfaces. The first of the plurality of different network interfaces and the second of said plurality of different network interfaces may be associated with said local RNIC. The first of the plurality of different network interfaces may be associated with a first local RNIC and the second of said plurality of different network interfaces may be associated with a different local RNIC. - Accordingly, the present invention may be realized in hardware, software, or a combination of hardware and software. The present invention may be realized in a centralized fashion in at least one computer system, or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system or other apparatus adapted for carrying out the methods described herein is suited. A typical combination of hardware and software may be a general-purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein.
- The present invention may also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which when loaded in a computer system is able to carry out these methods. Computer program in the present context means any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following: a) conversion to another language, code or notation; b) reproduction in a different material form.
- While the present invention has been described with reference to certain embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the present invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the present invention without departing from its scope. Therefore, it is intended that the present invention not be limited to the particular embodiment disclosed, but that the present invention will include all embodiments falling within the scope of the appended claims.
Claims (21)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/269,062 US20060168274A1 (en) | 2004-11-08 | 2005-11-08 | Method and system for high availability when utilizing a multi-stream tunneled marker-based protocol data unit aligned protocol |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US62628304P | 2004-11-08 | 2004-11-08 | |
US11/269,062 US20060168274A1 (en) | 2004-11-08 | 2005-11-08 | Method and system for high availability when utilizing a multi-stream tunneled marker-based protocol data unit aligned protocol |
Publications (1)
Publication Number | Publication Date |
---|---|
US20060168274A1 true US20060168274A1 (en) | 2006-07-27 |
Family
ID=36698363
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/269,062 Abandoned US20060168274A1 (en) | 2004-11-08 | 2005-11-08 | Method and system for high availability when utilizing a multi-stream tunneled marker-based protocol data unit aligned protocol |
Country Status (1)
Country | Link |
---|---|
US (1) | US20060168274A1 (en) |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060164999A1 (en) * | 2005-01-27 | 2006-07-27 | Fujitsu Limited | Network monitoring program, network system and network monitoring method |
US20070280228A1 (en) * | 2006-06-06 | 2007-12-06 | Murata Kikai Kabushiki Kaisha | Communication system and remote diagnosis system |
US20090063625A1 (en) * | 2007-08-28 | 2009-03-05 | Rohati Systems, Inc. | Highly scalable application layer service appliances |
US20090100194A1 (en) * | 2007-10-15 | 2009-04-16 | Dell Products, Lp | System and method of emulating a network controller within an information handling system |
US20090138615A1 (en) * | 2007-11-28 | 2009-05-28 | Alcatel-Lucent | System and method for an improved high availability component implementation |
US20090191917A1 (en) * | 2005-11-21 | 2009-07-30 | Nec Corporation | Method of communication between a (u)sim card in a server mode and a client |
US20090288104A1 (en) * | 2008-05-19 | 2009-11-19 | Rohati Systems, Inc. | Extensibility framework of a network element |
US20090288135A1 (en) * | 2008-05-19 | 2009-11-19 | Rohati Systems, Inc. | Method and apparatus for building and managing policies |
US20090285228A1 (en) * | 2008-05-19 | 2009-11-19 | Rohati Systems, Inc. | Multi-stage multi-core processing of network packets |
US20090288136A1 (en) * | 2008-05-19 | 2009-11-19 | Rohati Systems, Inc. | Highly parallel evaluation of xacml policies |
US20100070471A1 (en) * | 2008-09-17 | 2010-03-18 | Rohati Systems, Inc. | Transactional application events |
US20110170553A1 (en) * | 2008-05-01 | 2011-07-14 | Jon Beecroft | Method of data delivery across a network fabric in a router or ethernet bridge |
US20110225308A1 (en) * | 2010-03-09 | 2011-09-15 | Kabushiki Kaisha Toshiba | Data communication apparatus and method |
US8369345B1 (en) * | 2009-11-13 | 2013-02-05 | Juniper Networks, Inc. | Multi-router system having shared network interfaces |
US20130185441A1 (en) * | 2005-09-26 | 2013-07-18 | Nec Corporation | Mobile radio communication device and method of managing connectivity status for the same |
US8566471B1 (en) * | 2006-01-09 | 2013-10-22 | Avaya Inc. | Method of providing network link bonding and management |
US20130332557A1 (en) * | 2012-06-12 | 2013-12-12 | International Business Machines Corporation | Redundancy and load balancing in remote direct memory access communications |
US8856354B1 (en) * | 2006-12-29 | 2014-10-07 | F5 Networks, Inc. | TCP-over-TCP using multiple TCP streams |
US8930507B2 (en) | 2012-06-12 | 2015-01-06 | International Business Machines Corporation | Physical memory shared among logical partitions in a VLAN |
US9178966B2 (en) | 2011-09-27 | 2015-11-03 | International Business Machines Corporation | Using transmission control protocol/internet protocol (TCP/IP) to setup high speed out of band data communication connections |
US20160094608A1 (en) * | 2014-09-30 | 2016-03-31 | Qualcomm Incorporated | Proactive TCP Connection Stall Recovery for HTTP Streaming Content Requests |
US20160112318A1 (en) * | 2014-10-21 | 2016-04-21 | Fujitsu Limited | Information processing system, method, and information processing apparatus |
US9396101B2 (en) | 2012-06-12 | 2016-07-19 | International Business Machines Corporation | Shared physical memory protocol |
US9485149B1 (en) | 2004-01-06 | 2016-11-01 | Juniper Networks, Inc. | Routing device having multiple logical routers |
US20180241809A1 (en) * | 2017-02-21 | 2018-08-23 | Microsoft Technology Licensing, Llc | Load balancing in distributed computing systems |
US20180278539A1 (en) * | 2015-12-29 | 2018-09-27 | Amazon Technologies, Inc. | Relaxed reliable datagram |
US20180278540A1 (en) * | 2015-12-29 | 2018-09-27 | Amazon Technologies, Inc. | Connectionless transport service |
US10860511B1 (en) * | 2015-12-28 | 2020-12-08 | Western Digital Technologies, Inc. | Integrated network-attachable controller that interconnects a solid-state drive with a remote server computer |
US10917344B2 (en) | 2015-12-29 | 2021-02-09 | Amazon Technologies, Inc. | Connectionless reliable transport |
US20220131768A1 (en) * | 2018-03-30 | 2022-04-28 | Intel Corporation | Communication of a message using a network interface controller on a subnet |
US11451476B2 (en) | 2015-12-28 | 2022-09-20 | Amazon Technologies, Inc. | Multi-path transport design |
US12218841B1 (en) | 2019-12-12 | 2025-02-04 | Amazon Technologies, Inc. | Ethernet traffic over scalable reliable datagram protocol |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5822531A (en) * | 1996-07-22 | 1998-10-13 | International Business Machines Corporation | Method and system for dynamically reconfiguring a cluster of computer systems |
US6192483B1 (en) * | 1997-10-21 | 2001-02-20 | Sun Microsystems, Inc. | Data integrity and availability in a distributed computer system |
US20020059451A1 (en) * | 2000-08-24 | 2002-05-16 | Yaron Haviv | System and method for highly scalable high-speed content-based filtering and load balancing in interconnected fabrics |
US6438705B1 (en) * | 1999-01-29 | 2002-08-20 | International Business Machines Corporation | Method and apparatus for building and managing multi-clustered computer systems |
US20030110276A1 (en) * | 2001-12-10 | 2003-06-12 | Guy Riddle | Dynamic tunnel probing in a communications network |
US20040010612A1 (en) * | 2002-06-11 | 2004-01-15 | Pandya Ashish A. | High performance IP processor using RDMA |
US20040049774A1 (en) * | 2002-09-05 | 2004-03-11 | International Business Machines Corporation | Remote direct memory access enabled network interface controller switchover and switchback support |
US6718392B1 (en) * | 2000-10-24 | 2004-04-06 | Hewlett-Packard Development Company, L.P. | Queue pair partitioning in distributed computer system |
US20050060442A1 (en) * | 2003-09-15 | 2005-03-17 | Intel Corporation | Method, system, and program for managing data transmission through a network |
US7055085B2 (en) * | 2002-03-07 | 2006-05-30 | Broadcom Corporation | System and method for protecting header information using dedicated CRC |
US7142539B2 (en) * | 2001-05-31 | 2006-11-28 | Broadcom Corporation | TCP receiver acceleration |
US7171452B1 (en) * | 2002-10-31 | 2007-01-30 | Network Appliance, Inc. | System and method for monitoring cluster partner boot status over a cluster interconnect |
US7295555B2 (en) * | 2002-03-08 | 2007-11-13 | Broadcom Corporation | System and method for identifying upper layer protocol message boundaries |
US7328144B1 (en) * | 2004-04-28 | 2008-02-05 | Network Appliance, Inc. | System and method for simulating a software protocol stack using an emulated protocol over an emulated network |
-
2005
- 2005-11-08 US US11/269,062 patent/US20060168274A1/en not_active Abandoned
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5822531A (en) * | 1996-07-22 | 1998-10-13 | International Business Machines Corporation | Method and system for dynamically reconfiguring a cluster of computer systems |
US6192483B1 (en) * | 1997-10-21 | 2001-02-20 | Sun Microsystems, Inc. | Data integrity and availability in a distributed computer system |
US6438705B1 (en) * | 1999-01-29 | 2002-08-20 | International Business Machines Corporation | Method and apparatus for building and managing multi-clustered computer systems |
US20020059451A1 (en) * | 2000-08-24 | 2002-05-16 | Yaron Haviv | System and method for highly scalable high-speed content-based filtering and load balancing in interconnected fabrics |
US7346702B2 (en) * | 2000-08-24 | 2008-03-18 | Voltaire Ltd. | System and method for highly scalable high-speed content-based filtering and load balancing in interconnected fabrics |
US6718392B1 (en) * | 2000-10-24 | 2004-04-06 | Hewlett-Packard Development Company, L.P. | Queue pair partitioning in distributed computer system |
US7142539B2 (en) * | 2001-05-31 | 2006-11-28 | Broadcom Corporation | TCP receiver acceleration |
US20030110276A1 (en) * | 2001-12-10 | 2003-06-12 | Guy Riddle | Dynamic tunnel probing in a communications network |
US7055085B2 (en) * | 2002-03-07 | 2006-05-30 | Broadcom Corporation | System and method for protecting header information using dedicated CRC |
US7295555B2 (en) * | 2002-03-08 | 2007-11-13 | Broadcom Corporation | System and method for identifying upper layer protocol message boundaries |
US20040010612A1 (en) * | 2002-06-11 | 2004-01-15 | Pandya Ashish A. | High performance IP processor using RDMA |
US20040049774A1 (en) * | 2002-09-05 | 2004-03-11 | International Business Machines Corporation | Remote direct memory access enabled network interface controller switchover and switchback support |
US7171452B1 (en) * | 2002-10-31 | 2007-01-30 | Network Appliance, Inc. | System and method for monitoring cluster partner boot status over a cluster interconnect |
US20050060442A1 (en) * | 2003-09-15 | 2005-03-17 | Intel Corporation | Method, system, and program for managing data transmission through a network |
US7328144B1 (en) * | 2004-04-28 | 2008-02-05 | Network Appliance, Inc. | System and method for simulating a software protocol stack using an emulated protocol over an emulated network |
Cited By (77)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9832099B1 (en) | 2004-01-06 | 2017-11-28 | Juniper Networks, Inc. | Routing device having multiple logical routers |
US9485149B1 (en) | 2004-01-06 | 2016-11-01 | Juniper Networks, Inc. | Routing device having multiple logical routers |
US20060164999A1 (en) * | 2005-01-27 | 2006-07-27 | Fujitsu Limited | Network monitoring program, network system and network monitoring method |
US7623465B2 (en) * | 2005-01-27 | 2009-11-24 | Fujitsu Limited | Network monitoring program, network system and network monitoring method |
US20130185441A1 (en) * | 2005-09-26 | 2013-07-18 | Nec Corporation | Mobile radio communication device and method of managing connectivity status for the same |
US20090191917A1 (en) * | 2005-11-21 | 2009-07-30 | Nec Corporation | Method of communication between a (u)sim card in a server mode and a client |
US8566471B1 (en) * | 2006-01-09 | 2013-10-22 | Avaya Inc. | Method of providing network link bonding and management |
US20070280228A1 (en) * | 2006-06-06 | 2007-12-06 | Murata Kikai Kabushiki Kaisha | Communication system and remote diagnosis system |
US7778184B2 (en) * | 2006-06-06 | 2010-08-17 | Murata Kikai Kabushiki Kaisha | Communication system and remote diagnosis system |
US8856354B1 (en) * | 2006-12-29 | 2014-10-07 | F5 Networks, Inc. | TCP-over-TCP using multiple TCP streams |
US9491201B2 (en) | 2007-08-28 | 2016-11-08 | Cisco Technology, Inc. | Highly scalable architecture for application network appliances |
US7921686B2 (en) | 2007-08-28 | 2011-04-12 | Cisco Technology, Inc. | Highly scalable architecture for application network appliances |
US20090064287A1 (en) * | 2007-08-28 | 2009-03-05 | Rohati Systems, Inc. | Application protection architecture with triangulated authorization |
US20090063688A1 (en) * | 2007-08-28 | 2009-03-05 | Rohati Systems, Inc. | Centralized tcp termination with multi-service chaining |
US20090059957A1 (en) * | 2007-08-28 | 2009-03-05 | Rohati Systems, Inc. | Layer-4 transparent secure transport protocol for end-to-end application protection |
US20090063625A1 (en) * | 2007-08-28 | 2009-03-05 | Rohati Systems, Inc. | Highly scalable application layer service appliances |
US20090063665A1 (en) * | 2007-08-28 | 2009-03-05 | Rohati Systems, Inc. | Highly scalable architecture for application network appliances |
US20090063701A1 (en) * | 2007-08-28 | 2009-03-05 | Rohati Systems, Inc. | Layers 4-7 service gateway for converged datacenter fabric |
US20090064288A1 (en) * | 2007-08-28 | 2009-03-05 | Rohati Systems, Inc. | Highly scalable application network appliances with virtualized services |
US8443069B2 (en) | 2007-08-28 | 2013-05-14 | Cisco Technology, Inc. | Highly scalable architecture for application network appliances |
US20090063747A1 (en) * | 2007-08-28 | 2009-03-05 | Rohati Systems, Inc. | Application network appliances with inter-module communications using a universal serial bus |
US7895463B2 (en) | 2007-08-28 | 2011-02-22 | Cisco Technology, Inc. | Redundant application network appliances using a low latency lossless interconnect link |
US7913529B2 (en) | 2007-08-28 | 2011-03-29 | Cisco Technology, Inc. | Centralized TCP termination with multi-service chaining |
US8621573B2 (en) | 2007-08-28 | 2013-12-31 | Cisco Technology, Inc. | Highly scalable application network appliances with virtualized services |
US8295306B2 (en) | 2007-08-28 | 2012-10-23 | Cisco Technologies, Inc. | Layer-4 transparent secure transport protocol for end-to-end application protection |
US20110173441A1 (en) * | 2007-08-28 | 2011-07-14 | Cisco Technology, Inc. | Highly scalable architecture for application network appliances |
US9100371B2 (en) | 2007-08-28 | 2015-08-04 | Cisco Technology, Inc. | Highly scalable architecture for application network appliances |
US20090063893A1 (en) * | 2007-08-28 | 2009-03-05 | Rohati Systems, Inc. | Redundant application network appliances using a low latency lossless interconnect link |
US8161167B2 (en) | 2007-08-28 | 2012-04-17 | Cisco Technology, Inc. | Highly scalable application layer service appliances |
US8180901B2 (en) | 2007-08-28 | 2012-05-15 | Cisco Technology, Inc. | Layers 4-7 service gateway for converged datacenter fabric |
US8346912B2 (en) * | 2007-10-15 | 2013-01-01 | Dell Products, Lp | System and method of emulating a network controller within an information handling system |
US20090100194A1 (en) * | 2007-10-15 | 2009-04-16 | Dell Products, Lp | System and method of emulating a network controller within an information handling system |
US20130086262A1 (en) * | 2007-10-15 | 2013-04-04 | Dell Products, Lp | System and Method of Emulating a Network Controller within an Information Handling System |
US8521873B2 (en) * | 2007-10-15 | 2013-08-27 | Dell Products, Lp | System and method of emulating a network controller within an information handling system |
US10148742B2 (en) * | 2007-11-28 | 2018-12-04 | Alcatel Lucent | System and method for an improved high availability component implementation |
US20090138615A1 (en) * | 2007-11-28 | 2009-05-28 | Alcatel-Lucent | System and method for an improved high availability component implementation |
US20110170553A1 (en) * | 2008-05-01 | 2011-07-14 | Jon Beecroft | Method of data delivery across a network fabric in a router or ethernet bridge |
US9401876B2 (en) * | 2008-05-01 | 2016-07-26 | Cray Uk Limited | Method of data delivery across a network fabric in a router or Ethernet bridge |
US8677453B2 (en) | 2008-05-19 | 2014-03-18 | Cisco Technology, Inc. | Highly parallel evaluation of XACML policies |
US20090288104A1 (en) * | 2008-05-19 | 2009-11-19 | Rohati Systems, Inc. | Extensibility framework of a network element |
US8667556B2 (en) | 2008-05-19 | 2014-03-04 | Cisco Technology, Inc. | Method and apparatus for building and managing policies |
US8094560B2 (en) | 2008-05-19 | 2012-01-10 | Cisco Technology, Inc. | Multi-stage multi-core processing of network packets |
US20090288135A1 (en) * | 2008-05-19 | 2009-11-19 | Rohati Systems, Inc. | Method and apparatus for building and managing policies |
US20090285228A1 (en) * | 2008-05-19 | 2009-11-19 | Rohati Systems, Inc. | Multi-stage multi-core processing of network packets |
US20090288136A1 (en) * | 2008-05-19 | 2009-11-19 | Rohati Systems, Inc. | Highly parallel evaluation of xacml policies |
US20100070471A1 (en) * | 2008-09-17 | 2010-03-18 | Rohati Systems, Inc. | Transactional application events |
US8369345B1 (en) * | 2009-11-13 | 2013-02-05 | Juniper Networks, Inc. | Multi-router system having shared network interfaces |
US9444768B1 (en) | 2009-11-13 | 2016-09-13 | Juniper Networks, Inc. | Multi-router system having shared network interfaces |
US20110225308A1 (en) * | 2010-03-09 | 2011-09-15 | Kabushiki Kaisha Toshiba | Data communication apparatus and method |
US9130957B2 (en) * | 2010-03-09 | 2015-09-08 | Kabushiki Kaisha Toshiba | Data communication apparatus and method |
US9178966B2 (en) | 2011-09-27 | 2015-11-03 | International Business Machines Corporation | Using transmission control protocol/internet protocol (TCP/IP) to setup high speed out of band data communication connections |
US9473596B2 (en) | 2011-09-27 | 2016-10-18 | International Business Machines Corporation | Using transmission control protocol/internet protocol (TCP/IP) to setup high speed out of band data communication connections |
US8930507B2 (en) | 2012-06-12 | 2015-01-06 | International Business Machines Corporation | Physical memory shared among logical partitions in a VLAN |
US9417996B2 (en) | 2012-06-12 | 2016-08-16 | International Business Machines Corporation | Shared physical memory protocol |
US20130332767A1 (en) * | 2012-06-12 | 2013-12-12 | International Business Machines Corporation | Redundancy and load balancing in remote direct memory access communications |
US8954785B2 (en) * | 2012-06-12 | 2015-02-10 | International Business Machines Corporation | Redundancy and load balancing in remote direct memory access communications |
US20130332557A1 (en) * | 2012-06-12 | 2013-12-12 | International Business Machines Corporation | Redundancy and load balancing in remote direct memory access communications |
US9396101B2 (en) | 2012-06-12 | 2016-07-19 | International Business Machines Corporation | Shared physical memory protocol |
US20160094608A1 (en) * | 2014-09-30 | 2016-03-31 | Qualcomm Incorporated | Proactive TCP Connection Stall Recovery for HTTP Streaming Content Requests |
CN106716966A (en) * | 2014-09-30 | 2017-05-24 | 高通股份有限公司 | Proactive tcp connection stall recovery for http streaming content requests |
EP3202104A1 (en) * | 2014-09-30 | 2017-08-09 | Qualcomm Incorporated | Proactive tcp connection stall recovery for http streaming content requests |
US20160112318A1 (en) * | 2014-10-21 | 2016-04-21 | Fujitsu Limited | Information processing system, method, and information processing apparatus |
US11451476B2 (en) | 2015-12-28 | 2022-09-20 | Amazon Technologies, Inc. | Multi-path transport design |
US10860511B1 (en) * | 2015-12-28 | 2020-12-08 | Western Digital Technologies, Inc. | Integrated network-attachable controller that interconnects a solid-state drive with a remote server computer |
US20180278540A1 (en) * | 2015-12-29 | 2018-09-27 | Amazon Technologies, Inc. | Connectionless transport service |
US10645019B2 (en) * | 2015-12-29 | 2020-05-05 | Amazon Technologies, Inc. | Relaxed reliable datagram |
US10673772B2 (en) * | 2015-12-29 | 2020-06-02 | Amazon Technologies, Inc. | Connectionless transport service |
US20180278539A1 (en) * | 2015-12-29 | 2018-09-27 | Amazon Technologies, Inc. | Relaxed reliable datagram |
US10917344B2 (en) | 2015-12-29 | 2021-02-09 | Amazon Technologies, Inc. | Connectionless reliable transport |
US11343198B2 (en) | 2015-12-29 | 2022-05-24 | Amazon Technologies, Inc. | Reliable, out-of-order transmission of packets |
US11770344B2 (en) | 2015-12-29 | 2023-09-26 | Amazon Technologies, Inc. | Reliable, out-of-order transmission of packets |
US10652320B2 (en) * | 2017-02-21 | 2020-05-12 | Microsoft Technology Licensing, Llc | Load balancing in distributed computing systems |
US11218537B2 (en) * | 2017-02-21 | 2022-01-04 | Microsoft Technology Licensing, Llc | Load balancing in distributed computing systems |
US20180241809A1 (en) * | 2017-02-21 | 2018-08-23 | Microsoft Technology Licensing, Llc | Load balancing in distributed computing systems |
US20220131768A1 (en) * | 2018-03-30 | 2022-04-28 | Intel Corporation | Communication of a message using a network interface controller on a subnet |
US11799738B2 (en) * | 2018-03-30 | 2023-10-24 | Intel Corporation | Communication of a message using a network interface controller on a subnet |
US12218841B1 (en) | 2019-12-12 | 2025-02-04 | Amazon Technologies, Inc. | Ethernet traffic over scalable reliable datagram protocol |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20060168274A1 (en) | Method and system for high availability when utilizing a multi-stream tunneled marker-based protocol data unit aligned protocol | |
CN110771118B (en) | Seamless mobility and session continuity with TCP mobility options | |
US20060101225A1 (en) | Method and system for a multi-stream tunneled marker-based protocol data unit aligned protocol | |
US8250643B2 (en) | Communication device, communication system, communication method, and program | |
US7526577B2 (en) | Multiple offload of network state objects with support for failover events | |
CN102238230B (en) | Method and system for offloading tunnel packet processing in cloud computing | |
US7801135B2 (en) | Transport protocol connection synchronization | |
US20030140124A1 (en) | TCP offload device that load balances and fails-over between aggregated ports having different MAC addresses | |
US8447802B2 (en) | Address manipulation to provide for the use of network tools even when transaction acceleration is in use over a network | |
US20060159011A1 (en) | Detecting unavailable network connections | |
US7672223B2 (en) | Method and apparatus for replicating a transport layer protocol stream | |
US11888818B2 (en) | Multi-access interface for internet protocol security | |
US7269661B2 (en) | Method using receive and transmit protocol aware logic modules for confirming checksum values stored in network packet | |
CN110086689B (en) | Double-stack BFD detection method and system | |
US20070266174A1 (en) | Method and system for reliable multicast datagrams and barriers | |
US20150373135A1 (en) | Wide area network optimization | |
US20060101090A1 (en) | Method and system for reliable datagram tunnels for clusters | |
US20060209830A1 (en) | Packet processing system including control device and packet forwarding device | |
CN101361325B (en) | Packet packaging and redirecting method for data packet | |
US7672239B1 (en) | System and method for conducting fast offloading of a connection onto a network interface card | |
EP2124397A1 (en) | A method for transfering the ip transmission session and the equipment whereto | |
US7420991B2 (en) | TCP time stamp processing in hardware based TCP offload | |
CN117201510A (en) | File synchronization method, device, equipment and storage medium | |
CN116032689A (en) | Message transmission method based on tunnel and client gateway equipment | |
KR20020052066A (en) | Upgrading transmitting rate method using hop count in mobile computing environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: BROADCOM CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ALONI, ELIEZER;OREN, AMIT;BESTLER, CAITLIN;REEL/FRAME:019861/0056;SIGNING DATES FROM 20060105 TO 20070817 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: BANK OF AMERICA, N.A., AS COLLATERAL AGENT, NORTH CAROLINA Free format text: PATENT SECURITY AGREEMENT;ASSIGNOR:BROADCOM CORPORATION;REEL/FRAME:037806/0001 Effective date: 20160201 Owner name: BANK OF AMERICA, N.A., AS COLLATERAL AGENT, NORTH Free format text: PATENT SECURITY AGREEMENT;ASSIGNOR:BROADCOM CORPORATION;REEL/FRAME:037806/0001 Effective date: 20160201 |
|
AS | Assignment |
Owner name: AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD., SINGAPORE Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BROADCOM CORPORATION;REEL/FRAME:041706/0001 Effective date: 20170120 Owner name: AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BROADCOM CORPORATION;REEL/FRAME:041706/0001 Effective date: 20170120 |
|
AS | Assignment |
Owner name: BROADCOM CORPORATION, CALIFORNIA Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:BANK OF AMERICA, N.A., AS COLLATERAL AGENT;REEL/FRAME:041712/0001 Effective date: 20170119 |