Skip to content

Comments

Multiple TCP Connections for Live Migrations#7669

Open
amphi wants to merge 14 commits intocloud-hypervisor:mainfrom
amphi:upstream-live-migration-multiple-tcp-connections
Open

Multiple TCP Connections for Live Migrations#7669
amphi wants to merge 14 commits intocloud-hypervisor:mainfrom
amphi:upstream-live-migration-multiple-tcp-connections

Conversation

@amphi
Copy link

@amphi amphi commented Feb 5, 2026

Summary

This PR implements VM live migrations using multiple TCP connections. Most of this work is taken from @blitz , so kudos to him!

The send-migration HTTP command now accepts a connections parameter (defaults to 1) that specifies how many connections to use for live migration.

Benchmarks

We did a quick test on two of our servers which have a 100G connection. We transferred a VM with 50GB RAM, here are our results:

Connections Throughput in MiB/s
1 1617.29
2 3752.15
4 7238.08
8 11502.83
16 10683.72

We also ran iperf between the two machines and got a throughput of 11.5 GiB/s, so I'd say the feature works pretty good.

Hint: MiB/s is Mebibyte per Second, GiB is Gibibyte per second

Design

If connections is larger than 1, the sender will attempt to establish additional TCP connections to the same migration destination. The main (initial) connection handles most of the migration protocol. The additional connections handle only Memory commands for transferring chunks of VM memory.

For each additional connection, a thread is created that receives chunks of memory from the main thread, and sends those chunks to the receiver.

For each iteration of sending memory, the MemoryTable that describes dirty memory will be split into chunks of fixed size (CHUNK_SIZE). These chunks will then be distributed among the available threads using an MPSC channel wrapped in a Mutex. This channel can have a configurable backlog of outstanding chunks to send (BUFFERED_REQUESTS_PER_THREAD). This is currently 64 chunks per thread to keep the memory consumption at a sensible level (Otherwise, for VMs with a huge amount of memory, this may take up a lot of additional memory).

We still use the original request-response scheme. Since we don't pipeline requests, but always wait for the other side to acknowledge them, we have a fundamental limit on throughput that we can reach. The original code only expected one ACK for the whole dirty memory table. We now have one ACK per chunk.

@blitz came up with this formula of the upper bound of throughput per connection:

effective_throughput = chunk_size / (chunk_size / throughput_per_connection + round_trip_time)

This formula is also in the code. We've played around with this and with large enough chunks, the impact seems negligible, especially since we can scale up connections. Feel free to plug in your favorite numbers.

@amphi amphi requested a review from a team as a code owner February 5, 2026 14:49
@amphi amphi force-pushed the upstream-live-migration-multiple-tcp-connections branch from a39f7b7 to b8a8caf Compare February 5, 2026 15:30
Copy link
Member

@phip1611 phip1611 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Left a few remarks. I'll refrain from approving this because I'm biased (Sebastian and Julian are my colleagues)

Please, as discussed, add specific bandwidth numbers to the PR description and the commit series, showing how awesome this work is!

@amphi amphi force-pushed the upstream-live-migration-multiple-tcp-connections branch from b8a8caf to 0460ed6 Compare February 6, 2026 10:08
blitz and others added 14 commits February 10, 2026 10:03
This is not wired up to anywhere yet. We will use this to establish
multiple connections for live migration.

On-behalf-of: SAP julian.stecklina@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
This has no functional change, but it is a requirement to remove the
lock that used to obtain the MemoryManager instance.

On-behalf-of: SAP julian.stecklina@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
... to avoid having to grab a lock when we receive a chunk of memory
over the migration socket. This will come in handy when we have
multiple threads for receiving memory.

On-behalf-of: SAP julian.stecklina@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
To allow for accepting more connections in the migration receive code
paths, we need to keep track of the listener. This commit adds a thin
abstraction to be able to hold on to it regardless of whether it is a
UNIX domain or TCP socket.

On-behalf-of: SAP julian.stecklina@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
We keep the listening socket around and accept as many connections as
the sender wants to open.

There are still some problems: We never tear these threads down
again. We will handle this in subsequent commits.

On-behalf-of: SAP julian.stecklina@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
In anticipation of using multiple threads for sending memory, refactor
the sending code to be in a single place.

On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
... to be able to re-use it when establishing multiple send
connections. I moved the receive socket creation out for symmetry.

On-behalf-of: SAP julian.stecklina@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
... to simplify sending memory from multiple connections in future
commits.

On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
For sending memory over multiple connections, we need a way to split
up the work. With these changes, we can take a memory table and chop
it into same-sized chunks for transmit.

On-behalf-of: SAP julian.stecklina@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
This does not actually use the additional connections yet, but we are
getting closer!

On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
This will stop us from listening for more connections on the TCP socket
when migration has finished.

Tearing down the individual connections will come in a subsequent
commit.

Co-authored-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP julian.stecklina@sap.com
On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
... after the VM migration finishes.

On-behalf-of: SAP julian.stecklina@sap.com
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
This solves the race condition in the following scenario: Thread A is
done working and waits at the barrier. Thread B encounters an error and
sends it to the main thread. Thread A is still waiting at the barrier,
and the main thread cannot abort that. With the custom gate, the main
thread can simply open the gate, and all waiting threads will continue.

Even if Thread A now gets the gate-message that was sent for Thread B,
the gate is now open and Thread A will not block.

On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
@amphi amphi force-pushed the upstream-live-migration-multiple-tcp-connections branch from 0460ed6 to 001ad37 Compare February 10, 2026 09:04
@phip1611 phip1611 self-requested a review February 11, 2026 06:20
@likebreath likebreath self-requested a review February 16, 2026 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants