From fa5de0c5cd4061b3b70d5e9eb2d67a4a7c594a63 Mon Sep 17 00:00:00 2001 From: Alexander Motin Date: Wed, 20 Mar 2024 20:22:36 -0400 Subject: [PATCH] Update resume token at object receive. Before this change resume token was updated only on data receive. Usually it is enough to resume replication without much overlap. But we've got a report of a curios case, where replication source was traversed with recursive grep, which through enabled atime modified every object without modifying any data. It produced several gigabytes of replication traffic without a single data write and so without a single resume point. While the resume token was not designed to resume from an object, I've found that the send implementation always sends object before any data. So by requesting resume from offset 0 we are effectively resuming from the object, followed (or not) by the data at offset 0, just as we need it. Reviewed-by: Allan Jude Reviewed-by: Paul Dagnelie Signed-off-by: Alexander Motin Sponsored by: iXsystems, Inc. Closes #15927 --- module/zfs/dmu_recv.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/module/zfs/dmu_recv.c b/module/zfs/dmu_recv.c index 54aa60259ea1..2cf10909738b 100644 --- a/module/zfs/dmu_recv.c +++ b/module/zfs/dmu_recv.c @@ -2110,6 +2110,16 @@ receive_object(struct receive_writer_arg *rwa, struct drr_object *drro, dmu_buf_rele(db, FTAG); dnode_rele(dn, FTAG); } + + /* + * If the receive fails, we want the resume stream to start with the + * same record that we last successfully received. There is no way to + * request resume from the object record, but we can benefit from the + * fact that sender always sends object record before anything else, + * after which it will "resend" data at offset 0 and resume normally. + */ + save_resume_state(rwa, drro->drr_object, 0, tx); + dmu_tx_commit(tx); return (0);