Vladimir Homutov via nginx-devel
November 09, 2023 10:16AM
> On Thu, Oct 26, 2023 at 03:08:55AM +0400, Sergey Kandaurov wrote:
> > # HG changeset patch
> > # User Vladimir Khomutov <vl@wbsrv.ru>
> > # Date 1697031803 -10800
> > # Wed Oct 11 16:43:23 2023 +0300
> > # Node ID 9ba2840e88f62343b3bd794e43900781dab43686
> > # Parent 1f188102fbd944df797e8710f70cccee76164add
> > QUIC: fixed handling of PTO counter.
> >
> > The RFC 9002 clearly says in "6.2. Probe Timeout":
> > ...
> > As with loss detection, the PTO is per packet number space.
> > That is, a PTO value is computed per packet number space.
> >
> > Despite that, current code is using per-connection PTO counter.
> > For example, this may lead to situation when packet loss at handshake
> > level will affect PTO calculation for initial packets, preventing
> > send of new probes.
>
> Although PTO value is per packet number space, PTO backoff is not,
> see "6.2.1 Computing PTO":
>
> : When ack-eliciting packets in multiple packet number spaces are in flight, the
> : timer MUST be set to the earlier value of the Initial and Handshake packet
> : number spaces.

And I read this fragment as:
- there are multiple timer values (i.e. per packet number space)
(and value is pto * backoff)
- we have to choose the earliest value

The ngx_quic_pto() has nothing that depends on packet number space
(with the minor exception that we add max_ack_delay at application level
after the handshake)
So pto_count is the only thing that can make timer values to be
different in different packet number spaces.

>
> But:
>
> : When a PTO timer expires, the PTO backoff MUST be increased <..>
>
> : This exponential reduction in the sender's rate is important because consecutive
> : PTOs might be caused by loss of packets or acknowledgments due to severe
> : congestion. Even when there are ack-eliciting packets in flight in multiple
> : packet number spaces, the exponential increase in PTO occurs across all spaces
> : to prevent excess load on the network. For example, a timeout in the Initial
> : packet number space doubles the length of the timeout in the Handshake packet
> : number space.

yes, this really looks like contradiction.
At least I don't understand how it is possible to have PTO value
different by packet number space given the way we calculate it.

> Even if that would be proven otherwise, I don't think the description
> provides detailed explanation. It describes a pretty specific use case,
> when both Initial and Handshake packet number spaces have in-flight packets
> with different PTO timeout (i.e. different f->last). Typically they are
> sent coalesced (e.g. CRYPTO frames for ServerHello and (at least)
> EncryptedExtensions TLS messages).
> In interop tests, though, it might be different: such packets may be
> sent separately, with Handshake packet thus having a later PTO timeout.
> If such, PTO timer will first fire for the Initial packet, then for Handshake,
> which will result in PTO backoff accumulated for each packet:
>
> t1: <- Initial (lost)
> t2: <- Handshake (lost)
> t1': pto(t1) timeout
> <- Initial (pto_count=1)
> t2': pto(t2) timeout
> <- Handshake (pto_count=2)
> t1'': pto(t1') timeout
> <- Initial (pto_count=3)
>
> So, I would supplement the description with the phrase that that's
> fair typically with uncoalesced packets seen in interop tests, and
> that the same is true vice verse with packet loss at initial packet
> number space affecting PTO backoff in handshake packet number space.
>
> But see above about PTO backoff increase across all spaces.

I tend to think that it is better to leave things as is.
maybe RFC needs some better wording in this case.

I've checked ngtcp2 and msquic and it it looks like both
handle pto counter per-connection too;
(see pto_count in ngtcp2 and QUIC_LOSS_DETECTION.ProbeCount in msquic)


> > Additionally, one case of successful ACK receiving was missing:
> > PING frames are not stored in the ctx->sent queue, thus PTO was not
> > reset when corresponding packets were acknowledged.
>
> See below.
>
> >
> > diff --git a/src/event/quic/ngx_event_quic.c b/src/event/quic/ngx_event_quic.c
> > --- a/src/event/quic/ngx_event_quic.c
> > +++ b/src/event/quic/ngx_event_quic.c
> > @@ -1088,8 +1088,6 @@ ngx_quic_discard_ctx(ngx_connection_t *c
> >
> > ngx_quic_keys_discard(qc->keys, level);
> >
> > - qc->pto_count = 0;
> > -
> > ctx = ngx_quic_get_send_ctx(qc, level);
> >
> > ngx_quic_free_buffer(c, &ctx->crypto);
> > @@ -1120,6 +1118,7 @@ ngx_quic_discard_ctx(ngx_connection_t *c
> > }
> >
> > ctx->send_ack = 0;
> > + ctx->pto_count = 0;
> >
> > ngx_quic_set_lost_timer(c);
> > }
> > diff --git a/src/event/quic/ngx_event_quic_ack.c b/src/event/quic/ngx_event_quic_ack.c
> > --- a/src/event/quic/ngx_event_quic_ack.c
> > +++ b/src/event/quic/ngx_event_quic_ack.c
> > @@ -286,8 +286,12 @@ ngx_quic_handle_ack_frame_range(ngx_conn
> > if (!found) {
> >
> > if (max < ctx->pnum) {
> > - /* duplicate ACK or ACK for non-ack-eliciting frame */
> > - return NGX_OK;
> > + /*
> > + * - ACK for frames not in sent queue (i.e. PING)
> > + * - duplicate ACK
> > + * - ACK for non-ack-eliciting frame
> > + */
> > + goto done;
> > }
> >
> > ngx_log_error(NGX_LOG_INFO, c->log, 0,
> > @@ -300,11 +304,13 @@ ngx_quic_handle_ack_frame_range(ngx_conn
> > return NGX_ERROR;
> > }
> >
> > +done:
> > +
> > if (!qc->push.timer_set) {
> > ngx_post_event(&qc->push, &ngx_posted_events);
> > }
> >
> > - qc->pto_count = 0;
> > + ctx->pto_count = 0;
>
> This part of the change to reset pto_count
> for duplicate ACK or ACK for non-ack-eliciting frame
> contradicts the OnAckReceived example in RFC 9002,
> although I didn't found a format text in the RFC itself:
>
> OnAckReceived(ack, pn_space):
> ...
> // DetectAndRemoveAckedPackets finds packets that are newly
> // acknowledged and removes them from sent_packets.
> newly_acked_packets =
> DetectAndRemoveAckedPackets(ack, pn_space)
> // Nothing to do if there are no newly acked packets.
> if (newly_acked_packets.empty()):
> return
>
> // Update the RTT if the largest acknowledged is newly acked
> // and at least one ack-eliciting was newly acked.
> ...
>
> // Reset pto_count ...
>
> From which it follows that pto_count is reset
> (and RTT updated) for newly ack'ed packets only.

yes, agreed.
the main issue is tracking of in-flight PING frames.
>
> I think the better fix would be to properly track in-flight PING frames.
> Moreover, the current behaviour of not tracking PING frames in ctx->sent
> prevents from a properly calculated PTO timeout: each time it is calculated
> against the original packet (with increasingly receding time to the past)
> that triggered the first PTO timeout, which doesn't result in exponentially
> increased PTO period as expected, but rather some bogus value.

Agree. I've created a patch that fixes this.

_______________________________________________
nginx-devel mailing list
nginx-devel@nginx.org
https://mailman.nginx.org/mailman/listinfo/nginx-devel
Subject Author Views Posted

[patch] quic PTO counter fixes

Vladimir Homutov via nginx-devel 381 October 11, 2023 10:00AM

Re: [patch] quic PTO counter fixes

Thorvaldur Thorvaldsson 82 October 15, 2023 03:50PM

Re: [patch] quic PTO counter fixes

Sergey Kandaurov 63 October 25, 2023 07:10PM

Re: [patch] quic PTO counter fixes

Vladimir Homutov via nginx-devel 60 October 26, 2023 10:30AM

Re: [patch] quic PTO counter fixes

Sergey Kandaurov 58 October 26, 2023 04:28PM

Re: [patch] quic PTO counter fixes

Vladimir Homutov via nginx-devel 51 October 26, 2023 04:38PM

Re: [patch] quic PTO counter fixes

Sergey Kandaurov 77 October 26, 2023 05:34PM

Re: [patch] quic PTO counter fixes

Vladimir Homutov via nginx-devel 58 November 09, 2023 10:16AM

Re: [patch] quic PTO counter fixes

Sergey Kandaurov 66 November 14, 2023 08:12AM



Sorry, you do not have permission to post/reply in this forum.

Online Users

Guests: 64
Record Number of Users: 8 on April 13, 2023
Record Number of Guests: 421 on December 02, 2018
Powered by nginx      Powered by FreeBSD      PHP Powered      Powered by MariaDB      ipv6 ready