Among gRPC’s many imperfections, none is worse than its use of HTTP trailers. Perversely, gRPC fans seem eager to defend this mistake. As an example, a former gRPC maintainer recently published Why Does gRPC Insist on Trailers. He argues that trailers protect clients from abruptly dropped TCP connections and compensate for oddities in the binary encoding of Protocol Buffers. Bluntly, both of these arguments are nonsense.
gRPC doesn’t need HTTP trailers. In this post, I’ll explain:
- What trailers are and how gRPC uses them,
- Why they’re unnecessary,
- How they impede gRPC adoption,
- How Google could fix gRPC in a minor release, and
- How we can do even better without Google.
What are trailers?
Trailers have been lurking in the HTTP specification since at least HTTP/1.1, released in 1997 (when they were called “footers”). Simply put, they’re headers that come after the request or response body. The recent HTTP Semantics RFC suggests that they “can be useful for supplying message integrity checks, digital signatures, delivery metrics, or post-processing status information.” If you’ve been slinging HTTP for decades and haven’t heard of trailers, you’re not alone — most HTTP/1.1 implementations don’t support them, so they’re rarely used.
Why, then, would gRPC rely on trailers? Because gRPC supports streaming
responses, in which the server writes multiple records to the response body.
Imagine that a gRPC server is preparing to stream a large collection to a
client. The server connects to the database, executes a query, and begins
inspecting the results. Everything is going well, so the server sends a 200
OK
status code and some headers. One by one, the server begins reading records
from the database and writing them to the response body. Then the database
crashes. How should the server tell the client that something has gone wrong?
The client has already received a 200 OK
HTTP status code, so it’s too late
to send a 500 Internal Server Error
. Because the server has already started
sending the response body, it’s also too late to send more headers. The
server’s only options are to send the error as the last portion of the response
body or to send it in trailers.
gRPC chooses to use trailers. Responses include a gRPC-specific status code in
the grpc-status
trailer and a description of the error (if any) in the
grpc-message
trailer. Even successful responses must explicitly set
grpc-status
to 0.
Are trailers necessary?
Of course not! But let’s at least address the two arguments above.
- Clients don’t need trailers to detect dropped TCP connections. In this
argument, gRPC apologists claim that trailers help clients detect incomplete
responses. “What if,” they say, “the server — or some proxy — crashes
partway through a streaming response and drops the TCP connection? By
insisting on an explicit status code in trailers, clients can detect a
prematurely-terminated response body.” This is plausible-sounding,
especially when accompanied by an HTTP/1.1 example, but it’s nonsense. gRPC
requires at least HTTP/2, and both HTTP/2 and HTTP/3 handle this explicitly:
every HTTP/2 frame includes a byte of bitwise
flags, and the frame
types used for headers, trailers, and body data all include an explicit
END_STREAM
flag used to cleanly terminate the response. If the client sees a TCP connection drop before it receives an HTTP/2 frame withEND_STREAM
set, it knows that the response is incomplete — no trailers needed. - Nothing about Protocol Buffers requires trailers. In this variant of the first argument, gRPC apologists argue that detecting dropped TCP connections is especially important when using Protocol Buffers. “If gRPC only supported JSON,” they say, “clients would detect many incomplete responses by noticing unbalanced curly braces. But Protocol Buffer messages don’t have explicit delimiters, so we really need to rely on trailers to detect dropped connections.” But not only does HTTP/2 provide an unambiguous way to detect dropped connections, the gRPC protocol doesn’t rely on encoding-specific delimiters to find message boundaries. Instead, it prefixes each message in a stream with its length. Clients can easily detect response bodies that end before delivering the promised quantity of data. Again, trailers don’t add any safety.
Why are trailers bad?
If trailers were unnecessary but harmless, I wouldn’t be ranting to internet
strangers. But trailers aren’t harmless: they make it difficult to add gRPC
APIs to existing applications. Is your Python application built with Django,
Flask, or FastAPI? Too bad — WSGI and ASGI don’t support trailers, so your
application can’t handle gRPC-flavored HTTP. Trying to call your gRPC server
from an iPhone? Sorry, URLSession
doesn’t support trailers either. Rather
than adding a few new routes to your existing server and client, you’re stuck
building a entirely new application for RPC.
To support trailers, your new application uses a gRPC-specific HTTP stack. But
apart from supporting trailers, your new stack is less capable than your old
one: usually, gRPC’s HTTP implementation can only serve RPCs over HTTP/2. If
you also want to serve an HTML page, receive a file upload, support HTTP/1.1 or
HTTP/3, or just handle an HTTP GET
, you’re out of luck. In practice, adopting
gRPC requires a multi-service backend architecture.
These pains are most acute on the web. Like many other clients, fetch
doesn’t
support trailers. Unlike mobile or backend applications, though, web
applications can’t bundle an alternate, gRPC-specific HTTP client. Instead,
they’re forced to proxy requests through Envoy, which translates them on the
fly from a trailer-free protocol to standard gRPC. Envoy is a perfectly fine
proxy, but it’s a lot to configure and manage in production if you’re only
using it to work around gRPC’s quirks. And of course, no web developer enjoys
running a C++ proxy during local development.
In short, relying on trailers abandons one of HTTP’s key strengths: the ready availability of interoperable servers and clients.
Could Google fix gRPC?
When Google designed gRPC, trailer support had just been added to the fetch
specification. If the Chrome, Firefox, Safari, and Edge teams had followed
through and implemented the proposed APIs, other HTTP implementations might
have followed their lead. Instead, browser makers withdrew their support for
the new APIs, and they were formally removed from the specification in late
2019.
It’s now 2023. Trailers aren’t coming to browsers — or to most other HTTP implementations — for years, if ever. Even Cloudflare, a multi-billion dollar internet infrastructure company, doesn’t have end-to-end support for trailers. The gRPC team should confront this reality and add support for a second, trailer-free protocol to their servers and clients.
gRPC-Web is the pragmatic choice for a second protocol. It’s nearly identical to standard gRPC, except that it encodes status metadata at the end of the response body rather than in trailers. It uses a different Content-Type, so servers could automatically handle the new protocol alongside the old. Clients could opt into the new protocol with a configuration toggle. Implementations wouldn’t need any other user-visible API changes, so these improvements could ship in a backward-compatible minor release. And because gRPC-Web is already under the gRPC umbrella, we wouldn’t need to convince Google to adopt any outside ideas. (gRPC-Web also drops gRPC’s strict HTTP/2 requirement, which is nice but unnecessary to mitigate the trailers fiasco.)
If today’s gRPC implementations embraced the gRPC-Web protocol, new
implementations could only support gRPC-Web. All of a sudden, grpc-rails
and similar framework integrations would be feasible. Browsers could call gRPC
backends directly. iOS applications could drop their multi-megabyte dependency
on SwiftNIO
. Without trailers, gRPC could meet developers where they are.
Microsoft seems to agree with this assessment: they’ve built support for the
gRPC-Web protocol into grpc-dotnet
. If you’d like Google to do the same,
upvote issue 29818 in the main gRPC
repository.
Could we do better without Google?
gRPC-Web might be the pragmatic choice, but it still leaves a lot to be desired. What if we were bolder? To really improve upon gRPC, we’d use different protocols for streaming and request-response RPCs. The streaming protocol would be similar to gRPC-Web, but we’d bring the request-respose protocol closer to familiar, resource-oriented HTTP:
- We’d support HTTP/1.1 and HTTP/2.
- We’d use meaningful HTTP status codes.
- We’d dispense with trailers and end-of-body metadata and just rely on headers.
- We wouldn’t need to length-prefix messages, so the body could be plain JSON
or binary Protocol Buffer. That lets us use recognizable Content-Types,
like
application/json
. - We’d use the standard
Accept-Encoding
header, so web applications benefit from compressed responses. - We’d support
GET
requests for cacheable RPCs. With some care, we could avoid having theseGET
requests trigger CORS preflight from browsers. - For servers using Protocol Buffer schemas, we’d encourage implementations to support both binary and JSON payloads by default (using the canonical JSON mapping).
None of these changes affect the protocol’s efficiency, but they eliminate most
of gRPC’s fussiness. Creating a User
becomes a cURL one-liner:
curl --json '{"name": "Akshay"}' https://api.acme.com/user.v1/Create
This protocol just works because it’s boring. It works with human-readable
JSON and optimized binary encodings. It works with cURL and requests
. It
works with fetch
and browsers’ built-in debuggers. It works with URLSession
and Charles Proxy. It works with Rails, Django, FastAPI, Laravel, and Express.
It works with CDNs and browser caches. It works with standard penetration
testing toolkits.
I can’t imagine Google embracing a protocol that’s so different from today’s gRPC, especially if it requires HTTP/1.1 support, but you can try it today: use Connect. Connect servers and clients support the full gRPC protocol, gRPC-Web, and the simpler protocol we just outlined. Implementations are available in Go, TypeScript, Swift, and Kotlin.