The Reconnect Bug and a Proposal to Rewrite %eyre

~dachus-tiprel - 2.21.23

sigils

Urbit is broken. Tlon has been building groups on Urbit for years now, and they still have not reached parity with Discord's performance. I would argue this is not due to engineering incompetence, but because %eyre itself is broken. No matter how much time we put into trying to fix http-api and other frontend js code, we will never reach adequate performance without massive changes to %eyre.

The Case Against SSE

For those not aware, %eyre uses a technology called Server Side Events (SSE) to get a "real time responsive" feel to web applications. This is highly nonstarndard: for the kinds of apps that we want to build on Urbit, WebSockets (WS) are the defacto standard technology used by Discord, Github, Twitter, Signal, Facebook, and more. The main difference is that WS are a two way connection, while SSE is one way (server to client). All the major projects on Urbit require a two way connection. If SSE had no other problems, that would be a compelling enough reason to switch to WebSockets - but unfortunately, SSE has other big problems.

The most infamous bug in all of Urbit is the reconnect bug. This is when users are required to refresh their browser window to reestablish a connection. Sometimes you think your window is still connected, but when you try to send messages, nothing actually goes through. I've gotten to the point where I refresh any time I switch back to an Urbit tab. This is unnaceptable UX. Attempts to fix this bug have been made numerous times by smart engineers, and even they cannot do it. This is probably the nastiest bug on Urbit because users get the impression that Urbit is broken, and in a sense, it is.

The reconnect bug was actually solved on iOS by ~hosryc-matbel from Assembly Capital. In his own words:

[The] networking stack has a request interceptor, one of the callbacks, is a retry handler. you get to inspect the request that failed and tell the networking stack if you want to retry it right away. on GET channel request (SSE channel) we make sure to retry. this works, because it’s basically just re-establishing the TCP connection, nothing more

What is worth noting here is that iOS gives apps greater control of lower level networking primitives, while the browser does not. In fact, the http-api uses the browser's AbortController to determine when to abort a connection. This is the lowest level API that we get to control SSE connections. My only remaining idea for how to fix the reconnect bug is to remove the ability for the client to disconnect at all, leaving %eyre with sole authority to terminate all connections. However, this doesn't solve everything. There are browser level bugs that have been marked as "won't fix" given our current implementation. The reconnect bug will trigger when >6 Urbit tabs are open, and the only recourse for this is to switch to HTTP/2 (which is quite controversial), or write our own browser. Rather than implementing this, I want to stop wasting time and start rewriting %eyre.

Before I make the case for WebSockets, it's worth examining why we decided to use SSE in the first place, and if we were to lose any of these advantages by moving to WebSockets. The only compelling reason I have heard is that SSE provides "automatic reconnection" - which is comical because the absense of this feature is what ignited this entire discussion. If there are any other points in favor of SSE, I am willing to consider them if the advocates of SSE came forward to state their case. Perhaps suprisingly, attempts to canvas Urbit core for these defenders have revealed that there are no advocates of SSE still working on Urbit.

The Case for Websockets

I'll begin with a list of everything that WebSockets are good for: realtime chat, realtime feeds, real-time multiplayer gaming, audio/video chat via WebRTC, real time location apps, and more. This list should look familiar: it is a list of everything that we want to build with Urbit. Tlon is building a realtime chat/feed app; Holium is building muliplayer spaces and leveraging WebRTC; Uqbar is building multiplayer gaming applications. Because WebSockets are well supported, integrations to other services will get easier. For example, WS will enable a direct connection to a BTC full node, making the BTC wallet much easier to use. It is also very well supported, which will allow for many different clients to interact with Urbit over time (compare this to the same MDN page for SSE).

All of this points towards one thing: we should stop wasting time and move %eyre over to WebSockets. I plan to specify what I mean by "WebSockets" in an upcoming post.

~dachus-tiprel
powered by %blog, download at ~hanrut-sillet-dachus-tiprel