How I reverse-engineered Google Docs to play back any document's keystrokes

If you’ve ever typed anything into a Google Doc, you can now play it back as if it were a movie — like traveling through time to look over your own shoulder as you write.

This is possible because every document written in Google Docs since about May 2010 has a revision history that tracks every change, by every user, with timestamps accurate to the microsecond; these histories are available to anyone with “Edit” permissions; and I have written a piece of software that can find, decode, and rebuild the history for any given document.

See that little gizmo above? It’s like a video player, but made especially for writing. This one’s from an Atlantic article I began work on nearly four years ago, on the day after Christmas in 2010. The article was about the first (and only) time I got to fly a small airplane. At the time, I didn’t give the slightest thought to the idea that one day I’d be able to watch the draft unfold. But since I happened to write this one in Google Docs, I can recover every keystroke. Above, you can see the first uncertain stirrings of the first paragraph.

What’s neat about this is that I didn’t have to use any special software while I was writing to make this “video” possible. I was working in plain old vanilla Google Docs. And to show you this one paragraph I liked, I didn’t have to present you with the whole document (all 39,154 revisions of it) — I could extract bits and pieces that I thought were interesting, and interleave them in a blog post. Imagine what a high school English teacher could do with that. Imagine what you could do with that if instead of a minor effort by ol’ Somers here you had, say, a piece by Ta-Nehisi Coates. (I’ve always wanted to watch how TNC writes. If he’s ever used Google Docs, it’s now possible.)

A screenshot showing what it’s like to work with a document in Draftback.

To produce the embed, I used a Chrome extension I made called Draftback, which I suppose I’m launching right now. With Draftback, you can play back and analyze any of your own Google Docs, or, for that matter, any Google Doc you have permission to edit.

(Everyone I’ve talked to about this has been surprised, and maybe a little unnerved, to discover that whenever they share a Google Doc with someone, they’re also sharing an extremely detailed record of them typing the thing.)

A map of changes to a document over time.

Here’s a graph that Draftback automatically produced for an article I was working on a few weeks ago. It shows the timeline of my changes, and below it, a “map” that tells me where in the document each of those revisions happened: the further down the graph, the further down the page. At the start, I added many thousands of words of notes — that’s why the doc gets so long so fast, and why the edits look sparse. Then you can see that I made three distinct passes, the first one focused on the top of the article, and slow; and the later ones faster and further down. A visual fingerprint of a document, and of a writer.

The data that Google stores is, as you might expect, kind of incredible. What we actually have is not just a coarse “video” of a document — we have the complete history of every single character. Draftback is aware of this history, and assigns each character a persistent unique ID, which makes it possible to do stuff that I don’t think folks have really done to a piece of writing before.

This animation shows how knowing every character’s history can help you trace the origins of the text you highlight.

Here, for instance, you can see me typing a short document. Focus on the first paragraph: you’ll see that it wasn’t written in one contiguous swoop, but rather was cobbled together over time via a bunch of discontinuous edits: I edit the paragraph, then do other stuff, then I come back to the paragraph, and so on. I even cut and paste a phrase from one paragraph to another.

Since Draftback has the full history for every character, and since that history is maintained even as characters are cut and pasted, it’s possible to select some text and see exactly where it came from. It’s like having a four-dimensional view of a document.

To what end?

I’ve long been obsessed by what you might call the “archaeology” of writing: how something like John McPhee’s profile of Bill Bradley (A Sense of Where You Are), or T. S. Eliot’s The Waste Land, comes to be.

I’ll read stuff about it: Eliot Among the Typists is a fascinating paper; the introduction to The John McPhee Reader is good, as are McPhee’s own essays on writing, Structure and Draft No. 4. I liked McP’s interview in The Paris Review, whose long-running series is legendary, especially this one with Hemingway, which is one of the best things I’ve read.

But what if you could actually see these guys at work? Isn’t it a shame you can’t?

I worry that most people aren’t as good writers as they should be. One thing is that they just don’t write enough. Another is that they don’t realize it’s supposed to be hard; they think that good writers are talented, when the truth is that good writers get good the way good programmers get good, the way good anythings get good: by running into the spike. Maybe folks would understand that better if they had vivid evidence that a good writer actually spends most of his time fighting himself.

That’s why I wanted something like Draftback. I had this image I just couldn’t shake: you’d get someone whose writing is accessible, concise, uncontroversial, well-styled, and, above all, quintessentially writing: i.e., someone who’s writing in a form where the writing is what there is, where the job isn’t to report but rather to put into words what we would think if only we had their critical equipment and verbal range… someone like A.O. Scott, who reviews movies for the New York Times and does such a good job of it that sometimes I’ll watch a movie just so I can read his review.

So you get A.O. Scott to write in Google Docs, and you publish the full playback and excerpted bits and pieces of it, the greatest hits — annotated, of course, director’s-commentary style — for every fan, every aspiring writer, and every high school English teacher in the country.

Whaddya say, Mr. Scott?

The Technical Origin Story: From Etherpad to Jimbopad to Google Docs

It all started 5 years ago on Hacker News with this oddly exuberant post by pg himself: The most surprising thing I’ve seen in 2009, courtesy of Etherpad. pg got famous because of his essays, and here you could watch him write one, backspaces and all. It was a sensation. At the time, it was one of the biggest Hacker News stories ever.

Here’s what it looked like. (This is actually a later, slightly more advanced version; the original, at etherpad.com, was taken down when Etherpad was bought by Google. More on that later.) All it was was a document with a slider at the top and a big play button, showing every revision. You could play the whole history start to finish. Prettty simple.

I remember seeing this playback and thinking that it could be better. I wanted more information: when did pg pause, and for how long? How much, exactly, did he delete? How did that compare against other writers? What if I saw a sentence I really liked — could I trace it to its source?

So I decided to build a thing I called Jimbopad. I was surprised at how simple Jimbopad turned out to be. You don’t actually need that much code to play back a record of someone writing. All you need is a textarea and some way of tracking diffs. Here’s what the playback UI was like, and here’s the JavaScript that made it possible (click on the highlighted bits of code for annotations):

Read “Code sample for embed” by James Somers on Genius

Simple as it is, this was actually better for my purposes than Etherpad. The problem with Etherpad is that in order to power its playback feature, it actually stored a full snapshot of the document at every tick. So if you had a 1MB text file — say, you’re working on a 7,500-word article — every keystroke would dump another meg on disk. Jimbopad, which was purpose-built for playback — I didn’t have to worry about real-time collaboration, which was Etherpad’s raison d’être and big value proposition — just stored “deltas” between each revision, which led to about a 1,000x decrease in required storage.

This is why if you were to do “version control” for writing, you would have to record everything. You would have to make it trivial for the writer to “branch” off from some articulation, fail, and fall back to what they had before. Their every half-overture would have to be saved—because every half-overture, like every “commit,” might have words they would want to get back to.
— jsomers.net/blog/jimbopad

As soon as I made Jimbopad, which was the simplest this program could possibly be, I wanted something better. That’s when I set out to build Draftback 1.0. You can see what it looked like here.

As far as I can tell this was the state of the art in writing playback. You’ve got your slider, of course. But you’ve also got these nifty green and red colors that show you exactly what changed in each revision. You’re automatically scrolled to the part of the document that changed (HUGE innovation). And you could drop in to “actual-speed” playback mode, which somehow I thought was far more intimate, and interesting, than watching a ceaseless robotic clack. (It had a feature where if the delay between revisions was long enough, a thing would come up and say “the writer stared into space for 30 minutes.”) You could even search phrases and filter to just the revisions including that phrase.

But there were still a bunch of problems. The “search” filter was really naive: all it did was look for revisions whose full rendered text included the phrase, and it filtered out everything else. That’s useful, but what I was really looking for was the “genealogy” of a phrase or sentence; I wanted to know where the parts of the sentence, before it was the atomic unit I’m seeing now, came from. That just wasn’t even possible using the diff-match-patch approach.

Maybe the bigger problem was that no good writer was going to use this program. Up to this point, my “editor” had been a simple textarea, and it required that you write in Markdown. And eventually I got this mantra in my head: “A.O. Scott is never gonna use markdown”, “A.O. Scott is never gonna use markdown.”

I was convinced you needed a beautiful clean WYSIWYG editor to get people to use your writing software.

I looked at a lot of options, and ultimately I paid for a thing called Redactor. That’s right: in my desperation I actually bought my RTF technology. I paid like $200 for a Javascript file.

Redactor was actually a good editor, it had this great big API, it was really easy to hack on, but still it ultimately used contentEditable, and contentEditable ends up breaking a lot. Here are a couple of TODOs and notes from my time working on that editor:

The WYSYWIG control buttons sometimes don’t reflect state. Toggles don’t toggle properly.
Why does hitting “I” italicize so much text?
Does un-blockquoting something not return you to normal formatting?

So that was a problem.

The § That Actually Finally Delivers What the Title Promised: An explanation of how to reverse-engineer Google Docs’s diff data structures and renderer, a system which was actually probably developed for real-time collaboration, a.k.a “Operational Transformation,” a.k.a. nothing to do with “the archaeology of writing”

The slam dunk in my face was this blog post by Google in which they explained why they scrapped the contentEditable approach for Docs, and in its stead built a brand new rendering engine from scratch.

When you’re using Google Docs, you’re not actually typing into where you think you’re typing. You’re typing into a textarea in an iFrame off-screen, and through the postMessage API, those events are being sent to the “edit surface” that you see, which does stuff like draw your cursor. (Your cursor on Docs isn’t actually a cursor, it’s a 2px-wide div!)

I took this as proof not just that contentEditable was doomed, but that Google were the only ones who had the gall, and technical wherewithal, to do the insane gymnastics required to build something that felt like Word in the browser. I figured if I couldn’t beat them, I’d join them.

I started by trying to build an actual plugin for Docs. I played with their sample code, and I looked through the documentation. I was trying to see if there was a hook I could get that would tell me when a user changed the document. Recall that all I really need is that one hook, a diff-match-patch library, and a place to store the deltas.

It turns out that they don’t expose this kind of event for their docs. (“The onEdit trigger runs automatically when a user changes the value of any cell in a... spreadsheet.”) But that’s when things started getting pretty interesting.

I decide I’m just going to write a Chrome extension on top of Google Docs, and I’m gonna capture the rendered HTML every time I make a change. Sure, the user has to install a Chrome extension, but that’s pretty simple, and when they’re using Docs they’ll hardly notice that my extension is there. It’ll feel like a seamless transparent experience.

So what I did was I looked in the web inspector and found the DOM I cared about. I found out that all the actual content has these classes like kix-page and kix-lineview and kix-wordhtmlgenerator-word-node. (Google’s codename for their Docs edit surface and rendering engine is “Kix.”) I figured that I could do something like this in a Chrome extension:

Read “Chrome extension to capture Google Docs revisions” by James Somers on Genius

I thought I was pretty clever, but while testing this code, I discovered that sometimes it would miss big chunks of my document. I found out that Google renders pages on demand: if you load a 99-page document, although it might look like you can scroll all the way down right away, the actual text on those later pages won’t be generated until you scroll it into view.

At this point I did something kinda dumb. I tried to reverse-engineer the obfuscated, minified client-side editor code so that I could find whatever the render function was. I figured if I could find some hook, I could trick the editor into thinking I’d scrolled through the whole document. That way, my diff-match-patch tool would be working with the full document at each revision.

My thought was that if the Docs editor/rendering code was all Javascript, I must be able to figure out how it works, even if it was 80,000 lines of code that looked like this:

Read “Obfuscated Kix renderer code” by James Somers on Genius

I tried to do this by throwing breakpoints all over the place. I’d search for phrases in the code that weren’t obfuscated, like innerHTML, and throw a breakpoint beside them. Then I’d do stuff in the UI, and see if I hit my breakpoint. Then I’d inspect the call stack and see what values were lying around. I found out stuff like if you type something like P.j.zb.rx() in the console, and run it, you’ll “redo” whatever your last action was. I spent days doing this. In fact, on one weekend I spent so much time staring at minified Docs Javascript that I literally developed an eye ulcer.

Have you ever heard the story of how while NASA spent years and tens of millions of dollars developing a pen that would write in space, underwater, and upside-down, the Russians just brought a pencil? It’s apparently apocryphal (the space pen was much safer than a pencil, and the Russians wanted one too) but it illustrates a point. Here’s the “Russians bring a pencil” solution to my rendering problem. Again, click the highlighted lines to see an annotation that explains what’s going on:

Read “The "Russians Brought a Pencil" Solution” by James Somers on Genius

Needless to say, I wasn’t really happy with this solution. And I had seen something curious while getting my eye ulcer. At one point I’d clicked away from the “Sources” tab in the Chrome inspector and started looking at the “Network” tab. And I noticed these /save calls every time I typed something:

The payload looked pretty juicy. Here, for instance, I’m typing a period at the end of a sentence early in the document:

That seems parseable enough: a “command” of type (ty) insert (is) where the “insert begin index” (ibi) is 24 and the string (s) is “.”. Now we’re cooking with gas.

At this point, I figured my Chrome extension could be pretty dumb. All I had to do was intercept these “save” requests and store them somewhere. Later, I could figure out how to use them to rebuild the document. As long as someone had my extension installed from the very start of their editing, and never made any change in a browser without the extension, I should have enough to do everything Docs could do. (I reasoned that Docs gets exactly no more data about a document than what is sent to the server via these save calls; so those must be enough to render everything.)

Here’s what I cooked up:

Read “Chrome Extension Redux, This Time With Request Captures” by James Somers on Genius

This gave me a bunch of commands that looked like this:

Read “Google Docs Data Structure” by James Somers on Genius

These didn’t seem so hard to figure out. You have what looks like a “multi” or bundle operation, and then inside of it, a list of other operations: some inserts and some deletes. For inserts, you have the string you’re adding; for deletes, the indexes that tell you what to remove. I built myself a debugging tool that would let me step through a list of these revisions, to see both a rendered document and a dump of the critical characters array I was using to represent it under the hood:

The data is so simple that it almost suggests the implementation of the builder and renderer. You have a characters array, and you insert and remove characters from it. When you format text, you’re just passing a hash of options to a range of characters. The whole of my document builder looks like this, in outline. The main thing it’s doing, really, is giving intelligible names to a bunch of variables:

Read “Basic Renderer” by James Somers on Genius

The renderer is also pretty simple. (For bigger documents, for now, I don’t render styles, because it’s a lot of extra work for not that much better of a user experience.) It works like this. We have two levels: paragraphs and spans. To figure out what to wrap in styles, we look at each character and say “what are your calculated styles?” based on its hash of properties. Then we say “are those styles equal to the styles of the character before you?” If they are, we continue the span. If not, we create a new span.

Read “Basic Renderer Code” by James Somers on Genius

And that’s essentially all you need to make something like Draftback.

Except, of course, the big key, which is that wouldn’t it be nice if you didn’t have to install a Chrome extension to capture these /save requests?

I was talking to my boss at Genius about this, and he suggested I look at the standard “Revision History” menu in Docs — maybe they had all the diffs somewhere in there?

I thought he must be wrong, since I remembered that Google only ever rendered a fairly coarse set of changes: maybe dozens or at most a hundred revisions for a document that had probably been changed tens of thousands of times. But I indulged him, and kept my Network tab open while poking through the Revision History menu. It’s then that I chanced upon the /load call. It has a URL that looks like this:

https://docs.google.com/document/d/#{docid}/revisions/load?id=#{docid}&start=1330&end=1341

And it returns something that looks like this:

Hmm, I wonder what happens when you change the start and end parameters to cover a wider range? Will you, by chance, get the entire revision history for the document?

I think yes.

Hack the planet from James Somers on Vimeo.

There are a couple of complications — one is that you can’t just say “load me revisions 1 to infinity” (or -1): you have to specify the actual upper bound. My first cut at this was to do a binary search — if you get a 500 response, you know you’ve gone too high, so you reduce your upper bound; if you get a 200, you’re in range, so you increase your lower bound; stop until lower > upper.

And, of course, there’s the matter of building a renderer that works at scale, including for documents that have many tens of thousands of revisions, where each revision is hundreds of pages long. (For that, the main trick is in calculating a “window” around the locus of each revision, and only doing your heavy-duty rendering within that window.) And making a UI that people want to use. And finding a way to hit these undocumented APIs on the behalf of other Google users without having them give you their credentials.

A historical note

It’s worth noting for a second that Google probably wasn’t thinking of playback when they built this system for storing documents as a series of minute changes. They probably did it for the same reason that Etherpad did it, which is to power real-time collaboration. The only way you can do that quickly and reliably is by shooting small changes back and forth across the network; if two changes differ, you can just reject one of them, thereby ensuring that everyone has the same version of the document. This is a technique called operational transformation, and it’s a whole science unto itself.

So it’s not likely that Google is going to change the way they save documents just because it enables this playback stuff. The playback is an epiphenomenon of real-time collaboration, as it was with Etherpad. Etherpad made their playback demo at Paul Graham’s request; it was a hack on top of data they were already storing for other purposes. In fact, I think it’s possible that the very same engineers who built Etherpad found their way to the Docs team. (When they were acquired, they started at Wave, but then, of course, Wave was discontinued.)

A few notes about Draftback

In the spirit of “worse is better,” the software for the Chrome extension is about as simple as I could bear releasing. I hope people find it useful. You probably could use it to look at the revision history of documents where you really have no business doing so — documents, for instance, shared with you by folks who didn’t know you’d be able to see their revision history. Don’t do that, obviously.

Aside from that, well, I’m just excited that this thing finally exists.

Who are you?

James Somers is a writer and programmer based in New York, NY. At the time this post was written, he worked on the engineering team at Genius.com. Contact me directly at

If you’d like to read more, check out jsomers.net. You can also subscribe for email updates on that page.

Acknowledgements

All errors, omissions, and lapses in judgement in the above are my own. The following people might not have known exactly how they were helping me or what with; they were just being their generous selves.

I’d like to thank Jim for talking through diffs and nodes with me at length, and for explaining exactly the approach that it turned out Google uses, long before I discovered it. And Tom, for looking at what I thought was a pretty good demo and saying “well wouldn’t it be better if you could do this for existing documents,” and then coming up with a practical suggestion for where to look. And finally to John, for letting me battle-test my code against his beautifully written book chapter.

How I
Reverse Engineered
Google Docs
To Play Back Any Document’s Keystrokes