HTTP + CVS

Michael Sheldon (mike at mikeasoft dot com)

About

This is an idea for an extension to HTTP that would make it possible to reduce bandwidth usage whilst still retaining backward compatibility with HTTP/1.1.

Implementation

Currently there exists in the HTTP/1.1 specification a GET method called a 'conditional GET' which allows for requests such as 'If-Modified-Since'. What I propose is that browsers make an 'If-Modified-Since' request and accept the 'text/html-diff' mime-type. This would return just the differences between the two versions rather than the whole file; tracking changes in much the same way that present CVS software does for source code.

The server should compare the size of the differences generated with the size of the original file, if the size of the original file is less then this should be sent instead. This means that bandwidth savings will always be as good as or better than current HTTP/1.1 with the 'text/html' mime-type.

Example Communication

Client's initial GET request:

GET /ideas/test.html HTTP/1.1
From: spam@example.com
User-Agent: Mike's Wonderful Fake Browser/1.1
Accept: text/html
			

Server's response:

HTTP/1.1 200 OK
Date: Sat, 10 Jul 2004 12:34:56 GMT
Content-Type: text/html
Content-Length: 104

<html>
	<body>
		Wow, isn't HTTP/1.1 + CVS great?
		Just think of the bandwidth savings!
	</body>
</html>
			

The browser now has a whole copy of the file, so the next time it wishes to retrieve the page it can use 'text/html-diff'.

Client's second GET request:

GET /ideas/test.html HTTP/1.1
If-Modified-Since: Sat, 10 Jul 2004 12:34:56 GMT
From: spam@example.com
User-Agent: Mike's Wonderful Fake Browser/1.1
Accept: text/html text/html-diff
			

If the server supports text/html-diff and the file has been modified since that date the server will send back just the differences.

Server's second response:

HTTP/1.1 200 OK
Date: Fri, 30 Jul 2004 03:12:17 GMT
Content-Type: text/html-diff
Content-Length: 44

4a5
>		If only someone would implement it.
			

The client merges these changes with the original file and arrives at the latest version.

File now held by client:

<html>
	<body>
		Wow, isn't HTTP/1.1 + CVS great?
		Just think of the bandwidth savings!
		If only someone would implement it.
	</body>
</html>
			

This offers a saving of 100 bytes. Which is a saving of nearly 70%.

Application

The area in which I can see this offering the largest savings in bandwidth is online forum software, where new messages are added on to the end of old ones and occasionally edited. News sites and online RSS aggregators would also find a large benefit from this.

Binary files such as images would be likely to find no real improvements, but are also much less likely to feature frequent changes.

Drawbacks

While this would offer fairly large savings in bandwidth for many purposes, it would also require the server to store the past state of every page served for a set period of time. For sites with frequently changing dynamic content this could lead to possible storage issues.

Versioning for dynamic content

This is a method to reduce the problem mentioned above.

Each time the server recieves a request for dynamic content it checks the differences of the content currently being served with that of the last stored piece, if there are differences then the current content being served is also stored. When the server recieves a request from a browser allowing 'text/html-diff' it checks against the first date equal to or prior to that being requested.

Thanks

Thanks to Tom Walker of Dunorix web design for various suggestions on improvements.