This is an idea for an extension to HTTP that would make it possible to reduce bandwidth usage whilst still retaining backward compatibility with HTTP/1.1.
Currently there exists in the HTTP/1.1 specification a GET method called a 'conditional GET' which allows for requests such as 'If-Modified-Since'. What I propose is that browsers make an 'If-Modified-Since' request and accept the 'text/html-diff' mime-type. This would return just the differences between the two versions rather than the whole file; tracking changes in much the same way that present CVS software does for source code.
The server should compare the size of the differences generated with the size of the original file, if the size of the original file is less then this should be sent instead. This means that bandwidth savings will always be as good as or better than current HTTP/1.1 with the 'text/html' mime-type.
Client's initial GET request:
GET /ideas/test.html HTTP/1.1 From: spam@example.com User-Agent: Mike's Wonderful Fake Browser/1.1 Accept: text/html
Server's response:
HTTP/1.1 200 OK Date: Sat, 10 Jul 2004 12:34:56 GMT Content-Type: text/html Content-Length: 104 <html> <body> Wow, isn't HTTP/1.1 + CVS great? Just think of the bandwidth savings! </body> </html>
The browser now has a whole copy of the file, so the next time it wishes to retrieve the page it can use 'text/html-diff'.
Client's second GET request:
GET /ideas/test.html HTTP/1.1 If-Modified-Since: Sat, 10 Jul 2004 12:34:56 GMT From: spam@example.com User-Agent: Mike's Wonderful Fake Browser/1.1 Accept: text/html text/html-diff
If the server supports text/html-diff and the file has been modified since that date the server will send back just the differences.
Server's second response:
HTTP/1.1 200 OK Date: Fri, 30 Jul 2004 03:12:17 GMT Content-Type: text/html-diff Content-Length: 44 4a5 > If only someone would implement it.
The client merges these changes with the original file and arrives at the latest version.
File now held by client:
<html> <body> Wow, isn't HTTP/1.1 + CVS great? Just think of the bandwidth savings! If only someone would implement it. </body> </html>
This offers a saving of 100 bytes. Which is a saving of nearly 70%.
The area in which I can see this offering the largest savings in bandwidth is online forum software, where new messages are added on to the end of old ones and occasionally edited. News sites and online RSS aggregators would also find a large benefit from this.
Binary files such as images would be likely to find no real improvements, but are also much less likely to feature frequent changes.
While this would offer fairly large savings in bandwidth for many purposes, it would also require the server to store the past state of every page served for a set period of time. For sites with frequently changing dynamic content this could lead to possible storage issues.
This is a method to reduce the problem mentioned above.
Each time the server recieves a request for dynamic content it checks the differences of the content currently being served with that of the last stored piece, if there are differences then the current content being served is also stored. When the server recieves a request from a browser allowing 'text/html-diff' it checks against the first date equal to or prior to that being requested.
Thanks to Tom Walker of Dunorix web design for various suggestions on improvements.
