Phil at Warrimoo: June 2009

Saturday, 27 June 2009

Cloud Computing

Cloud computing is all about running a service or a server using a pool of computers. The computers could be your own or you could lease time on a commercial cloud.

Who's Who

Some of the big names in this space are Amazon, Google, GoGrid and ElasticHosts. Apart from Google, they allow you to run a complete operating system and whatever other software you like on their infrastructure - which is why it is called Infrastructure as a Service (IaaS). Google's is a bit different, it is a Platform as a Service (PaaS). Google also offers applications like Google Docs (word processor, spreadsheet, etc.) which is known as Software as a Service (SaaS).

Infrastructure as a Service

Infrastructure as a Service interests me at the moment. It promises to disrupt current practice. No longer does a company need to buy servers for their business. They can lease time from one or more providers without having to outlay any capital. Nor do they need to maintain or upgrade any hardware.

How does it work?

An example may help.

Your IT department wants to upgrade one third of your servers. A normal request that you might get each year. Instead of investing in new machines they suggest that the company leases time on ElasticHosts (EH) server cloud and on Amazon's AC2 as a backup site.

The IT people would setup accounts, request a certain number of virtual servers and copy the disks of the current servers to EH and AC2. Each virtual server would be configured to have the necessary number of CPUs, RAM and network bandwidth. The IT department would then administer the servers from your offices just as if they are real servers: They can start them, pause them and stop them just like a real server.

But, they can also upgrade them in an instant. And, they tell you, it costs 25c per hour for 1GHz CPU, 1GB RAM, 1000GB disk and 100GB network traffic - $180/month. But they say that at night they can turn half the servers off so it would cost about $120/month on average. They can do the same on weekends and on public holidays too.

And if the company had a busy period, they could order more servers or upgrade the existing ones almost instantly and afterwards they would downgrade them.

Private Clouds

It sounds like magic. But these clouds can also be established using your existing infrastructure and if you need additional capacity you can lease it by the hour.

Making a Virtual Server

I wondered how hard it might be to make a disk image to run on a cloud. It turns out to be rather easy.

I have a Ubuntu linux virtual machine running in VMWare Fusion. It allows me to run linux on my Mac and it works well (I should mention Sun's VirtualBox here as well which does the same job and is free and Open Source).

Ubuntu provide a program to make the creation of virtual machines an amost trivial task.

sudo ubuntu-vm-builder kvm jaunty 'sudo' allows the program to run as an administrator. 'kvm' specifies that I want a virtual machine that will run on a kvm based cloud such as ElasticHosts. I could have used EC2 to make an image that would run on Amazon's EC2. 'jaunty' specifies the version of Ubuntu server that I wanted it to be.

(I used some additional command line options that are not needed.)

A little while later, I had a directory containing a disk image and the command necessary to run it. With some other configuration changes it it possible to create a VM in a few minutes.

Since my Ubuntu machine is itself a virtual machine running on a Mac I did not think that I could test the new VM. But I thought that I would try to run it to see what it might do. It began by complaining that it could find KVM support and then ran the VM in an emulator (QEMU).

The machine booted like a real PC and eventually gave me a login prompt.

Appliances

Some companies are now offering their applications or operating systems as a VM to download - ready to go. VMWare has a large selection of pre-build VMs.

The Future

It looks interesting. Some other work in this area focuses on standardising the management interface for a cloud of VMs, standardising the VMs so that any VM can run on and public or private cloud and schedulers so that an administrator can prioritise VMs and schedule the start-up and shutdown of any VM. Read more about OpenNebula and Haizea.

Sunday, 21 June 2009

Browser Benchmarks - too many variables

Laptop Battery Benchmark

Slashdot has a link to an article which presents a claim by AMD that the MobileMark 2007 battery benchmarking specification does not represent typical laptop use - if fact, AMD claims that the test basically runs the laptop at idle with the screen dimmed and with wifi turned off.

I know I don't get anything near what Apple claimed for my MacBook Pro 2008 - and I run with a dimmed screen, Bluetooth off and with the under-volted processor tweaked down to 600MHz on idle. My laptop typically runs between 15 and 20 degrees C above ambient. For example it is 18 degrees inside and the CPU is running at 34 degrees C.

Whether the claims are true or not, I wonder how the current browser benchmark tests relate to typical use? Is there such a thing as typical use? And how does a laptop or desktop energy settings affect the result?

Browser Benchmark Tests

There are a number of browser JavaScript benchmark tests online: V8, Sun Spider and Dromaeo. Dromaeo takes too long to run so I have only used V8 and Sun Spider.

V8 benchmark is setup by Google. There are curently 4 versions of the test and it is a very quick test. Sun Spider is built for testing WebKit. WebKit is a branch of KHTML which Konqueror was built from. Apple bases Safari on WebKit. Dromaeo is built by Mozilla.

I like to experiment. I use Shiretoko (Firefox beta) for Mac mostly. I also have Firefox 3, Safari 4, and a suite of development or experimental browsers: WebKit, Stainless, Chrome and Chromium. All but Firefox are based on WebKit but Google has their own V8 JavaScript engine. I have most of these running on an old PowerBook as well.

I've been interested in how their JavaScript engines are performing so I occasionally download the latest nightly-build and run a quick test. It occurred to me that the results are affected by what else the laptop is doing and how the operating system has throttled the CPU. So I started to fix my CPU speed and shutdown most applications before I ran the tests. But in the background all sorts of updating and backup utilities are running and if they decided to start-up, the performance test results would be poorer.

I have not read about anyone else fixing their CPU speed before running the test. Perhaps it is not important, perhaps the CPU and throttling techniques know not to adjust CPU frequencies while benchmark tests are running, but somehow I doubt it.

I think we need benchmark tests that ignore other tasks, garbage collection and somehow ensure that CPU frequency and caching does not affect the result - ideally, every time the test is run, the result should be the same. Otherwise there are too many uncontrolled variables that prevent any useful comparison.

How about we put some scientific method back into Computer Science and Software Engineering?

My Results

For those that are interested, here are my rounded results and some graphs to visualize the data. With the lack of accuracy in the measurements, I suggest that a 20% error is as good as any other.

You can see that for me, Firefox is not performing as well on the tests as the other browsers. This doesn't mean that Firefox is not useable - I use it more than the others combined. It does mean that Mozilla can do better. All the WebKit based browsers do well. They are more than 10x faster than Firefox 3 on the v8 tests and take 1/5th the time on the Sun Spider tests.

The dual core MacBook Pro 2.4GHz is about 10x faster than the PowerBook G4 1.67GHz. This is probably due to the work being done in optimizing the JavaScript compilers for the Intel instruction set - it seems that the PowerPC is not a high priority.

All browsers (except Chrome) run well on the PowerBook which is our main machine.

Postscript

On reflection, if we are to be more scientific then we should have some predictions as well.

Perhaps we can predict the optimum performance of an algorithm or test case running on a particular CPU. We would then have a target for our JavaScript compilers to aspire to. Of course we need to take language overheads, if any, into account.

Friday, 12 June 2009

Google Wave: Killer App

I read an article a few days ago. It said: blah blah Google Wave blah ...

'What is Google Wave' I thought.

I watched this video and noted my thoughts which I edited later:

Different.
OK, nice.
Hmmm. Slick email. Bit scary (the idea of having the message on a server)

OK. I see, it is email and instant messaging (IM).
No wait. It is email, IM and blogging.... and blog feedback as well...

Hang on, it is now a document editor... but others can edit at the same time... and they can discuss points in the document... and its got version control...

Wow! a context spell checker too.
OK, you can publish docs and update them later.

With some sort of meeting acceptance thingy.
And it has multi-party games.

Spreadsheets and other content in the future.

I'm not surprised now when they add maps and video.

Why forms?
Nice, it can link to other social networking services.

Where are the ads?

I wonder what back-end XML database server they are using?

Stunning! Dynamic translation! They just got the whole world interested.

Ray Ozzie from Microsoft even had some things to say (which I re-state):

Ray starts out by praising 'those' that took it on.. it's nice. I don't think Ray used the Google word at all.

He thinks it is anti-web: that complexity is the enemy of the web. If something is complex - many roles, interconnections - then you need Open Source to have many instance since no one will be able to do an independent implementation.

Fundamental to the web is decomposing things to be simple so you don't need Open Source.

Ray says that the web is about open protocols, open data formats, no opaque packages and payloads being tunnelled. It is simple and out-there.

Later he says that Google Wave and Microsoft Groove are basically the same thing. That Mesh is based on Groove and that Mesh will not do all the things that Wave does or that Groove does but it will be sustainable.

Ray Ozzie built Lotus Notes. In its early days, it was beautifully simple. I liked it and I still do. Notes was way ahead of it's time. It was ahead of the web. It was strongly security minded, client-server based and Notes supported every relevant open standard that came along. At work we built an Operations Support System (1995 and onwards) with it and it is still supporting the job and fault management systems today.

Notes was great. Ray and his team did a great job. I think Google wave is what Ray would have liked Notes/Domino to be today.

Notes had a back-end database that seems to be like XML in structure and separated the presentation from the data. Wave has the advantage of virtually real-time synchronisation where as Notes, due to bandwidth limitations, was 'document' based and asynchronous. This is why Notes had to deal with replication conflicts - something they worked-around nicely - but Google it would appear doesn't have a problem since it is synchronising at a very low level.

Wave seems to have the following attributes:

Hierarchical database in XML.
Fine-grained time stamps.
Nothing deleted (partially solves replication conflicts and allows playback).
Remove edit history by publishing. Publiched docs retain links to source Wave and it has its own edit/update history.
Version control within document (allows playback).
Allows extensions (an XML data structure instead of a Blip - the content part of a Wave).
Each Wavelet has an Access Control List (ACL) of people and robots that can read the Blips within it.
I suspect that the Blips are signed with the author's private key and that other people in the ACL can read the Blip with the author's public key.
For security, during transmission, each fraction of a Blip would/could be encrypted with the reader's public key and decrypted with their private key - using TLS or SSL.
I think it would be possible to have no ACL so that the document would become a public document, but I would hope that Wave uses a white-list.
It seems that a reader is also an editor - no distinction. It may be simple enough the have reader and editor roles.
The GUI takes the Wave and formats it for display.
Spelling and translation seem to be in the GUI.
The Back-end manages replication and updates.
The scope of conflicts are eliminated by date-stamped single-character transactions and no actual deletes.
Front-end extensions can display other content.

I am wondering.

Has Google hit upon a partial solution for internationalisation (I18N)? Can a Wave, web and native applications use the Google translation service for window titles, menu items and help pages to eliminate bundling languages with an application?

Has Google enabled global collaboration of source code where, say, English comments and strings are translated into Chinese based on the browser user agent language setting?

Is Google Wave to beginning of the end of SPAM?