Tuesday, 29 June 2010

Sunday, 20 June 2010

Google Captured Data and Passwords in Australia - an Estimate

Most would have heard by now that Google's StreetView cars have been taking pictures of our streets and at the same time collecting WiFi SSIDs. But they have also been collecting other data that has got them into trouble with Governments and privacy groups.

http://googleblog.blogspot.com/search/label/privacy

Google sponsored report

http://www.google.com/googleblogs/pdfs/friedberg_sourcecode_analysis_060910.pdf

In Australia it is no different. Senator Conroy has been very vocal about this.

http://www.abc.net.au/news/stories/2010/05/25/2908415.htm

I was chatting to a friend and a quick calculation seemed to indicate that the amount of data and passwords captured must be small.

So, with lots of assumptions I have looked at two cases: the 'best' case and the 'worse' case.

To do any estimation I needed numbers. Fortunately our Bureau of Statistics (ABS) provided what I needed.

Here are my assumptions:

1. From the ABS December 2009, 5.2M ADSL and Cable subscribers. http://abs.gov.au/ausstats/abs@.nsf/mf/8153.0/

2. 114,400 TB downloaded data per year (ABS)

3. Uploaded data is 5 - 20% of downloaded data.

4. Between 50 and 80% of all subscribers use encrypted WiFi.

5. WiFi range from an indoor household Access Point is +-100m to +-250m.

6. The StreetView car samples 5 channels per second (See report).

7. Data in overlapping channels can be received in channels 1, 6, 11.

8. StreetView car travels at 30 - 50 km/h between 8:00 and 16:00. The car need bright light to take photos and for safety (fatigue) reasons they would only do 8 hour shifts - probably with breaks every 2 hours or so.

9. Uploaded data is sent evenly throughout the day from domestic homes and that between 8:00 and 16:00 the upload data rates are average.

10. Households send 1 - 10 non-secured passwords per day.

11. Household use the internet between 16 and 20 hours per day.

WiFi Reception Performance

One important variable that I did not model was the error rate of received frames that is related to distance: the further away from a WiFi access point, the lower the change of receiving a frame without error. The assumption that you can receive all frames within 100m (worse case) or within 250m (best case) is, frankly, silly and unrealistic. This will mean that any result is going to establish an upper limit for both cases.

WiFi Channels

So basically I assume the street car samples 3 channels at the rate of 5 per second. I conservatively assume that data on all the other WiFi channels can read using just these 3 channels. I doubt this is correct so again it will establish an upper limit for both cases.

Reception Period

I calculate the time the car is in range of a WiFi base is between 14 and 60 seconds and can sample between 5 and 20 seconds worth of data. The car probably travels at a speed between 30 and 50 km/h and can collect WiFi frames from any given access point for one third of the time.

Data Collected

By assuming that households transmit data continuously, and knowing the average amount of data sent each second I can estimate the amount of data collected on average per WiFi access point.

Australians downloaded 114,000 TB of data in 2009. Most web browsing is downloading, especially when we are talking about reading bank accounts and email.

Assuming uploaded data is between 5 and 20% of the amount we download, I arrived at an upload data amount of between 5700 - 23,000 TB per year or 400 - 1400 bps.

This means that they can record between 250 and 3400 Bytes of data per WiFi access point (SSID).

Password Capture

I have assumed that passwords are sent in the clear (unencrypted) between 1 to 10 times per day. Most sites (such as banks) use HTTPS and email hosting services generally support encrypted SMTP/POP/IMAP to send account and password information. But some sites may still allow access to mail and other discussion sites using unencrypted passwords. It is these passwords that Google could have captured on unencrypted WiFi access points.

The probability of capturing a password is between 0.002% and 0.14% so this means that between 17 and 3600 passwords would be captured Australia-wide as the StreetView car drove by.

Resulting Estimate

Data collected: 250 - 3400 Bytes per WiFi access point
Passwords collected: 17 - 3600 Australia-wide

The difference between my best and worse case is just over 1 order of magnitude for bytes collected and just over 2 orders of magnitude for passwords collected. This reflects the level of uncertainty on my estimates.

I think the real values will be much lower for these reasons:

1. During the day is not the peak time for downloads from households. On weekdays, on average, many family members will be at school or at work so it stands to reason that less internet activity will take place.

2. Encrypted WiFi access points seems to be closer to 80% rather than 50%. People are more aware about security and ISPs have done a lot to encourage the security of wireless networks.

3. Few services use unencrypted passwords - I can not think of any except for POP based email. All banks use some for of encryption - to do otherwise would be incompetent. Unfortunately it may be the small businesses that allow staff to access their email via unencrypted POP that are letting their employees down. GMail and Yahoo only seem to allow encrypted authentication so your account name and password are safe.

4. WiFi range is probably not even 100m and hardly 250m and the ability to pickup a transmission from a laptop/mobile at these distances is low. The further away from a WiFi access point, the higher the probability of receiving an errored frame that would contain no reliable data.

5. Uploaded data is probably less that 5% of Downloaded data. A packet containing an account name and password is small compared to the resulting page that gets downloaded.

6. Even when a WiFi access point is unencrypted, typically traffic to sites that require privacy are encrypted. So the actual unencrypted data is publicly available web pages, images, video and javascript code.

My Guess

The above best and worse case set an upper limit. But the worse case is too optimistic regarding the amount of uploaded traffic, WiFi range and the number of unencrypted access points. So I would suggest the number of passwords collected to be around 17 - say 10 to 100 - Australia-wide and the amount of unencrypted personal data to be much less - say 10 to 100 bytes per household.

IPhoto Script to Remove Missing Photos

For some reason when you try to open photos you get a big exclamation mark! This seems to happen when the actual photo is missing. Perhaps it has been deleted through the file system or perhaps iPhoto is confused or perhaps because iPhoto crashed during some operation. Who knows.





I wrote this script to find these 'photos' and to move them to the trash.


tell application "iPhoto"
  set curPhotos to selection
  if (count of curPhotos) 0 then
    display alert "You need to select the photos you want me to process."
  else
    set countPhotos to count of items in curPhotos
    repeat with i from 1 to countPhotos
      set thisPhoto to item i of curPhotos
      try
        set t to info for (image path of thisPhoto as POSIX file)
      on error eStr number eNum partial result rList from badObj to expectedType
        log eStr
        select thisPhoto
        remove thisPhoto
      end try
    end repeat
  end if
end tell

Open Script Editor,  cut and paste the above script into the editor, Compile it to check for errors, and save it to a file - perhaps on your desktop but anywhere is fine.

To run it, open iPhoto, select the Photos Library, Select all photos you want to process (Edit - Select All is what I usually do), and then switch back to the Script Editor and press Run.

When it finishes you may have some missing photos in your trash. You can decide what to do with them at this point - I just empty the trash.

If you use it, write a short comment about whether it was helpful or not or whether it worked or not. If you improve the script, let me know as well.