http

Saturday, October 1, 2016

Latacora Riddle and Browser Fingerprinting

Today on Twitter I stumbled over a tweet by Patrick Thomas (@coffeetocode) where he wrote:

"Hah! I like this. Comment in Latacora source about a cheat is true, but the cheat is way more clever."

with this screenshot:

Screenshot from @coffeetocode's tweet

This sparked my interest. Looking at the page from the screenshot (latacora.com), I saw that there are three names displayed and this code seems to put the three names (Erin, Thomas and Jeremy) in a random order. But the comment also says that there seems to be some bias. I then saw that the tweet by @coffeetocode also had a second screenshot:

statistics by @coffeetocode

I did expect some bias due to the fact that there is a calculation of 100%3 which does not equally divide the numbers into three values, but a bias of 60% versus 10% is by far over what I expected, so I had to dig deeper. Here is my analysis.

In the meantime Patrick has posted his own solution, but here my analysis anyway.

The code seems to be very simple. It first declares the names array, then it applies some sort function on it and finally it puts the three names into the HTML page.

I had to look up the documentation about the sort function and I found the w3schools description of it. The function that can optionally be specified is used to sort the array and decides if the two values being passed are greater, smaller or equal. Instead of comparing the values, the function returns a random number.

Let's look at the random number generation first. Everything starts with Math.random(), which is not seeded in any way and if we assume a good browser implementation, this should return some good random values. Maybe not cryptographically secure, but random enough for this purpose.

Math.random() actually returns a number between 0 and 1 (including 0, but not including 1). Multiplying this by 100 should return numbers between 0 and 99.999 (just less than 100). Math.ceil() then rounds this number up to the next integer.

This is actually a fail here, because this will result in numbers between 0 and 100, with 101 different possible values. The value 0.001 will be rounded up to 1 for example, but 0 will remain 0. The value 0 (exactly 0) is probably pretty rare though. The chance that exactly 0 comes back from this code is about the same as if Math.random() would return exactly 0.239183777281903 - it depends on the implemented resolution of the float numbers. I tried this out in the browser and exactly 0 was never returned, so I'll ignore this for the following investigation, but a correct implementation would not use Math.ceil(), but Math.floor()+1 instead. If we ignore the 0 which practically does never exist, we now have integer values between 1 and 100.

I then followed a wrong trail by not reading the code correctly. I thought that the next calculation is (1 - calc) % 3, with calc being our random number between 1 and 100. This yielded to negative numbers and there is a known bug in the JavaScript implementation about negative numbers when calculating the modulo. Feel free to look that up too, but this is wrong, there are no brackets, which means the modulo will be calculated first. This means that we are calculating 1 - (calc % 3), which is much simpler. calc % 3 returns values between 0 and 2, and calculating 1 minus this give us values -1, 0 or 1; exactly what the sort function expects.

There is actually already a little bias there. The modulo 1 - calc % 3 returns the numbers -1, 0 or 1 with the following probabilities:

-1: 33/100
0: 34/100
+1: 33/100

As you can see, the 0 is slightly more frequent than the other two values. This comes from the fact that we have calc in the range of 1 to 100 and both 1 - 1 % 3 equals to 0 and also 1 - 100 % 3 is also 0. For an equal distribution the code should not multiply by 100, but by 99 (or simply by 3) instead. This is just a slight bias that does not contribute much to the end result.

Now the question is what does the sort function do when the compared values are purely random? In order to have consistent results, we would need to know in which order the sort function is called. Let's look at the real documentation:

"If comparefn is not a consistent comparison function for the elements of this array, the sort order is implementation-defined."

There you have it. This means that the sorting is highly implementation dependent. I tried with different browsers and indeed I got different results. But in all cases Erin was always first (with the highest percentage). Let's have a closer look at how the sorting works.

I tried with seven different scripting engines:

Windows scripting host (cscript)
Internet Explorer 11 and Edge (both sort identical)
Chrome, Firefox, Brave (all three sort identical)
Mobile Safari

It's interesting that cscript always does three calls to the sort function, independent of what was the result of the first two calls. In some cases we even get the same call twice (for the same two names). That seems inefficient. IE and Edge need between two and five calls to the sort function, while all others call the sort function either twice or three times. I do not understand what the advantage would be to sorting this array with five calls.

Anyway, I simulated the sorting in the different engines and created some Excel lists of which results of the sort function yield which sort order. For Chrome, Firefox and Brave we have the following list:

-1, -1: ETJ (1. Erin, 2. Thomas, 3. Jeremy)
-1, 0: ETJ
-1, 1, -1: EJT
-1, 1, 0: EJT
-1, 1, 1: JET
0, -1: ETJ
0, 0: ETJ
0, 1, -1: EJT
0, 1, 0: EJT
0, 1, 1: JET
1, -1: TEJ
1, 0: TEJ
1, 1, -1: TJE
1, 1, 0: TJE
1, 1, 1: JTE

From this overview you can already see that Erin is quite often first, sometimes Thomas, but rarely Jeremy. Calculating the percentages (taking into account the different probabilities for the three different values) we get:

Thomas first: 29.406%
Thomas second: 48.484%
Thomas third: 22.110%
Jeremy first: 10.890%
Jeremy second: 22.110%
Jeremy third: 67.000%
Erin first: 59.704%
Erin second: 29.406%
Erin third: 10.890%

This matches with the screenshot from the beginning. I also made a script that let it run for 10 million times (takes just a few seconds) and got almost identical results.

For completeness here the calculated results for other engines.

cscript:

Thomas first: 36.924%
Thomas second: 40.966%
Thomas third: 22.110%
Jeremy first: 18.186%
Jeremy second: 36.924%
Jeremy third: 44.890%
Erin first: 44.890%
Erin second: 22.110%
Erin third: 33.000%

IE and Edge:

Thomas first: 9.704%
Thomas second: 74.090%
Thomas third: 16.206%
Jeremy first: 10.890%
Jeremy second: 7.296%
Jeremy third: 81.814%
Erin first: 79.406%
Erin second: 18.614%
Erin third: 1.980%

Mobile Safari:

Thomas first: 22.110%
Thomas second: 40.966%
Thomas third: 36.924%
Jeremy first: 33.000%
Jeremy second: 22.110%
Jeremy third: 44.890%
Erin first: 44.890%
Erin second: 36.924%
Erin third: 18.186%

In short, the chances that Erin is first is slightly browser dependent, but in any case she's first more often than the others, or her probability is higher than 33%:

Chrome/FF/Brave: 60%
Mobile Safari: 45%
IE/Edge: 79%

Here you can see the distributions in a bar chart for all tested browsers:

distribution of results

But more interesting is who is first:

How is first or second?

In this bar chart you can see (separated by browser) the percentages of who is first or at least second. For example on Internet Explorer, Erin is in 79% first, and in 98% at first or second. I would say that was a good cheating!

So yes, this code is causing highly biased results. Feel free to test it with more browsers.

It's also interesting that we can easily test which of the three browser categories yours is by using the sort function for fingerprinting:

Your browser:

This detection was done with a very simple javascript, testing just one specific case where all four act different:

simple fingerprinting source

Additionally I felt like writing some javascript that shows the entire table and creates a full fingerprint for all rows. The following table shows the sorting results for your current browser:

The fingerprint is the concatenated text string of the id for each row. The id is calculated like this: The five input values are treated like a base 3 number of five digits. This gives a maximum number of 243 (decimal). That number in hex are the first two digits of the id. Then we have the length of the input (usually 2..5, but theoretically it can also be 1 or 6) and one of six possible different outputs. These two values are combined into one more character.

Here's how the code looks like:

actual calculation function

loop over all 3^5 combinations and call calculation

simplify data (shorter sequences don't need all rows)

output table

calculate fingerprint

print the fingerprint data

If you need this, you can copy the text from view source, or ask me on Twitter if you want that I add it to GitHub.

Update from 2 October 2016: fixed some typos, added some charts, added javascript code for fingerprint string and dynamic table creation.

Tuesday, September 9, 2014

Screenshots from Apple Keynote

My experience from today's live steaming of the Apple keynote:

Nice presentation!

Saturday, May 18, 2013

De-obfuscating some JavaScript malware

Is Antivirus (AV) snake oil? Well, I won't go deep into this right now, but I need to provide some quick background information. Since about ten years I was using Avira Free. It worked quite well and blocked some threats on family members' computers, but never anything on mine. About a year ago I read somewhere "If your AV didn't block anything in the last year, you probably don't need it." That convinced me, because it only blocked some false-positives in the about ten years I had it installed. Actually I'm not afraid of getting any malware through social engineering or by opening something bad, but I am afraid of getting malware through some 0-day or a targeted attack. So about a year ago, I didn't reinstall any AV when re-installing my computer. I installed EMET though and also Windows has the built-in Windows Defender. That's sufficient for me. I'm not talking about normal consumers.

About three weeks ago, when looking for a hotel, I was surprised to get Windows Defender taking action, just by opening the start page of the hotel:

Windows Defender in action

We all know the drive-by exploits and stuff, but with me running Internet Explorer 10, with Enhanced Protected Mode enabled (it's off by default) and the system fully patched, why would there be a dangerous threat that Defender should block? That's interesting and I started to dig into this. So where do we start? View the HTML page source of course.

First disappointment. What I saw was this:

View Source - lots of includes

So it's full of includes. Should I look through each of them and examine everything? Well, scrolling a little down, I quickly saw something that caught my attention:

suspicious code

IE's source code viewer made this into a nice multi-line view, but actually this is just one long line of code.

Ok, so we have some obfuscated JavaScript here. While there are automated de-obfuscators on the Internet readily available (like this one), let's do this manually. It's more fun anyway. I'm not afraid of any JavaScript running in my well sandboxed browser, even if it's IE, so why should this be something to block? I want to know the details. So let's start.

In order to do something with this, we have to disable Windows Defender. Even if we save this in a Notepad text file, Defender kicks in and deletes the file. In Windows 8, Defender is no longer in Control Panel Category View, it's hidden. You have to switch to the old Icon View in order to see it. Ok, let's disable Defender and let's start.

The first obvious thing here is that there's a lot of white space after the initial script tag. That's probably to hide itself, because on systems that show this source code in one line, the rest is invisible, scrolled out on the right. Ok, let's remove these spaces. Then there is a big string, with some strange codes in them. In order to understand what this does, I just removed the content of the string in order to see the rest of the code. So it looks like this:

obfuscated code

So finally we have something to de-obfuscate. First of all, let's add some line breaks and indentation. Then it looks like this:

nicely formatted obfuscated code

So let's remove the obfuscation. We'll do the following:

zz=3; ... if(zz)... → remove this, because it's always true
dbshre=53; if(dbshre)... → remove this, because it's always true
ss=(123)?String.fromCharCode:0; → replace this with ss=String.fromCharCode;
asgq variable → rename it to code_1
p=parseInt; ... p(...) → use directly parseInt(...)
ss=String.fromCharCode; ... ss(...) → use directly String.fromCharCode(...)
gdsgsdg variable → rename it to exc1, as it is an exception
agdsg variable → rename it to exc2, as it is an exception
vfvwe variable → rename it to flag, as it is set to 0 or 1

After all this, it already looks a lot cleaner:

after first step of de-obfuscation

But there's still some cleanup left. Let's continue:

try{document.body} → the document.body is an object HTMLBodyElement. Applying the Bitwise AND Operator with a Number causes an invalid argument exception and will get catched by the catch(exc1){...} part. So leave away the entire first part. I assume this was written to confuse automated de-obfuscating tools.
if(window.document) → remove this, because it's always true (twice in the code)
try{document;}catch(exc2){flag=1;} → This is not doing anything. Not even throwing an exception. Remove the entire code part.
flag=0; ... if(!flag)... → remove this, because it is always true
e=eval; ... e(s); → use directly eval(s);

Ok, so we have the code de-obfuscated now:

de-obfuscated JavaScript code

Actually we could leave away the { } for the for-loop too. Anyway, looking at this, it seems to be clear now what this code does. First, it replaces the at-character in the string with the digit 9 and then splits the string by the exclamation mark. These hex values between the exclamation mark get with the parseInt converted to a number and then to a character. So these are all ASCII codes. Finally they get concatenated to a string, which then gets executed with the final eval command. So the question is, what is in this string that gets executed?

We could simply replace the final eval(s); with an alert(s);, but that wouldn't be nicely formatted and not ready for copy and further examination. So let's use the string and do it manually.

I opened Notepad and replaced

@ → 9
!a! → !0a!
!0d! → !0d!
! → space character

So we get a list of hex codes:

hex data for second stage code

So to continue, we could convert each character manually with an ASCII table or write a program to do it. I used one of the online converters for that. After that conversion to text we get this:

second stage code

So this second stage code is not obfuscated, but badly formatted. So after formatting and renaming the main function from zzzfff to mainfunc we get this:

second stage code, formatted

So finally we can start analyzing it. So we have three functions and a small code block. This code block runs right away and executes GetCookie. From the nice name they left in there we can assume it reads a cookie (which it does if you look at the code of GetCookie) and if the cookie is found, nothing else is done. If the cookie is not there, then the other function with the nice name SetCookie will be called. This simply stores a cookie. We don't need to go into the details of these two functions, only that SetCookie sets an expiration time of today.getTime()+1day (the parameter to setTime is in milliseconds since a fixed date). This is to execute the mainfunc() only once per day. After SetCookie, we get to the main function, the core of this "malware".

So what does mainfunc() do? First it creates an object jn, which is an iframe with 1 pixel size and without a border and pointing to some external URL. Then the code checks if the page already has an HTML element with the id 'jn'. If not, it writes (document.write) at the current position a div tag with the id 'jn' and adds the iframe object into it.

This means that all this malware does is to inject a div tag with an iframe object that loads in the iframe the content of another site, presumably with malicious code in there exploiting some unpatched browser or plugin vulnerability. So per se this code is nothing malicious at all. It would be the same as having an iframe tag on the page itself. It was just hidden in some obfuscated code.

The mentioned URL doesn't work anymore, it results in a 404 (not found). But at the time of testing this, it still worked. But I couldn't get it to serve me anything. It just returned an "ok" text. I thought it does maybe some browser fingerprinting by checking the User Agent, so I tried various strings there, even of old browsers, but I never received anything. So maybe it serves the malware only to certain IP ranges (country specific). Anyway, this php serving the malware and anything it returns would be part of a further blog post. This write up was just for the injection JavaScript.

Some general thoughts: I thought the variable names in the obfuscated code were to distract AV detectors and they were always different. It seems that this is not the case and they are always the same in this variant of the "malware", only the URL in the contained second stage code varies and the length values varies (in the for loop) and also some assigned value for the useless variables. Then the second stage code seems to be not minified nor obfuscated in any way. It even contains useful function and variable names. Formatting could've been better, but the programmers didn't care of shortening the code for unknown reasons. For me all this looks like they were not very professional, more like some script kiddies or someone using tools.

Additionally the same injected code seems to appear on websites all over the world. The funny thing is that obviously this was an automated attack on these sites, probably exploiting some common bug, as for some, the injected code doesn't even work and the JavaScript is at a place where it gets displayed instead of executed, like here:

injection at the wrong place

If you look at the source here:

source code of wrong place injection

You can see here that our JavaScript was added in the middle of the meta tag. That doesn't work of course. Why would some automated tool put it there? For me this looks more like someone being directed to do that and being told "put it right after the line with head and meta tags", not even understanding HTML. We can also see that in this example WordPress 3.5.1 was used.

Searching Google for some code parts of the initial obfuscated code results in "about 170,000 results", including a few discussions about this code. On only a few of these pages I got this nice warning from Google:

Google warning

This specific page had our JavaScript code 13 times on the same page at different positions. Plus at least one other one, so I'm not 100% sure which one Google refers to. Anyway, nice to get a warning - and there's no link to the page; you cannot simply click "continue anyway."

One interesting aspect of this is that Google or your AV might block this initial iframe injection, but the underlying iframe source is already down. This also counts in their statistics of "having successfully blocked malware from innocent users", which is wrong of course. Blocking an iframe prevents nothing by itself, but it's a good mitigation.

I looked through the first three pages of Google results (first 30 results). Some of these have slightly different code, probably some variations, and in many cases the GetCookie/SaveCookie part is missing (so it's served always). The one we examined is probably newer. Looking at the second stage code on these pages results in 18 unique links for the malicious iframe. These are the links I found there, including the ones mentioned above (http replaced with hxxp to avoid hotlinking). First the URLs that are down or not reachable:

hxxp://brandemotion.kei.pl/csv/clik.php - 404 down
hxxp://209.238.172.66/rel.php - 404 down
hxxp://bffunsc.com/esd.php - 404 down
hxxp://cemugurel.tk/SpryAssets/relay.php - 404 down
hxxp://clutte[p..z]... (URL was cut off there)
hxxp://cvct.ie/Engineers/traf.php - 404 down
hxxp://devinedesignswy.com/counter.php - 404 down
hxxp://faktyinfo.pl/dtd.php - 404 down
hxxp://ftp.elhermeneuta.org/cgi-bin/clk.php - 403 forbidden
hxxp://losilla.com/dtd.php - 404 down
hxxp://thekidsclinic.us/counter.php - 404 down
hxxp://wineloverguide.com/_vti_bin/counter.php - DNS error
hxxp://wl29www42.webland.ch/admin/cnt.php - 404 down
hxxp://www.betterbailbonds.net/VLNSec01/cnt.php - 404 down
hxxp://www.onestepbuildingsystem.com/Documents/esd.php - 404 down

And these four are still working by the time of this writing:

hxxp://198.63.54.175/_mm/esd.php - up, served by IIS6
hxxp://dv8fitness.com/includes/clicker.php - 301 redirect to www site, still up, served by Apache
hxxp://ops.skylease.aero/dtd.php - up, served by Apache/2.2
hxxp://www.fansoftaylorswift.com/wp-includes/dtd.php - up, served by Apache/2.2

So these sites are probably all hacked and serve this malicious iframe. If you don't know what to do with your time, you could write some script to query Google for some indicators of this initial JavaScript code, get the results into some database, query all these pages, extract this JavaScript, automatically de-obfuscate it to get the URL from the second stage code into a second database table, then query all these URLs and list those that are still up and running. As a AV vendor I would do that.

For the four that are still running, the result looks like this when querying:

HTTP stream connecting to the malicious URLs

Here I'm connecting with IE10-64 directly, but I tried other User-Agents as well. I always get one of the two following results (depending on URL):

2_ok_0 (with "_" standing for a line break)
ok

Having the php source code of that page would help a lot of course, but I won't hack into those servers just for this reason. I might get associated to someone exploiting their server if I would try that.

When googling for this, I found some Intrusion Detection System (IDS) log file saying that it "detected malicious iframe injection" and also "detected BlackHole v2.0 exploit kit URL pattern", so they might be related.

Although we didn't get any deeper yet, I still hope you liked this write up.

Saturday, October 6, 2012

iOS photos EXIF data

I ran into an issue that looking at a photo made on my iPhone had the Exif data with the GPS (geolocation) stored in it and I wanted to post it online without revealing any personal details there. What I usually do then is just to click in Windows on Properties, Details and click "Remove Properties and Personal Information" as you can see in these two screenshots here.

Image Properties

Remove Properties and Personal Information

I don't know why I can't select F-stop, Exposure time and all that, but maybe that's not considered personal enough. Anyway, the problem now is that this removal process fails, although I remember this worked well at some time in the past. This is what happens:

Apply Property Error
An error occurred when writing the property 'Altitude' to the file 'IMG_0734.JPG'.

Not all personal properties were cleared
Windows was unable to remove properties from the
selected files. Before sharing these files, you should
review them for unwanted personal information.

After some googling, I found that other people had similar problems. Someone said that this is because iOS now writes some information about the fact that the photo was taken from the lockscreen as property into the image itself and Windows cannot handle that information. That sounded reasonable, but turned out to be wrong, as we will see later. But this got me interested in finding out what information is exactly stored in the Exif data of my photos. Another thing that I wanted to find out was how iOS knows if it is an HDR photo or not. It has to know it from somewhere, because HDR photos show this special symbol.

HDR symbol

So this got me enough interested now to start a full examination of the Exif headers. We will examine the following photo, so if you want to look at it as well, download it from Picasa and verify the checksum first, to make sure we're looking at the same file.

Apple Store Zurich
MD5: ddbe1f1ea166c7793328db78896f18dd
SHA1: ba2e522cdd8ffa1644cb4f09e38a8bee68359956

Ok, to start, I opened this jpg file in a hex editor. To find the Exif area, you first have to understand the jpg image format. Every jpg image starts with the "marker" ff d8. Then there follow blocks of data. Each block begins with a two-byte marker and a two-byte length. So for our file we have the following blocks (simplified):
000000: ff d8                   ; marker SOI, Start Of Image

000002: ff e1                   ; marker APP1, Application-specific
000004: 3f fe                   ; length of this block (next block at 004002)
000006: 45 78 69 66 00 00 4d... ; content of this block

004002: ff db                   ; marker DQT, Define Quantization Table(s)
004004: 00 84                   ; length of this block (next block at 004088)
004006: 00 01 01 01 01 01 01... ; content of this block

004088: ff c0                   ; marker SOF0, Start Of Frame (Baseline DCT)
00408a: 00 11                   ; length of this block (next block at 00409b)
00408c: 08 09 90 0c c0 03 01... ; content of this block

00409b: ff c4                   ; marker DHT, Huffman table
00409d: 01 a2                   ; length of this block (next block at 00423f)
00409f: 00 00 01 05 01 01 01... ; content of this block

00423f: ff da                   ; marker SOS, Start Of Scan
004241: 00 0c 03 01 00 02 11... ; 12 bytes general data, then the image data

377f39: ff d9                   ; marker EOI, End Of Image
One thing that is difficult here is that the image data (in the 'ff da' tag) cannot be read easily manually, so there is no easy way to find the next block, so if there is a block after that one, we would probably miss it. If you want to know more details and want to investigate deeper, there are some links at the end of this post. But actually this is not important for now, as we only need to look at the APP1 block, which contains the Exif data. We will examine this APP1 block now.

In this APP1 block, this starts with an Exif marker to indicate the Exif information.
0000: 45 78 69 66 00 ; "Exif\0"
0005: 00             ; padding
0006:
After this 6-byte introduction the structure of the rest of the Exif-block is similar to a TIFF version 6 file. I will start showing the addresses now starting with zero again, because all further pointers are starting here as well. If you want to calculate this back to the offset in the entire file, you have to add 6 for the Exif introduction and 6 more bytes for the jpg start (SOI+APP1 markers and length). So just add 0xC or 12d to the offset to get the real position within the file.

Ok, after the Exif introduction above, the TIFF header starts, now as mentioned, starting with offset 0 here.
0000: 4d 4d       ; big-endian (Motorola)
0002: 00 2a       ; 42d magic number
0004: 00 00 00 08 ; Offset to first IFD (just following)First we have an identification that all the following values are stored in big-endian (Motorola-style), meaning you can read the bytes just from left to right. I like this a lot more than little-endian, where you have to read from right to left (even worse if you store nibbles in bytes in words and such stuff; get almost impossible to read manually). So this identification says that the rest of the Exif block has to be read in big-endian style. Please note that there is no alignment, so a SHORT value (two bytes) can start either at an even or at an odd address.

What the rest of this Exif information contains, is actually just a list of IFDs (IFD stands for TIFF image file directory). The first IFD (IFD 0) usually just follows the TIFF header, so that's why the above offset is set to 00000008, meaning we can read it at the next address.

How is an IFD constructed? First we have a length (number of entries in the directory), then the entries themself (each 0xc/12d bytes long), then a LONG value with a pointer to the next IFD (zero if no other follows) and then usually the data that is referenced by the entries.

Each entry looks like this:

two bytes tag identification
two bytes data type
four bytes number of values (not bytes!)
four bytes pointer to the data, or, if the data fits into four bytes, the data itself

The data types can be:

1=BYTE
2=ASCII (bytes, no unicode)
3=SHORT
4=LONG
5=RATIONAL (2 LONGs, first is numerator, then denominator)
7=UNDEFINED
9=SIGNED LONG
0xa=SIGNED RATIONAL

So in our example image we can find the following in IFD 0:
0008: 00 0b       ; 11d fields follow, each 0xc/12d bytes size

000a: 01 0f 00 02 00 00 00 06 00 00 00 92 ; 010f Make             06 ASC:92->
0016: 01 10 00 02 00 00 00 0a 00 00 00 98 ; 0110 Model            0a ASC:98->
0022: 01 12 00 03 00 00 00 01 00 01 00 00 ; 0112 Orientation      01 SRT:0001
002e: 01 1a 00 05 00 00 00 01 00 00 00 a2 ; 011a XResolution      01 RAT:a2->
003a: 01 1b 00 05 00 00 00 01 00 00 00 aa ; 011b YResolution      01 RAT:aa->
0046: 01 28 00 03 00 00 00 01 00 02 00 00 ; 0128 ResolutionUnit   01 SRT:0002
0052: 01 31 00 02 00 00 00 06 00 00 00 b2 ; 0131 Software         06 ASC:b2->
005e: 01 32 00 02 00 00 00 14 00 00 00 b8 ; 0132 DateTime         14 ASC:b8->
006a: 02 13 00 03 00 00 00 01 00 01 00 00 ; 0213 YCbCrPositioning 01 SRT:0001
0076: 87 69 00 04 00 00 00 01 00 00 00 cc ; 8769 ExifIFD          01 LNG:000000cc
0082: 88 25 00 04 00 00 00 01 00 00 02 4a ; 8825 GPS IFD          01 LNG:0000024a

008e: 00 00 03 14 ; pointer to next IFD (314->)

0092: 41 70 70 6c 65 00             ; "Apple\0"
0098: 69 50 68 6f 6e 65 20 34 53 00 ; "iPhone 4S\0"
00a2: 00 00 00 48 00 00 00 01       ; 72d/1=72d
00aa: 00 00 00 48 00 00 00 01       ; 72d/1=72d
00b2: 35 2e 31 2e 31 00             ; "5.1.1\0"
00b8: 32 30 31 32 3a 31 30 3a 30 34 ; "2012:10:04 13:39:02\0"
00c2: 20 31 33 3a 33 39 3a 30 32 00
So this IFD 0 gives us these properties:

Make: Apple
Model: iPhone 4S
Orientation: 1 (normal, the image doesn't need to be turned, it's horizontal already)
XResolution: 72d
YResolution: 72d
ResolutionUnit: 2 (Inch)
Software: 5.1.1 (iOS software version I used)
DateTime: 2012:10:04 13:39:02
YCbCrPositioning: 1 (centered)

But there are three more things in there. We have the properties ExifIFD and GPS IFD, both point to a Sub-IFD with more information, actually another IFD block with the same structure. Also, at the end there was a pointer to the next IFD (IFD 1). So we have three more IFDs to read; at addresses 00cc (Exif Sub-IFD), 024a (GPS Sub-IFD) and 0314 (IFD 1). Here they are. You can skip reading this block if you're not interested in these details.
First is the Exif Sub-IFD:
00cc: 00 18       ; 11d fields follow, each 12d bytes size, same as above

00ce: 82 9a 00 05 00 00 00 01 00 00 01 f2 ; 829a ExposureTime            | 5=RATNL | 01 | 1f2->
00da: 82 9d 00 05 00 00 00 01 00 00 01 fa ; 829d FNumber                 | 5=RATNL | 01 | 1fa->
00e6: 88 22 00 03 00 00 00 01 00 02 00 00 ; 8822 ExposureProgram         | 3=SHORT | 01 | 0002 normal prog
00f2: 88 27 00 03 00 00 00 01 00 50 00 00 ; 8827 ISOSpeedRatings         | 3=SHORT | 01 | 0050 80d, ISO 80
00fe: 90 00 00 07 00 00 00 04 30 32 32 31 ; 9000 ExifVersion             | 7=UNDEF | 04 | "0221" V2.21
010a: 90 03 00 02 00 00 00 14 00 00 02 02 ; 9003 DateTimeOriginal        | 2=ASCII | 14 | 202->
0116: 90 04 00 02 00 00 00 14 00 00 02 16 ; 9004 DateTimeDigitized       | 2=ASCII | 14 | 216->
0122: 91 01 00 07 00 00 00 04 01 02 03 00 ; 9101 ComponentsConfiguration | 7=UNDEF | 04 | 1230 -> Y,Cb,Cr
012e: 92 01 00 0a 00 00 00 01 00 00 02 2a ; 9201 ShutterSpeedValue       | a=S-RAT | 01 | 22a->
013a: 92 02 00 05 00 00 00 01 00 00 02 32 ; 9202 ApertureValue           | 5=RATNL | 01 | 232->
0146: 92 03 00 0a 00 00 00 01 00 00 02 3a ; 9203 BrightnessValue         | a=S-RAT | 01 | 23a->
0152: 92 07 00 03 00 00 00 01 00 05 00 00 ; 9207 MeteringMode            | 3=SHORT | 01 | 0005 (Pattern)
015e: 92 09 00 03 00 00 00 01 00 00 00 00 ; 9209 Flash                   | 3=SHORT | 01 | 0000 (not fired)
016a: 92 0a 00 05 00 00 00 01 00 00 02 42 ; 920a FocalLength             | 5=RATNL | 01 | 242->
0176: a0 00 00 07 00 00 00 04 30 31 30 30 ; a000 FlashpixVersion         | 7=UNDEF | 04 | "0100" V1.0
0182: a0 01 00 03 00 00 00 01 00 01 00 00 ; a001 ColorSpace              | 3=SHORT | 01 | 0001 (sRGB)
018e: a0 02 00 04 00 00 00 01 00 00 0c c0 ; a002 PixelXDimension         | 4=LONG | 01 | 00000cc0 (3264d)
019a: a0 03 00 04 00 00 00 01 00 00 09 90 ; a003 PixelYDimension         | 4=LONG | 01 | 00000990 (2448d)
01a6: a2 17 00 03 00 00 00 01 00 02 00 00 ; a217 SensingMethod           | 3=SHORT | 01 | 0002 colarea sens
01b2: a4 01 00 03 00 00 00 01 00 03 00 00 ; a401 CustomRendered          | 3=SHORT | 01 | 0003 0=Norm,1=Cst
01be: a4 02 00 03 00 00 00 01 00 00 00 00 ; a402 ExposureMode            | 3=SHORT | 01 | 0000 auto exp
01ca: a4 03 00 03 00 00 00 01 00 00 00 00 ; a403 WhiteBalance            | 3=SHORT | 01 | 0000 autowhitebal
01d6: a4 05 00 03 00 00 00 01 00 23 00 00 ; a405 FocalLengthIn35mmFilm   | 3=SHORT | 01 | 0023 35d
01e2: a4 06 00 03 00 00 00 01 00 00 00 00 ; a406 SceneCaptureType        | 3=SHORT | 01 | 0000 Standard

01ee: 00 00 00 00 ; offset to next IFD (none)

01f2: 00 00 00 01 00 00 00 78       ; Exposure time in seconds: 1/120d
01fa: 00 00 00 0c 00 00 00 05       ; F Number: 12d/5d=2.4d
0202: 32 30 31 32 3a 31 30 3a 30 34 ; "2012:10:04 13:39:02\0"
020c: 20 31 33 3a 33 39 3a 30 32 00
0216: 32 30 31 32 3a 31 30 3a 30 34 ; "2012:10:04 13:39:02\0"
0220: 20 31 33 3a 33 39 3a 30 32 00
022a: 00 00 15 bf 00 00 03 26       ; Shutter speed, APEX setting 15bf/326=5567d/806d=6->(1/2^x)->1/64d
0232: 00 00 12 ed 00 00 07 7e       ; lens aperture, APEX unit 12ed/77e=4845d/1918d=2->(sqrt(2)^x)->2
023a: 00 00 22 7e 00 00 06 1b       ; value of brightness: 227e/61b=8830d/1563d=~5.6d EV
0242: 00 00 00 6b 00 00 00 19       ; actual focal length of the lens in mm: 6b/19=107d/25d=4.28d
This is the GPS Sub-IFD:
024a: 00 09       ; 9 fields follow, each 12d bytes size, same as above

024c: 00 01 00 02 00 00 00 02 4e 00 00 00 ; 0001 GPSLatitudeRef     | 2=ASCII | 02 | "N\0" (north latitude)
0258: 00 02 00 05 00 00 00 03 00 00 02 bc ; 0002 GPSLatitude        | 5=RATNL | 03 | 2bc->
0264: 00 03 00 02 00 00 00 02 45 00 00 00 ; 0003 GPSLongitudeRef    | 2=ASCII | 02 | "E\0" (east longitude)
0270: 00 04 00 05 00 00 00 03 00 00 02 d4 ; 0004 GPSLongitude       | 5=RATNL | 03 | 2d4->
027c: 00 05 00 01 00 00 00 01 00 00 00 00 ; 0005 GPSAltitudeRef     | 1=BYTE | 01 | 0 (above sea level)
0288: 00 06 00 05 00 00 00 01 00 00 02 ec ; 0006 GPSAltitude        | 5=RATNL | 01 | 2ec->
0294: 00 07 00 05 00 00 00 03 00 00 02 f4 ; 0007 GPSTimeStamp       | 5=RATNL | 03 | 2f4->
02a0: 00 10 00 02 00 00 00 02 54 00 00 00 ; 0010 GPSImgDirectionRef | 2=ASCII | 02 | "T\0" (True North dir)
02ac: 00 11 00 05 00 00 00 01 00 00 03 0c ; 0011 GPSImgDirection    | 5=RATNL | 01 | 30c->

02b8: 00 00 00 00 ; offset to next IFD (none)

02bc: 00 00 00 2f 00 00 00 01 ; latitude: 47d/1, 2248d/100d, 0/1 -> 47° 22.48'
02c4: 00 00 08 c8 00 00 00 64
02cc: 00 00 00 00 00 00 00 01
02d4: 00 00 00 08 00 00 00 01 ; longitude: 8/1, 3233d/100d, 0/1 -> 8° 32.33'
02dc: 00 00 0c a1 00 00 00 64
02e4: 00 00 00 00 00 00 00 01
02ec: 00 00 01 b6 00 00 00 01 ; Altitude in meters: 1b6/1=438d/1
02f4: 00 00 00 0b 00 00 00 01 ; timestamp: 11d/1,39/1,206/100 -> "11:39:2.06 GMT"
02fc: 00 00 00 27 00 00 00 01
0304: 00 00 00 ce 00 00 00 64
030c: 00 00 b1 76 00 00 00 97 ; direction of image: b176/97=45430/151=~300.86°
And this is IFD 1:
0314: 00 06       ; 6 fields follow, each 12d bytes size, same as above

0316: 01 03 00 03 00 00 00 01 00 06 00 00 ; 0103 Compression                 | 01 SRT:0006 (JPEG old-style)
0322: 01 1a 00 05 00 00 00 01 00 00 03 62 ; 011a XResolution                 | 01 RAT:362->
032e: 01 1b 00 05 00 00 00 01 00 00 03 6a ; 011b YResolution                 | 01 RAT:36a->
0346: 02 01 00 04 00 00 00 01 00 00 03 72 ; 0201 JPEGInterchangeFormat       | 01 LNG:372->
0352: 02 02 00 04 00 00 00 01 00 00 30 ed ; 0202 JPEGInterchangeFormatLength | 01 LNG:000030ed

035e: 00 00 00 00 ; offset to next IFD (none)

0362: 00 00 00 48 00 00 00 01 ; number of pixels per resolution unit in image width direction: 48/1=72d
036a: 00 00 00 48 00 00 00 01 ; number of pixels per resolution unit in image length direction: 48/1=72d

Here starts the thumbnail image, also part of this IFD 1:

0372: ff d8 ff db 00 43 00 02 ; JPEG image (thumbnail 160x120)
...
3457: 8c f4 a2 8a 82 d1 ff d9 ; end of JPEG image thumbnail, JPEG length is 000030ed
345f:

00346b: 00 00 00 00 00 00 00 00 ; null bytes at address (file offset now) 0372+000030ed+C=00346b
...
003ffa: 00 00 00 00 00 00 00 00 ; number of null bytes:
                                ; APP1_size-length_size-Exif_start-Exif_content_size=3ffe-2-6-345f=b97,
                                ; also 004002-00346b=b97 (2967d)

004002: ff db                   ; marker DQT, Define Quantization Table(s)
004004: 00 84                   ; length of this block (next block at 004004+0084=004088)
004006: 00 01 01 01 01 01 01... ; content of this block

So where is the HDR marker (this is an HDR image)?

The HDR marker is the a401 CustomRendered=3 tag in the Exif IFD. In non-HDR photos, this tag is completely missing. In non-HDR photos there are two other tags, both in the Exif Sub-IFD instead:

9214 SubjectArea, four SHORTs, X-Y-W-H for the area you clicked to be the subject, located between tags 920a and a000
a40a Sharpness, one SHORT, value 0 (=Normal), located at the end

In the non-HDR version of this image, I had the SubjectArea coordinates X=065F, Y=04C7, W=H=0371.

Ok, nothing special so far. But the bugs? Did you spot anything that could've caused Windows to react in such a way? Well, yes, there are many things that are bad here:

In IFD 1 the counter says there are 6 fields, but there are actually only 5.
In the Exif IFD, a405 FocalLengthIn35mmFilm has a value of 35. I assume this is wrong and not accidentally just exactly 35. Probably they don't care and just wanted to fill out this field with "something". The focal length is 4.28mm, but calculated into 35mm film?
old-style thumbnail: The usage of the tags 0201 and 0202 has been disallowed since 17-Mar-1995, because there are problems and incompatibilities. See TIFF TechNote 2 in the links for details.
After the thumbnail image there are a lot of zeroes. While not disallowed, this is still a waste of space in the file.
The Exif IFD tag a401 CustomRenderer, used as the HDR marker, allows only the values 0 (Normal) and 1 (Custom). The value of 3 (HDR) is not defined in the standard. The standard says that custom tags can be obtained and I don't see the reason why Apple is not using any newly requested standard tag for that.
In the GPS IFD the mandatory version tag is missing.
The GPS IFD has a GPSTimeStamp tag, but not a GPSDateStamp.

From all these problems in the image, I think what could cause the bad Windows behaviour would be the wrong count of elements and the missing version tag in the GPS IFD. Maybe also the old-style thumbnail. I fixed all these problems in the file and still had the same problem in Windows.

Finally I found out that having the GPS IFD in general is causing this. It is not iOS related. I found several images with geolocation information on the Internet (from four different Nikon cameras: Coolpix P6000, D90, D300, D200) and the problem existed there as well. So Windows cannot remove GPS location properties. I think geolocation data is the most personal identifying information that a photo can have, so if Windows cannot remove that, then this feature is totally broken. I tried only with Windows 7, 64-bit, Ultimate, english. Maybe it's fixed with Windows 8. And from the above iOS image problems, maybe some are fixed with the release of iOS6, but I didn't test yet.

Here some links for more information about the file structure that were helpful for me if you want to dig into this yourself:

Wikipedia EN: JPEG A general introduction to JPEG and the basic block markers
xbdev.net: jpeg_file_layout Some more information about the JPEG file structure
opennet.ru: jpeg.txt A thorough description of the JPEG format including image decoding
media.mit.edu: exif Description of Exif file format
exif.org: specifications Official EXIF format specification (see EXIF 2.2 PDF there)
awaresystems.be: tifftags/baseline A nice description of each TIFF tag, see also Extension Tags, Private Tags, etc. in the menu
awaresystems.be: tag jpeginterchangeformat Short description of JPEGInterchangeFormat and why it shouldn't be used
remotesensing.org: TIFFTechNote2 TIFF Technical Note #2
adobe.com: TIFF6 TIFF specification Revision 6.0
picasaweb.google.com Photo of Zurich Apple Store

Saturday, February 12, 2011

Privacy Security bug in Facebook

Under certain circumstances Facebook shows you email addresses of your friends, even if they are marked with the security property "Only Me", which should mean that nobody can see it.

In this screenshot you can see such a setting. Actually I would recommend not to store any information on Facebook that nobody should see. But maybe you want an email address to login to Facebook that is separate from all other emails you use, or you need to add your company's email to add yourself to a Facebook network, or you just want to hide this temporarily or whatever.

When a new person joins Facebook, Facebook wants you to get your profile updated and to get new friends as soon as possible, so you get these advertisements that the person is new on Facebook and to suggest him or her new friends that he or she might also know. This is very useful to connect people and to get them quickly up to speed.

Unfortunately there is a slight glitch in this mechanism. When you suggest that the new person connects with somebody else and they make friends, you get a notification email. In this email you get notified that the two people joined. In this email you get a link to suggest the new person more contacts. It also contains your login email, so that you only have to enter your password. The problem now is that it doesn't contain your email, but errornously the one from the person new to Facebook. That way you can see the email of the person new to Facebook - independently of the security setting for this email.

So if you want to exploit this, you can suggest new friends to a person. If that person accepts any suggestion to join and you get a confirmation email, you can see the main Facebook email that the person uses. People often just accept friend requests if the people are from the same company, or other group, even if they don't really know them, so the chances are high that this way you could get new emails.

Here's an example of such an email. Black are my personal details (email, my name, user id or other information), red is the person that added the new yellow user. The first yellow mark is the full name, the second yellow mark is the first name only, and the right yellow mark is the secret email of the new user that should be my email actually.

Mitigation factors:

Exploiting this only works for friends, as you can only suggest new friends for your own friends.
The person has to accept a friend suggestion.
The only secret you can see is the person's main login email. In many cases this email is already known.
The bug is fixed by Facebook in the meantime, but was open for several months - if not years.
Somewhen between September 2009 and July 2010 Facebook changed the format of the link in this email. In the old version no email was visible in the link. It is unknown what email was displayed when clicked on the link, so possibly the bug was introduced at that time.

Timeline:

31-Dec-2010 After finding the bug, I immediately tweeted about it. Looking back now, I know I shouldn't have done this, but nobody noticed anyway.
31-Dec-2010 I thought about this again and decided to contact Facebook. Because I didn't find a security contact, I notified abuse support department and only told them that I found a security problem, without any details.
14-Jan-2011 Lillie from User Operations asked me for the details. A day later I replied her with the details.
19-Jan-2011 Lillie replied "Thanks for bringing this to our attention. We are investigating this matter and are working to get this issue resolved as soon as possible. We appreciate your report."
06-Feb-2011 I received a suggestion-confirmation email with the corrected link. Facebook never replied that it's solved.
12-Feb-2011 This blog was created and the problem made public.

In case you need to contact me for any questions, you can reach me on Twitter (@SwissHttp).