jeudi 5 mars 2009

Ghost cookie ?

  1. Strange behaviour

    We last experienced a strange behaviour on one of our sites: marketing department wanted a tracking cookie to be written when customers came to the site through a sponsored link displayed upon a Google search.

    So the sponsored link had a request part that indicated the marketing code to put in the tracking cookie.
    But that marketing code was never put in the cookie. Why ?

    In fact, our mechanism to write the cookie verifies if it has already been set. If it is, we do not overwrite it's value. And indeed, even before the user arrived on our site, a cookie was set on our domain ! This only happened when the browser was Firefox. No issues with IE, Opera, Safari and others in any version.

  2. What happens ?

    It took us a while to find out that Google uses link prefetching, which is only implemented in Mozilla browsers.
    When Google detects you're using a Mozilla browser, it puts this HTML code in the results page:

    <link rel="prefetch" href="http://url.to.fetch/">

    The link indicated in the prefetch is the one that Google expects most likely to be clicked according to the keyword(s) you entered in the search. This will never be a sponsored link though!

    Mozilla browsers parse that and will actually preload the page in the background.

    Now, we also have a tracking cookie for "spontaneous" visits from search engines. So it was this cookie that was set instead of the cookie from the sponsored link, even if people would click on the sponsored link: we don't overwrite it remember ?

  3. The solution

    Luckily, the Mozilla browsers add a specific HTTP header when performing the preload of the page upon following a prefetch link:

    X-Moz: prefetch

    So all we had to do is ignoring this request when handling the tracking cookie:

    if (!in_array('prefetch', $_SERVER)) {
        // Set tracking cookie
    }

Funny thing is that this has been around since 2005, and I only just now discovered it...

Other people long ago blogged about this, and indicate other solutions, like responding with a 404 HTTP or 503 Header, or using URL Rewrites.