Why I don’t have a lot of personal info on the internet- OLAP – BIGDATA and the future of privacy

I was asked this question while having a chat with a few friends. Mainly the complaint was I don’t have a lot of social media presence. The only real social media I follow is LinkedIn and that’s because it is mostly a Professional network. But more importantly, the reason why I don’t have a massive internet presence is because I have been part of some significant BI projects and appreciate the level of information that can be extracted by the person who knows how.

While most of today’s discussions revolve around BigData, I would like to share an example from my days in Thomson Reuters. We used SSAS to query data from the FAST Search engine during the subprime mortgage crisis. What most people didn’t realize about the subprime mortgage crisis was that a few months before the banks started collapsing, Citibank, HSBC etc. announced a quarterly loss of excess 10.8 billion dollars mainly due to bad debt or subprime mortgages. Within 24 hours of this news hitting the wire, the most frequently searched term was Lehman Brothers. About 6 -8 months later Lehman Brothers filed for Bankruptcy. While sitting in IT, we weren’t able to connect the dots. Those in the industry were able to forecast which banks were in trouble and planned mergers and acquisitions in anticipation.

Cut to the present.

Today big news doesn’t just belong to big corporate MNCs. Things that shape the public perception could be anything from Gagnam style to the presidential elections. But a mistake people make is to assume that big analytics is done only for big things. A lot of companies use social media to keep track of the buzz around their products and even when people talk about poor service etc. While this is a good thing, it is still a double edged sword. I had recently posted about performing BI on your phone bill. Take your phone bill for a few months, plug it into SQL Server and identify patterns.

e.g.

  • If you attend a call in the middle of the night I know the person you are talking to is important to you
  • If I track the average times between the first call of the day and the last I will know when you go to sleep and when you wake up
  • If you get calls on the weekend I can assume those people are personal friends than professional
  • If you call someone in the middle of the night I can assume they are very close and the people you depend on.
  • If certain numbers only call you and certain other numbers only receive calls from you then I can make certain assumptions of the dynamics of the relationship.
  • If you are on a call regularly at a particular time each week I can assume you have a recurring meeting at this point.

This and many more….

Now you’re probably thinking that all this is based on the assumption that I have your phone bill with me. Thing is, I could probably have something better like a mobile app installed on your phone (especially the free kind or the poorly built one that is easy to hack or the one where I tell you outright but you are too busy to read the terms). Now imagine the same thing happening to your online conversations on Facebook, twitter or LinkedIn (they have apps too). Say for example, I take it one step further and correlate this with your bank statement and things get even more interesting, take it further and tie it up to your medical records and it gets even more interesting and so on and so on.

How did things get this far. It’s actually very simple when Facebook launched, a lot of people claimed that being an online platform it can never replace real face to face interactions but we see today that its pervasive in how it has impacted almost all the social connections a person has. This false sense of security allowed companies to really mine the data without the end user actually putting any restrictions on what they share on the internet.

So am I winning my fight against the internet:-

    No because it’s not a fight, it’s a choice I made on how much of my life I want a clustered index on.

    Even if it was a fight I don’t hope to win, because when it’s all said and done, the internet still makes my life a lot easier than it makes hard.

    Will I ever adopt social media if my privacy was guaranteed, no because I still lack the social skills to really make friends on the internet.

Should you be bothered about this post, no not really, until maybe I start up a Hadoop cluster in Azure HDInsight and decide to mine twitter for all your tweets ;-). I am just kidding I would never do that to you. You can trust** me.

 

https://azure.microsoft.com/en-in/documentation/articles/hdinsight-analyze-twitter-data/