Sunday, May 29, 2016

The Complete Privacy & Security Desk Reference Volume I

For most of my life, privacy hasn't really been something I've thought about too much. I've happily given out my name, address, phone numbers, email addresses and other information. In 1993 I proudly made my first personal website, and in the last decade have reveled in the human digital connections enabled by social media. However, as data scientists we know that our deep learning algorithms and cloud platforms are enabling a new era, where machines can get unprecedented insights into our everyday lives my mining millions of data points about us. Much of this can be for good, but it can also work against us - for instance when your health insurance doubles in price because the insurance company's algorithms predict that your health is going to go downhill soon, maybe based on your grocery shopping habits, cellphone trail and hypochondriatic web searches of late; or when your credit card information gets leaked in the latest hack.

Since data scientists and data engineers are the people enabling these activities, I believe we as a community need to put as much effort into understanding and mitigating the human and social implications of data science, as we put into our coding and analytics. This has many dimensions, but one of those is understanding the choices we have as individuals about what we do and do not share with the rest of the world, and what access we give to sensitive information such as our credit card numbers.

The Complete Privacy & Security Desk Reference Volume I: Digital is by far the most comprehensive guide I have seen to understanding the privacy and security choices we make in the digital world, and to how to take some control back about what gets shared about us. The book covers a multitude of techniques from the basic that we should all do, such as setting the privacy settings of browsers and using VPNs - to highly advanced methods such as masking credit card numbers, setting up aliases and keeping your home address information completely private, that are probably only going to be realistic if you are a public figure or you are unfortunate enough to be threatened by someone. The chapters are helpfully organized into "basic", "intermediate", "advanced" and "expert". Several chapters lead you through a process to find out exactly what information about you is publicly accessible on the internet, and how to have some of it removed if you wish to.

The book goes into a lot of detail about each of the topics it covers - for instance which browser you should use (Firefox), and exactly what settings to choose to prevent third party cookies tracking you. I have spent the last couple of weeks experimenting with a variety of the methods of the book, including using VOIP phones, VPNs, searching myself on the internet, and closing a few security and privacy loopholes. What is for sure - and the book is clear about this - is that there is a tradeoff between security, privacy and convenience. If I have any criticism of this book, it would be that once you get started implementing its suggestions it is not clear where to stop, since everything is connected to everything else. Unless you want to live like a secret agent in a foreign country, you're going to have to draw the line somewhere. I am not sure how many of my experiments will persist for me, but going through the process I have learned a lot about what digital trail I am leaving, and what choices I have to do something about it.

Overall I would highly recommend the book, as it shows that you have much more control about your digital data than you probably realize, and it gives you tools to help you find the right place for you on the privacy-convenience continuum.

Wednesday, March 2, 2016

Who owns the future? A must-read book

I'm not normally one for posting book reviews - in fact if I am quite honest, I'm not normally one for reading books. I can just about get through a journal article or a magazine, but my attention span is just too short to stick with something the size of a book. However, occasionally a book grabs my attention on the first page, and goes on to have a real impact on my thinking. Jaron Lanier's Who Owns the Future? is such a book. I think it is a must-read for anyone working with technology or data in the 21st century (i.e. all of us). Unlike most books which have one idea that is repeated over and over again, this one has new ideas on every page.

Who Owns the Future is probably at heart an economics book. It is about how big data infrastructure, and specifically what he calls siren servers - hugely powerful cloud computing infrastructures like Amazon and Google. These siren servers become monopolies that everything else - people and things - revolve around. They can do this because data is now becoming more important than things - and perhaps even people. The answer to the title of the book then becomes apparent - those who own the future are those who have access to the most powerful computation to leverage the most from data.

How can this be? A good example is in healthcare. If you've been to the doctor recently, you'll have noticed that the nurses and usually the doctors spend more time talking to their laptop than to you. Doctors are arguably becoming data entry clerks - or at best a small part in a computation process that converts patients' symptoms into diagnosis codes and treatment plans. The real value comes from those who can sum over all of the doctors making all of their decisions in aggregate and optimize accordingly - for example, which treatment plans work for certain kinds of patients. Perhaps we can even replace the doctor with a machine learning model that learns from tens of thousands of real doctors. The doctors give up their value to the "server", and then the one who owns the server (a large provider network, a health insurance company, or maybe even ultimately Google) reaps the value.

There are many other examples of this we see around us. One recent example was when Amazon opened a bricks-and-mortar bookstore in Seattle. The bookstore can probably beat Barnes and Noble, because Amazon knows exactly what books people want to buy in that square mile of Seattle; they use customer reviews and ratings (given online for free by all of us!) to guide and add value to customers. Lanier goes on to a fascinating journey questioning whether this is desirable, the economic impacts, and impacts on the value of people, and how the issues it brings up have been addressed through philosophy. Lanier ultimately recommends a micropayments system - where the value of data is shared among all of us.

If you just read one book this year, I think this should be the one you read. There are many ways you can use it to impact your thinking. For instance, you could ask: "what would it look like if my company stopped being a [fill in the blank] company, and became a data company?"; you could use it to inform your ethics and the decisions you make in your data science career; you can use it to position your career for the world ten years from now. But make sure you read it sooner rather than later.