• Dr. Timothy Smith

Who Cares about Dark Data?


Photo Source: Wikipedia Commons

The term “Dark Data,” according to Gartner Inc., a research and analytics company (NYSE: IT), refers to information collected and stored by companies and institutions that is not currently being used for anything productive. In other words, dark data information that remains “in the dark” and therefore unable to add value to an organization. I like to call it sleeping data because in many cases it only needs to be woken up through analysis to bring value and insights to an organization. Often, companies keep dark data as a matter of compliance or simply just in case. Dark data appears in many forms and takes up massive amounts of storage space for many groups. Take, for example, text messages, emails, videos and voice recordings that accumulate every day in companies around the world. Every time someone calls the help line at a company or institution, a voice transcript is recorded and stored after the automated voice lets you know that the call may be recorded for quality control. Many companies will keep these recordings in storage but never refer to them again. The same goes for collected emails, text exchanges, or tweets. In a blog on OdinText.com, Tom H.C. Anderson lists a variety of dark data types such as images, reviews, social media comments, and web information. Often, dark data remains in the dark because companies like to hoard data thinking there must be value in this information, but companies do not often have the tools to make sense of their data. However, with the rise artificial intelligence and more powerful computers, analysts are increasingly drawing valuable insights from dark or sleeping data.

Dark data reveals valuable information for organizations. Steven Dong wrote in “Why 'Dark Data' is the Key to Better Serving Customers,” that Target India, a subsidiary of Target Inc. analyzes customer registrations with artificial intelligence to more personalize customer outreach. In this case, the registration data remained dark data until analysts found a way to make it work for the marketing department. One way companies have used this data is called “sentiment analysis.” Sentiment analysis simply refers to analyzing what people are saying about a subject on the web in social media websites like Facebook to understand what they are feeling about the subject. Much of what is said about a company on the web may be dark data because it is difficult to analyze the thousands of comments and tweets. That’s where artificial intelligence comes in. Artificial intelligence works especially well with very large amounts of data, and it helps to find which way the crowd is leaning by analyzing the sentiment across social media. Such analysis proves very valuable when launching a new product or tracking a new ad campaign. Expedia, the massive online travel company, gathers millions of customer reviews about hotels. Earlier this year, Expedia launched a new tool called “Guest Review Insights” which analyzes all the guest reviews for each hotel served by their site. The tool provides insights into how guests are feeling about their hotel stays. Customers can easily compare hotel favorability on a number of amenities such as the gym, pool, cleanliness and more. Additionally, hotels often do not have the capability to analyze their customer sentiment, so Expedia’s Guest Review Insights allows them to look at what categories of service to improve.

The individual as a customer and as a private citizen certainly makes up the other side of the dark data coin. Much value has been derived from previously dark data about customers and social media traffic. It is often difficult to know what information a company or institution gathers about individuals during business or in the use of social media. It may seem even a bit unnerving that your data is currently being collected at a mass scale and being stored for the sake of enterprise. In a 2015 Pew Research Study, 93% of people think that it is important that they have control over who has their data, but only 9% of people feel that they have a lot control over their data. This sentiment is relatively accurate, for according to Naked Security by Sophos, it is relatively easy to triangulate who a user is using dark data, even if your name is not necessarily attached to the data being collected. While one can take solace in that companies do aim to make improvements in their interactions with customers using the data they collect (such as the Guest Review Insights tool), it is also important to know that there are ways to mitigate the collection of dark data. One such method is making sure that you use a private browser as much as possible such as using incognito mode or an anonymous browser such as Tor. While it does not totally save you from getting your data collected, It does keep your data safer and harder to track. Another way is to always log out of your social networks after you use them. As convenient as it is to keep your Facebook or twitter always logged in, it makes it easier to release data by mistake. You can find more tips about keeping safe from dark data here. While data collection is a necessary evil when it comes to freely browsing the internet, there are ways to keep yourself safer and keeping all of your data from being used without your knowledge.

Dark data refers to information stored by organizations that often just sits in storage and does nothing but take up space. The improvements in artificial intelligence over the past few years offers the ability to analyze large amounts of data and have opened up the field of dark data analysis. Now, companies and institutions can wake up sleeping data to drive better insights into their operations and better serve their customers. Any business or institution should do some dark data mining to first just figure out what information it is regularly collecting, storing, and forgetting. Once you figure out what dark data you have as a business, you can begin to ask what you could learn from this information. Cases such as Expedia and Target India have already indicated that valuable information can be gained from data that was simply sleeping in storage. On the other side of the coin, individuals should be aware that their data is being sold and analyzed to support many types of efforts from marketing campaigns to election predictions. To retain personal privacy, everyone should carefully consider what they put on social media, read the privacy policies of their providers, and take precautions when browsing the internet.

Dr. Smith’s career in scientific and information research spans the areas of bioinformatics, artificial intelligence, toxicology, and chemistry. He has published a number of peer-reviewed scientific papers. He has worked over the past seventeen years developing advanced analytics, machine learning, and knowledge management tools to enable research and support high-level decision making. Tim completed his Ph.D. in Toxicology at Cornell University and a Bachelor of Science in chemistry from the University of Washington.

You can buy his book on Amazon in paperback and in kindle format here.