Fly Fishing in the Big Data Lake

4 minute read

It’s thought that the term ‘Big Data’ was first used at the end of the 1980s but the last few years has seen a rise in an additional phrase: ‘Data Lakes’ - large scale repositories of data held in their raw or source form. I suspect that the reason data lakes have become so popular is because the cost of storage and the tools necessary to manipulate Big Data has decreased dramatically. That doesn’t mean to say that they haven’t introduced a whole new set of challenges and pitfalls as well.

Data without a design strategy is just data

I don’t dispute the potential value of Big Data (after all, I have a data science background) but I do take exception at some of the sloppy design thinking it tends to engender – along the lines of ‘don’t worry about what we store or how, just store it all and we’ll sort it out later’ (at the point of extraction and use). Here’s the problem – Lots of algorithms and especially analytics, requires the data to be in a structured form (think tables) and it can be extraordinarily difficult to apply structure after the fact. Worse, you may not even be capturing the right kind of data but you won’t know until you try and use what you’ve got. Sadly, there is still no getting around the need to do some planning about your data needs.


So, why the title of this blog? It came to me after a discussion with a large organisation that said it was going to implement a Data Lake strategy and use it to find insights about customers that Marketing could then use for segmentation and campaign targeting. As we talked more, I became aware that there wasn’t really a strategy about what kinds of insights they needed, how they would be generated, what sort of data would be required or even where it would come from in the first place. They were just going to store everything and hope the insights would appear after some kind of critical mass had been achieved.


Preparing for your catch

To stretch the lake metaphor further; that’s like turning up at the lake, baiting your hook with whatever you have at the time, casting into an area you hope holds fish and waiting to see what happens; you don’t really know what’s lurking beneath the surface, whether your bait is attractive or even whether there are any fish there in the first place. Nor do you know if what you catch will be particularly appetising. Fly fishing is different – it’s about preparing the fly before you leave for the lake, looking for an individual fish (one you want to catch), carefully getting to a good casting location, casting to just the right spot at the just the right moment and being able to eat whatever you land. So, back to the data lake – do you have clear objectives in mind about what you hope to extract from your lake? Are you adding the right kind of data and in a way that ensures you can use it again afterwards? Do you have the right tackle and the skills to use it? It is at this point it’s also worth noting out that some of the tastiest fish come from small rivers and streams.


If you are unsure of how to fish in your data lake, speak to a ghillie (a data expert).


Finally, I will leave you with another metaphor to ponder – have you ever heard of a case where adding more hay to the haystack made finding the needle easier?


Peter Dorrington is Director of Customer Insights at TeleTech Consulting and is expert on combining data science with behavioural science.


ViewPoint comment “Big data has the potential to transform businesses, we know that, but as Peter describes, data for data sake can be a distraction. ViewPoint helps to simplify the customer data process by focusing on what matters to the customer, giving operational feedback to managers alongside satisfaction scores for reports. Our suggested question “Is there anything else you’d like to tell us?” helps us to delve in to exactly what the customer wants to tell you. Surveys don’t have to be long and laborious – in fact our customers have found that some of the richest pickings come from the shortest surveys!”

About author

Leave a Reply