In the light of your experience what are the trends and challenges you’ve witnessed happening in the Big Data space?
One of the major trends we’ve seen is how the migration of operating applications in the public cloud has resulted increasingly in companies opting for their public cloud vendor’s native data services when it comes to Big Data solutions. The major public clouds like AWS, GCP and Azure all have their own offerings when it comes to data storage and data processing platforms that are elbowed out erstwhile dominant proprietary enterprise solutions in the past. The family of cloud service offerings taken as a whole presents a very compelling case and obviates the need to implement the external proprietary solutions.
Another trend that has seemingly gained steam is the emergence of the data lake as a necessary component of any big data strategy. Being able to dump data into a central repository independent of format or source for further processing is the central tenet of a data lake. Increasingly, companies are realizing the benefits of having a searchable catalog of data in the data lake that allows for data discovery and exploration.
Data privacy and security will continue to remain a big talking point and more emphasis will be placed on compliance to regulations
As companies invest more in Big Data, there is increasingly the need to harness data through data science and machine learning. The business intelligence discipline where we model data in the data warehouse is still hugely important but the new dimension of applying data science and machine learning to data is becoming more appreciated and more mainstream in data departments. Instead of humans predicting outcomes and proposing optimizations, the prospect of letting ML do all the heavy lifting is enticing a lot of companies to join the fray.
In terms of challenges, one of the main challenges in Big Data is deriving value from it to drive decisions for the company. As we collect more and more data in all forms, the question still remains how much value it is really providing us. There can be a missing bridge between the data that’s available in the data lake and how it can actually be leveraged to benefit the business – especially if there isn’t an inherent data driven culture, and without easy self-service tools.
It is also still quite challenging to find folks who have the right expertise in the trending areas mentioned. Data engineers who have expertise in both data and the cloud are not as plentiful. Another skill set that’s at a premium is expertise in Python as it has become the de facto language of choice of sorts for data processing. While there are a lot of Python scripters, it’s relatively hard to find data engineers who have worked with large Python code bases. Experienced data scientists who have more than just academic knowledge and ML engineers who can productionize ML models are also very hard to find.
Could you talk about your approach to identifying the right partnership providers from the lot?
In assessing potential partners in the data space, the framework we use is to always scrutinize first the clear value proposition that solves an existing pain point and then determining the likelihood of success for adoption within the company. The last thing we want to see is to spend money and time to roll out something that ends up being a solution looking for a problem. We next evaluate based on switching costs and complexity and duration of a migration path if it’s replacing an existing technology. Lastly, we look at ongoing maintenance costs which include the necessity of having personnel in house who can support the technology going forward. In general, we would prefer SAAS-based or managed solutions where the overhead of infrastructure support is obviated. As we operate exclusively on the public cloud, we have a bias towards solutions that are cloud native and have tight integrations with the cloud provider offerings.
What are some of the points of discussion that go on in your leadership panel? What are the strategic points that you go by to steer the company forward?
The discussions around our data strategy frequently start with questioning whether we are investing in the right areas that are most impactful to moving the business forward. Dedicating man hours and resources is a zero sum game and it’s important we are focused on the areas that provide the highest leverage. Getting accurate insights from data is crucial to our business and to this end, we need to make sure we are sourcing the right set of data and deriving the most value from it. Another major discussion point is determining whether we have the right metrics or KPIs that assess accurately how effective we are at evaluating our chosen strategies. One constant theme as well is how we shorten the feedback loop between our decisions and seeing the data that properly gauges its effectiveness.
How do you see the evolution of the Big Data arena a few years from now with regard to some of its potential disruptions and transformations?
The explosion of Big Data was predicated on capturing as much data as possible regardless of format to power the business both from an operational or analytical perspective. The central thought was that with more data, we’d have better insights. For the most part, we’ve solved the big data storage and processing problems but the derivation of insights still leaves a lot to be desired. The promise of Big Data transforming companies and entire industries is still playing out and is still evolving.
One potential evolution or transformation is a world where data increasingly becomes more commoditized and companies would rely on a marketplace of verticalized data vendors selling data insights powered by Big Data underneath the covers. Already, we are seeing this trend popping up on verticals such as finance, healthcare, hospitality, ecommerce, etc. The trend is clear that every vertical will have an AI industry sprout around it.
Data privacy and security will continue to remain a big talking point and more emphasis will be placed on compliance to regulations. It is incumbent on those handling data to look for solutions that can mature their data operations to make compliance painless and seamless. Long term, I think blockchain with its auditable open ledger will play a part eventually in enabling data privacy and security. Personal health or financial data could conceivably be placed on the blockchain and protected by personal keys.
What would be the single piece of advice that you could impart to a fellow or aspiring professional in your field, looking to embark on a similar venture or professional journey along the lines of your service and area of expertise?
The single piece of advice I can give is to be determined to learn from those who’ve done it before. It is quite common in this industry now for companies to maintain an engineering blog where they describe their data journey, the pitfalls and lessons learned. There is a lot of commonality in the paths that everyone has gone through and you don’t want to be wasting time making the mistakes somebody else has made.