Data Science is much like fine cuisine. Anyone can make a meal, but a good chef consistently creates culinary masterpieces. A good chef requires a kitchen capable of producing an entre of enduring value. It is the same with data science. A data platform is where a data scientist goes to work.
Data science software is undergoing a transformation, prompted by the accelerating development of underlying cloud technology. Each of the largest cloud providers now contain native database platforms. These platforms boast high-speed storage and compute capabilities that are more than sufficient to provide scalable and economically practical capacity for large-scale advanced analytics and data science practices. In addition, multi-cloud database platforms are emerging with these same capabilities, increasing the available vendor options. These data platform engines now provide critical native support for XML and JSON data storage, must-haves for any modern integration platform. Since data science requires bringing together data from many different disparate systems and platforms, it is critically important to have a strong database foundation at the center of the technology platform. Utilizing these native cloud platforms, data engineers can focus on data definition, integration, and transformation. Even small data shops can create full stack advanced analytics practices with minimal investment in people resources, while still reaping the benefits of laughably cheap cloud storage. Large shops can cut the cost of their advanced analytic platform to a fraction of their current investment. All data teams can bring faster value to their organizations. Those who invest deeply in these modern platforms will reap the largest benefits.
Database technology is not the only cloud platform that has dramatically increased in scale and function over the past few years. Native cloud queuing and messaging platforms have also matured significantly over the past few years and are now capable of handling the volume and speed requirements of top data programs. Coupled with the improved platform database capabilities, each major cloud provider now boasts a comprehensive technology platform for efficiently managing high volumes of data. IOT
For many industries, the internet of things (IOT) is a relatively new phenomenon. IOT is not new to transportation, but data collection speeds are accelerating at an unprecedented pace. Transportation companies continue to collect truck sensor data, but the span and volume of the available information continues to increase exponentially. Truck telemetrics provide a strong data foundation for deep analytical research in domains such as safety analytics and predictive maintenance. Ingesting, storing and processing the overwhelming amount of data requires modern data practices that are capable of handling these extreme data situations. Cloud platform technologies are a natural match to these increasingly challenging data sets.
In the world of high cuisine, the best chefs insist on the very best ingredients. The world’s finest sushi chefs engage in bidding wars to ensure the finest bluefin tuna for their premier quality sushi. French chefs make deals with local growers to ensure the first pick of the best produce. It is the same with data science. To ensure the highest quality results, it is important to ensure that data inputs are of the highest quality. In the world of IOT, this is a unique challenge. Sensor data is susceptible to failures in many forms. A fuel sensor could give a false reading due to a tractor unit parked on an incline. Sensors also fail when transmission devices or antennas fail or lose satellite connectivity due to obstruction. In addition, the sensors themselves, which are small electronic devices, can fail. ECM units, acting as electronic hubs, can also have issues. With all of these points of failure, it is incredibly important to have tolerance checks on data feeds to detect missing or incomplete data. These small failures might seem inconsequential in a large sea of data, and sometimes that is true. Context is important. To quote Benjamin Martin in the movie The Patriot, “Aim small, miss small”. If your data science initiative involves optimization with very small tolerances, it is critically important to ensure the highest quality data. If your data science initiative only requires directionally accurate results, then small data quality considerations are potentially not worth the time and effort. Make your choices carefully and intentionally.
Due to technical advances in cloud data platforms, a sustainable data science workbench can be created much more efficiently now than at any other time in the history of data analytics. Data science practitioners would be wise to become early adopters of these platform technologies to reduce the total cost of ownership of their data science portfolio and to reduce the amount of time that technical staff needs to spend simply maintaining a platform to support the work. Invest aggressively and reap the rewards.