The Transition to Data Science Architecture

Blog /The-Transition-to-Data-Science-Architecture

In my previous post, I described my point of view on “Data Science Architecture.” In this post, I will discuss the transition from traditional Data Architecture to Data Science Architecture, focusing on two aspects:

  • Transition of individual talent to Data Science
  • Transition of enterprises to Data Science

Transition of Individual Talent

Data Architects, Data Analysts and Data Engineers often ask me, “How do I become a Data Scientist?” My answer is simple. Anyone can become a Data Scientist if he or she is:

  • Data savvy, smart and a data artist
  • Committed
  • Ready to invest time for learning new programming languages like Python, R for Data Analysis

Let’s look at some of the transitional elements below.

The transition to Data Science Architecture is faster for data engineers, DB developers, and data analysts who are good in statistics and data mining algorithms. Business analysts with good data mining techniques can also play an important role in this journey into the Data Science world. However, the transition is not easy. One has to learn new programming languages like Python, R or Java along with data analytics skills. Having a background in machine learning languages, knowledge and awareness of programming languages like R, MATLAB and knowledge of SAS can speed up the transition.

Data Architects, data designers, database developers and ETL developers who come from the traditional world of data modeling and application development need to approach problems differently. There is a transition from a relational world of Data Architecture to NoSQL database world (document-oriented database, column-oriented database and key-value pair concepts) in Data Science Architecture.

Transition of Enterprises to Data Science

The second and most important aspect that I want to focus on is the transition of enterprises to Data Science.

The enterprises have already entered an era of Big Data, and Data Science can take advantage of scientific, statistical, data mining techniques, and analysis of large volumes of data to improve the business profit and potential revenue generation. However, there are many technical challenges including:

  • Scale
  • Lack of infrastructure
  • Error handling
  • Privacy
  • Timeliness
  • Provenance
  • Visualization

These challenges must be addressed effectively before realization of real business benefits.

Unstructured data is just another source of data. However, the biggest question is: Do stakeholders know what to get out of this source of data, considering it is a new data source? There is a need to analyze what this unstructured, digital and documents-related data means to the organization. We need to determine if the organization is ready to take the steps in terms of the amount of information that will be added to their corporate assets and if stakeholders have analyzed what it takes to handle the new information flow - machine, software, analytic skills.

Old and traditional ways of Data Architecture like OLTP, OLAP solutions are going to remain, and this traditional Data Architecture cannot be replaced with Data Science Architecture. The major difference is: While the old and traditional way of RDBMS depends more on adding more data storage or more I/O for capacity to scale-up to a bigger server, the new “Data Science” approach to Data Architecture depends on virtual machines, more commodity kind of servers like cloud instances on Amazon, which is why they are called scale-out approach rather than scale-up.

Data Science Transition

While small and mid-sized businesses may not want to invest the time and effort to build the necessary infrastructure, they will look to utilize ‘Big Data as-a-service’ in the Data Science world; however, larger organizations can benefit from the pre-built services that they can readily deploy and use. In the Data Science world, Big Data and mobile integration play a key role with data visualization and analytics, where Big Data mobile apps complement Data Science because the mobile and tablet market is gaining additional market share from the laptops/PCs.

For the Data Science world, Big Data and NoSQL, like emerging DB technologies, are not just hype but they are talk of CIO meetings, international summits, webinars, and conferences. There is a need to switch gears to new world of Data Science Architecture. Enterprises are getting ready to invest more for Big Data application development and scale-out approach for the data model, though there are implications and this is an expensive affair.

The bottom line is that we need to bracket the problem and potential results to mitigate the risk with Data Science and to avail the most of the business benefits.

The transition from a relational/traditional world to Data Science world is quite challenging. But it’s important to understand this journey and how Data Scientists play a critical role in developing an intelligent enterprise.

There is already a buzz about “small data” in the IoT world. We will talk about this in upcoming posts.

More on Data Science and the Data Scientist will be covered in my next post.

Post Date: 8/20/2015

Prakash Mishra - NTT DATA Prakash Mishra

About the author

Prakash Mishra leads NTT Data’s Data Architecture and Management Practice. A solutions-driven, results-oriented, self-motivated leader, Prakash has a proven record of extensive data architecture leadership in a complex environment. Prakash has been involved in developing and leading the implementation of traditional and innovative big data strategies and solutions, data modernization and master data management solutions for small to large organizations. Prakash is a master in building and motivating high-performance teams, cultivating a positive work environment and promoting a spirit of teamwork and idea-sharing to maximize individual contributions. Prakash holds a master’s degree in computer science , with two decades of experience specialized in enterprise data architecture and management.

VIEW ALL POSTS
EXPLORE OUR BLOGS