TEAM DATA SCIENCE PROCESS (TDSP)
Where ever you go there is talk of Cloud Computing. So what is Cloud Computing? Why there is a big space for Cloud Computing?
Cloud Computing is a network-based computer service where many applications are stored and the process of data is carried on over the internet as you use you pay policies.
The present trends are going on Cloud Computing. Major Cloud makers are Amazon Web Services, Google Cloud and Microsoft Azure. Cloud Computing is a vast topic to describe in detail. Taking the most underrated and important topic TDSP (Team Data Science Process ) in Microsoft Azure.
The Team Data Science Process (TDSP) is an efficient methodology in the field of Data Science offered by Microsoft AZURE. It is an agile system which predicts analysis over the data and produces solutions with intelligent applications. Microsoft Azure is offering this feature to help many companies to ensure an effective way of using analytics programs.
There are four Key Concepts in TDSP, Data science lifecycle, standardised project structure, Infrastructure and resources, Tools and Utilities.
Data Science Lifecycle:
TDSP provides a lifecycle to make a structure of our Data Science Projects. If we want to use CRISP-DM or KDD in an organisation with our customer requirements, normally we can’t do it. But using skill-based techniques we can achieve many features in Azure. There are many common things yet they are connected with different methodologies.
The major steps that many Data science projects iteratively take upon to execute are:
v Business Understanding
v Data Acquisition and Understanding
v Customer Acceptance
STANDARDIZED PROJECT STRUCTURE:
Having many projects containing a structure of directories, which use templates for many project documentations. This makes the people in the team members giving access to information. The codes and documents are stored in Version Control Systems (VCS) All code and documents are stored in a version control system (VCS), like Git, Subversion, TFS etc. There is an agile project tracking system tasks and like Jira, Rally. Azure Dev Ops allows the collaborated team in closer tracking of the code for many individual features enabled. Tracking systems are worthy of using and they are cost-efficient for the team projects in Data Science.
RESOURCES AND INFRASTRUCTURE IN DATA SCIENCE PROJECTS :
TDSP started providing recommendations to the project developers for managing shared analytics of data sets and cloud storage infrastructure such as cloud file systems for datasets, databases, big data (Hadoop or Spark) clusters`, machine learning service.
In analytics and cloud storage the unprocessed and raw data is stored. The infrastructure provided by the Azure services gives reproducible analysis. Azure does not support duplication of data sets, it needs unique and new data sets in every single input. This is because if duplication of data sets is allowed there might rise in inconsistencies to the user which shows the effect on the cost of the utilities. Many tools are provided for the welfare of the user like sharing of resources and tracking them too. It is a good practice for the team to have a clear and consistent work environment.
TOOLS AND UTILITIES IN EXECUTION OF PROJECT:
The user might face many challenging issues while introducing the processes. Microsoft Azure Tools provide an implementation of data science projects in a stepwise order like Life cycle & Infrastructure etc, They help to reduce the barriers and improve the consistency of the project adoption. Team Data Science Process of Microsoft Azure (TDSP) provides the user with an initial set of scripts and tools which ensures them to jump-start the adoption of tools in a team. TDSP also automates some of the daily routine tasks in data science life cycles like data exploration and modelling of baseline. User can share their repositories in local social platforms with some tools provided by Microsoft Azure enabled with well-defined structures that are provided for the users to contribute shared code repository. The resources can be leveraged by other projects too within the team or the organization. The Data Science Process (TDSP) also enables contributions of tools to the whole community. The utilities and tools of TDSP could be cloned from GITHUB.
ROLES AND TASKS IN MICROSOFT AZURE:
The Team Data Science Project in Microsoft Azure is a structured methodology that acts as a framework in the Team Project. There are many methods to access the Microsoft Azure tasks they are called Azure Dev Ops Tutorials. These tutorials can also be used in analytics machines as DSVM ( Data Science Virtual Machine). These machines have pretty more tools which are used for Data Science Projects. The tools offered in Azure Dev Ops’s Data Science Virtual Machine are pre-defined libraries and integrated ones.
There are three roles in TDSP :
ü Project Lead
ü Team Lead
ü Group Manager.
Tasks are done by the Roles:
1. Project Lead -> project repository, Azure file storage, DSVM
2. Team Lead -> DSVM, security control, utility repository
3. Group Manager -> project template, utility repository, security control policies, Azure DevOps organizations
These leads and roles take up the major roles in the TDSP Microsoft Azure. Cloud Computing is a buzz word in the present trend.