Data Mining and Software Applications

diwakar Dhungana
Analytics Vidhya
Published in
4 min readOct 10, 2019

We are entering a new world in which data and software products are well connected. But, nowadays data being more due to the extreme use of good and unique developed and developing software products. Data’s and software applications are much more important to each other on there respective ways.

Data Mining:
The explorations and analysis of large quantities of data in order to discover valid, novel, potentially useful and ultimately understandable patterns in data.

Valid: The patterns hold in general
Novel: We did not know the pattern beforehand
Useful: We can devise actions from the patterns
Understandable: We can interpret and comprehend the patterns.

Analysis of data and the use of software techniques for finding patterns and regularities in sets of data.

Data Mining Processes

Data — → Information — → Decisions

  1. Data:
    1.1 On-line Updates
    1.2 Batch Feeds
    1.3 Operational Data Store
  2. Information:
    2.1 Data Warehouse
    2.2 Data Mart
    2.3 Data Transformation
    2.4 Data Synchronization
  3. Decisions:
    3.1 Query and Reporting
    3.2 Data Mining
    3.3 On-line Analytical Processing
    3.4 Summary and details
    3.5 Drill capability

Knowledge Discovery:
It is the process of identifying valid, potentially useful, understandable patterns and relationships in data.

Data Mining Step:
1. Data Preprocessing:
1.1 Data Selection:
Identify target datasets and relevent fields

1.2 Data Cleaning:
Removing noise and outliers
Data Transformations
Create common units
Generate new fields

2. Data mining model construction
3. Mode Evaluation

Why we use Data Mining Today?
Human analysis skills are inadequate due to the:
— Volume and dimensionality of data
— High data growth rate
— Competitive Pressure

Data Mining not need any availability?
Data Mining is complex and difficulties processes as we know.
Availability are:
1. Data
2. Storage
3. Computational Power
4. Off-the-shelf software
5. Expertise

On what kind of Data, Data Mining used?
1. Database-oriented data sets and applications
-Relational Database
-Data Warehouse
-Transactional Database

2. Advanced data sets and advanced applications
-Data streams and sensor data
-Time-series data, temporal data, sequence data
-Structure data, graphs, social networks and multi-linked data
-Object-relational databases
-Spatial data and spatiotemporal data
-Multimedia database, Text database, The WWW (World-Wide Web)

Is Software Applications useful for Data Mining?
There are various kinds of applications developed for the different purpose to their respective used. We used applications for the easier, faster and efficiency measurements of any outputs that driven from any tasks and works. Since, there are many tools used for mining data from voluminous of data.

Software application:
An application is any program, or group of programs, that is designed for the end user.

Software used for data mining are:
1. Oracle Data Mining
2. RapidMiner
3. KNIME
4. Orange
5. IBM Cognos etc.

More faster than query in trend and patterns analysis.

Is there any Data Mining Techniques?
1. Descriptive:
It is used for characterizes the general properties of data in the database and also finds important patterns or information in data.

It mostly used during data exploration.

Descriptive Techniques includes:
1.1 Clustering:
1.2 Association
1.3 Sequential Analysis

2. Predictive:
It is used to predict outcomes whose inputs are known but the output values are not realized yet.

X — — → MODEL — — →Y = f(X)
where:
X: Vectors of independents variables
Y: Dependent Variable

Predictive Techniques includes:
2.1 Classification:
2.1.1 Decision Tree
2.1.2 Rule Induction
2.1.3 Neural Networks
2.1.4 Nearest Neighbor Classification

2.2 Regression

Where we can store data that extract from different source?
Data Warehouse:
It is a single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context.

Data Warehouse is a subject-oriented, integrated, time-varient and non-volatile collection of data in support of management’s decision-making process.

Subject-oriented: Focusing on the modeling and analysis of data decision makers, not on daily operations or transaction processing.

Integrated: Constructed by integrating multiple, heterogeneous data source. Data cleaning and data integration technique are applied.

Time-variant: Connection between the information in the warehouse and the time when it was entered.

Non-volatile: A physically separate of data transformed from the operational environment.

Data Warehouse

Why Separate Data Warehouse?
1.
High performance for both systems
- DBMS → tuned for OLTP (Online Transactional Processing)
-Warehouse → tuned for OLAP (Online Analytical Processing)

2. Different functions and different data
-missing data
-data consolidation
-data quality

OLAP:
OLAP applications and tools are those that are designed to ask ad hoc, complex queries of large multidimensional collections of data. It is often mentioned in the context of Data Warehouse.

Data Mart:
It is a subset of the information content of data warehouse that is stored in its own database.

Data mart can improve query performance simply by reducing the volume of data that needs to be scanned ti satisfy the query.

--

--