Data warehousing: -
The
term data warehouse goes to first coined by BILL IMMON in 1990. According IMMON
a data warehouse is a subjected oriented, integrated, time variant and
non-volatile collection of data. This data helps analysis to take inform
decision in an organization.
A data warehouse provides us
generalized and consolidated data in multidimensional view. A data warehouse is
a database which is kept separate from the organizations, operational database.
A data warehouse helps executive to organize, understand and use their data to
take strategic decision. Data warehouse system helps in the integration of
diversity of application system. An operational database maintains current data
on the other hand a data warehouse maintains historical data.
click for download pdf
click for download pdf
Data warehousing features
Subject oriented:
-
A data warehouse is subject oriented because
it provides information around the subject rather than the organization’s on
going operation. These subjects can be products, customers, suppliers, sales,
revenue. A data warehouse does not focus on ongoing operation rather that it focuses
on modeling and analysis of data or decision making.
Integrated: -
A
data warehouse is constructed by integrating data from historical sources. Such
as relational database, flat files etc. this integration enhances the effective
analysis of data.
Time variant: -
The
data collection in a data warehouse provides information from historical point
of view.
Non- volatile: -
Nonvolatile
means provide data is not erased When new data is added to it. A data warehouse
is kept separate from the operational database is not reflected in the
warehouse.
Data warehouse applications
1. Financial service
2. Banking service
3. Consumer goods
4. Retail sectors
5. Control manufacture
Types of data warehouse
1. Information processing
2. Analytical processing
3. Data mining
1. Information processing: -
A
data warehouse allows processing the data store in it. The data can be processed by the means of querying
basic statically analysis, reporting using cross tabs, tables, charts or
graphs.
2. Analytical processing: -
A
data warehousing supports analytical processing of the information stored in
it. The data can be analytical processing operations including slice, dice,
drill down, and drill up, pivoting.
3. Data mining: -
It
supports knowledge discovery by finding hidden pattern, associations, constricting
analytical model, performing classification and prediction. This mining result
can be present using visualization tool.
Multidimensional data
model(MDDM)
The
dimensional model goes develop for implementing data warehouse and data marts.
MDDM provide data mechanism to store data and way for business analysis. The
two primary component of dimensional model that are dimension and facts.
Dimension: - Texture
attribute of analysis data.
Facts: - numeric volume
of analyze business.
Types of MDDM:
1. Data cube model
2. Star schema model
3. Know flake model
4. Fact constellations model
MDDM involve two type
of tables
1. Dimensional table:
·
consist of
tuple of attribute of the dimensions.
·
It is very
simple primary key
2. Fact table:
·
A fact
table has tuples one per a recorded fact.
·
It is
compound primary key.
Data mining
The data mart is a
subset of the data warehouse and is usually oriented specific business line for
team. Whereas data warehouse helps on enterprise wide, depth, the information
in data marts contains to a single department. The current trade in data
warehousing is to develop a data warehouse with several similar related data
marts for specific kinds of queries and reports.
Needs:
·
To
patrician data in order to employ access control strategies.
·
To speed
up the queries by reduce the volume of data to be scanned.
·
To segment
data into different hardware platform
·
To
structure data in a form suitable for a user access tool.
Cost of data mining
The
costs for measure for data martin are as follows:
1. Hardware and software cost
2. Network access
3. Time window constraints
1.
Hardware and software cost:
Although data marts
are created on some hardware they require some additional hardware and software
to handle user queries it requires additional processing power and disk
storage.
2.
Network access:
A data mart to be on a different location from the data warehouse so we
should ensure that the LAN or WAN has the capacity to handle them data volumes
be transferred within the data mart load process.
3.
Time window constraints:
The determination of how many marts are possible depends on network
capacity time window available and volume of data being transferred.
Meta data
Good meta data is
essential to the effective operation of a data warehouse and it is used in data
collection data transformation any data access .Transformation meta data
include a history of data transform, changes in names and other physical
characteristics access meta data provides navigation and graphical user
interface(GUI) that allow non-technical business user to interact with the contents
of the warehouse and these are main parts of meta data that’s requires in basic
operational meta data in a data warehouse stream.
The procedure of
how a data warehouse is used and accessed, procedure or monitoring the door of
the data warehouse relative to the available storage space and authorization on
who is responsible for and who has access the data in the data warehouse and
operational system.
Data mining
The concept of
data mining attracted attention for the users due to high of huge amount of
data and the need to convert such data into useful information and knowledge.
Data mining can very well row data into information which can be future use for
applications like fraud detection market analysis etc.
Data mining is viewed as a
consequence of the natural evaluation of information technology. It basically extracts
or mining knowledge from large amount of data. User some time also called data
mining is knowledge mining of from data, knowledge extraction, data or pattern
analysis. Data mining is basically an essential state in the process of
knowledge discovery knowledge discovery process that is containing seven
sequential steps:
1. Data cleaning: to remove noise and insistent data.
2. Data integration: to combine multiple data
sources.
3. Data selection: from where data relevant to
the analysis task are retrieved from the database.
4. Data transformation: data is transform into
forms appropriate for mining by performing summary operations.
5. Data mining: it is a required process where
smart techniques are applied in order to extract data patterns.
6. Pattern evaluation: to identify true pattern
representing knowledge.
7. Knowledge representation: here visualization
and knowledge representation techniques are used to present the mined knowledge
to the user.
Data
mining is the process of the discovering interesting knowledge from large
amount data store in database, data warehouse and other information
repositions. Data warehouse have many small parts or modules and data mining
one of them. That is based on data mining architecture of a typical data mining
system.
![]() |
Fig: architecture of a typical data mining system.
1. Database, data warehouse,
world wide web and other information repositories:
There are sets of database or information sets available. Data cleaning
and data integration techniques can be applied.
2. Database or data warehouse
server:
It basically factors relevant data which is based on the uses data
mining request.
3. Knowledge base:
It helps in
searching integration pattern. It uses techniques of concept hierarchy or
organizes data at different levels of abstraction.
4. Data mining engine:
It consists of a
set of functional modules for a task such as association, correlation analysis
classification prediction, cluster, and analysis and revelation analysis.
5. Pattern evaluation module:
It is interacting with data mining
modules so as to focus the search toward interesting patterns. The pattern
evaluation module may be interrelated if the mining module depending or the
implementation of data mining method used.
6. User interface:
This component communicates between users
and data mining system allowing to the user to interact with the system by
specifying data mining queries or task, providing information to help focus the
search and performing data mining based on the intermediate data mining
results.
Steps of data mining:
1. Deciding business objectives for data mining
2. Data preparation
3. Data mining
4. Analysis of result
5.
Assimilation of knowledge
1.
Deciding business objectives for data mining
These business objectives
are decided prior to the data mining process. Theses business objectives may
depend on the inferences of result that will be drawn by data mining process.
The precise definition
of the business needs is the first step in the direction. This can be achieved
only through the joint effort of the business analyst with domain knowledge and
data analyst who can translate the objective identify by the analyst into well
define data mining problems.
2.
Data preparation:
After decide the
objective the data is prepared for the mining process this process comprises of
the following types:
·
Data
selection- identification and extraction of data
·
Data
preprocessing – data sampling and quality assurance
·
Data
transformation- data conversation into a analysis model
3.
Data mining:
This step is comprises applying
the selected data mining algorithm to the data which already preprocessed. Theses
step combine with the next step. It involves actual mining process by the
analyst who is reequipped with preprocess data from various step the related
meta data and also some inside into contain of data based on the forgoing
analysis.
4.
Analysis of result:
This is the most
significant step of data mining process. Using the visualization aids and tools
analysis the data can be done by highly skilled data analyst with the help of
business analyst. Developing association rules is an important aspect of data
mining
5.
Assimilation of knowledge:
This is the last process of data
mining process. It comprises step to be taken to implement the knowledge gained
by data mining. The business analyst identifies the stages of implementation
for assimilating business knowledge into actual real life business processes
for the benefits of data mining exercise perform. The final benefit is focused
on importance data of data as an aspect of the organization.
Data mining
issue:
Data mining is a
not easy task as the algorithm used can get very complex and data is not always
available at the one place. It needs to be integrated from the various
heterogeneous data sources. In data mining there are major issues
1. Mining methodology and user interaction
2. Performance issues
3. Diverse data type issues.
1.
Mining methodology and user interaction:
Mining different kind
of knowledge in database different users may be interested in different kind of
knowledge. Therefore, it is necessary for the data mining cover a broad range
of knowledge discovery task.
·
Enter
active mining of knowledge at multiple level s of abstraction.
·
Data
mining query languages.
·
Presentation
and visualization of data mining results – ones the patterns are discovered its
needs to be expressed in high level language and visual representation. These
representations should be easily understandable or not.
·
Handling
noisy or incomplete data.
·
Pattern
evaluation.
2.
Performance issues:
Efficiency and scale
ability of data mining algorithm. In order to efficiency extract the
information from huge amount of data in data base. Data mining algorithm must
be efficient and scalable.
Parallel distributed
and incremental mining algorithm: - the factor such as huge size of data base
wide distribution of data and complexity of data mining methods motivate the
development of parallel and distributed mining algorithm.
3.
Diverse data types issues:
Handling the
relational and complex of data the data base may contain complex data object,
multimedia data object temporal data object spatial data etc. it is not
possible for one system to mine on these kinds of data.
Mining information
from heterogeneous database and global information system.
The data base is
available at different data sources on LAN and WAN. These data sources may be
structured and semi structured or unstructured. Therefore, mining the knowledge
from then challenges to data mining.
Hello! Great article and thank You for Providing Such a Unique and valuable information for your readers. I really appreciate it. You can also visit Shipping from China to Amazon FBA for more **** related information and knowledge.
ReplyDelete