Data warehousing



Data warehousing: -
                                                                The term data warehouse goes to first coined by BILL IMMON in 1990. According IMMON a data warehouse is a subjected oriented, integrated, time variant and non-volatile collection of data. This data helps analysis to take inform decision in an organization.



                A data warehouse provides us generalized and consolidated data in multidimensional view. A data warehouse is a database which is kept separate from the organizations, operational database. A data warehouse helps executive to organize, understand and use their data to take strategic decision. Data warehouse system helps in the integration of diversity of application system. An operational database maintains current data on the other hand a data warehouse maintains historical data.


click for download pdf

Data warehousing features
Subject oriented: -
A data warehouse is subject oriented because it provides information around the subject rather than the organization’s on going operation. These subjects can be products, customers, suppliers, sales, revenue. A data warehouse does not focus on ongoing operation rather that it focuses on modeling and analysis of data or decision making.
Integrated: -
                                A data warehouse is constructed by integrating data from historical sources. Such as relational database, flat files etc. this integration enhances the effective analysis of data.
Time variant: -
                                The data collection in a data warehouse provides information from historical point of view.
Non- volatile: -
                                Nonvolatile means provide data is not erased When new data is added to it. A data warehouse is kept separate from the operational database is not reflected in the warehouse.
Data warehouse applications
1.       Financial service
2.       Banking service
3.       Consumer goods
4.       Retail sectors
5.       Control manufacture
Types of data warehouse
1.       Information processing
2.       Analytical processing
3.       Data mining
1.     Information processing: -
A data warehouse allows processing the data store in it. The   data can be processed by the means of querying basic statically analysis, reporting using cross tabs, tables, charts or graphs.
2.     Analytical processing: -
A data warehousing supports analytical processing of the information stored in it. The data can be analytical processing operations including slice, dice, drill down, and drill up, pivoting.
3.     Data mining: -
It supports knowledge discovery by finding hidden pattern, associations, constricting analytical model, performing classification and prediction. This mining result can be present using visualization tool.
Multidimensional data model(MDDM)
                                                                                                                The dimensional model goes develop for implementing data warehouse and data marts. MDDM provide data mechanism to store data and way for business analysis. The two primary component of dimensional model that are dimension and facts.
Dimension: - Texture attribute of analysis data.
Facts: - numeric volume of analyze business.
Types of MDDM:
1.       Data cube model
2.       Star schema model
3.       Know flake model
4.       Fact constellations model
MDDM involve two type of tables
1.       Dimensional table:
·         consist of tuple of attribute of the dimensions.
·         It is very simple primary key
2.       Fact table:

·         A fact table has tuples one per a recorded fact.
·         It is compound primary key.
Data mining
The data mart is a subset of the data warehouse and is usually oriented specific business line for team. Whereas data warehouse helps on enterprise wide, depth, the information in data marts contains to a single department. The current trade in data warehousing is to develop a data warehouse with several similar related data marts for specific kinds of queries and reports.
Needs:
·         To patrician data in order to employ access control strategies.
·         To speed up the queries by reduce the volume of data to be scanned.
·         To segment data into different hardware platform
·         To structure data in a form suitable for a user access   tool.
Cost of data mining
                                                                The costs for measure for data martin are as follows:
1.       Hardware and software cost
2.       Network access
3.       Time window constraints

1.       Hardware and software cost:
Although data marts are created on some hardware they require some additional hardware and software to handle user queries it requires additional processing power and disk storage.
2.       Network access:
                                                  A data mart to be on a different location from the data warehouse so we should ensure that the LAN or WAN has the capacity to handle them data volumes be transferred within the data mart load process.
3.       Time window constraints:
                                                The determination of how many marts are possible depends on network capacity time window available and volume of data being transferred.
Meta data
Good meta data is essential to the effective operation of a data warehouse and it is used in data collection data transformation any data access .Transformation meta data include a history of data transform, changes in names and other physical characteristics access meta data provides navigation and graphical user interface(GUI) that allow non-technical business user to interact with the contents of the warehouse and these are main parts of meta data that’s requires in basic operational meta data in a data warehouse stream.
                                The procedure of how a data warehouse is used and accessed, procedure or monitoring the door of the data warehouse relative to the available storage space and authorization on who is responsible for and who has access the data in the data warehouse and operational system.
Data mining
                                The concept of data mining attracted attention for the users due to high of huge amount of data and the need to convert such data into useful information and knowledge. Data mining can very well row data into information which can be future use for applications like fraud detection market analysis etc.
                Data mining is viewed as a consequence of the natural evaluation of information technology. It basically extracts or mining knowledge from large amount of data. User some time also called data mining is knowledge mining of from data, knowledge extraction, data or pattern analysis. Data mining is basically an essential state in the process of knowledge discovery knowledge discovery process that is containing seven sequential steps:
1.       Data cleaning:   to remove noise and insistent data.
2.       Data integration: to combine multiple data sources.
3.       Data selection: from where data relevant to the analysis task are retrieved from the database.
4.       Data transformation: data is transform into forms appropriate for mining by performing summary operations.
5.       Data mining: it is a required process where smart techniques are applied in order to extract data patterns.
6.       Pattern evaluation: to identify true pattern representing knowledge.
7.       Knowledge representation: here visualization and knowledge representation techniques are used to present the mined knowledge to the user.

Data mining is the process of the discovering interesting knowledge from large amount data store in database, data warehouse and other information repositions. Data warehouse have many small parts or modules and data mining one of them. That is based on data mining architecture of a typical data mining system.



 





























               Fig: architecture of a typical data mining system.
1.       Database, data warehouse, world wide web and other information repositories:
                                                                                                                                       There are sets of database or information sets available. Data cleaning and data integration techniques can be applied.
2.       Database or data warehouse server:
                                                             It basically factors relevant data which is based on the uses data mining request.
3.       Knowledge base:
                           It helps in searching integration pattern. It uses techniques of concept hierarchy or organizes data at different levels of abstraction.
4.       Data mining engine:
                              It consists of a set of functional modules for a task such as association, correlation analysis classification prediction, cluster, and analysis and revelation analysis.



5.       Pattern evaluation module:
It is interacting with data mining modules so as to focus the search toward interesting patterns. The pattern evaluation module may be interrelated if the mining module depending or the implementation of data mining method used.
6.       User interface:
This component communicates between users and data mining system allowing to the user to interact with the system by specifying data mining queries or task, providing information to help focus the search and performing data mining based on the intermediate data mining results.


Steps of data mining:
1.       Deciding business objectives for data mining
2.       Data preparation
3.       Data mining
4.       Analysis of result
5.       Assimilation of knowledge
1.       Deciding business objectives for data mining
These business objectives are decided prior to the data mining process. Theses business objectives may depend on the inferences of result that will be drawn by data mining process.
The precise definition of the business needs is the first step in the direction. This can be achieved only through the joint effort of the business analyst with domain knowledge and data analyst who can translate the objective identify by the analyst into well define data mining problems.
2.       Data preparation:
After decide the objective the data is prepared for the mining process this process comprises of the following types:
·         Data selection- identification and extraction of data
·         Data preprocessing – data sampling and quality assurance
·         Data transformation- data conversation into a analysis model
3.       Data mining:
                This step is comprises applying the selected data mining algorithm to the data which already preprocessed. Theses step combine with the next step. It involves actual mining process by the analyst who is reequipped with preprocess data from various step the related meta data and also some inside into contain of data based on the forgoing analysis.
4.       Analysis of result:
                                This is the most significant step of data mining process. Using the visualization aids and tools analysis the data can be done by highly skilled data analyst with the help of business analyst. Developing association rules is an important aspect of data mining
5.       Assimilation of knowledge:
                This is the last process of data mining process. It comprises step to be taken to implement the knowledge gained by data mining. The business analyst identifies the stages of implementation for assimilating business knowledge into actual real life business processes for the benefits of data mining exercise perform. The final benefit is focused on importance data of data as an aspect of the organization.

Data mining issue:
                                Data mining is a not easy task as the algorithm used can get very complex and data is not always available at the one place. It needs to be integrated from the various heterogeneous data sources. In data mining there are major issues
1.       Mining methodology and user interaction
2.       Performance issues
3.       Diverse data type issues.
1.       Mining methodology and user interaction:
Mining different kind of knowledge in database different users may be interested in different kind of knowledge. Therefore, it is necessary for the data mining cover a broad range of knowledge discovery task.
·         Enter active mining of knowledge at multiple level s of abstraction.
·         Data mining query languages.
·         Presentation and visualization of data mining results – ones the patterns are discovered its needs to be expressed in high level language and visual representation. These representations should be easily understandable or not.
·         Handling noisy or incomplete data.
·         Pattern evaluation.
2.       Performance issues:
Efficiency and scale ability of data mining algorithm. In order to efficiency extract the information from huge amount of data in data base. Data mining algorithm must be efficient and scalable.
Parallel distributed and incremental mining algorithm: - the factor such as huge size of data base wide distribution of data and complexity of data mining methods motivate the development of parallel and distributed mining algorithm.
3.       Diverse data types issues:
Handling the relational and complex of data the data base may contain complex data object, multimedia data object temporal data object spatial data etc. it is not possible for one system to mine on these kinds of data.
Mining information from heterogeneous database and global information system.
The data base is available at different data sources on LAN and WAN. These data sources may be structured and semi structured or unstructured. Therefore, mining the knowledge from then challenges to data mining.

1 comment:

  1. Hello! Great article and thank You for Providing Such a Unique and valuable information for your readers. I really appreciate it. You can also visit Shipping from China to Amazon FBA for more **** related information and knowledge.

    ReplyDelete