Friday, August 19, 2011

DWH Material - II

ODSThe Operational Data Store (ODS) is a subject oriented, integrated, volatile & current data store containing detailed corporate data. ODS stores only a specific data (Ex. credit data) from different systems. Because it is always current, it requires very frequent data loads from source systems to the ODS.

Hierarchical Relationships
Any dimension's members may be organized based on parent-child relationships, typically where a parent member represents the consolidation of the members which are its children. The result is a hierarchy, and the parent/child relationships are hierarchical relationships. For example, the Time dimension would contain the following hierarchy
Day – Week – Month – Quarter – Year.Based on the business queries, such hierarchies would be formed in all dimensions.


ExtractionRefers to the process of extracting data from OLTP systems by means of scripts/tools for use in the data warehouse.

TransformationRefers to a set of procedures that are used to ensure the uniformity of data in the warehouse. Multiple OLTP systems may store the same data in different formats. For example, gender may be stores as Male/Female, 1/0, M/F. The process of standardizing this storage and applying the standard for all extracted data is called transformation.
CleaningRefers to the process of validating the extracted data. The business queries would specify some business rules that the extracted data has to undergo. The process of applying these rules is called cleansing.

ETLExtraction, Transformation & Loading process loads the data from heterogeneous source systems to the data warehouse. The data is also cleansed by eliminating duplicate rows, standardizing on acronyms (For ex, co., comp., company etc) in the staging area. The ETL process is triggered at a frequency based on the business needs.

Multi-dimensional/HyperCube/CubeA group of data cells arranged by the dimensions of the data. For example, a spreadsheet exemplifies a two-dimensional array with the data cells arranged in rows and columns, each being a dimension. A three-dimensional array can be visualized as a cube with each dimension forming a side of the cube, including any slice parallel with that side. Higher dimensional arrays have no physical metaphor, but they organize the data in the way users think of their enterprise. Typical enterprise dimensions are time, measures, products, geographical regions, sales channels, etc.

Member/AttributeA dimension member is a discrete name or identifier used to identify a data item's position and description within a dimension. For example, January 1989 or 1Qtr93 are typical examples of members of a Time dimension. Wholesale, Retail, etc., are typical examples of members of a Distribution Channel dimension.


Consolidation/AggregationMulti-dimensional databases generally have hierarchies or formula-based relationships of data within each dimension. Consolidation involves computing all of these data relationships for one or more dimensions, for example, adding up all Departments to get Total Division data. While such relationships are normally summations, any type of computational relationship or formula might be defined.

No comments:

Post a Comment