Fact
tables and dimension tables are the two types of objects commonly used in
dimensional data warehouse schemas.
Fact
tables are the large tables in your data warehouse schema that store business
measurements. Fact tables typically contain facts and foreign keys to the
dimension tables. Fact tables represent data, usually numeric and additive,
that can be analyzed and examined. Examples include Sales, Cost and Profit.
Dimension
tables, also known as lookup or reference tables, contain the relatively static
data in the data warehouse. Dimension tables store the information you normally
use to contain queries. Dimension tables are usually textual and descriptive
and you can use them as the row headers of the result set. Examples are
Customers or Products
Fact
Table:
A
fact table typically has two types of columns: those that contain numeric facts
(often called measurements), and those that are foreign keys to dimension
tables. A fact table contains either detail-level facts or facts that have been
aggregated. Fact tables that contain aggregated facts are often called summary
tables. A fact table usually contains facts with the same level of aggregation.
Fact
tables are summarized into three types:
1 .
Additive
2 .
Semi-Additive
3 .
Non-Additive
Additive
facts can be aggregated by simple arithmetical addition. A common example of
this is sales.Semi-additive facts can be aggregated along some of the
dimensions and not along others. An example of this is inventory levels, where
you cannot tell what a level means simply by looking at it. Non-additive facts
cannot be added at all. An example of this is averages.
Requirements
of Fact Table:
You
must define a fact table for each star schema. From a modeling standpoint, the
primary key of the fact table is usually a composite key that is made up of all
of its foreign keys.
Dimension
Table:
A
dimension is a structure, often composed of one or more hierarchies, that
categorizes data. Dimensional attributes help to describe the dimensional
value. They are normally descriptive, textual values. Several distinct
dimensions, combined with facts, enable you to answer business questions.
Commonly used dimensions are customers, products, and time.
Dimension
data is typically collected at the lowest level of detail and then aggregated
into higher level totals that are more useful for analysis. These natural
rollups or aggregations within a dimension table are called hierarchies.
Commonly
dimension tables are categorized into three types:
1
Conformed dimension
2
Degenerated dimension
3.
Junk dimension
If we are saying
in simple word, Conformed dimension table is a table which is having at least
one or multiple primary key or unique key to join with one or multiple fact
table. For example product table is related with the sales fact table
A conformed dimension
is a set of data attributes that have been physically implemented in multiple
database tables using the same structure, attributes, domain values,
definitions and concepts in each implementation. A conformed dimension cuts
across many facts.
Dimensions are conformed when they are
either exactly the same (including keys) or one is a perfect subset of the
other. Most important, the row headers produced in the answer sets from two
different conformed dimensions must be able to match perfectly.
A junk dimension is a convenient
grouping of typically low-cardinality flags and indicators. By creating an
abstract dimension, these flags and indicators are removed from the fact table
while placing them into a useful dimensional framework.
A Junk Dimension is a dimension table
consisting of attributes that do not belong in the fact table or in any of the
existing dimension tables. The nature of these attributes is usually text or
various flags, e.g. non-generic comments or just simple yes/no or true/false
indicators. These kinds of attributes are typically remaining when all the
obvious dimensions in the business process have been identified and thus the
designer is faced with the challenge of where to put these attributes that do
not belong in the other dimensions.
One solution is to create a new
dimension for each of the remaining attributes, but due to their nature, it
could be necessary to create a vast number of new dimensions resulting in a
fact table with a very large number of foreign keys. The designer could also
decide to leave the remaining attributes in the fact table but this could make
the row length of the table unnecessarily large if, for example, the attributes
is a long text string.
Refer Ralph Kimball’s explanation about
Junk
Dimension
Hierarchies
Hierarchies
are logical structures that use ordered levels as a means of organizing data. A
hierarchy can be used to define data aggregation. For example, in a time
dimension, a hierarchy might aggregate data from the month level to the quarter
level to the year level. A hierarchy can also be used to define a navigational
drill path and to establish a family structure.
Within
a hierarchy, each level is logically connected to the levels above and below
it. Data values at lower levels aggregate into the data values at higher
levels. A dimension can be composed of more than one hierarchy. For example, in
the product dimension, there might be two hierarchies—one for product
categories and one for product suppliers.
Dimension
hierarchies also group levels from general to granular. Query tools use
hierarchies to enable you to drill down into your data to view different levels
of granularity. This is one of the key benefits of a data warehouse.
When
designing hierarchies, you must consider the relationships in business
structures. For example, a divisional multilevel sales organization.Hierarchies
impose a family structure on dimension values. For a particular level value, a
value at the next higher level is its parent, and values at the next lower
level are its children. These familial relationships enable analysts to access
data quickly.
Levels
A
level represents a position in a hierarchy. For example, a time dimension might
have a hierarchy that represents data at the month, quarter, and year levels.
Levels range from general to specific, with the root level as the highest or
most general level. The levels in a dimension are organized into one or more
hierarchies.
Level
Relationships
Level
relationships specify top-to-bottom ordering of levels from most general (the
root) to most specific information. They define the parent-child relationship
between the levels in a hierarchy.
Hierarchies
are also essential components in enabling more complex rewrites. For example,
the database can aggregate an existing sales revenue on a quarterly base to a
yearly aggregation when the dimensional dependencies between quarter and year
are known.
No comments:
Post a Comment