Please keep in mind that the following 10 best practices when you Defining Repository Objects in DAC
1. Container Behavior and Best Practices
-
When changes are made to objects in the container that owns them, the change is instantaneous.
-
Changes made to parent objects in the owner container are automatically pushed to the parent referenced objects.
-
When you add child objects to a parent object, you must use the Push
to References right-click command (Design view) to push the changes to
the child referenced objects. For example, if you add a column to a
table that is registered in DAC, the new column is not automatically
added to the references in the other containers referencing the parent
object. You must use the Push to References command to effect the column
changes in the other referenced tables.
-
When you delete a referenced object, only the referenced object is deleted. The original object is not deleted.
-
If you delete an object from the owner container, the object is
deleted as well as referenced objects in other containers. This is
referred to as a deep delete. For example, if
you delete a table from the owner container, the table and columns are
deleted from the owner container and all the containers that reference
this object.
-
If you delete a column from the owner table, the column is deleted in all the referenced objects.
-
If you delete child objects from the owner object, the referenced child objects are automatically deleted.
2. Task Behavior and Best Practices
The following behavior and best practices apply to tasks:-
Start your work with tasks in Informatica. After you create a workflow, do the following in the DAC Task tab:
-
Create a new task and assign it a logical (readable) name.
-
Enter the command for a full load or incremental load.
The commands can be the same. If the Command for Incremental Load field is left blank, no action occurs for this task while in incremental mode. If the Command for Full Load field is left blank, no action occurs for this task while in full mode.
-
Ensure all the source and target tables are defined for the task.
You can use the task synchronize functionality to import data from Informatica. You can also manually assign the source or target tables.
-
Create a new task and assign it a logical (readable) name.
-
Select at least one primary table because the incremental and full
mode properties are determined based on the refresh dates of the primary
table.
-
Design tasks so that they load only one table at a time.
-
Define granular tasks rather than tasks that include bigger blocks of
processes. Granular tasks are more efficient and have better
restartability.
-
Do not truncate a table on the source system tables (for example, Oracle, Siebel or PeopleSoft sources).
-
Ensure the truncate property for the target tables is set properly.
-
For tables that need to get truncated regardless of the mode of the
run (Full or Incremental), set the Truncate Always property to True.
-
For tables that need to get incrementally loaded, set the Truncate for Full Load property to True.
-
Select the Analyze Table option if the task should analyze the table.
The default value for this option is True if either of the Truncate
options are selected.
-
Do not truncate a table more than once within the single life span of an ETL.
-
If a task that writes to a target table is contingent upon another
table being loaded, use conditional tables. This ensures that the task
qualifies only if the conditional table is part of the subject area
design.
-
Assign an appropriate phase to the task. An understanding of task phases is essential to understanding ETL processes.
-
If you want to force a relationship where none exists, consider using
phase dependencies. For example, if you have an ETL process in which
the extract facts and extract dimensions do not share any common source
or target tables, but the design requires that the extract facts should
run before extracting dimensions, then, for the task that extracts
facts, add extract dimension as the phase that waits.
-
Ensure you do not introduce conflicting phase dependencies. This can cause the DAC Server to hang.
-
If the source qualifier needs to use a data parameter, always use the
DAC date parameter that can be formatted to the database-specific
syntax.
3. Task Group Behavior and Best Practices
The following best practices apply to task groups:-
Do not create task groups unnecessarily. Doing so can adversely
impact the DAC's auto-dependency functionality, which automatically
orders tasks for an execution plan. Create task groups only to satisfy
the points listed below.
-
Avoid circular relationships among tasks if the tasks are of the same
phase. For example, avoid situations in which Task 1 reads from Table A
and writes to Table B, and Task 2 reads from Table B and writes to
Table A. You can use task groups to avoid these situations. Note: If
tasks belong to different phases, this situation is acceptable.
-
If you have many tasks belonging to the same phase that write to the
same table, you can use task groups to run the tasks in parallel. If the
target tables need to be truncated before the tasks are run, select the
properties Truncate Always and Truncate Full Load in the Task Group
tab.
-
Do not mix extracts and loads under a single table group.
-
Do not make Task Groups for obvious ordering needs. DAC handles ordering in such cases.
-
If a source system container uses a task group, make other containers that reference the task also include the task group.
4. Table Behavior and Best Practices
The following best practices apply to tables:-
Always use all upper case characters for table names.
-
Ensure you set the Table Type property correctly in the Tables tab of the Design view.
-
For Teradata databases, pay attention to the Set/Multiset property.
If you anticipate that the data will contain duplicate rows, choose
Multiset as the value for this property.
-
DAC automatically associates foreign key tables with the referenced
table. You can also define which other tables need to be loaded, such as
aggregate tables, by associating these tables with the referenced table
using the Related Tables subtab of the Tables tab.
5. Index Behavior and Best Practices
The following best practices apply to indexes:-
Index names must be unique. Oracle, DB2 and Teradata databases
enforce this rule by default. However, SQL Server databases do not
enforce this rule. If you are using a SQL Server database, you must
ensure all index names are unique.
-
Always use all upper case characters for column names.
-
If you have a foreign key column, associate the foreign key table and
the join column. DAC uses this information to identify all the related
tables to be loaded when a certain table needs to be loaded in the data
warehouse.
-
Do not register any columns for source system container tables.
-
Ensure you add all the appropriate system columns. For example, all tables should have the following:
-
ROW_WID in number format.
-
INTEGRATION_ID in varchar format.
-
DATASOURCE_NUM_ID in number format.
-
ROW_WID in number format.
-
For Teradata databases:
-
Pay attention to the Teradata Primary Index property.
-
Pay attention to which columns need to gather statistics. Note that column statistics are somewhat equivalent to indexes.
-
If you would have had indexes that span multiple columns for other
databases, consider defining multi-column statistics for Teradata.
-
Pay attention to the Teradata Primary Index property.
6. Column Behavior and Best Practices
The following best practices apply to columns:-
Always use all upper case characters for table names.
-
Ensure you set the Table Type property correctly in the Tables tab of the Design view.
-
Always use all upper case characters for column names.
-
If you have a foreign key column, associate the foreign key table
with the join column. DAC uses this information to identify all the
related tables to be loaded when a certain table needs to be loaded in
the data warehouse.
-
For Teradata databases:
-
Pay attention to which columns need to gather statistics. Note that column statistics are somewhat equivalent to indexes.
-
If you would have had indexes that span multiple columns for other
databases, consider defining multi-column statistics for Teradata.
-
Pay attention to the Teradata Primary Index property.
-
Pay attention to which columns need to gather statistics. Note that column statistics are somewhat equivalent to indexes.
-
Do not register any columns for source system container tables.
-
Ensure you add all the appropriate system columns. For example, all tables should have the following:
-
ROW_WID in number format.
-
INTEGRATION_ID in varchar format.
-
DATASOURCE_NUM_ID in number format.
-
ETL_PROC_WID in number format.
-
ROW_WID in number format.
7. Configuration Tag Behavior and Best Practices
The following best practices apply to configuration tags:-
Use configuration tags to tag tasks that you do not want to be part of all the defined subject areas.
-
A tagged task can be re-associated with a subject area by assigning the configuration tag to the subject area.
8. Source System Parameter Behavior and Best Practices
The following best practices apply to source system parameters:-
Use source system parameters when tasks in the source system container need a particular value.
9. Subject Area Behavior and Best Practices
The following best practices apply to subject areas:-
To define a subject area, associate only fact tables with it. DAC
automatically computes which additional aggregate tables and dimension
tables to associate with the subject area based on the related tables
you define and foreign key relationships.
-
If you delete a task from a subject area using the Delete button on
the Task tab, the next time you assemble the subject area the task may
be included. However, if you inactivate the task by selecting Inactive
in the Task tab, the task will remain inactive when you re-assemble the
subject area.
-
Avoid adding tasks or inactivating tasks manually.
10. Execution Plan Behavior and Best Practices
-
If many tasks with the same name across source system containers read
and write to the same data source, DAC will consider them to be the
same task.
-
If the logical source to physical mapping yields multiple records, DAC will produce as many runtime instances of the task.
-
DAC orders tasks in the most efficient manner possible based on the following:
-
Phase of task
-
Source and target tables
-
Truncate table properties
-
Data source priority
-
Phase of task
No comments:
Post a Comment