Search This Blog

Monday 3 September 2012

DAC Best Practices

Please keep in mind that the following 10 best practices when you Defining Repository Objects in DAC

 1. Container Behavior and Best Practices

The following behavior and best practices apply to containers:
  • When changes are made to objects in the container that owns them, the change is instantaneous.
  • Changes made to parent objects in the owner container are automatically pushed to the parent referenced objects.
  • When you add child objects to a parent object, you must use the Push to References right-click command (Design view) to push the changes to the child referenced objects. For example, if you add a column to a table that is registered in DAC, the new column is not automatically added to the references in the other containers referencing the parent object. You must use the Push to References command to effect the column changes in the other referenced tables.
  • When you delete a referenced object, only the referenced object is deleted. The original object is not deleted.
  • If you delete an object from the owner container, the object is deleted as well as referenced objects in other containers. This is referred to as a deep delete. For example, if you delete a table from the owner container, the table and columns are deleted from the owner container and all the containers that reference this object.
  • If you delete a column from the owner table, the column is deleted in all the referenced objects.
  • If you delete child objects from the owner object, the referenced child objects are automatically deleted.

2. Task Behavior and Best Practices

The following behavior and best practices apply to tasks:
  • Start your work with tasks in Informatica. After you create a workflow, do the following in the DAC Task tab:
    • Create a new task and assign it a logical (readable) name.
    • Enter the command for a full load or incremental load.
      The commands can be the same. If the Command for Incremental Load field is left blank, no action occurs for this task while in incremental mode. If the Command for Full Load field is left blank, no action occurs for this task while in full mode.
    • Ensure all the source and target tables are defined for the task.
      You can use the task synchronize functionality to import data from Informatica. You can also manually assign the source or target tables.
  • Select at least one primary table because the incremental and full mode properties are determined based on the refresh dates of the primary table.
  • Design tasks so that they load only one table at a time.
  • Define granular tasks rather than tasks that include bigger blocks of processes. Granular tasks are more efficient and have better restartability.
  • Do not truncate a table on the source system tables (for example, Oracle, Siebel or PeopleSoft sources).
  • Ensure the truncate property for the target tables is set properly.
  • For tables that need to get truncated regardless of the mode of the run (Full or Incremental), set the Truncate Always property to True.
  • For tables that need to get incrementally loaded, set the Truncate for Full Load property to True.
  • Select the Analyze Table option if the task should analyze the table. The default value for this option is True if either of the Truncate options are selected.
  • Do not truncate a table more than once within the single life span of an ETL.
  • If a task that writes to a target table is contingent upon another table being loaded, use conditional tables. This ensures that the task qualifies only if the conditional table is part of the subject area design.
  • Assign an appropriate phase to the task. An understanding of task phases is essential to understanding ETL processes.
  • If you want to force a relationship where none exists, consider using phase dependencies. For example, if you have an ETL process in which the extract facts and extract dimensions do not share any common source or target tables, but the design requires that the extract facts should run before extracting dimensions, then, for the task that extracts facts, add extract dimension as the phase that waits.

  • Ensure you do not introduce conflicting phase dependencies. This can cause the DAC Server to hang.
  • If the source qualifier needs to use a data parameter, always use the DAC date parameter that can be formatted to the database-specific syntax.

3. Task Group Behavior and Best Practices

The following best practices apply to task groups:
  • Do not create task groups unnecessarily. Doing so can adversely impact the DAC's auto-dependency functionality, which automatically orders tasks for an execution plan. Create task groups only to satisfy the points listed below.
  • Avoid circular relationships among tasks if the tasks are of the same phase. For example, avoid situations in which Task 1 reads from Table A and writes to Table B, and Task 2 reads from Table B and writes to Table A. You can use task groups to avoid these situations. Note: If tasks belong to different phases, this situation is acceptable.
  • If you have many tasks belonging to the same phase that write to the same table, you can use task groups to run the tasks in parallel. If the target tables need to be truncated before the tasks are run, select the properties Truncate Always and Truncate Full Load in the Task Group tab.
  • Do not mix extracts and loads under a single table group.
  • Do not make Task Groups for obvious ordering needs. DAC handles ordering in such cases.
  • If a source system container uses a task group, make other containers that reference the task also include the task group.

4. Table Behavior and Best Practices

The following best practices apply to tables:
  • Always use all upper case characters for table names.
  • Ensure you set the Table Type property correctly in the Tables tab of the Design view.
  • For Teradata databases, pay attention to the Set/Multiset property. If you anticipate that the data will contain duplicate rows, choose Multiset as the value for this property.
  • DAC automatically associates foreign key tables with the referenced table. You can also define which other tables need to be loaded, such as aggregate tables, by associating these tables with the referenced table using the Related Tables subtab of the Tables tab.

5. Index Behavior and Best Practices

The following best practices apply to indexes:
  • Index names must be unique. Oracle, DB2 and Teradata databases enforce this rule by default. However, SQL Server databases do not enforce this rule. If you are using a SQL Server database, you must ensure all index names are unique.
  • Always use all upper case characters for column names.
  • If you have a foreign key column, associate the foreign key table and the join column. DAC uses this information to identify all the related tables to be loaded when a certain table needs to be loaded in the data warehouse.
  • Do not register any columns for source system container tables.
  • Ensure you add all the appropriate system columns. For example, all tables should have the following:
    • ROW_WID in number format.
    • INTEGRATION_ID in varchar format.
    • DATASOURCE_NUM_ID in number format.
  • For Teradata databases:
    • Pay attention to the Teradata Primary Index property.
    • Pay attention to which columns need to gather statistics. Note that column statistics are somewhat equivalent to indexes.
    • If you would have had indexes that span multiple columns for other databases, consider defining multi-column statistics for Teradata.

6. Column Behavior and Best Practices

The following best practices apply to columns:
  • Always use all upper case characters for table names.
  • Ensure you set the Table Type property correctly in the Tables tab of the Design view.
  • Always use all upper case characters for column names.
  • If you have a foreign key column, associate the foreign key table with the join column. DAC uses this information to identify all the related tables to be loaded when a certain table needs to be loaded in the data warehouse.
  • For Teradata databases:
    • Pay attention to which columns need to gather statistics. Note that column statistics are somewhat equivalent to indexes.
    • If you would have had indexes that span multiple columns for other databases, consider defining multi-column statistics for Teradata.
    • Pay attention to the Teradata Primary Index property.
  • Do not register any columns for source system container tables.
  • Ensure you add all the appropriate system columns. For example, all tables should have the following:
    • ROW_WID in number format.
    • INTEGRATION_ID in varchar format.
    • DATASOURCE_NUM_ID in number format.
    • ETL_PROC_WID in number format.

7. Configuration Tag Behavior and Best Practices

The following best practices apply to configuration tags:
  • Use configuration tags to tag tasks that you do not want to be part of all the defined subject areas.
  • A tagged task can be re-associated with a subject area by assigning the configuration tag to the subject area.

8. Source System Parameter Behavior and Best Practices

The following best practices apply to source system parameters:
  • Use source system parameters when tasks in the source system container need a particular value.

9. Subject Area Behavior and Best Practices

The following best practices apply to subject areas:
  • To define a subject area, associate only fact tables with it. DAC automatically computes which additional aggregate tables and dimension tables to associate with the subject area based on the related tables you define and foreign key relationships.
  • If you delete a task from a subject area using the Delete button on the Task tab, the next time you assemble the subject area the task may be included. However, if you inactivate the task by selecting Inactive in the Task tab, the task will remain inactive when you re-assemble the subject area.
  • Avoid adding tasks or inactivating tasks manually.

10. Execution Plan Behavior and Best Practices

The following best practices apply to execution plans:
  • If many tasks with the same name across source system containers read and write to the same data source, DAC will consider them to be the same task.
  • If the logical source to physical mapping yields multiple records, DAC will produce as many runtime instances of the task.
  • DAC orders tasks in the most efficient manner possible based on the following:
    • Phase of task
    • Source and target tables
    • Truncate table properties
    • Data source priority

No comments:

Post a Comment

What is Oracle Database Link and How to create database link (DB Link)

 https://www.oracletutorial.com/oracle-administration/oracle-create-database-link/

Recent Posts