Advertisements

How to load data from multiple Excel files with similar but different names and varying number of sheets in QlikView

This post describes a scenario for loading data into QlikView from multiple Excel files with similar but different names and a different number of tabs.

Let’s say you need to load multiple Excel files containing information about orders into your QlikView application.  These files have different names, and each file may have a different amount of sheets.

For example, you may have several files with Order information from different sources for multiple dates such as:

CallCenter_Orders_20150312.xlsx
InStore_Orders_20150311.xlsx
SalesRep_Orders_20150312.xlsx
SalesRep_Orders_20150311.xlsx

Let’s say each file has one or more sheets representing regions/divisions – West, Mid-West, North East, and South.  Some files may have all 4 region/division sheets, while others may have just one region sheet.

This script is one possible way of loading this data in QlikView using a single script. With some adjustments, this script may also work for Qlik Sense, but I did not test exactly what changes would be needed.

//-----------------------------------------------
// set the errormode so that your script will not fail when one or more of
// the 4 sheets is not found in any particular file
 SET ErrorMode = 0;
OrdersFileData:
 LOAD [CustomerID]  as [Customer ID],
 [OrderID           as [Order Number],
 [OrderDate]        as [Order Date],
 [ShipDate]         as [Ship Date],
 [Notes]            as [Order Notes],
 [Turn around days] as [Turnaround Days],
 'WEST'             as [Division] //identify region/division on all records
 FROM [..\Data\Text Files\*Orders*.xlsx] //wildcard allows load from all
                                         //xlsx files with “Orders” in the name
 (ooxml, embedded labels, table is WEST);  //load from the West sheet
CONCATENATE (OrdersFileData) //append data from Midwest sheet from all files
 LOAD [CustomerID]  as [Customer ID],
 [OrderID]          as [Order Number],
 [OrderDate]        as [Order Date],
 [ShipDate]         as [Ship Date],
 [Notes]            as [Order Notes],
 [Turn around days] as [Turnaround Days],
 'MIDWEST'          as [Division]
 FROM [..\Data\Text Files\*Orders*.xlsx]
 (ooxml, embedded labels, table is MIDWEST);
CONCATENATE (OrdersFileData) //append data from Northeast sheet from all files
 LOAD [CustomerID] as [Customer ID],
 [OrderID] as [Order Number],
 [OrderDate] as [Order Date],
 [ShipDate] as [Ship Date],
 [Notes] as [Order Notes],
 [Turn around days] as [Turnaround Days],
 'NORTHEAST' as [Division]
 FROM [..\Data\Text Files\*Orders*.xlsx]
 (ooxml, embedded labels, table is NORTHEAST);
CONCATENATE (OrdersFileData) //append data from South sheet from all files
 LOAD [CustomerID] as [Customer ID],
 [OrderID] as [Order Number],
 [OrderDate] as [Order Date],
 [ShipDate] as [Ship Date],
 [Notes] as [Order Notes],
 [Turn around days] as [Turnaround Days],
 'SOUTH' as [Division]
 FROM [..\Data\Text Files\*Orders*.xlsx]
 (ooxml, embedded labels, table is SOUTH);

STORE OrdersFileData into ..\Data\QVDs\OrdersData.QVD; // if loading to QVD 
DROP Table OrdersFileData; //if loading to QVD and not needed in memory
//-----------------------------------------------
Advertisements

Oracle Business Intelligence Applications (OBIA) Fact Tables

Dimensionally modeled (star-schema designed) data warehouses are primarily made up of two types of tables – Fact and Dimension.  Fact tables store the measurements generated by business events (# of orders, amount of dollars, etc.); and Dimension tables store the descriptive attributes that provide context to the measurements (product [product name], customer [customer type], date, etc.).

This post describes the types of Fact tables found in Oracle Business Intelligence Applications (OBIA) data warehouse – Oracle Business Analytics Warehouse (OBAW).  There will be future posts that describe in detail the other table types in OBIA (Dimension, Internal, etc.).

The 5 types of Fact tables used in the OBAW are:

  1. Transactional
  2. Aggregate
  3. Cycle Lines
  4. Snapshot
  5. State Transition.

The Transactional Fact Table is the main type of fact table. It stores the lowest-level of information from transactional sources. An example of a Fact table in OBIA (Financial Analytics) is: W_GL_BALANCE_F
Note: Fact tables in OBIA end with “_F”.
This table stores the current balance for GL accounts by GL_ACCOUNT and other dimensions.

The Aggregate Fact Table is typically used for performance improvements.  It is a summarized or rolled-up version of the Transactional fact table.  Instead of querying the data at the transactional level – which is the most detailed level and the level with the most records, the Aggregate table allows you to query the data at a more rolled up level when appropriate.  One of the most frequent roll-ups is time – for example, a transactional table at a day level is rolled up to the month level.
Aggregate tables can be tens of times less (or even hundreds) than their transactional versions.  These types of tables are also very common in OBIA and in data warehousing in general.

An example of an Aggregate Fact Table in OBIA (Financial Analytics) is: W_GL_BALANCE_A
Note: Aggregate Fact tables in OBIA end with “_A”.
This table stores the GL account balances aggregated by GL Account Segment and other dimensions. Instead of having data at the GL_ACCOUNT level as in the Transactional fact table, the data is at the GL Account Segment level in the Aggregate table.  Aggregate Fact tables are derived from Transactional Fact  Tables or other Aggregate Fact tables. This table is derived from the transactional fact table mentioned above: W_GL_BALANCE_F.

The Snapshot Fact Table stores “snapshots” of measurements taken at well-defined, predetermined time intervals – such as daily, monthly, annually, etc.  Examples include Inventory and Account Balance snapshots, and AR/AP aging snapshots.  Common items such as financial reports or bank statements are examples of reports from Snapshot Fact tables.

An example of a Snapshot table in OBIA(Supply Chain Analytics) is: W_INVENTORY_DAILY_BAL_F
Oracle’s description of this table will help to clarify its makeup and purpose.
The W_INVENTORY_DAILY_BAL_F fact table is used to represent at a point in time information of all inventory balances and inventory values related to products whose inventory is maintained by the business organization, these would typically include all inbound (purchased from external entities) products as well as outbound (sold to external entities) products. The inventory balance information is trended by copying historical snapshot information from this table at periodic points in time into history table W_INVENTORY_MONTHLY_BAL_F.
The W_INVENTORY_MONTHLY_BAL_F table stores a snapshot of inventory balance.
There is one row for each product and product storage location whose point in time inventory quantity and value information is maintained. The storage location could represent a warehouse or further divisions within a warehouse. This aspect is configurable within the product. All the dimension key links to the other Oracle Business Analytics Warehouse dimension tables, such as W_DAY_D, W_BUSN_LOC_D, W_PRODUCT_D, W_INVENTORY_PRODUCT_D, and so on, represent information associations at that point in time for that product inventory information. The DATE_WID column represents the date on which the inventory balance information is valid.

These tables can also have Aggregate versions:
As mentioned in the description for the W_INVENTORY_DAILY_BAL_F table above, there is an aggregate version.  However, snapshot tables are not necessarily aggregated like transactional tables, because many times the measures are non-additive or semi-additive. For example, you would not take your account monthly balance in January and add it to your account monthly balance in February to determine how much money you have – that would be wrong.

The W_INVENTORY_MONTHLY_BAL_F fact table is used to represent the monthly information of all the inventory balances and the inventory values related to products whose inventory is maintained by the business organization. This information includes all inbound (purchased from external entities) products and outbound (sold to external entities) products. The aggregation period is configurable, and has a preconfigured value of Monthly.
There is one row for each product and product storage location whose point in time (as of a month) inventory quantity and value information is maintained. All the dimension key links to the other Oracle Business Analytics Warehouse dimension tables such as W_DAY_D, W_BUSN_LOC_D, W_PRODUCT_D, W_INVENTORY_PRODUCT_D, and so on, and represents information and associations at that point in time for that product inventory information. The PERIOD_START_DT_WID and PERIOD_END_DT_WID column represents the aggregation bucket start and end dates. The column INV_BALANCE_DT_WID represents the date within this aggregation period on which the inventory balance information is valid.

The Cycle Lines Fact Table store measurements for multiple related business events and are therefore typically derived from multiple fact tables. They typically store process cycle times or provide the ability to easily determine process cycle times.  These tables are also called Accumulating Snapshot Fact tables because they are snapshots of different events accumulated on each other.  An example of a Cycle Lines Fact table is W_PURCH_CYCLE_LINE_F.

Here is Oracle’s description of the table which should help clarify its purpose: W_PURCH_CYCLE_LINE_F table tracks the time duration of all events pertaining to the purchase process commencing with a requisition. Information in this table enables analysis of the direct spend process within an organization beginning with a purchase requisition, its approval, the creation of an approved purchase order, its submission to a supplier, the creation of a purchase schedule and ending with its receipt of the products. It can be used to calculate the time taken to receive products that have been ordered, the time between the first receipt and last receipt of products that have scheduled for delivery. The W_PURCH_CYCLE_LINE_F table contains all the various dates associated with the processes such as submission, approval, ordering and receiving as well as quantities and amounts. While Other spend related fact tables capture individual process such as requesting, ordering, scheduling this table combines all the in one place for ease of analysis and reporting.

These Cycle Lines tables can also have aggregate versions. For example: W_PURCH_CYCLE_LINE_A This is an aggregate table of W_PURCH_CYCLE_LINE_F at a higher level of dimensionality. The Product dimension is replaced by a Product type dimension to give a high level analysis of the sourcing data. It stores Purchase Cycle Line records aggregated over a preconfigured Monthly time period and product types.

State Transition Fact Tables store state-transition metrics based on business events, such as customer state – new, top, dormant, lost, etc – based on the customer order activity.  These tables store or allow you to easily derived counts of the various states.  State Transition Fact tables are derived from Transactional or Snapshot fact tables.

Below are two examples of State Transition Fact tables in OBIA (Marketing Analytics):

The Customer Status History Fact: W_CUSTOMER_STATUS_HIST_F
This is a fact table that tracks the status of customers based on the frequency of orders they place with the organization. Possible statuses are NEW, RECENT, DORMANT and LOST. The time duration for each status bucket is configurable, out of the box being a calendar year.
The grain of this table is at a Customer, Customer Status and the Status Start Date level. Other important columns in this table include the Sold to and the Ship to location for the customer. These are derived based on the status bucket start date against the Customer Locations dimension table.

The Loyalty Member Status History Fact: W_LOY_MEMBER_STATUS_HIST_F
W_LOY_MEMBER_STATUS_HIST_F Fact table stores status changes of Loyalty members. Grain: One record for each member status changed.

That’s it for OBIA fact tables.  Understanding the types of fact tables and their purpose helps us to make better design choices when we set out to build new fact tables to represent business events, and it also helps us to quicker recognize and better analyze the data in these tables.
I hope you found this information useful. If you have information about other fact table types, please share.

Informatica Unconnected Transformations

There are 3 Informatica transformations (External Procedure, Lookup, and Stored Procedure) that can be unconnected in a valid mapping.  An unconnected transformation is one that is not connected to the pipeline – that is, it is not connected to any other transformation via a link.  Unconnected transformations, especially unconnected lookup transformations, are widely in OBIA mappings.

Unconnected transformations are defined in a mapping and are called / invoked from another transformation in the mapping.  They can be called from any transformation that supports the transformation expression language.

Below is a list of the 3 Informatica Unconnected Transformations and how they are called or invoked from another transformation:

 Transformation  How its called / invoked
 External Procedure  :EXT.external_procedure_transformation(argument1, argument2, …)
 Lookup  :LKP.lookup_transformation(argument1, argument2, …)
 Stored Procedure  :SP.stored_procedure_transformation( argument1, argument2, [, PROC_RESULT])

Any mapping with an unconnected transformation must also include another transformation that calls the unconnected transformation.

All of the 3 Unconnected Transformations can also be used in connected mode.  However, when you want to execute the transformation logic for a subset of the rows passing through the mapping (especially when it’s a small percentage of the rows), you will want to consider using the transformations in unconnected mode and with conditional logic.  The conditional logic will cause the unconnected transformation to execute only when necessary, and therefore, may improve perfiormance.

The example below shows a call to an unconnected lookup transformation (lkp_item_id) that is invoked only when the desired item_id value is NULL.  In a case where only a small percentage of the rows have an item_id that is NULL, then the lookup would only be invoked for a small number of rows, as opposed to all rows if the transformation was connected.

                       IIF  ( ISNULL(item_id),   :LKP.lkp_item_id ( item_name ),   item_id )

As a side note, the Sequence Generator transformation can also be invoked from a function call, but it is a connected transformation.

 Transformation  How its called / invoked
 Sequence Generator  :SEQ.sequence_generator_transformation.CURRVAL

Oracle positioned as a leader in the Gartner Magic Quadrant for Data Integration tools

Oracle, along with Informatica, IBM, SAP and SAS, are positioned as leaders in the Gartner Magic Quadrant for Data Integration tools.

Other data integration vendors that made it into the Magic Quadrant, but not in the leaders category are Microsoft, Information Builders, Syncsort, Talend, Pervasive Software.

GartnerMagicQuadrant_for_DataIntegrationTools

Source: Gartner (October 2012)

When performing its analysis, Gartner considers these areas of data integration:

  • Data acquisition for business intelligence (BI) and data warehousing
  • Consolidation and delivery of master data in support of master data management (MDM)
  • Data migrations/conversions
  • Synchronization of data between operational applications
  • Interenterprise data sharing
  • Delivery of data services in an SOA context

And analyzes these features and functionality that the data integration tools should provide:

  • Connectivity/adapter capabilities (data source and target support)
  • Support for different modes of interaction with a range of data structure types
  • Data delivery capabilities
  • Support for the delivery of data across a range of latency requirements
  • Data transformation capabilities
  • Provide facilities for developing custom transformations and extending packaged transformations
  • Metadata and data modeling capabilities
  • Design and development environment capabilities
  • Data governance support capabilities (via interoperation with data quality, profiling and mining capabilities)
  • Deployment options and runtime platform capabilities
  • Operations and administration capabilities
  • Architecture and integration capabilities
  • Service enablement capabilities

Per Gartner, Oracle’s strengths are:

  • Breadth of functionality,
  • Usability of core functionality across use cases
  • Addressing data challenges across a range of application- and data-oriented customer bases

And their cautions are:

  • Enabling product migration,
  • Complexity of integrated deployment across products,
  • Pricing perception and availability of skills

As Oracle-centric developers and solution providers, we are most concerned with the capabilities and future of Oracle Data Integrator (ODI), Oracle Warehouse Builder (OWB), Golden Gate and, in some cases, Informatica (a part of most OBIA installations). These products are doing well in the market, but are a bit behind the install bases of Microsoft, SAP and IBM.  The largest install base is with Microsoft (12K customers), followed by SAP (10K) and IBM (9.4K).  Oracle has 3.5K and Informatica has 5K data integration customers.  With a major push now behind the ODI tool, you can expect to see growth in Oracle’s numbers in the coming years.

Apparently, Oracle also has plans for providing a migration path (a migration wizard) from Oracle Warehouse Builder (OWB) to ODI.  This will be a very welcomed tool for OWB users seeking to migrate to ODI.  And this will further solidify ODI as the data integration tool for Oracle-centric IT organizations, and start to create more availability of skilled ODI resources.

You can read all the details at Gartner Magic Quadrant for Data Integration Tools.