Tuesday, March 24, 2015

Behind the scenes of the Oracle BI Cloud Service Data Loader #BICS #OBIEE

Oracle's Business Intelligence Cloud Service (BICS) includes a Data Loader application which can be used to import data into its associated Oracle Cloud Database instance using either cut-and-paste or uploading CSV, TXT, ZIP, or Microsoft Excel files.   Note: For ZIP files must contain CSV or TXT files and only the first non-hidden file will be processed by the Data Loader.

The process of loading data into BICS has been well covered by other OBIEE bloggers like:

Mark Rittman: Introduction to Oracle BI Cloud Service : Provisioning Data

and

Kevin McGinley: Oracle BICS Day-In-The-Life (DITL)


The purpose of this post is to look "behind the scenes" into the metadata behind the Data Loader application in BICS.    

Data Loader Background


The Data Loader Home Page

Within the BICS Data Loader Home page, a list will be displayed showing a summary of data load activities.   





  • Date:
    The date / timestamp of when the load was performed.
  • User:
    The application user who performed the data load
  • Destination Table:
    The name of the table in the associated Oracle Cloud database into which the load was performed
  • Inserted Records:
    The count of records which were successfully inserted into the target table. 
  • Updated Records:
    The count of records which were successfully updated in the target table.
  • Not Loaded Records:
    The count of records which had errors while performing the load. This includes records which had errors while inserting and updating.
  • Processed Records:
    The count number of records processed. This includes inserted, updated and not loaded records.

The Load Details Page

The summary load page for the Data Loader includes links on each load history row that allow for displaying the load details by clicking on either the InsertedUpdated,Not Loaded or Processed Records columns on any record in the display.  

The detail report shows all records loaded in the selected row. The report includes options for filtering the loaded records in the following ways:

All records: 
Shows all the records in the original source.  This includes both records that were successfully loaded and non-loaded records.
Inserted Records:
Shows all records which were inserted successfully into the load's destination table. 
    Updated Records:
    Shows all records which were successfully updated in the load's destination table. 
      Errors:
      Shows all records which had errors while inserting or updating.


      Digging Deeper


      Now we are left to consider three questions:

      1) Where is the information about the data loads stored?

      2) Do we really want to keep duplicate copies of every record loaded by the data load in our database?

      3) If the answer to #2 is "No", how do we go about cleaning up the metadata?


      Where is the information about the data loads stored?


      Examining the Oracle Cloud Database instance that is associated with the BICS instance, we find a set of tables that appear to be where the data loader stores the load summary and detail information.

      Using the Object Browser in Oracle Application Express (APEX), we see three tables named EBA_ADL_LOADS, EBA_ADL_LOAD_MAPPINGS, and EBA_ADL_LOAD_RECORDS.





      Clicking on EBA_ADL_LOADS and then selecting the Model tab from the view shows the relationship between the three tables.   The EBA_ADL_LOADS table is the master table while the other two tables contain the details about each load.  



      EBA_ADL_LOADS contains one row for each attempted data load.   It stores who did the load, when the load was performed, and what table was loaded.





      EBA_ADL_LOAD_MAPPINGS contains one row for every column loaded in each instance of a data load.  The table has a foreign key reference between EBA_DATA_LOAD_ID and the EBA_ADL_LOADS.ID column.






      Most of the columns are self-explanatory except for these two:

      UPDATE_ROW_ON_MATCH is a flag that determines whether to use the column as a key to identify rows that will be updated versus inserted during a load.

      DATA_TRANSFORMATION is a numeric code that relates to which optional basic transformation rule was selected for the column for the data load.   

      • 1 = Uppercase
      • 2 = Lowercase
      • 3 = Trim Spaces
      • 4 = Format as number



      EBA_ADL_LOAD_RECORDS contains one row per row inserted by each data load.   

      Let's repeat that statement just to be clear: EBA_ADL_LOAD_RECORDS contains one row per row inserted by each data load.

      This might be a cause for concern over time as data is loaded into the Oracle Database Cloud instance.

      The structure of the EBA_ADL_LOAD_RECORDS includes a primary key ID column, an ACTION column showing whether the row was inserted or updated, an ERROR column containing the error message that may have occurred when inserting the row, a SEQ_ID containing the sequential row number from the data loaded.  The table also has a foreign key reference between EBA_DATA_LOAD_ID and the EBA_ADL_LOADS.ID column.

      In addition, there are 50 columns prefixed with C00% that map to the individual columns in the EBA_ADL_LOAD_MAPPINGS columns.   This is why a data load using the BICS Data Loader tool is limited to 50 columns.   









      The combination of the three EBA_ADL_% tables forms the metadata that controls the BICS Data Load process.


      Do we really want to keep duplicate copies of every record loaded by the BICS Data Loader in our database?

      In most cases, the answer would presumably be "NO".  The Oracle Cloud Database instance that is provided with BICS is limited to 50 Gb of space.  Retaining duplicate copies of every row from every data load would reduce the actual available space for analysis by half or more.

      It is important to note that the EBA_ADL_% tables are only utilized when the data is loaded via the BICS loader.   Loading data via other methods like Oracle APEX Data Workshop, Oracle SQL Developer carts, or REST API services will not cause information to be loaded into those tables.   

      How do we go about cleaning up the metadata?

      There does not appear to be any purge utility for the Data Load tables included within the BICS environment.   Hopefully this will be functionality that Oracle will add at a later point.

      For now, the only way to purge the data appears to be via direct DELETE or TRUNCATE statements executed withing the Oracle Database Cloud APEX environment.   

      For a complete purge of the data load information, the following (unsupported) SQL commands can be executed:

      DELETE FROM EBA_ADL_LOAD_RECORDS;
      DELETE FROM EBA_ADL_LOAD_MAPPINGS;
      DELETE FROM EBA_ADL_LOADS;

      To delete only the data related to a specific load, first identify the ID column from the EBA_ADL_LOADS table of the load to be purged.

      SELECT * FROM EBA_ADL_LOADS;

      DELETE FROM EBA_ADL_LOAD_RECORDS WHERE EBA_ADL_LOAD_ID = value;
      DELETE FROM EBA_ADL_LOAD_MAPPINGS WHERE EBA_ADL_LOAD_ID = value;
      DELETE FROM EBA_ADL_LOADS WHERE ID = value;

      COMMIT; -- If necessary 


      Conclusion

      Hopefully this has shed some additional light onto what occurs behind the scenes during a BICS data load.  And, even more importantly, we can hope that Oracle will provide utilities to manage the data in the underlying load tables.