Benny Austin

Power Query: Import Data from Web Pages

Power Query can discover and import data from websites. Often data warehouses rely on external data which is readily available in public websites for e.g. public holidays, school holidays, SIC codes, SWIFT codes, post codes etc. Power Query is perfectly suitable for such situations. Power Query can discover, fetch and transform data from a HTML table in a web page into a format that can be easily imported into data warehouses using SSIS package. It’s like an ETL tool for the web page data source.

Featured · Leave a comment

Azure: Delete HDInsight Cluster, yet Retain Data

Storage cluster (HDFS) in Hadoop is also the Processing cluster (MapReduce). Azure provides two different options to store data:
Option 1: Use HDInsight cluster to store data as well as to process MapReduce requests. For e.g. a Hive database hosted in an HDInsight cluster which also executes HiveQL MapReduce queries. In this instance data is stored in the cluster’s HDFS.
Option 2: Use HDInsight cluster to only process MapReduce requests whereas data is stored in Azure blob storage. For e.g. the Hive data is stored in Azure storage while the HDInsight cluster executes HiveQL MapReduce queries. Here the metadata of Hive database is stored in the cluster whereas the actual data is stored in Azure storage. The HDInsight cluster is co-located in the same datacentre as the Azure storage and connected by high speed network.

There are several advantage of using Azure storage. Provisioning the HDInsight cluster on demand while retaining Azure storage is cost effective especially when there is spending limits on your Azure subscription.

Featured · Leave a comment

HDInsight: Analysing Twitter Hashtags using Hive

Hive implements MapReduce using HiveQL. The built-in capabilities of HiveQL abstracts the implementation of mappers and reducers with a simple yet powerful SQL like query language. To demonstrate the inbuilt capabilities of HiveQL, I will be analysing hashtags from a twitter feed on Azure HDInsight platform.

Featured · Leave a comment

SSIS: Creating Dynamic Data-driven SSAS Partitions

Partitions improve the query response of SSAS cube by narrowing the MDX query to a pre-defined subset of data. Usually cubes are developed with few basic partitions to begin with and new partitions are added over time. Partitions are created using XMLA commands. The rest of this post explains the steps to create partitions using metadata and SSIS package.

Featured · 4 Comments

SSIS: Dynamic Processing of SSAS Partitions

SSAS Partitions are primarily geared towards better query performance by narrowing the MDX query to a pre-defined subset of data. Each partition must be in processed state for the cube to be available for browsing. However in a delta ETL run, it is more than likely that only some partitions would be refreshed with data while the rest remain unchanged. For example in a hospital admissions cube partitioned by month, data movements are frequent for admissions made during the current/last month compared to earlier months. This means if the partitions updated by the ETL run can be identified, then we can tailor SSIS package to dynamically process only those partitions that require data refresh while leaving the remaining partitions untouched.

Featured · 1 Comment

The End Game is Not Just Canned Reports

Canned reports/dashboards aka custom reports or out of box reports are inevitable part of any BI ecosystem. It’s a good point of reference for conversations with business users especially at the start of a project when requirements are often a bit hazy. In many instances, business users see report layouts as an effective and convenient medium of communication with BI project teams. This however should not be considered as THE business requirement. Instead of dwelling on canned reports, the focus should be on building a well-designed DW/BI system that empowers users to perform analysis on a self-service basis. The quality and the range of analysis that a self-service BI platform provides far outweighs the benefits of just rolling out a set of canned reports. It empowers business users to do their own analysis while the BI team can work towards expanding and refining the analytical platform instead of operating as a “reporting team”. A well-designed DW/BI will satisfy most of the analytical requirements of business which includes but not restricted to canned reports. Canned reports are a given, not the end game to any BI strategy. Reporting tools available today are incredibly easy to use. A power user with a little bit of training can build reports and dashboards with reasonable ease while BI teams can focus on expanding and enriching the self-service BI Platform.

Featured · Leave a comment

SSIS Package to Process SSAS Cube

The execution plan of SSAS processing engine triggers a sequence of processing tasks based on the object’s current processed state and its dependencies with other objects. This is what happens when the SSAS cube is processed from SSMS. It is possible to replicate this execution plan close enough in an SSIS package so that the cube is always processed and refreshed with latest data.

Featured · 10 Comments

Benefits of Creating SSAS Cube from Database Views

While SSAS allows you to build a cube straight of the fact and dimension tables, however there are benefits in creating the cube from database views that sits on top of the underlying fact and dimension tables. Here’s why.

Featured · 5 Comments

Dimensional Modeling: Banding

It’s quite common to analyse a measure by categorizing it into different buckets or bands. For example to list the customers who are due for payment in the next week or fortnight, to decide which income group to target for a particular product campaign, which age group of students are most likely to buy a smartphone and so on. These bands are usually not supplied by source systems but derived from the underlying measure and they exist only for analytic purposes. While this banding could be achieved in reports or dashboards, there is significant value to model them as dimensions especially to promote self-service BI.

Featured · 1 Comment

Dimensional Modelling: Excluding Unused Dimension Attribute Members

In a self-service BI platform, every member of a dimension attribute is available for browsing irrespective of whether the member has an associated fact or not. Though the unused members disappear when browsed alongside facts, they are still undesirable when used as filters. This is especially true with role playing dimensions which cover a broad range of data to cover every scenario. No one wants a date filter starting at 1900-01-01 and going all the way to 9999-12-31. This post explains different options to exclude the unused dimension attribute members.

Featured · 2 Comments

SSRS: Parameter Expression for MDX Datasets Having Role Playing Dimension

In SSRS report, when a report parameter is role playing dimension then the parameter value passed to each dataset has to be customized depending on the context in which the dimension is being used. In an earlier post, I explained how this is done in PerformancePoint dashboard. In this post an equivalent solution for reporting services is provided.

Featured · 4 Comments

PerformancePoint: Using Connection Formula to Pass Filter Values to Report Having Role Playing Dimension

Date dimension is inevitably used in most dashboards. Due to its role playing nature, a specific date dimension used as slicer in one report is different from the date dimension used for another report on the same dashboard page. For example sales reports are based on order date whereas delivery reports are based on delivery dates. On a dashboard that displays both sales and delivery reports, it would not be good for navigation purposes to have both order date and deliver date filters on the same page. The choice of which date dimension to use as filter among the available dates is a tactical decision. Having chosen a particular date dimension, the challenge is to provide filter value to those reports that use a different date dimension. This is where PerformancePoint filter Connection Formula comes into play as explained in this post

Featured · 7 Comments

SharePoint: MDX Expressions for PerformancePoint Filters

Dashboard filters is one of the key components of BI Platform that drive reports and KPI’s in various dashboards. MDX expressions to a great extent enables to customize PerformancePoint filters and enhance the end user experience.

Featured · 7 Comments

SSAS: Using MDX to Calculate Snapshot Measures

In a data warehouse, Periodic Snapshot table is a type of fact table that “freezes” the measure value at recurring reporting periods. This enables reporting of measures AS AT a point in time. Periodic Snapshots are usually created from Transaction Fact tables that support reporting FOR the period. Using MDX it is possible to create snapshot values for measures from transaction fact without creating Periodic Snapshot tables.

Featured · 5 Comments

Dimension Modeling: The Different Types of Unknown SK

Dimension Tables usually have a member with Surrogate Key (SK) value of -1 to handle unknown members. Unknown SK is usually set when SSIS look-up on a dimension table fails to return a match based on a business/natural key. Given that several factors contribute to unknown SKs, it would be beneficial to customize the unknown SK into different types rather than clubbing into a generic -1 value

Featured · 3 Comments

Deploying PerformancePoint Content to another SharePoint Site

PerformancePoint Dashboard Designer is usually synchronized with the SharePoint Site from where it was first downloaded. The PerformancePoint Content – Data Connections, Filters, Reports, Scorecards, KPIs and Dashboards from the workspace gets deployed to this SharePoint Site. In most implementations often there is a need to deploy the PerformancePoint Content to a different Business Intelligence SharePoint Site. For instance, deploy dashboards from test to production environment.If you need to deploy your PerformancePoint content to a different SharePoint Site, follow the steps in this post.

Featured · Leave a comment

SSAS: MDX Calculated Measures that Require Date Comparison

Often there is requirement to calculate measures based on two different date dimensions. For e.g. how many orders that were delivered this month was actually ordered last month? How many orders were delivered in the same month as they were ordered? How many orders were carried forward to next month?. Using MDX set functions EXISTS(), FILTER() and Range operator these calculations can be easily achieved provided the Date Dimension conform to same key columns.

Featured · Leave a comment

SSAS: Membership in Multiple Roles Could Throw Unexpected Surprises

Analysis Service supports role based model to secure cube data. Data can be secured at object level, cell level (Cell Data Access) and at member level (Dimension Data Access). A user can be a member of more than one role and it is important to understand the behaviour of Analysis Services role permissions in such a scenario. Analysis Services role permissions are additive in nature. This means if you are a member of a role that has been granted access to an Analysis Service object; you have access to that object even if you belong to another role which denies access to the same object. In other word the least restrictive roles permissions override the most restrictive role permissions. This could throw some nasty surprises that you never expected.

Featured · 17 Comments

PowerPivot: DAX Expression for Type 2 SCD Lookup

The greatest strength of PowerPivot is the ability to create relationship with different sources of data and piggyback on this relationship for various kinds of analysis. PowerPivot relationship requires one-to-many relationship between the parent and child tables. But what happens if the relationship between parent and child tables is many-to-many instead? For e.g. you might want to establish a relationship between a fact table that has a Business Key to the dimension table and that dimension table maintains history. In this post I explain how to establish a relationship between two tables that do not exhibit one-to-many relationship on their Business Key using a technique that is similar to an ETL Type 2 SCD lookup using Data Analysis Expressions (DAX).

Featured · 2 Comments

Alternatives to SSIS SCD Wizard Component

SSIS comes with an out-of-box SCD Wizard to handle Type 1 and Type 2 Slowly Changing Dimensions (SCD) which is a fundamental ETL requirement. However the SCD wizard component has some serious drawbacks – both from operational and functional perspectives that make it unusable for practical purposes. Several workarounds have evolved over time and in this post I would like explore the different alternative options to handle Type 1 and Type 2 SCD without using the out-of-box SCD Wizard Component.

Featured · 26 Comments

SSRS: Why does SSAS Member UniqueName Return Nothing?

If your SSRS report is using SSAS MDX datasets, the member uniquename is particularly useful to pass parameter values to drill-through MDX reports or for sorting based on key values. If Fields!Attribute.UniqueName is returning nothing or blank or null value in your report , then most likely you are missing the DIMENSION PROPERTIES clause in your MDX dataset query

July 24, 2013 · 2 Comments

Copyright

© Benny Austin.
Follow

Get every new post delivered to your Inbox.

Join 198 other followers