Header-53.jpg

INSIGHTS

Mike Vogt

Mike Vogt is a Director on NVISIA's data management team.

Recent Posts

SQL Server 2016 Test Drive: Columnstore Indexes, In-Memory Tables and Indexed Views

on 4/5/17 11:43 AM By | Mike Vogt | 0 Comments | SQL Server
Part 1 of 3 Introduction I’ve been getting re-acquainted with SQL Server 2016 after a very long hiatus (read 7.0).  While I’ve used SQL Server on many projects throughout the years, I’ve tended towards Oracle, DB2, and Postgres due to either more advanced features or price (in the case of Postgres).  Recently, I had the opportunity to take a much more in-depth look at SQL Server 2016.  A few capabilities struck me as significant to our clients’ needs:  columnstore indexes, in-memory tables and indexed views (aka materialized views).  In the first of three blog posts, I’ll cover columnar indexes.  I’ll cover in-memory tables in the second installment and indexed views in the last one.
Read More

In-memory NoSQL?

on 5/18/16 1:25 PM By | Mike Vogt | 0 Comments | Data Management
Great post explaining what in-memory NoSQL stores are...  http://hazelcast.org/use-cases/in-memory-nosql/  Although this information is published by Hazelcast, it conveys generally how in-memory data grids (IMDG) work.
Read More

[DAMA Chicago] Ensuring your data lake doesn’t become a data swamp

on 2/18/16 5:02 PM By | Mike Vogt | 1 Comment | Big Data data lake Data Management
Here is my presentation from the 2/17/2016 DAMA Chicago meeting. 
Read More

DAMA Chicago Meeting—February 17th, 2016

on 2/16/16 7:56 PM By | Mike Vogt | 0 Comments | Data Quality Big Data Data Management
MEETING AGENDA 8:30 a.m. Continental Breakfast sponsored by NVISIA 9:00 a.m. DAMA Chicago Business Meeting 9:30 a.m. Dr. Sanjay Shirude, Transforming an Application-based IT Organization into a Datadriven Service Provider 11:30 a.m. Lunch 1:00 p.m. Michael Vogt, Ensuring Your Data Lake Doesn't Become a Data Swamp 2:30 p.m. NVISIA presentation 3:00 p.m. Raffle drawing & Adjournment Location:  nielsen
Read More

DAMA Chicago Meeting—December 9th, 2015

on 11/24/15 2:45 PM By | Mike Vogt | 0 Comments | Data Quality Big Data Data Management
MEETING AGENDA 9:00 a.m. Business Meeting & Announcements 9:30 a.m. Scott Rudenstein – “WANdisco - Data Movement for globally deployed big data Hadoop architectures” 11:15 a.m. “Shaun Malott –Northern Trust - Information Quality Analysis and remediation : A case Study” 12:15 p.m. Lunch 1:45 p.m. Michael G. Miller – The Future of Data Governance 2:45 p.m. Raffle Drawing 3:00 p.m. Adjournment Location:  NORTHERN TRUST 50 South Lasalle Street Cicago,IL 60603 GLOBAL CONFERENCE CENTER– MIAMI ROOM     Morning Presentations Data movement for globally deployed Big Data Hadoop architectures. Speaker: Scott Rudenstein Over the past few years, Hadoop has quickly moved to the production data center for storing and processing Big Data, and is now widely used to support mission-critical applications. One of the challenges to an organization is the adoption of multi-data center Hadoop, where data needs to flow between environments that are in the metropolitan area or thousands of miles apart. The problems related to operating Hadoop across the WAN can be broadly divided into data relevancy, continuous availability, risk reduction and recovery. The challenge we’ll focus on is to keep data flowing consistently in the face of network, hardware, and human failures. Eliminating downtime and data loss is critical for any application having stringent service-level agreements (SLAs) and regulatory compliance mandates associated with it, by providing the lowest-possible recovery point objective (RPO) and recovery time objective (RTO). Bio: Scott Rudenstein has worked in commercial software sales for 18 years and has an extensive background in Application Lifecycle Management and High Performance Computing. Throughout his career in US and UK, he is specialized in replication, where data and environments need high availability, disaster recovery and backup capabilities. As WANdisco's VP of Technical Services, Scott works with partners, prospects and customers to help them understand and evolve the requirements for mission critical/enterprise-ready Hadoop.   Northern Trust - Information Quality Analysis and remediation : A case Study” Speaker: Shaun Malott In this session, Shaun Malott will discuss the strategy and process used to identify and remediate a cross-system information quality issue that was originally selected as low-hanging fruit to demonstrate the power of Northern Trust's information quality tools. This quickly went awry as it unearthed a trove of issues providing valuable lessons learned and leveraging Embarcadero ER/Studio and IBM Infosphere Information Analyzer. Speaker: Shaun Malott Shaun Malott is a Vice President at The Northern Trust Company, Chicago. He serves as a Business Data Architect and a Data Steward for Wealth Management. He is responsible for the data foundation stream of the Partner Platform program including future state data requirements definition and working with technology to define and deliver the data architecture required to meet strategic partner, client and investment platform data quality requirements. Shaun actively participates in DAMA Chicago, DAMA International, and TDWI. He is a member of the Embarcadero ER/Studio Product Advisory Committee. Bio: Shaun Malott is a Vice President at The Northern Trust Company, Chicago. He serves as a Business Data Architect and a Data Steward for Wealth Management. He is responsible for the data foundation stream of the Partner Platform program including future state data requirements definition and working with technology to define and deliver the data architecture required to meet strategic partner, client and investment platform data quality requirements. Shaun actively participates in DAMA Chicago, DAMA International, and TDWI. He is a member of the Embarcadero ER/Studio Product Advisory Committee.
Read More

Data Virtualization tools

on 6/22/15 2:30 PM By | Mike Vogt | 0 Comments | Data Management
Are they right for you?
Read More

2015 Data Landscape

on 6/3/15 11:25 AM By | Mike Vogt | 0 Comments | Data Management
 
Read More

Apache Ignite Coding Examples Webinar by Dmitriy Setrakyan

on 4/15/15 1:59 PM By | Mike Vogt | 0 Comments | Data Management
 
Read More

If Everyone Owns Data, No One Owns It

on 2/25/15 3:59 PM By | Mike Vogt | 0 Comments | Data Management
  When I start with a new client, I almost always ask, "who owns the data?" The range of answers goes from corporate ownership to IT ownership to Data Management ownership to Sales ownership. When I clarify my question as who is accountable (ie head is on a stick) when there is a data related disaster, like a financial misstatement, the sound of people retreating from ownership is deafening -- and one is left. In a lot of organizations, Sales owns the 'core' customer, with Finance owning some specific financial customer attributes.
Read More

Who are you calling a “junk” dimension?

on 1/28/15 11:24 AM By | Mike Vogt | 0 Comments | Data Management
Many times in data warehouse designs, one encounters a bunch of low cardinality (think less than 4 values) attributes (e.g. transactional codes, flags, or text attributes) that are unrelated to any particular dimension. There are a few options to deal with these:      1. Add them to the Fact (very inefficient for storage and performance [if it causes data size to cross pages])      2. Create dimensions for each one (clutters up the model & offers no performance advantage)      3. Create a “junk” dimension to hold these odd assortment of unrelated attributes (keeps the model clean, reduces storage, and provides better performance)  We’ll explore the third option a bit further. So we add all the permutations (ie Cartesian product) of all the junk attributes to the junk dimension.  It is worth noting that these attribute values are fairly static (don't change very often).  Some examples include 'status', yes/no flags, types and categories.  In addition to keeping the model clean, adding new low-cardinality attributes becomes much easier.  Consider the following example: Order_fact ========== order_id date_submitted_key date_fulfilled_key date_delivered_key late_delivery_ind_key partial_fulfillment_ind_key customer_key priority_delivery_ind_key customer_loyalty_catg_key [gold, silver, bronze, none] ... where you have 4 dimensions (late_delivery, partial_fulfillment, priority_delivery, customer_loyalty_catg) with very low cardinality data values I propose a better approach... Order_fact ========== order_id date_submitted_key date_fulfilled_key date_delivered_key customer_key junk_key Junk_dim ============== junk_key  late_delivery_ind                         [yes/no] - 2 values partial_fulfillment_ind         [yes/no] - 2 values priority_delivery_ind                 [yes/no] - 2 values customer_loyalty_catg                 [gold, silver, bronze, none] - 4 values The Cartesian product of values is 32 (2 * 2 * 2 * 4) for the junk dimension. The thinking behind performance gains of a junk dimension relate to the inefficiency of low cardinality dimension joins and the number of low cardinality data sets. First, low cardinality attributes are not properly supported by a normal B tree index. If Bitmap indexes are available in your RDBMS implementation, I would highly suggest you use one here. Low cardinality dimension joins are treated like nested loop joins, which, when combined with other nested loop joins are a performance disaster. Instead a Cartesian product join, utilized by junk dimension joins are far more efficient.  References: Definition of a junk dimension (http://en.wikipedia.org/wiki/Dimension_(data_warehouse)#Junk_dimension) More detailed explanation of Cartesian product joins  (https://analyticsreckoner.wordpress.com/2012/07/24/modelling-tip-how-junk-dimensions-helps-in-dw-performance/) For Additional Blog Post please check out NVISIA's Enterprise Data Management Page 
Read More

[Video] Quick Targeted Analysis using Talend's Open Studio for Data Quality

on 11/21/14 3:51 PM By | Mike Vogt | 0 Comments | Data Quality Data Management
Read More

Data Quality in an Agile Development Sprint

on 11/6/14 6:18 PM By | Mike Vogt | 0 Comments | Data Quality Data Management
 
Read More

Avoiding the Data Iceberg with Quick Targeted Analysis

on 11/6/14 2:34 PM By | Mike Vogt | 0 Comments | Data Quality Data Management
Have you run into a situation where an agile project flopped because bad data affected the successful delivery of your solution?
Read More