The Key to Big Data Modeling: Collaboration
Wednesday, August 26, 2015
11:00am Pacific / 1:00pm Central / 2:00pm Eastern
Some claim that, in the age of Big Data, data modeling is less important or even not needed. However, with the increased complexity of the data landscape, it is actually more important to incorporate data modeling in order to understand the nature of the data and how they are interrelated. In order to do this effectively, the way that we do data modeling needs to adapt to this complex environment.
One of the key data modeling issues is how to foster collaboration between new groups, such as data scientists, and traditional data management groups. There are often different paradigms, and yet it is critical to have a common understanding of data and semantics between different parts of an organization. In this presentation, Len Silverston will discuss:
- How Big Data has changed our landscape and affected data modeling
- How to conduct data modeling in a more ‘agile’ way for Big Data environments
- How we can collaborate effectively within an organization, even with differing perspectives
Don't miss out. Register now!
Below is the slide presentation from our recent webinar Breaking Down Business Barriers with Enterprise Architecture.
See the companion webinar at: http://forms.embarcadero.com/Breaking-Down-Business-Barriers
Learn more about ER/Studio at http://www.embarcadero.com/products/er-studio
As a data professional, you know how important it is to have good data models and to ensure that your extended team has access to them. Many companies are suffering from silos that keep valuable data from being used effectively because business users may not have the right access or understanding of the data they are using to make decisions. When corporate data has business context with centralized communication and collaboration, the overall integrity and visibility of the data can be improved.
To create a business-driven data architecture, you need an enterprise data environment that enables both business stakeholders and IT users to access and collaborate on key models and metadata, at the right levels for their needs. Embarcadero offers the ER/Studio Enterprise Team Edition with the Team Server web portal to help you break down the barriers between business and IT users in your organization. Join this webinar to see how the ER/Studio Team Server works with the Enterprise Team Edition to extend the value of data in your organization, with capabilities including:
- Business glossaries to specify the terms and definitions for metadata
- Discussion and activity streams to track requests and actions for tasks
- Permission structures to give users and groups the right level of access
About the Presenter:
Josh Buckner is the ER/Studio Team Server Solutions Expert for Embarcadero. He helps customers understand the benefits of Team Server and works with them to implement it effectively in their organizations.
Top 6 Essential Data Architecture Insights eBook
Whether you’re working with relational data, schema-less (NoSQL) data, or model metadata, you face challenges in providing the right access and context for your enterprise data assets. How do you effectively integrate data from multiple sources and make it useful and usable across your organization?
You need a data architecture that can actively leverage information assets for business value. The most valuable data has high quality, business context, and visibility across the organization. Check out this must-read eBook for essential insights on important data architecture topics, including:
The Quest for the ‘Golden Record’
Thursday, May 14, 2015
11:00am Pacific / 1:00pm Central / 2:00pm Eastern
Although master data management (MDM) systems have been deployed in numerous industries and organizations, the vision of creating an overall “single source of truth” is beginning to yield to a more pragmatic perspective of providing visibility to shared information about uniquely-identifiable entities within the enterprise. This more mature approach sheds light on some of the potential gaps associated with the typical out-of-the-box data models for customer or product.
In this webinar, David Loshin will address data modeling for MDM systems, and share insights about:
- Some of the complexities emerging from reliance on canned master data models
- Alternatives for revising how master data entities are viewed and consumed within the enterprise
- How a consumption-oriented engagement process will help the master data modeler devise thoughtful conceptual and logical representations of shared master data
He will also discuss how these different ways of looking at master data modeling will help reduce complexity for master data adoption, system interoperability, and legacy migration.
About the presenter:
David Loshin, president of Knowledge Integrity, Inc, (www.knowledge-integrity.com), is a recognized thought leader and expert consultant in the areas of analytics, big data, data governance, data quality, master data management, and business intelligence. Along with consulting on numerous data management projects over the past 15 years, David is also a prolific author regarding business intelligence best practices with numerous books and papers on data management, including the second edition of “Business Intelligence – The Savvy Manager’s Guide.”
If you don’t get the data right, nothing else matters. However, the business focus on applications often overshadows the priority for a well-organized database design. Addressing some simple data modeling and design fundamentals can put you on the right path. Here are seven common database design “sins” that can be easily avoided and ways to correct them in future projects.
Sin #1: Poor or missing documentation for database(s) in production
Documentation for databases usually falls into three categories: incomplete, inaccurate, or none at all. This causes developers, DBAs, architects, and business analysts to scramble to get on the same page. They are left up to their own imagination to interpret the meaning and usage of the data.
The best approach is to place the data models into a central repository and spawn automated reports so that with minimal effort, everyone benefits. Producing a central store of models is only half the battle, though. Once that is done, executing validation and quality metrics will enhance the quality of the models over time. As your level of data management increases, you can extend what metadata is captured in the models.
Sin #2: Little or no normalization
There are times to denormalize a database structure to achieve optimized performance, but sacrificing flexibility will paint you in a corner. Despite the long-held belief by developers, one table to store everything is not always optimal. Another common mistake is repeating values stored in a table. This can greatly decrease flexibility and increase difficulty when updating the data.
Understanding even the basics of normalization adds flexibility to a design while reducing redundant data. The first three levels of normalization are usually sufficient for most cases:
· First Normal Form: Eliminate duplicate columns and repeating values in columns
· Second Normal Form: Remove redundant data that apply to multiple columns
· Third Normal Form: Each column of a table should be dependent on the primary identifier
Sin #3: Not treating the data model like a living, breathing organism
There are numerous examples of customers performing the modeling up front, but once the design is in production, all modeling ceases.
To maintain flexibility and ensure consistency when the database changes, those modifications need to find their way back to the model.
Sin #4: Improper storage of reference data
There are two main problems with reference data. It is either stored in many places or, even worse, embedded in the application code.
Reference values provide valuable documentation which should be communicated in an appropriate location. Your best chance is often via the model. The key is to have it defined in one place and used in other places.
Sin #5: Not using foreign keys or check constraints
Customers complain all the time about the lack of referential integrity (RI) or validation checks defined in the database when reverse engineering databases. For older database systems, it was thought that foreign keys and check constraints slowed performance, thus, the RI and checks should be done via the application.
If it is possible to validate the data in the database, you should do it there. Error handling will be drastically simplified and data quality will increase as a result.
Sin #6: Not using domains and naming standards
Domains and naming standards are probably two of the most important things you can incorporate into your modeling practices. Domains allow you to create reusable attributes so that the same attributes are not created in different places with different properties. Naming standards allow you to clearly identify those attributes consistently.
Sin #7: Not choosing primary keys properly
The simplest principle to remember when picking a primary key is SUM: Static, Unique, Minimal. It is not necessary to delve into the whole natural vs. surrogate key debate; however, it is important to know that although surrogate keys may uniquely identify the record, they do not always uniquely identify the data. There is a time and a place for both, and you can always create an alternate key for natural keys if a surrogate is used as the primary key.
Hopefully this information has helped you to evaluate your data management habits and assess your current database structure. It may be a multi-step process to define your strategy to eliminate all of these issues, but it is important to have a plan to get there. Embarcadero can help you to address these concerns; contact us to learn more. You can also read the white paper on this topic: http://forms.embarcadero.com/7_deadlysins_datamodeling.