Archive | Best Practices RSS feed for this section

Interesting thought process to identify analytical approaches

29 Jan

Courtesy of a colleague in the medical data management world – check out this graphic. It is missing a few approaches, but lays out the thought process well.

Machine Learning - Cheastsheet

The Booz Allen Field Guide to Data Science has a similar linkage that is useful. That book can be downloaded here

While I am at at, I found this good book Managing Research Data by Graham-Pryor that focuses on managing research data. I continue to be surprised at the approaches taken by “traditional” data management folks to feed the analytical processes. The old school way of dealing with analytics data did not work well which has created some of the organizational work arounds that exist in companies. This only gets worse when dealing with large amounts of data, and data that must work across systems / sources.

Interesting observations on the healthcare system implementation

9 Dec

Many thanks to http://www.bespacific.com/ for forwarding this post. I often get funny looks when I tell people that the expected outcome may not be the desired outcome. With ones analyst hat on, it is easy to say this – one has a hypothesis, and one tests it. If the hypothesis proves false, then we have identified a place not to go, or a refinement in thinking. For an analyst “failure” (as defined below) is an option. For program managers, it must be an option, but one that is so hard to manage – generally the inability to address this issue starts at the top, and is framed within the culture of the organization.

As a project manager, one would think it is an option – identified as a “risk” in PMP speak, and addressed and managed as such. We will not know the details of what happened for a while, but the article below sheds some light.

HealthCare.gov and the Gulf Between Planning and Reality
By Irving Wladawsky-Berger
Guest Contributor, WSJ

 It’s way too early to know what really happened with the botched launch of HealthCare.Gov.  We don’t know how it will all play in years to come and what its impact will be on the evolution of the Alternative Care Act, on election results over the next few years, or on President Obama’s legacy. Depending on how it all turns out over time, this will be just a chapter in future books on the history of the ACA and the Obama administration, or the subject of major books and investigative reports.

 Most everyone who’s been involved with the development of complex IT systems knows how wrong things can sometimes go.  So, when serious problems do happen, we are eager to learn the lessons that might help us avoid similar problems in the future. It’s quite possible that HealthCare.gov and the ACA’s overall IT system are such complex outliers–technically, organizationally and politically–that any lessons learned might apply to few other projects.  But, given the increasing complexity of private and public sector IT systems, the lessons are worth thinking about.

 I like the way Clay Shirky, NYU faculty member as well as author and consultant, framed the problem in a very interesting blog, Healthcare.gov and the Gulf Between Planning and Reality.  He writes about the gulf between those charged with planning the overall rollout of the ACA and health care exchanges and the realities of trying to get such a complex system designed, built and launched in a short amount of time. It’s essentially a tale of failure is not an option versus the messy world of highly complex IT systems. While the blog is focused on the launch of HealthCare.gov, it can also be read as a more general discussion of the kinds of problems often encountered with highly, complex IT-based projects when a management decision to win a deal at all costs comes back to haunt the implementation of the project.

 “For the first couple of weeks after the launch, I assumed any difficulties in the Federal insurance market were caused by unexpected early interest, and that once the initial crush ebbed, all would be well,” he writes.  “The sinking feeling that all would not be well started with this disillusioning paragraph about what had happened when a staff member at the Centers for Medicare & Medicaid Services [(CMS)], the department responsible for Healthcare.gov warned about difficulties with the site back in March.”

 The paragraph responsible for Mr. Shirky’s sinking feeling was part of an October 12 NY Times article, From the Start Signs of Trouble at Health Portal. According to the article, the warnings came from CMS deputy CIO Henry Chao, the chief digital architect for the new online insurance marketplace. In response, his superior told him:

 “. . . in effect, that failure was not an option, according to people who have spoken with him. Nor was rolling out the system in stages or on a smaller scale, as companies like Google typically do so that problems can more easily and quietly be fixed. Former government officials say the White House, which was calling the shots, feared that any backtracking would further embolden Republican critics who were trying to repeal the health care law.”

 “The idea that failure is not an option is a fantasy version of how non-engineers should motivate engineers,” adds Mr. Shirky. “Failure is always an option. Engineers work as hard as they do because they understand the risk of failure.” In his opinion, neither technology, talent, budgets or the government’s bureaucratic processes are the main culprits here.  Rather, this is a management and a cultural problem.  As a result of the huge political pressures they were under, top administration officials did not feel that they could seriously address the possibility that things might go wrong.

 Other articles paint a similar picture, such as this recent one in the WSJ’s CIO Journal:

 “It was on a cold, sunny day in Baltimore last January that Curt Kwak, chief information officer of the Washington Health Benefit Exchange, first realized that the signature feature of President Obama’s Affordable Care Act could be in trouble.  That day, at a status review meeting of CIOs of state health exchanges, he learned that many of his peers were far behind where they should have been.  According to Mr. Kwak, several of his peers hadn’t yet selected a systems integrator – tech vendors who play crucial roles in fitting together the multiple components of health insurance exchanges that allow consumers to select and enroll in health plans.”

 Why did the administration, as well as several states, wait so long to start the planning of the ACA system including the health care exchanges?  Ezekiel Emanuel– oncologist, vice provost and professor at the University of Pennsylvania and former White House advisor on health policy–said in a good article on the subject that the administration did not want to release detail regulations and specifications on the exchange while in the middle of the 2012 election campaign in order to avoid political controversies. “This may have been a smart political move in the short term, but it left the administration scrambling to get the IT infrastructure together in time, robbing it of an opportunity to adequately consult with independent experts, test the site and fix any problems before it opened to the public.”

 But, then came the reality, which Mr. Shirky describes as the painful tradeoff between features, quality and time.

 “When a project cannot meet all three goals–a situation Healthcare.gov was clearly in by March–something will give.  If you want certain features at a certain level of quality, you’d better be able to move the deadline. If you want overall quality by a certain deadline, you’d better be able to simplify, delay, or drop features.  And if you have a fixed feature list and deadline, quality will suffer. . . You can slip deadlines, reduce features, or, as a last resort, just launch and see what breaks. . . That just happened to this administration’s signature policy goal.”

 The inability of a troubled project to meet all three goals simultaneously, almost feels like the complex systems equivalent of the Heisenberg uncertainty principle; that is, it’s impossible to simultaneously determine the exact position and velocity of an atomic particle with any great degree of accuracy no matter how good your measurement tools are. While clearly not a scientific principle, but a set of guidelines based on decades of experience, there seem to be intrinsic limits to our ability to fix troubled IT projects no matter how hard we try.

 In The Mythical Man-Month, noted computer scientist and software engineer Fred Brooks introduced one of the most important concepts in complex IT systems: adding manpower to a late software project makes it later. Brooks’ Law as his concept became known, remains as true today as when it was first formulated almost 40 years ago.

 Over the years, we have learned that there are limits to our ability to pre-plan complex IT projects in advance. You need a good design, architecture and overall project plan, but you also need the flexibility to learn as you go and make trade-offs as appropriate. Most such projects are therefore released in stages, with alpha and beta phases that start testing the system with a select and relatively small number of users. Such early testing uncovers not only software bugs, but also design flaws that users have trouble with.

 Another important lesson is that all parties involved in a complex, high-risk project must have a good working relationship. All available information on the status of the project should be shared, so there are few last-minute surprises. Tradeoff decisions and project adjustments should involve all key members of the team. Behind most seriously troubled projects lies not only a gulf between planning and reality, but a lack of the close collaboration and overall good will necessary to make the project succeed.

 It’s hard to imagine a more politically contentious project than the ACA.  The administration was worried that any glitches uncovered while testing the system as part of the usual staged release cycle would give further ammunition to those trying to kill the ACA altogether. They may have felt that slipping deadlines and reducing features prior to the October 1 launch was not politically feasible, and that they therefore had no choice but to launch anyway and hope for the best.  Did they make the right decisions?  We’ll find out in the fullness of time.

 Irving Wladawsky-Berger is a former vice-president of technical strategy and innovation at IBM. He is a strategic advisor to Citigroup and is a regular contributor to CIO Journal.

 http://blogs.wsj.com/cio/2013/12/06/healthcare-gov-and-the-gulf-between-planning-and-reality/?mod=wsj_nview_latest

The different aspects of BI

5 Dec

http://www.martinsights.com/?p=774

I like the recognition that approaches need to be integrated in order to create useful insights. Valuable insights come from balancing the needs and capabilities of  business strategy, business analysis, business intelligence and advanced analytics.

Big Data and Marketing – Geoffrey Moore

15 Nov

Geoffrey Moore does a good job of explaining big data and marketing – always a cogent explanation of things.

Competitive Intelligence – A Selective Resource Guide – Completely Updated – September 2013

24 Sep

A compendium of links helpful for web based research from LLRX.com

http://www.llrx.com/features/ciguide.htm

Databases & Analytics – what database approach works best?

1 Aug

Every once in a while the question comes up as to what is the “right” database for analytics. How do organizations move from their current data environments to environments that are able to support the needs of Big Data and Analytics? It was not too long ago that the predominant answer was a relational database; moreover these were often organized around a highly normalized structure that arranged the fields and tables of a relational database to minimize redundancy and dependency (See also).

These structures to a large extent existed to optimize database efficiencies – or sidestep inefficiencies –  in a world that was memory and / or hardware constrained;  think 20+ years ago. Many of these constraints no longer exist which has created more choices for practitioners in how to store data. This is especially true of data repositories that built to support analytics as a highly normalized structure is often inefficient and cumbersome for analytics. Matching the data design and management approaches to the need improves performance and reduces operational complexity and with it costs.

The table below lays out some of the approaches and where they might apply. Note, these are not mutually exclusive: one can persist semantic data in a relational database for example. Also, this is not exhaustive by any means. The table below provides a starting point for those that are considering how their data environments should evolve as they seek to move from their legacy environment to one that supports new demands created by the need for analytics.

Data Design Approach

Analytical Activity

Relational. In this context “relational” refers to data stored in rows. Relational structures are best used when the data structures are known and change infrequently. Relational designs often present challenges for analysts when queries and joins are executed that are incompatible with the design schema and / or indexing approach. This incompatibility creates processing bottlenecks, and resource challenges resulting in delays for data management teams. In the context of analytics the challenges associated with this form of data persistence are discussed in other posts, and a favorite Exploiting Big Data Strategies for Integrating with Hadoop by Wayne Eckerson; Published: June 1, 2012.
Columnar. Columnar data stores might also be considered relational. However, their orientation is around the column versus the row. Columnar data designs lend themselves to analytical tasking involving large data sets where rapid search and retrieval in large data tables is a priority. See previous post on how columnar databases work. A columnar approach inherently creates vertical partitioning across the datasets stored this way. Columnar DBs allow for retrieval of only a subset of the columns and some columnar DBs allow for processing data in a compressed form. All this minimizes I/O for large retrievals.
Semantic A semantic organization of data lends itself to analytical tasks where the understanding of complex and evolving relationships is key. This is especially the case where ontologies are required to organize entities and their relationships to one another: corporate hierarchies/networks; insider trading analysis for example. This approach to organizing data is often represented in the context of the “semantic web” whose organizing constructs are RDF and OWL.
File Based File based approaches such as those used in Hadoop and SAS systems lend themselves to situations where data must be acquired and landed. However, the required organizational structure or analytical “context” is not yet defined. Data can be landed with minimal processing and made available for analysis in relatively raw form. Under certain circumstances file based approaches can improve performance as they are more easily used in MPP (Massively Parallel Processing or distributed computing) environments. Performance improvements will exist where data size is very large, and functions performed are “embarrassingly parallel“, and can work on platforms that designed around a “shared nothing architecture“; which is the architecture supporting Hadoop. The links referenced within the Relational section above speak to why and when you use Hadoop:see recent posts, and Exploiting Big Data Strategies for Integrating with Hadoop.

There are a few interesting papers on the topic – somewhat dated but still useful:

Curt Monash’s site has a presentation worth looking at title: How to Select an Analytical Database. In general, Curt’s blog DBMS2is well worth tracking.

This deck presented by Mark Madsen at 2011 Strata Conference is both informative and amusing.

This is a bioinformatics deck that was interesting. It does not have a date on it. However, good information from a field that has driven developments in approaches to dealing with large complex data problems.

Information Architecture – A Moving Target?

6 Jul

I am increasingly seeing articles that talk about the confusion in identifying and building out the right information architecture for the organization. The article here, and with a clip below talk to that point. This is a good thing. People seek simplicity, and are looking for the prescriptive approach: 1) build a data warehouse; 2) build some datamarts for the business folks; 3) get a BI tool and build reports. But this does not cut it as it is too rigid a structure for analysts, or other stakeholders that have to do more than pull reports. The industry has responded by – I am speaking in buzzwords here – by adding “sandboxes”; by adding ODS (Operational Data Stores); and by adding a whole new way of landing, staging, persisting data and using it in analytical tasks (Hadoop). Sitting on top of this data level of the information architecture has been an explosion of tools that cater to (more buzzwords) data visualization, self serve BI, and data mashups to name a few.

Bottom line – how does this all get put together without creating an even bigger data mess than when you started? It is hard. What one sees so often is organizations putting off addressing the issue until they have a real problem. At this point, one sees a lot of sub-optimal management behavior. A consistent theme in the press is agility – organizations and their leaders need to embrace the agile manifesto. I am whole heartedly behind this. HOWEVER, agility needs to be framed within a plan, a vision, or at least some articulated statement of  an end point.

The article below is interesting as it presents agility as a key “must have”  management approach, and yet it also discusses the fact that in order for an agile approach to be successful, it needs to adopt disciplines that are decidedly un-agile! This creates a dual personality for leaders within the data management related functions of an organization (BI, analytics, ERP, …). On the one hand one wants to unleash the power of the tools and the creative intellect that is resident within the organization; on the other, there exists a desire to control, to reduce the noise around data, to simplify ones life. The answer is to embrace both – build a framework that provides long term guidance, and iteratively delivers capabilities within that framework towards a goal that is defined in terms of business capabilities – NOT technology or tightly defined tactical goals.

The framework – whichever approach one chooses will articulate the information architecture of the organization – how data flows around the organization to feed core business activities, and advance management’s goals! It is important – if it cannot be explained on a one page graphic, it is probably too complicated!

Martin’s approach to tying things together is below…

“”So given that there is not a one size fits all approach anymore – how does a company ensure its Information Architecture is developed and deployed correctly? Well, you have to build it from the ground up, and you have to keep updating it as the business requirements and implemented systems change. However, to do this effectively, the organisation must be cognisant of separating related workloads and host data on relevant and appropriate platforms, which are then tied together by certain elements, including:

See also:

  1. Polyglot persistence
  2. Data Management Maturity Model as an example of a way to start thinking about governance
  3. Agile development – a good idea so often badly implemented!

Seven Questions to Ask Your Data Geeks

13 Jun

This is a good article on some of the basics that are so often not asked.

I get disagreement, but for the most part big mistakes in analytics projects stem from the answers to these questions not being answered – rarely do they stem from the statistician getting the math wrong