Archive | Industry RSS feed for this section

Aggregate Persistence &; Polyglot Persistence!

9 Jun

Gotta love the consultant speak!!

This short article provides an interesting perspective on how NoSQL differs from a data storage perspective, and why that is important. The article also points out that storing data on large clusters is very efficient from a storage perspective, but NOT if the data is relational in nature. In order to look at data across clusters efficiently, one needs to reorganize the data – this is where MapReduce comes in. Mapreduce is great at reorganizing data to feed a particular tasks – from my perspective a critical need for the analytical communities.

This links to a notion of “Polyglot Persistence” which accepts the notion that data will be stored in multiple mediums as new ways of persisting data evolve. I find this interesting as this mirrors what we are seeing today. Customers have Operational Data Stores – usually relational, and yet seek to perform tasks that are complicated by: 1) the size of the data, and 2) the constraints placed on how the data can be evaluated or analyzed by the data model or architecture. This motivates an exploration of new approaches; hence the discussions industry is having on NoSQL (or to use the buzzwords: Hadoop; Mapreduce; Big Data).

I may have simplified this a bit – apologies. At the end of the day, we are seeing a sea change in how organizations deal with data to more effectively apply it to the diverse needs demanded by the business side of the house. Explaining how organizations must change, but do so in a controlled risk reduced manner is the challenge.

See also:

Debate over NSA collecting information … can the media begin to report substantively?!!

7 Jun

So this business of the NSA collecting data should come as no surprise to anyone. The media is having a field day! The issue is whether or not the intel community is doing this legally. Has the FISA court done anything illegal? The court is guided by a set of rules that are mean to be transparent, known to the non-intel world, and approved by Congress. Did they follow these rules when allowing the NSA to collect what it collected? Did they restrict the use of that data appropriately? These are the questions — remember after 9-11 when everyone was asking (with outrage) why we had not connected the dots, but back to my area of concern…

CNN has pulled out their favorite privacy pundit – Jim Harper from the CATO Institute. Jim is well spoken, and very learned in the field of privacy and policy. However, he makes a statement in this interview that I find incredible – he says that collecting all the data from every American’s phone calls “can’t possibly be useful for link-based investigation.” Really, I cannot think of a better way of using phone call data than in linked based analysis. Methinks you need to stick with policy Jim!! Anyone out there care to explain this comment?

As a matter of policy, there are probably some questions to be answered.  The FISA courts have been criticized for approving everything without question. I would like the news agencies to focus on that, and whether or not the court is working as envisioned to protect our privacy.

Have a look at this post that is homeland security oriented – they are harvesting things differently here, but… same privacy concerns.

Analyst Desktop Binder – Interesting view of Social Media Exploitation

7 Jun

This is getting re-posted given the noise about the NSA collecting personal data

analyticaltern's avataranalyticaltern

Interesting reading – especially if you have done work in the fusion centers

Much noise was made of the words that are searched within media – This is a pretty long list and what it says to me is that there must be a significant amount human intervention and I would think an awful lot of “noise”.

Hard to believe that this is that effective without knowing more about underlying capabilities, but my guess is that this is only a step above Googling those terms!

View original post

The Making of an Intelligence-Driven Organization

6 Jun

Interesting presentation – but really liked the Prezi – if you have not seen one of these have a look

The discussions/handout covered many points including:

  • As a discipline, intelligence seeks to remain an independent, objective advisor to the decision maker.
  • The realm of intelligence is that judgment and probability, but not prescription.
  • The Intelligence product does NOT tell the decision maker what to do, but rather, identifies the factors at play, and how various actions may affect outcomes.
  • Intelligence analysts must actively review the accuracy of their mind-sets by applying structured analytic techniques coupled with divergent thinking
  • Critical thinking clarifies goals, examines assumptions, discerns hidden values, evaluates evidence, accomplishes actions, and assesses inferences/conclusions
  • Networking, coordinating, cooperating, collaboration, and multi-sector collaboration accomplish different goals and require different levels of human resources, trust, skills, time, and financial resources – but worth it to ensure coverage of issues.
  • Counterintelligence and Security to protect your own position
  • and more….

I liked the stages of Intelligence Driven Organizations in the Prezi.

Mary Meeker’s Latest Masterful Presentation On The State Of The Web

4 Jun

Thanks to BeSpacific for forwarding this article – lots of good interesting stats on the e-world…

Each year, Kleiner Perkins partner and former analyst Mary Meeker releases an in-depth look at the state of the web, and it’s always full of surprising stats.

Agile development – a good idea so often badly implemented!

20 May

I am reposting this, as I stand by my original assertion – that Agile requires real leadership skills.

I had a good giggle reading these two articles, here and here, and then finding this one referencing Flaccid Scrum  – by Martin Fowler.

Original:

The other day I got something from Carahsoft about a seminar on agile development. The Federal government has been pushing this for some time, so it is curious as to why Carahsoft decided to have a seminar. Regardless this happens to coincide with a number of other discussions regarding Agile approaches. It is interesting that there is still significant debate about what agile is and what it means for projects.

I have the following observations and comments that might help shape the debate (should you find yourself in one):

1. It is an approach not a religion! So many people get really wrapped into a particular approach and then feel the need to make sure that everyone follows that particular approach to the letter of the law. I have rarely seen a successful agile implementation work that was not in one form or another morphed to accommodate the particular needs of a project or the organization where it was being implemented. If we think of Agile as an management approach or framework, and less as a prescriptive remedy for development challenges, we are better off. We can be flexible and focus on outcomes and less on the “rules” that a particular methodology espouses. This article is a little old, but it lays things out well, and is a recommended read.

2. Agile can leave you vulnerable – it requires confidence and leadership! At some point, one has to accept that one adopts adaptive (same as Agile) approaches because the specific requirements are unknown. One has to have the confidence to say that “we do not know”, and the leadership to convince people that by following a disciplined agile approach, we will reveal the true requirements. This business of not knowing is very unsettling for people. This is especially true of the government space where there is a whole cadre of “business analysts” who exist to specify requirements so the government can contract to have things built. Over time, the role of these business analysts will need to change. This article again by Martin Fowler talks to some of the criticism of Agile approaches not having documentation and appropriate controls.

Lastly it is worth pointing out that adoption of Agile approaches often requires a cultural change for an organization. There are three ways that change can occur: from the top; bottom up – organically at the grass roots level; or externally imposed. In the government space this last one is more common perhaps than in the commercial space. Regardless of how change occurs, it always requires leadership to create the right environment for change. At the end of the day, this is often the largest hurdle.

The agile manifesto has key tenets of Agile approaches

Data Visualisation » Martin’s Insights

23 Apr

This is a good article on data visualization. The author indicates in his considerations section that “real data can be very difficult to work with at times and so it must never be mistaken that data visualisation is easy to do purely because it is more graphical.” This is a good point. In fact in some respects determining what the right visualization is can be harder than simply working with the data directly – however, much harder to communicate key insights to a diverse audience.

What rarely gets enough attention is that in order to create interesting visualizations, the underlying data needs to be structured and enhanced to feed the visualizations appropriately. The recent Boston bombing where one of the bombers slipped through the system due to a name misspelling recalled a project years ago where we enhanced the underlying data to identify “similarities” between entities (People, cars, addresses, etc.) For each of the entities, the notion of similarity was defined differently; for addresses it was geographic distance; for names it was semantic distance; for cars, it was matching on a number of different variables; and for text narratives in documents we used the same approach that the plagiarism tools use. In this particular project a name misspelling, and the ability to tune the software to resolve names based on our willingness to accept false positives, allowed us to identify linkages that identified  networks. Once the link was established we went back and validated the link. In the above example, the amount of metadata generated to create a relatively simple link chart was significant – the bulk of the work. In terms of data generated, it is not unusual for data created to dwarf the original data set – this is especially true if there are text exploitation and other unstructured data mining approaches used.

So … Next time the sales guy shows you the nifty data visualization tool, ask about the data set used, and how massaged it needed to be.

http://www.martinsights.com/?p=492&goback=%2Egde_4298680_member_232053156

This should come as no suprise… Using Excel for complex analysis on an ongoing basis is asking for trouble!

22 Apr

This report on how using Excel has caused some major miscalculations should come as no surprise… Excel exists because it is pervasive, easy to use and can be applied to a range of decision making activities. However, have you ever had the experience of trying to create a repeatable, defensible and transparent report using excel WITHOUT having to make sure you had done it correctly? The attached article talks about a number of mistakes. I have had a number of discussions over the years with companies that are struggling with whether or not to implement a BI system, and if so to what extent should it provide structure and guidance to the process of using Excel?

The easy implementation of BI is to implement a tool such as Tableau that in essence takes spreadsheets and allows you to pivot the data and visualize more easily that one could in excel. I realize that Tableau does more than that now, but that is how it started and most people appear to use it that way still. This gives you great looking dashboards, and allows you to roll around in the data to bubble up insights. However, it does nothing to address the quality of the report and the issues raised by the article.

At the other end of the spectrum are enterprise level tools that do a great job of locking down the source data, and tracking everything that happens at the desktop to make the final report.These tools are focused on creating the SAME report with exactly the same inputs and calculations as all previous times. To the extent changes are made, they are tracked, and capabilities exist to flag and route changes for review and approval. The downside of course is that they often limit what the user can do with the data.

Somewhere in the middle is the happy spot. To the extent tools are not able to support the requirements for transparency, traceability, and defensibility, these requirements must be addressed through policy, process and standards.  Most of the enterprise tools are configurable to create a well defined set of gates between which analysts and report creators can have complete flexibility.

In the cases mentioned in this article, the technology exists to create the safeguards required. However, the user communities were able to resist change, and management – for whatever reason – did not make the decision to invest in underlying data management, BI and analytical capabilities. In a data driven world, it is only a matter of time before that comes back to bite you.

TIBCO – Buys another company

1 Apr

TIBCO buys another company in the analytics space. I have always thought that with Spotfire, the Enterprise Service Bus business, and the acquisition of Insightful some years ago, TIBCO had the makings of a company that was putting together the Big Data analytical stack. With this purchase, the have added a geo capability. Will they ever get all these pieces integrated to create a solutions package – like SAS’s Fraud Framework? Not sure why they have not done that to date. It may just be that it is too hard to sell complete solutions, and it is easier to get in the door with a point solution? Anyway – I like Spotfire, and anything they do to build out the back end is good stuff. Price point still seems a little high for a point solution, but they seem to be making it work for them, so who am I to argue… interesting to see how this plays out.

See also here – as they post in the MDM Magic Quadrant as well.

Magic Quadrant for Data Integration Tools

23 Feb

Gartner Data Integration Survey

October 2012 – All the normal suspects. However, was surprised (and pleased) to see Talend in the mix. Interesting to note that SAS is in the lead with the number of installs (13k) – up there with Microsoft  (12k).