Machine Learning is getting a lot more air time these days but are we sure what it actually is…
The most common definition goes along the lines of:
It gives computers the ability to learn without being explicitly programmed” (Arthur Samuel, 1959).
This is an old quote but has held the test of time. But…how can computers “learn” – have we really reached the age of Artificial Intelligence that they will take over the world and humans are redundant…I suspect not…
Let’s explore the core of definition – the ability to learn
What this really means is there are a set of algorithms that rather than simply following a static set of program instructions, they can make data driven predictions or decisions through building a model
There are three recognised categories of algorithms:
Supervised learning – The computer is presented with example inputs (training data) and their desired outputs, given by a “teacher”, and the goal is to learn a general rule that maps inputs to outputs. The “easiest” example of supervised learning is a Decision treww – this uses a tree-like graph or model of decisions and their possible consequences, including chance-event outcomes, resource costs, and utility. From a business decision point of view, a decision tree is the minimum number of yes/no questions that one has to ask, to assess the probability of making a correct decision, most of the time. As a method, it allows you to approach the problem in a structured and systematic way to arrive at a logical conclusion
There are many other supervised learning alogorithms – which I will only reference; Naïve Bayes Classification:, Ordinary Least Squares Regression:; Logistic Regression;Support Vector Machines: Ensemble Methods:
Unsupervised learning – this is where the data is not labelled and so there are no error or reward signals; hence leaving the algorithm to find structure. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end ). Example include Clustering Algorithms: ; Principal Component Analysis: ; Singular Value Decomposition Independent Component Analysis:
Reinforcement learning – this has been inspired by behavioural psychology, concerned with how software agents ought to take an action in an environment so as to maximise the “reward”. There are many other adjacent theories in this space – from game theory, control theory, Operational research, swam intelligence etc. Reinforcement learning is different because the correct input/output are never presented, or suboptimal actions corrected, there is a focus on on-line performance. A good and relevant example is self driving car (autonomous car )- where it operates without a teacher explicitiy telling it whether it has come to close to its goal
OK – so we now have a set of algorithms that clearly need people who understand the techniques deeply – hence the need for Data Scientists.. An interesting summary is enclosed: See: http://nirvacana.com/thoughts/becoming-a-data-scientist/. This is using the tube map analogy and as can be seen Machine Learning is merely a stop on the way of the full journey…
However, even with the best algorithms; we need ways of storing the data and visualising it. There are many Analytic platforms – The Gartner magic quadrant top quartile being held by the likes – SAS, IBM, KNIME, Rapidminer, and Dell There is also MS-Cortana (see CSC announcement) hosted on Azure.
Its important to put Machine Learning into context; best summarised I believe via the Gartner Hype cycle for Data Science. This shows that ML is on the wave of Data Science but there is a lot before it and a lot after but this is a long cycle time; 5-10 years before it becomes properly mainstream.
Don’t be put off with the data science – this is a very new and exciting area. You don’t need to be a deep mathematician who understands the complexities of Clustering Algorithms vs Principal Component Analysis; like many of these phenomena – its knowing people that do but equally important it’s the knowing how to apply it. Data science has its own challenges – a data scientist can look at the data, write some algorithms, find some patterns and start to tell a story – often though it doesn’t get the right level of stakeholder commitment as its not close enough to the business problem itself. Using an agile technique is clearly one of the options that is going to increase stakeholder engagement – repeating the use of hypotheses. One technique I am particular enthused about is the OODA loop.
Software tooling (the platform and the applications) are going to become the catalyst for increased speed to market and to a certain degree dumbing down the complexities so it can become more mainstream. It will be a multi-layered architecture – Industry applications and solutions at the top, followed by visualisation/collaboration, analytics, storage and finally infrastructure. The danger like many of these stacks is spending too much on it rather than the actual business problem. A classic example would be to ask an organisation whether they have a “Big Data strategy” – with the likely retort of “yes, of course, we have deployed Hadoop” – clearly the wrong answer in this context.
As with many of these next generation technologies, the best way to understand and learn is to simply try it with a use case that is relevant to where you are. Get as close to the business problem and iterate on a hypothesis