The Perils of Careless Modeling
A well-known method of stock market manipulation involves something called “spoofing” , a practice where a stockholder puts out a fake order to sell a large number of shares, but ends up buying the same stock when the market reacts by dropping its value. It’s not a foolproof strategy, but on May 6th, 2010, a single investor’s use of this method caused a flash crash that brought the DOW Jones down by nearly 9% in one day.
The reason this flash crash occurred wasn’t spoofing directly, but actually the reaction of an erroneous model used by mutual fund Waddell & Reed. After seeing the fake sell orders, their automated model single-handedly made a real one at $4.1 billion. Selling such a huge number of shares caused others in the market to sell as well and created minor panic in the whole stock market for the day.
Today’s data science tools are wonderful; their accessibility, ease-of-use, and overall horsepower continue to expand as our wealth of data does the same. Businesses and governments alike are frantically trying to keep pace with innovation by buying and implementing tools to hit buzzwords for being “data-driven”, having “decision automation”, and harnessing “real-time analytics”. Doing all of these things successfully can do wonders for an organization, however a critical lesson should be taken from Waddell & Reed’s example: if these analytical, predictive, and prescriptive models are not built correctly they can spell disaster when billions of dollars are on the line.
The skills and expertise needed to perform these tasks adequately have grown just as much as our wealth of data. With how inter-connected today’s world has become, just imagine the sheer number of factors that can affect performance indicators for your own organization. Everything from today’s weather to tomorrow’s price of a Big Mac can fundamentally change the decisions you make. It’s no longer enough to have business knowledge or programming knowledge alone; a deep understanding of the business problem, statistics, and the available technology is required to even begin. The ONLY barrier that has been broken by the latest and greatest tools is the time it takes to simply GET to the modeling work. It can be reasonably easy to learn how to play chess, but the amount of skill it takes to become a “grandmaster” requires genius-level ability and thousands of hours of practice.
Maybe it’s a little dramatic to use “genius” in describing the data scientists of today, but the value brought by dependable, accurate, and learning models cannot be overstated. The basic must-haves for predictive modeling generally fall into three categories:
- Knowing how to train a model.
- Knowing how to test a model.
- Knowing the right environment to deploy a model.
These can feel like mundane and common-sense ideas, but there is an incredible depth of complexity in each. When training a model, are you using the optimal combination of variables you have at your disposal? In testing, how do you avoid being guilty of over-fitting and still know how the model will perform with future data that could look completely different? And when leaving a model to its own devices, how do you assure that it can maintain integrity while also learning from today’s latest information? In the case of Waddell & Reed, clearly the model wasn’t trained OR tested to handle such dramatic shifts in the market, and the environment where it operated allowed it to make a $4.1 billion decision without validation.
With as much power as our machines and analytical tools have, a human mind is still required to take the wheel. An inter-connected world with unfathomable numbers of actors means an analytical model must be stoutly robust. Predictive models have the potential to predict the spread of viruses or faults in telecommunication networks, but only with care, expertise, and respect for the underlying statistical methods.
The Data Science Portfolio at PK echoes these ideas in all of the work that we do. With a central focus on integrity in the modeling process, it can be rest assured that results will have the utmost in accuracy and dependability for decision-making.