More Insights Read more →
More Insights Read more →
While breaches affecting companies like Equifax, Target, and Sony receive huge attention in the media, for many organizations security is still an afterthought, an area to focus on if and only if attacks occur. Very few organizations take sufficient proactive steps to secure their environments and ecosystems while equipping themselves with assimilative and dynamic frameworks that keep current as the “bad guys” get more sophisticated, capable, and demanding.
This has to change; we need real security strategies and not simply Security Theater. Far too much is at stake, for individual businesses, for consumers, and for the entire ecosystem of digital business.
…we need real security strategies and not simply Security Theater
As organizations embark on the digital transformation “journey,” security must stay top of mind. Security breaches, from within and without, are givens of the system. It’s not a matter anymore of “if” but of “when,” “how often,” and “how major?” All large changes cut both ways- they empower and they constrain- digital transformation is no different.
Interestingly, security can be seen on both sides of the question as both empowerment and as constraint. Clearly, organizations having to spend enormous time, money, and people-power in preventing attacks is a constraint to their otherwise smooth workings, but such is the cost of doing digital business. In addition to managing constraints well, organizations should think of the “security of enablement” in the sense that good, well-managed, and well-governed security strategies actually empower employees to be productive, partners to collaborate, and ideas to be shared.
A strong security strategy includes elements of technology, people, and process, resting within a framework that is built on the understanding that change is inevitable. Great security strategies grow and morph as the needs of the business change, and as the attack vectors increase.
Great security strategies connect security with the needs of the business and acknowledge that each business has a different risk profile and as such, requires a different security profile. When business risk and security readiness are mismatched, problems ensue. When organizations take their eye off the security ball for even a moment, costly breaches occur.
Having a strong and assimilative security framework is not simply good business, its need is increasingly becoming enshrined in law. Security, Governance, and Compliance are the three horsemen of IT. They ride together.
Great security strategies grow and morph as the needs of the business change
Security is serious business. The good news, as one might expect, is that this area has seen enormous innovation in the last decade. Security technologies exist today at levels of performance heretofore unseen and at the lowest cost in history. These are very powerful pieces of security strategy, though they are not the silver bullets. In order to truly build a sustainable and strategic security framework, organizations need to concentrate on people and process as well.
Putting the puzzle pieces together can be challenging but it has to be done. To do so, one has to look at security holistically, and to Plan for the “unknown unknowns.”
To help organization think through, implement, and Manage these holistic solutions, Akvelon announces an array of security services, dedicated to helping organizations concentrate on their core businesses while mitigating the risks associated with security in a digital world.
Porto Seguro, one of Brazil’s largest auto and homeowner insurance companies, posted a challenge on Kaggle.com to build a model that predicts the probability that a driver will initiate an auto insurance claim within the next year. While Porto Seguro has used machine learning for the past 20 years, they’re looking to Kaggle’s machine learning community to explore new, more powerful methods. A more accurate prediction will allow them to further tailor their prices, and hopefully make auto insurance coverage more accessible to more drivers.
A group of two Akvelon machine learning engineers and a data scientist enlisted on Kaggle.com decided to compete side-by-side with more than 5,000 teams for the top positions in the leaderboard. The competition submissions are evaluated using Normalized Gini Coefficient. The Gini Coefficient ranges from approximately 0 for random guessing, to approximately 0.5 for a perfect score.
The insurance group provided the same data set to all competition participants as a raw data set with 893,000 rows of 59 distinct data points, with little or no specification as to what the data actually represents. The Akvelon team went through several different iterations to identify the meaning of each of those data points, to cleanse the data and to reduce the number of variables to a manageable subset. If the data model has too many data points with little or no impact on the output (event of the driver filing insurance claim) the resulting model becomes too complex, and has an adverse impact on the performance to generate the proper output in a timely manner.
The raw data set with 893,000 rows of 59 distinct data points
The work involved in crowdsourcing competitions requires lots of: “try, fail, fail fast then make some changes and try again, repeat the entire process over and over” approaches. For example, the team experimented with several prediction models such as: Random forest, Logistic regression, Extra tree classifier, KNN, Naïve Bayes, Adaboost, Xgboost, Lightgmb, Catboost, Neural networks.
‘Competitions requires lots of: “try, fail, fail fast then make some changes and try again, repeat the entire process over and over” approaches’
One of the chosen approaches was the t-SNE (t-distributed stochastic neighbor embedding) which is a nonlinear dimensionality reduction technique that is particularly well-suited for embedding high-dimensional data into a space of two or three dimensions, which can then be visualized in a scatter plot. Specifically, it models each high-dimensional object by a two or three dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points. The image below shows the result of this analysis.
t-SNE result (yellow dots corresponds to 1, purple – 0)
The most promising alternatives were XGBoost and LightGBM which placed the team onto the top half of the competition leaderboard with a score of ~0.282. The top ranked team had a score of 0.290. Akvelon ‘s team score was 97.2% of the top ranked team. (The higher the score the better, with the maximum value of 0.5). The next step was to adjust the data model parameters such as (depth of the tree, number of estimators, size of the train and validation sets). Those experimentations raised the score to about ~0.284. With a model that was promising, it was the right time to go back and revisit the groomed data set and do a better approach in choosing which data point to include and which one to replace.
Further research and investigations helped the Akvelon team improve the models. Some of the improvements involved including a Neural Network which raised the score to ~0.285, placing the team in the top 2% of the entire competition.
After some additional tweaks, adjustments and learning from the past Kaggle winner teams approaches, the Akvelon team ranked in the top 1% with the score of 0.287 against the leader having 0.291 (98.6% from the top scoring team).
At the final stage of the competition, the organizers used 70% of the complete data set, unseen yet by any of the competitors. Each teams result against this new data set will determine the final score in the Leaderboard.
Porto Seguro Machine Learning Kaggle Competition Final Standings
The Akvelon Team placed 22nd out of more than 5,000 teams in the final standings after applying the complete data set. Finishing the competition in the top 0.5%.
Participating in the Porto Seguro Safe Driver prediction competition increased our expertise in the Machine Learning field. The trial and error portion of this project helped us refine data quickly, focusing on accuracy and details in every model. The data also provided a real world data set for a real world issue, so our models could be applied to different situations and everyday occurrences.
Finishing in the top 0.5% is a proud accomplishment when competing against more than 5,000 other teams.
If you would like to learn more about our Machine Learning capabilities and solutions, contact us today!
The Internet Age has brought many discussions around the role of human capital and technology. The real answer lies in a mix of innovations in tech and human creativity. Read more.
We’re proud to be collaborating with Microsoft to implement the newly released Sports Performance Platform to professional sports teams around the globe. Read more.
New technology was presented at Microsoft Data Insights Summit which will help in player injury prevention through through the use of data and visualization. Read more.
Thanks [Akvelon consultant] for your passion and hard work. Looking forward for more help from you.
Sorry, the comment form is closed at this time.