article 26 July 2018

BLOG | Follow TMC Data Science in their Kaggle competition

In this blog we keep you updated on the progress, challenges and wins of the TMC Data Science team in their first Kaggle competition.

Shoot for the top 10% | Romain Huet

July 26, 2018 | As my colleague Valentin explains in his article we as the Employeneurs of TMC Data Science are participating in a Kaggle competition. I am helping him organize our participation. Since it is our first competition all together our objective is to be able to submit a collaborative work and aim to be within the top 10%.

Teaching and learning

To do so every week we gather cheerful and eager Employeneurs around pizzas to contribute to this project and learn how to use machine learning on a real world problem. Some of us are more experienced with a strong background in machine learning. Therefore, helping the other to keep up with what’s happening is another challenge in itself.

In these weekly meetings everyone shares what they have done during the previous week. That leads to open discussions and questions to learn more about the field of machine learning, especially for the curious/beginners. It is time consuming for those with more experience who solely contribute to the competition and have to explain/teach to others. However, as any teacher you are happy when people understand and improve in their work.

Learn from failure

The competition we are working on has been launched by Santander Group, a Spanish bank, to help them identify the value of transactions for each potential customer. One nice property of the data is that no domain knowledge is required, hence we can all focus on pre-processing data and the machine learning part. By working with Kaggle “Kernels” corresponding to codes shared by other kagglers we were able to be in the top 14%, until a “leak” appeared. In this kind of competition everything can happen and in matter of hours you can find yourself at the bottom of the leaderboard very quickly. This pseudo-leak is actually a hack which helps to have a better point of view/understanding about the data. Now everyone, including us, are taking advantage of this data hack by working with "Kernels" shared on the forums.

The participation in such a competition brings you the ability to learn faster about machine learning and see how quickly it is evolving with the help of competent people. As for me, in addition to learn, I can teach my knowledge to others which on the other hand help me realize that I have much more to learn in this amazing field.