BLOG | Follow TMC Data Science in their Kaggle competition
In this blog we keep you updated on the progress, challenges and wins of the TMC Data Science team in their first Kaggle competition.
KEEP IT SIMPLE. IT WILL BECOME COMPLICATED ANYWAYS | Vasia Tsiftsoglou
September 3, 2018 | When I first sat down with the team, I was confronted with the question: What is your reason for doing the competition? The answer is simple: I want to get better at learning. As I sit today, to pen down my thoughts, I realize how the Santander challenge has been the best trigger to learn, or in my case, to learn again. Another week passed by, and so the new lessons learned. Some worth sharing. Some not. Here we go:
Just put in the effort, consistently, day in and day out. Some days are: “You totally got this”, but more often than not: “More effort after 8 hours of work already? Come on now!” But a goal, is a goal. So just keep pushing forward, because it pays off in the end.
The data and the chaos
There’s no way around it. Real world data sets, even those curated by Kaggle are messy. Mess can be discouraging unless you approach it, with a beginner’s mindset. The Santander case has been the perfect example to test the hypothesis. Any “domain knowledge” simply went down the drain with missing data and column names that meant literally nothing.
Start off simple. Learn. Repeat.
Neural networks, data leaks, ensemble models, stacked models. Terms that can easily scare away the novice, with not much of knowledge but lots of curiosity. So how to face this? Just start off with a simple model. Then study. Refine your model, apply your learnings and study. Repeat again and again. As time goes by, the model becomes more sophisticated and the novice ceases to be one.
KAGGLE BRINGS PEOPLE TOGETHER | Jeroen Henrard
August 8, 2018 |This week, we as TMC Data Science Employeneurs continued to fight for our place in the rankings of a Kaggle competition. After the pseudo-leak last week that Romain talked about in his article, we started off again using one of the kernels from the forums. This brought us again on the same ground as the other contestants.
LEARNING AND IMPROVING
Since all of the ‘TMC Datathoners’ (the name we gave our group) are at different positions on the learning curve, the reasons everyone joined differ as well. While the more experienced among us do it to stay up to date with the best algorithms available, others joined to learn from their colleagues. This is a perfect example of one of the five pillars of TMC’s Employeneurship model: by working in business cells with their own technical expertise and niche market knowledge, like-minded people work together and valuable knowledge is easily shared.
The team-building is a very important aspect as well. Because of the weekly meetings we now see people coming together and spending time to work towards a common goal. Even over the weekends, two of the Employeneurs met up for ‘co+(ffee/ding)’ at a local coffee place. Personally, this is one of the things I like about TMC. Since many of the Employeneurs come from outside of the Netherlands when they start at TMC, they’re often challenged with building a new social circle. Initiatives like the Kaggle competition help them to easily meet new people and make the transition to a new environment a lot easier.
For me, this already has been a great opportunity to learn. When we first started, competing with data scientists from all over the world seemed very daunting. However, even after the first meeting we had as a group, I realised the potential our team has. The drive of the more experienced data scientists to help out the other members of the team is the main reason I think this is a great initiative and I truly believe this is not the last competition we will partake in.
SHOOT FOR THE TOP 10% | Romain Huet
July 26, 2018 | As my colleague Valentin explains in his article we as the Employeneurs of TMC Data Science are participating in a Kaggle competition. I am helping him organize our participation. Since it is our first competition all together our objective is to be able to submit a collaborative work and aim to be within the top 10%.
Teaching and learning
To do so every week we gather cheerful and eager Employeneurs around pizzas to contribute to this project and learn how to use machine learning on a real world problem. Some of us are more experienced with a strong background in machine learning. Therefore, helping the other to keep up with what’s happening is another challenge in itself.
In these weekly meetings everyone shares what they have done during the previous week. That leads to open discussions and questions to learn more about the field of machine learning, especially for the curious/beginners. It is time consuming for those with more experience who solely contribute to the competition and have to explain/teach to others. However, as any teacher you are happy when people understand and improve in their work.
Learn from failure
The competition we are working on has been launched by Santander Group, a Spanish bank, to help them identify the value of transactions for each potential customer. One nice property of the data is that no domain knowledge is required, hence we can all focus on pre-processing data and the machine learning part. By working with Kaggle “Kernels” corresponding to codes shared by other kagglers we were able to be in the top 14%, until a “leak” appeared. In this kind of competition everything can happen and in matter of hours you can find yourself at the bottom of the leaderboard very quickly. This pseudo-leak is actually a hack which helps to have a better point of view/understanding about the data. Now everyone, including us, are taking advantage of this data hack by working with "Kernels" shared on the forums.
The participation in such a competition brings you the ability to learn faster about machine learning and see how quickly it is evolving with the help of competent people. As for me, in addition to learn, I can teach my knowledge to others which on the other hand help me realize that I have much more to learn in this amazing field.