Why should all Data Scientists use Stack Exchange?

Nicolas MARTIN
Nerd For Tech
Published in
6 min readJan 3, 2023

--

I have applied the solutions from How to solve complex problems efficiently to the Data Science Stack Exchange, and I ranked #3 for 2022!

Stack Exchange is a Question and Answer service to solve solutions to any complex problem thanks to other Data Scientists, Researchers, or Engineers.

Here are many additional tips using Stack Exchange that would greatly help you in any Data Science project.

Photo by John Cameron on Unsplash

1. It is a perfect point for your resume

Stack Exchange is a mix of altruism and problem-solving, which companies always appreciate. In addition to that, they can directly see how talented you are by clicking on your answers. This is a very transparent and honest way to evaluate anyone’s skills.

2. You improve your problem-solving skills

Most Data Science project failures are due to a need for more skills. Data Science is a vast field, and one problem can have many solutions.

Most of the top 20 Stack Exchange users are senior data scientists, which is no coincidence! They know they improve their problem-solving skills by solving new problems from different angles. I have learned a lot from them.

Photo by Crystal Kwok on Unsplash

3. You know the current challenges in production

Many Stack Exchange users apply Data Science algorithms in many fields from all businesses and industries. Helping them with their problems will grant you new concrete scenarios and improves your Data Science skills. You will also know the state of the art in Data Science, either in its challenges or in its solutions, as many Data Scientists who answer are updated with the latest algorithms and could often make innovative solutions.
Richard Feynman wrote, “Science is the Belief in the Ignorance of the Experts.” He explained that “science is in the making, belongs to the (unknown, yet to be discovered) future, while expertise is based on the past, with in-built obsolescence.”

We should better become “making” Data Scientists!

4. You help others

Most people put themselves forward on social networks to be competitive and show their skills. This is not bad at all, but rather than focusing on one’s ego, why not use your skills to help others? Many questions I have posted were not validated, nor upvoted, even downvoted, and it is normal! Giving without expecting a reward is always satisfactory, even in scientific domains.

Photo by Milad Fakurian on Unsplash

5. You don’t have to spend a lot of time

I also wanted to participate in Data Sciences challenges like Kaggle, but I have many ongoing projects. Challenges require several days, and they are very competitive. However, spending 30 minutes per day or even 1 hour per week in Stack Exchange is enough to score well and maintain good data science knowledge, which is shorter than participating in challenges.

6. ChatGPT cannot solve many Data Science problems

Since ChatGPT is released, I have used it often to solve many problems, including Stack Exchange ones. However, ChatGPT is more of a problem helper than a problem solver. Its answers are generally correct, but if I dive into them, a few things could be corrected. Consequently, there are many Data Science problems that ChatGPT cannot solve, and there are still a lot of opportunities for Data Scientists to solve problems.

Photo by Tyler Lastovich on Unsplash

7. In asking a question clearly, you have almost solved it

This is more complex than it might seem. The Data Science Stack Exchange already has several tips for asking a question clearly and increases the chances of getting an answer. By following this guideline, you reformulate the problem clearly and thoroughly. This is an excellent exercise for problem-solving because you set boundaries and focus on the real problem’s origin. When following those guidelines, I have often canceled my question because I have discovered new clues assessing the situation.

8. Fun fact: Yoshua Bengio’s answer was not validated

In August 2022, there was a complex issue: How does “A Neural Probabilistic Language Model” learn good word vectors?

A Neural Probabilistic Language Model is one of the essential publications in Natural Language Processing, as many fundamentals are applied in modern technologies like ChatGPT.

The user wanted to understand precisely how word vectors are connected correctly and could result in meaningful answers.

I could not wholly answer with my limited skills in Natural Language Processing, so I have tried a tip from How to solve complex problems efficiently: Ask masters. So I have sent an email to the team who have written the publication, including Yoshua Bengio (who has recently received the Princess of Asturias Award), and he answered:

The reason why similar words end up having similar word embeddings is because of the smoothness of the neural net that takes these word embeddings in input. If “The cat is walking in the — — “ can be completed by “room”, it is also true when we replace “cat” by “dog”, which puts pressure on both words to have similar word embeddings. Small change of the embeddings = small change in the output probabilities.

If you think about it, the architecture is very similar to a 1-D convolutional neural network:

  • the matrix C corresponds to a usual dot-product neural operation when the input is a one-hot vector for the word at each position (with a 1 at the position corresponding to the word symbol)
  • using the same matrix C at every position makes sense and proceeds of the same inductive bias as in 1-D convolutional neural networks: the meaning of a word is position invariant (i.e. if we only know that a word appeared at position 3 vs 4, the meaning does not change).

The idea of using such layers, with shared weights across different positions, is found not just in 1-D convnets and time-delay neural net (which pre-existed the NNLM, and which I worked on in my 1991 PhD thesis), but also in neural nets operating on symbols (which were explored among others by Geoff Hinton and his student Paccanaro a few years earlier, cited in the paper).

I thanked him a lot for his time because, generally, the most outstanding scientists like Yoshua Bengio are very busy.

Surprisingly, the answer was not validated. Could the user have doubted the answer’s authenticity?
Anyone with good Data Science skills would recognize the quality of the answer. The answer is obviously from someone who excels in that topic; even ChatGPT would never answer like this!

The fact that the answer was not validated is not crucial as I am not the author, but it is helpful to many Data Scientists who want to understand how word vectoring works.

Conclusion

As an entrepreneur, I have to progress in many projects involving many skills, mainly coding. I don’t want to lose skills in Data Science or in problem-solving, I don’t have much time, and the best solution I have found is the Data Science Stack Exchange. I will continue to answer questions in the future. I hope this guide will encourage many Data Scientists to participate in the Data Science Stack Exchange Community: The more, the merrier :)

Main Stack Exchange services related to Data Science:

Data Science

Cross Validated

Artificial Intelligence

Mathematics

--

--

Nicolas MARTIN
Nerd For Tech

Full Stack Data Scientist. Topics: Deep learning, mathematics, manufacturing engineering, history. Creator of https://www.airoomstyles.com