Finding Data Engineers is like mining Gold…

Gold is flying high but it is getting harder to mine. Similarly, Data Engineers are in huge demand but everyone is struggling to find good ones.

Businesses are conducting interviews almost every week and still struggling to find talented and experienced people that can be called Data Engineers. Who are these Data Engineers? What makes them special? What skills should they have?

In an ideal Data Engineer, we look for the following experience:

  • SQL: For Data Engineering, SQL knowledge is a must. Your SQL skills will land you a job pretty quickly if you have them.
  • Python: Python is becoming more and more important every day. Thanks to some amazing open-source libraries like Numpy, PySpark, Pandas, Scikit, Keras, Tensorflow, and others. When you have a lot of data, you will start using distributed architectures, and either this or Scala could be needed.
  • Cloud: Which cloud should I learn from the three major Clouds, AWS, Azure, and Google? This is a very regular question asked by people who want to learn about cloud technology. A simple answer to this is very simple: it depends and does not matter. Depends on the organisation you’ll be working with and the geography. Why does not matter? Once you learn the cloud concepts and gain some experience in 1 provider, it is fairly easy to move to the next one.
  • Communication skills: You are not planning to be a political leader or an award-winning actor but if you can explain a technical concept to a non-technical person, you’ll become a superhero pretty fast.
  • Data Warehouse and Data Engineering Concepts: What are the different steps to ingest data? What is data governance? What is a fact and dimension tables? What is a data mart? When should I use a database or a data lake? What is a Slowly Changing Dimension? You should know the answers to all these questions.
  • Distributed architectures: To transform big data, you will have to use distributed processing like Apache Spark. Knowing your way around these architectures and understanding how they work would help you a lot. Knowledge and experience of working on Spark and about different vendors that provide these services will be beneficial. Databricks, AWS EMR, Azure HDInsight, Google Dataproc, etc.
  • Version Control: Working with version control and CI/CD tools is a must in today’s world. Organisations are moving away from long deployment cycles to rolling out new features almost every two weeks. Experience with BitBucket, GitHub/Gitlab, Jenkins, and DevOps is a major requirement.
  • Data Privacy: What should I consider when writing Personally Identifiable Information? What is GDPR? Right to be forgotten? 

Looking at the list of some of the qualities of Data Engineers, I am sure you think that no wonder we can’t find many!

Points for current Data Engineers and people who aspire to be Data Engineers:

  • Data Scientists need the right data at the right place and perform feature engineering to extract meaningful insights, recommendations, and analysis from the data. We need good Data Engineers to provide this useful data. We need you in this gold rush! 
  • Major companies are looking for Data Engineers. Given the shortage, you will find a good salary, nice benefits, and recognition.
  • To be a Data Scientist you need to be good with Mathematics and Statistics. You can still become a Data Scientist.

We are starting a program soon for hiring and training people, especially fresh graduates and masters, to make them successful and talented Data Engineers. We are encouraging women in tech, gender equality, and inclusivity.

Leave a comment if you would like to know more.