Over the last year, I taught myself data science. I learned from hundreds of online resources and studied 6–8 hours every day. All while working for minimum wage at a day-care.
My goal was to start a career I was passionate about, despite my lack of funds.
Because of this choice I have accomplished a lot over the last few months. I published my own website, was posted in a major online data science publication, and was given scholarships to a competitive computer science graduate program.
In the following article, I give guidelines and advice so you can make your own data science curriculum. I hope to give others the tools to begin their own educational journey. So they can begin to work towards a more passionate career in data science.
A Quick Note
When I say “data science”, I am referring to the collection of tools that turn data into real-world actions. These include machine learning, database technologies, statistics, programming, and domain-specific technologies.
A few resources to start out your journey.
The internet is a chaotic mess. Learning from it can often feel like drinking from the fun end of a fire-hose.
There are simpler alternatives that offer to sort the mess for you.
Sites like Dataquest, DataCamp, and Udacity all offer to teach you data science skills. Each creating an education program that shepherds you from topic to topic. Each requires little course-planning on your part.
The problem? They cost too much, they don’t teach you how to apply concepts in a job setting, and they prevent you from exploring your own interests and passions.
There are free alternatives like edX and coursera which offer one-off courses diving into specific topics. If you learn well from videos or a classroom setting, these are excellent ways to learn data science.
Check out this website for a listing of available data science courses. There are also a few free course curricula you can use. Check out David Venturi’s post, or the Open Source DS Masters (a more traditional education plan).
If you learn well from reading, look at the Data Science From Scratch book. This textbook is a full learning plan that can be supplemented with online resources. You can find the full book online or get a physical copy from Amazon ($27).
These are just a few of the free resources that provide a detailed learning path for data science. There are many more.
To better understand the skills you need to acquire on your educational journey, in the next section I detail a broader curriculum guideline. This is intended to be high-level, and not just a list of courses to take or books to read.
A Curriculum Guideline
Programming is a fundamental skill of data scientists. Get comfortable with the syntax of Python. Understand how to run a python program in many different ways. (Jupyter notebook vs. command line vs IDE)
Hint: Keep an ear out for common problem-solving techniques used by programmers. (pronounced “algorithms”)
Statistics & Linear Algebra
A prerequisite for machine learning and data analysis. If you already have a solid understanding spend a week or two brushing up on key concepts.
Focus especially hard on descriptive statistics. Being able to understand a data set is a skill worth its weight in gold.
Numpy, Pandas, & Matplotlib
Learn how to load, manipulate, and visualize data. Mastery of these libraries will be crucial to your personal projects.
Quick hint: Don’t feel like you have to memorize every method or function name, that comes with practice. If you forget, Google it.
Remember, the only way you will learn these libraries is by using them!
Learn the theory and application of machine learning algorithms. Then apply the concepts you learn to real-world data that you care about.
Most beginners start by working with toy data-sets from the UCI ML Repository. Play around with the data and go through guided ML tutorials.
The Scikit-learn documentation has excellent tutorials on the application of common algorithms. I also found this podcast to be a great (and free) educational resource behind the theory of ML. You can listen to it on your commute or while working out.
Getting a job means being able to take real-world data and turn it into action.
To do this you will need to learn how to use a business’ computational resources to get, transform, and process data.
This is the most under-taught part of the data science curriculum. Mainly because the specific tools you use depend on the industry you are going in to.
However, database manipulation is a required skill set. You can learn how to manipulate databases with code on ModeAnalytics or Codecademy. You can also implement your own database (cheaply) on DigitalOcean.
When considering what other technologies to learn, it is important to think about your interests and passions. For example, if you are interested in web development, then look into the tools used by companies in that industry.
Advice for executing your curriculum.
1. Concepts will come at you faster than you can learn them.
There are literally thousands of web pages and forums explaining the use of common data science tools. Because of this, it is very easy to get side-tracked while learning online.
When you start researching a topic you need to hold your goal in mind. If you don’t, you risk getting caught up in whatever catchy link draws your eye.
The solution, get a good storage system to save interesting web-resources. This way you can save material for later, and focus on the topic that is relevant to you at the moment.
My current Chrome Bookmarks Bar
If you do this right, you can make an ordered learning path that shows you what you should be focused on. You will also learn faster and avoid being distracted.
Warning, your reading list will quickly grow into the hundreds as you explore new topics that interest you. Don’t worry, this leads us to my second piece of advice.
2. Don’t stress. Its a marathon, not a sprint.
Having a self-driven education can often feel like trying to read a never-ending library of knowledge.
If you’re going to be successful in data science you need to think of your education as a lifelong process.
Just remember, the process of learning is its own reward.
Throughout your educational journey, you will explore your interests and discover more about what drives you. The more you learn about yourself, the more enjoyment you will get out of learning.
3. Learn -> Apply -> Repeat
Don’t settle for just learning a concept and then moving to the next thing. The process of learning doesn’t stop until you can apply a concept to the real world.
Not every concept needs to have a dedicated project in your portfolio. But it is important to stay grounded and remember that you are learning so you can make an impact in the world.
4. Build a portfolio, it shows others they can trust you.
When it comes down to it, skepticism is one of the biggest adversities you will face when learning data science.
This may come from others, or it may come from yourself.
Your portfolio is your way of showing the world that you are capable and confident in your own skills.
Because of this, building a portfolio is the single most important thing you can do while studying data science. A good portfolio can land you a job and make you a more confident data scientist.
Fill your portfolio with projects that you are proud of.
Did you build your own web app from scratch? Did you make your own IMDB database? Have you written an interesting data analysis of healthcare data?
Put it in your portfolio.
Just make sure write-ups are readable, the code is well documented, and the portfolio itself looks good.
This is my portfolio. A simpler method to publish your portfolio is to create a GitHub repository that includes a great ReadMe (summary page) as well as relevant project files.
5. Data Science + _______ = A Passionate Career
Fill in the blank.
Data science is a set of tools intended to make a change in the world. Some data scientists build computer vision systems to diagnose medical images, others traverse billions of data entries to find patterns in website user preferences.
The applications of data science are endless, that’s why it is important to find what applications excite you.
If you find topics that you are passionate about, you will be more willing to put in the work to make a great project. This leads to my favorite piece of advice in this article.
When you are learning, keep your eyes open for projects or ideas that excite you.
Once you have spent time learning, try to connect the dots. Find similarities between projects that fascinate you. Then spend some time researching industries that work on those types of projects.
Once you find an industry that you are passionate about, make it your goal to acquire the skills and technical expertise needed in that business.
If you can do this, you will be primed to turn your hard work and dedication for learning into a passionate and successful career.
If you love making discoveries about the world. If you are fascinated by artificial intelligence. Then you can break into the data science industry no matter what your situation is.
It won’t be easy.
To motivate your own education you will need perseverance and discipline. But if you are the type of person who can push yourself to improve, you are more than capable of mastering these skills on your own.
After all, that’s what being a data scientist is all about. Being curious, self-driven, and passionate about finding answers.
Original article written by Harrison Jansma on Medium, follow him at Towards Data Science. Thanks again for letting us post this, Harrison!