Picture by storyset on Freepik
It is a good time to interrupt into knowledge engineering. So the place do you begin?
Studying knowledge engineering can typically really feel overwhelming due to the variety of instruments that you have to know, to not point out the tremendous intimidating job descriptions!
So if you’re searching for a beginner-friendly introduction to knowledge engineering, this free Information Engineering Course for Newcomers, taught by Justin Chau, a developer advocate at Airbyte is an efficient place to begin.
In about three hours you’ll be taught important knowledge engineering abilities: Docker, SQL, analytics engineering, and extra. So if you wish to discover knowledge engineering and see whether it is for you, this course is a superb introduction. Now let’s go over what the course covers.
Hyperlink to the course: Information Engineering Course for Newcomers
This course begins out with an intro on why you need to contemplate turning into a knowledge engineer within the first place. Which I feel is tremendous useful to grasp earlier than diving proper into the technical matters.
The teacher, Justin Chau, talks about:
- The necessity for good high quality knowledge and knowledge infrastructure in guaranteeing the success of huge knowledge initiatives
- How knowledge engineering roles are rising in demand and pay properly
- The enterprise worth you may add to the group working as a knowledge engineer facilitating the group’s knowledge infrastructure
If you’re studying knowledge engineering, Docker is likely one of the first instruments you may add to your toolbox. Docker is a well-liked containerization software that allows you to package deal functions—with dependencies and config—in a single artifact known as the picture. This fashion Docker enables you to create a constant and reproducible atmosphere to run your whole functions inside a container.
The Docker module of this course begins with the fundamentals like:
- Docker photographs
- Docker containers
The teacher then goes over to cowl find out how to containerize an software with Docker: working by means of the creation of Dockerfile and the instructions to get your container up and working. This part additionally covers persistent volumes, Docker networking fundamentals, and utilizing Docker-Compose to handle a number of containers.
Total this module in itself is an efficient crash course on Docker for those who’re new to containerization!
Within the subsequent module on SQL, you’ll learn to run Postgres in Docker containers after which be taught the fundamentals of SQL by making a pattern Postgres database and performing the next operations:
- CRUD operations
- Mixture features
- Utilizing aliases
- Union and union all
With Docker and SQL foundations, now you can be taught to construct a knowledge pipeline from scratch. You’ll begin by constructing a easy ELT pipeline that you simply’ll get to enhance all through the remainder of the course.
Additionally, you’ll see how all of the SQL, Docker networking, and Docker-compose ideas that you’ve discovered to this point come collectively in constructing this pipeline that runs Postgres in Docker for each the supply and vacation spot.
The course then proceeds to the analytics engineering half the place you’ll study dbt (knowledge construct software) to prepare your SQL queries as customized knowledge transformation fashions.
The teacher works you thru getting began with dbt: putting in the required adapter and dbt-core and establishing the challenge. This module particularly focuses on working with dbt fashions, macros, and jinjas. You may learn to:
- Outline customized dbt fashions and run them on high of the information within the vacation spot database
- Manage SQL queries as dbt macros for reusability
- Use dbt jinjas so as to add management constructions to SQL queries
Thus far, you’ve constructed an ELT pipeline that runs upon handbook triggering. However you actually want some automation, and the only approach to do that is to outline a cron job that robotically runs at a particular time of the day.
So this tremendous brief part covers cron jobs. However knowledge orchestration instruments like Airflow (which you’ll be taught within the subsequent module) provide you with extra granularity over the pipeline.
To orchestrate knowledge pipelines, you’ll use open-source instruments corresponding to Airflow, Prefect, Dagster, and the like. On this part you’ll learn to use the open-source orchestration software Airflow.
This part is extra intensive as in comparison with the earlier sections as a result of it covers every thing you have to know to stand up to hurry to put in writing Airflow DAGs for the present challenge.
You’ll learn to arrange the Airflow webserver and the scheduler to schedule jobs. Then you definately’ll study Airflow operators: Python and Bash operators. Lastly, you’ll outline the duties that go into the DAGs for the instance at hand.
Within the final module, you’ll study Airbyte, an open-source knowledge integration/motion platform that allows you to join extra knowledge sources and locations with ease.
You’ll learn to arrange your atmosphere and see how one can simplify the ELT course of utilizing Airbyte. To take action, you’ll modify the prevailing challenge’s parts: ELT script and DAGs to combine Airbyte into the workflow.
I hope you discovered this evaluate of the free knowledge engineering course useful. I loved the course—particularly the hands-on method to constructing and incrementally bettering a knowledge pipeline—as a substitute of specializing in solely principle. The code can also be accessible so that you can observe alongside. So, blissful knowledge engineering!
Bala Priya C is a developer and technical author from India. She likes working on the intersection of math, programming, knowledge science, and content material creation. Her areas of curiosity and experience embody DevOps, knowledge science, and pure language processing. She enjoys studying, writing, coding, and occasional! At the moment, she’s engaged on studying and sharing her data with the developer group by authoring tutorials, how-to guides, opinion items, and extra.