Near Alexandria, VA
Created 1mo ago
We are looking for an experienced Data Engineer with a preference for Python that will be responsible for managing ETL/ELT and performant data storage and retrieval. Our projects ingest data from multiple sources, wrangles the disparate data into a unified schema, and then provides a final database/cloud storage for reporting efforts by other groups. We are looking for candidates proficient related AWS services and DBs.
Your primary focus will be the development of server-side logic, data ingestion, data wrangling, and algorithm development. Major technologies involved include AWS, Python 3, Spark, Pandas, MySQL.
Skills And Responsibilities
- Development of new RDBMS schema to handle the addition of new datasets.
- Applicable AWS proficiency
- Comfortable with Containerization (Docker, Vagrant, etc)
- Ability to write intermediate to advanced SQL for data wrangling and reporting efforts.
- Development of Python/Pandas code to wrangle multiple datasets covering a full spectrum of ETL tasks including entity resolution.
- Occasional Linux server management including the review or management of log files, crontab, security configuration, etc.
- Familiarity with machine learning topics to support supervised and unsupervised classification efforts.
- Data exploration, analysis, and reporting skills with an eye towards developing a narrative using Jupyter Notebook.
- Working understanding of REST APIs.
- Developing techniques to work with both tabular and hierarchical data.
The ideal candidate:
- Motivated by a passion to create highly fault-tolerant apps with excellent design practices
- Enjoys collaborating with other engineers on architecture and sharing designs with the team
- Has experience collaborating with team members and communicating code patterns.
- Interacts with others using sound judgment, good humor, and consistent fairness in a fast-paced environment