Near Saint Paul, MN
Created 1mo ago
About the Data Platform Team
The Data Validation team at Syapse will build the framework and quality measures that will feed the support of data quality activities and initiatives company-wide. This team will provide the quality data that empowers life science companies & Hospitals to make strategic decisions.
We’re making meaningful progress with Precision Medicine (and were awarded a 2018 Most Innovative Company award for our efforts) and welcome talented mission driven engineers to join our cause.
About the role
We are looking for a data validation engineer who enjoys solving complex data validation and quality problems. You will have a unique opportunity to help build new data validation infrastructure and processes that enable us to deliver insights that drive precision medicine forward. Your work will enable us to build and deliver innovative, high quality analytics pipelines, data visualizations, and other services within the platform.
What does the Data Validation Engineer do?
- Implement the Data validation framework and strategy for the Data Platform pipeline.
- Develop and maintain data pipeline automation frameworks that are repeatable regression validation tests on data quality.
- Work closely with engineering and data science ongoing monitoring of quality of data throughout the entire data pipeline.
- Work with data scientists, development, product management, data collection teams, and customer support teams to gather and verify requirements for the validation strategy.
- Ability to code, triage, and evaluate any data quality degradation or unexpected behaviors in the data pipeline.
- Document approaches for data validation, determine recommendations and maintain that documentation.
- Curate and manage extensive validation and training data sets.
What you bring to the table
- 2+ years of experience in building or working within a data validation ecosystem to identify and address data quality issues.
- Experience with ETL or Data platform based testing strategies.
- Experience with various data validation tools and frameworks, such as Informatica.
- Good understanding of database concepts, data modeling.
- Hands on experience with two or more of:
- NoSQL databases,
- Skills with Python or similar language.
- Experience with AWS-based services and applications.
- You enjoy collaborating across teams and work well in a cross-functional environment.
- You thrive working in a fast-paced, and dynamic environment.
Bonus points if you
- Understand clinical and/or molecular data, especially in the context of cancer.
- Have experience with Healthcare Interoperability Standards such as FHIR and other HL7 and EDI standards.
- Experience in validating, deploying and productionizing machine learning models
- Experience working in a regulated industry (e.g. finance, healthcare, etc…)
- Experience with deep learning, pattern recognition, data mining.
- Experience with CI tools like Jenkins, CircleCI etc.
- Experience with service-oriented architecture e.g. microservices
- Experience with containerization technologies like Docker/Kubernetes