Data Engineer

TS/SCI with poly required

The tech stack on this team is rather huge and includes Python (Pandas, numpy, scipy, scikit-learn, standard libraries, etc.), Python packages that wrap Machine Learning (packages for NLP, Object Detection, etc.), Linux, AWS/C2S, Apache NiFi, Spark, pySpark, Hadoop, Kafka, ElasticSearch, Solr, Kibana, neo4J, MariaDB, Postgres, Docker, Puppet, and many others.

Work on this program takes place in Chantilly, VA, McLean, VA and in various field offices throughout Northern VA (cannot support remote work) and requires a TS/SCI + Polygraph clearance.

THE ROLE

  • The Data Engineer supports enterprise Extract, Transform, and Load (ETL) activities to deliver data to both humans and systems. Move structured and unstructured data using approved methods. Execute data ingestion activities for storing data in a local or enterprise level location. Develop code to format data that supports exploration. Analyze source data formats and work with Data Scientists and partners to determine the formats and transforms that best meet mission objectives. Develop code and tools to provide one-time and on-going data extraction from various repositories, formatting and transformations into enterprise or standalone data models. Develop new ETL and perform O&M and enhancements on existing ETL code using best practices/standards. Develop and deliver documentation for each project including ETL mappings, code use guide, code location and access instructions.

Required Skills

  • Extract, Transform and Load (ETL) tools and processes
  • Python, Pyspark, Pytorch
  • AWS
  • SQL
  • APIs
  • Linux
  • Geospatial tools/data

Desired Skills (Optional)

  • Knowledge of agile methodologies/experience delivering on agile teams (Participates in scrum and PI Planning)
  • Docker, Jenkins, Hadoop/Spark, Kibana, Kafka, NiFi, ElasticSearch

Apply now

  • Accepted file types: pdf, docx, Max. file size: 2 GB.
  • This field is for validation purposes and should be left unchanged.