The Stanford Center for Genomics and Personalized Medicine (SCGPM) has an exciting opportunity available for a motivated Biomedical Cloud Engineer to create innovative data architectures that will automate the process of turning big genomic data into biomedical insights. The ideal person for this position is a keen listener who can interpret biological questions, assess the value and relevance of different technologies and methods, and deliver actionable technical solutions.
Background:
The Department of Veterans Affairs (VA) has commissioned the sequencing of hundreds of thousands of whole genomes from participants in the Million Veteran Program (MVP) [https://www.mvp.va.gov/]. This data is currently being delivered to the SCGPM’s cloud computing environment and constitutes one of the largest repositories of whole-genome sequencing data in the world. The scale and richness of this data make it an incredible resource for biomedical research. Our goal is to turn this data lake into a data commons: a dynamic computing environment where researchers bring questions and get answers, all without having to go through the ordeal of manually collecting, cleaning, massaging, scrubbing, sorting, transforming, and filtering data.
As an example of a publication from this group, see this reference describing the early design of our data processing system:
Ross, P.B., Song, J., Tsao, P.S. et al. Trellis for efficient data and task management in the VA Million Veteran Program. Scientific Reports 11, 23229 (2021). https://doi.org/10.1038/s41598-021-02569-5
Position:
In this position, you would be the system developer of the cloud-based MVP data management system that we have created called Trellis. Trellis stores the petabytes of sequence data contributed to the MVP by veterans and orchestrates its processing while keeping track of what programs were used, maintaining a detailed record of data provenance.
To manage the enormous volumes of biomedical research data that the MVP generates, we have built Trellis in the Google Cloud Platform. The Trellis architecture takes advantage of many serverless cloud services, such as Cloud Functions, Dataproc, Cloud SQL, and Pub/Sub, to make a workflow which responds to the arrival of new data by initiating pipeline processes automatically and at scale.
A production version of Trellis has already processed the whole genomic sequences of over 150,000 veterans and we plan to process at least as many more in the coming year. You would take the lead in keeping this production system running and optimized, and you would interface with our SecOps team which maintains that system in a FedRAMP-secure environment.
Our Team:
Our SCGPM bioinformatics team is a multi-disciplinary group composed of about a dozen scientists, engineers, and software developers with complementary backgrounds, each contributing their own expertise in managing and analyzing complex biomedical data [http://med.stanford.edu/gbsc/scgpm-team.html]. Other projects supported by this team include the NCI Human Tumor Atlas Network, Human BioMolecular Atlas Program, and the Stanford Metabolic Health Center.
This position can be on-site in Palo Alto, fully remote, or hybrid.
Duties include:
* - Other duties may also be assigned.
DESIRED QUALIFICATIONS:
Four-year degree in Genetics, Computer Science, Bioinformatics, Computational Physics, or a related fieldExperience with biomedical data formats (FASTQ, FASTA, BAM, CRAM, Hail MatrixTable, et al.)Comfortable in programming with PythonExperience with cloud computing, especially Google CloudExperience with databases, especially graph databasesExperience with big data technologies (e.g., BigQuery, Spark, Hail, Terra)Familiarity with issues in computer data securityFamiliarity with FedRAMP cloud securityFamiliarity with FAIR principles of data managementExcellent verbal and written communication skillsAn ability to independently grasp the objectives of research projects and assemble solutions from a range of technologies, standards, and approachesA desire to learn new methods and technologies and to adapt to demands of fast-paced researchEDUCATION & EXPERIENCE (REQUIRED):
Bachelor's degree and five years of relevant experience, or a combination of education and relevant experience.
KNOWLEDGE, SKILLS AND ABILITIES (REQUIRED):
CERTIFICATIONS & LICENSES:
None
PHYSICAL REQUIREMENTS*:
Constantly perform desk-based computer tasks.Frequently sit, grasp lightly/fine manipulation.Occasionally stand/walk, writing by hand.Rarely use a telephone, lift/carry/push/pull objects that weigh up to 10 pounds.* - Consistent with its obligations under the law, the University will provide reasonable accommodation to any employee with a disability who requires accommodation to perform the essential functions of his or her job.
WORKING CONDITIONS:
May work extended hours, evening and weekends.
WORK STANDARDS (from JDL):
The job duties listed are typical examples of work performed by positions in this job classification and are not designed to contain or be interpreted as a comprehensive inventory of all duties, tasks, and responsibilities. Specific duties and responsibilities may vary depending on department or program needs without changing the general nature and scope of the job or level of responsibility. Employees may also perform other duties as assigned.
Consistent with its obligations under the law, the University will provide reasonable accommodations to applicants and employees with disabilities. Applicants requiring a reasonable accommodation for any part of the application or hiring process should contact Stanford University Human Resources at stanfordelr@stanford.edu. For all other inquiries, please submit a contact form.
Stanford is an equal employment opportunity and affirmative action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic protected by law.