The Institute for Health Metrics and Evaluation (IHME) is an independent global health research center at the University of Washington that provides rigorous and comparable measurement of the world’s most important health problems and evaluates the strategies used to address them. IHME makes this information freely available so that policymakers have the evidence they need to make informed decisions about how to allocate resources to best improve population health. IHME’s system administration team is responsible for ongoing development, maintenance, and support of the infrastructure and applications which support and further IHME’s mission.
The IHME has an outstanding opportunity for a System Administrator on the IT team. We are looking for someone with a customer service mindset, strong interpersonal skills, team player, results-oriented attitude, commitment, adaptability, and flexibility. You’ll be working with a team of system administrators and engineers to build and support a growing infrastructure to support an active research environment. These systems are vital to the success of the Institute and its mission, to improve the health of the world’s populations by providing the best information on population health. You’ll be working with professors, researchers, and other technical staff to develop what the Institute needs to build better metrics and ultimately improve global health.
This position’s primary role is to support ongoing operations of complex systems, including maintaining datacenter infrastructure, operating systems (primarily CentOS, Ubuntu, and Windows), high-performance computing clusters, and systems applications. This position is contingent upon project funding availability.
Build, Configure, and Maintain Systems
- Build physical and virtual systems using standard build processes.
- Assist in requirements gathering, development, and testing of new systems or system upgrades.
- Configure systems to adhere to enterprise, departmental, and team standards.
- Create new environments as assigned.
- Ensure high availability of systems by developing and using tools to identify problems and opportunities for improvement.
- Regularly collect and trend performance data. Analyze and configure systems for optimal performance.
- Keep systems up-to-date with appropriate patches, firmware, and other fixes. Schedule necessary maintenance for systems, plan for downtime, and notify affected users if applicable.
- Maintain appropriate security access and protection, in compliance with existing security policies and best practices.
- Refresh/recreate test, development, and sandbox environments as required.
- Schedule and plan system upgrades as needed. Communicate outages to affected users as required.
- Manage backup and disaster recovery systems.
- Ensure compliance of system security patches.
- Manage infrastructure software such as Active Directory, Group Policy, DNS, and VPN.
- Implement and manage automated client operating system deployments.
- Implement and manage configuration management.
- Assist cluster users with cluster usage best practices, troubleshooting, and diagnosis.
- Follow IT Services standards for making changes in production environments. This includes, but is not limited to, conducting a production readiness review, following change control processes, participating in a security review, and communicating with stakeholders and affected users.
Hardware Installation and Maintenance
- Rack, cable, inventory, and configure new datacenter equipment.
- Surplus legacy equipment.
- Work with vendors to diagnose and resolve hardware issues.
- Monitor and react to operational failures of datacenter hardware.
Develop and Maintain System Documentation
- Create and update system architecture diagram(s).
- Create and update standard operational procedures.
- Document system configuration.
- Create and manage backup policies.
- Update system inventory after any post-deployment changes. Maintain up-to-date configuration documents.
- Produce operational documentation for technical staff.
- Place documentation and code in appropriate repositories. Communicate updates to team.
- Update documented issues and incidents regularly.
- Keep team informed about pertinent operational processes.
- Assist with standards development and quality improvement.
- Assist with administrative duties such as attending conference calls, equipment tagging and surplus, and storage room cleanup, as needed. Assist helpdesk with escalations of issues as needed.
- Communicate clearly and effectively while contributing as a productive member of the IT team and the Institute as a whole. Work closely with other team members to help them with relevant tasks, show them how to learn new skills, and help resolve emerging problems on different projects.
- Attend relevant meetings, adhere to deadlines, and participate as a vital member to collectively advance team-level objectives.
- Perform all required compliance with university and state policies on a timely basis.
- Other duties as assigned
- Bachelor’s degree in Computer Science, Management Information Systems, or a related field plus 2 years’ experience managing enterprise Linux, VMWare, and Windows server and client platforms or equivalent combination of education and experience
- Experience in security system design and engineering
- Strong interpersonal and communication skills, including team ethic and relationship building
- Excellent written and verbal communication skills, exceptional decision-making skills
- Self-starter and demonstrated ability to learn/adapt new methods to support this role
- Experience in security system design and engineering
- Ability to deliver quality systems under constant deadline pressures
- Ability to program/script proficiently in a common operating system scripting language
- This position has a physical component. Candidates should be able to lift up to 50 lbs and may need to work in cramped spaces or elevated locations.
Equivalent education/experience will substitute for all minimum qualifications except when there are legal requirements, such as a license/certification/registration.
- Demonstrated track record of innovative solutions deploying and supporting IT initiatives
- Interest in the promotion of global health
- Familiarity with supporting .Net, Java, Bash, Perl scripting, Python, PHP, reusable code
- Understanding of security and systems best practices
- Ability to be well organized and detail-oriented
- Interest and aptitude in information technology
Conditions of Employment:
- Expected to participate in 24x7 on-call rotation response to production system issues with other IHME-IT team members.