Platform Engineer
Information Technology | Hybrid in Dallas, TX | Contract
At Radiant Digital, we provide IT solutions and consulting services to help government agencies and businesses in the USA, Canada, the Middle East, and Southeast Asia. On the federal side, we support agencies like NASA, the Department of State (DOS), the IRS, ACL, ACF,USDA and many others, along with numerous state and local government agencies.
We work with industries like telecom, healthcare, entertainment, oil and gas offering solutions designed to meet their specific needs. We focus on improving systems, making better use of data, and updating applications to keep up with changing markets.
Position: Compute Platform Engineer
Duration: 12 Months plus
Location: Hybrid - Dallas, TX
Job Description:
We are seeking a highly skilled and motivated Engineer to join our Compute Platform Management team. In this role, you will take ownership of the reliability and operational excellence of our high-performance computing infrastructure, which underpins our firm’s research and production workloads.
As a Compute Platform Engineer, you will be responsible for identifying and resolving hardware issues, coordinating with vendors and ensuring compute nodes (CPU and GPU) maintain peak performance. This contract role is ideal for someone who thrives in technically demanding environments and is eager to contribute to the continuous evolution of our compute platform.
The ideal candidate will have the following skills and experience:
· 3+ years of hands-on experience in a data center environment supporting large-scale compute platforms
· Proficiency with HPE server infrastructure, such as ProLiant and Apollo, and NVIDIA GPUs, including A100 and H200
· Solid understanding of server architecture, including UEFI/BIOS, PCIe devices and out-of-band management systems, such as iLO and BMC)
· Proven ability to resolve complex hardware issues and manage vendor relationships
· Familiarity with automation tools such as Ansible, Terraform and CI/CD systems
· Working knowledge of Linux in high-performance or latency-sensitive environments
· Working knowledge of basic network concepts, such as DNS, DHCP, VLANs, switching and routing
· Basic working knowledge of Kubernetes and Openstack technologies (preferred but not required)
· Experience with data center operations and process adherence
· Excellent communication and coordination skills with cross-functional teams and external partners