We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results
Remote

Lustre QA Architect

DataDirect Networks
United States
Oct 03, 2025

Lustre QA Architect
Job Locations

US-Remote




Job ID
2025-5426


Name Linked

Remote: US


Country

United States


City

Remote

Worker Type
Regular Full-Time Employee



Overview

This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing.

"DDN's A3I solutions are transforming the landscape of AI infrastructure." - IDC

"The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments" - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIA

DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence.

Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management.

Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage.



Job Description

We are seeking a Lustre QA Architect with deep experience in Distributed Storage and High-Performance Computing (HPC) to lead our system validation and test architecture efforts. In this role, you will be responsible for defining QA strategies that ensure the correctness, performance, and resilience of storage solutions designed for petabyte-scale data and HPC environments. You'll work closely with engineering, system architects, and performance teams to validate functionality and optimize reliability in massively parallel workloads.

Key Responsibilities:

    Validation of distributed storage systems (e.g., Lustre, GPFS/Spectrum Scale, BeeGFS, GlusterFS).
  • Architect scalable test frameworks and automation pipelines to validate storage performance, throughput, IO behavior, and system reliability at scale.
  • Design test plans that cover key areas such as metadata operations, object lifecycle, parallel IO, file system consistency, and failure scenarios.
  • Lead performance benchmarking using industry-standard tools and custom workloads (e.g., IOR, MDTest, FIO, Vdbench).
  • Validate integration with HPC compute clusters, schedulers (e.g.,Lustre), and storage tiers (e.g., NVMe, SSD, HDD).
  • Simulate large-scale distributed environments and execute fault-injection and resilience testing.
  • Collaborate with product managers, architects, and DevOps teams to ensure test coverage across CI/CD pipelines and production-like environments.
  • Mentor QA engineers in automation development, performance validation, and HPC-specific debugging techniques.
  • Analyze test data, identify trends, bottlenecks, or regressions, and communicate findings clearly to engineering stakeholders.
  • Design and implement automated test cases using BDD frameworks such as Cucumber, Gherkin, or similar.
  • Develop test automation scripts and test utilities in Rust.

Required Skills and Qualifications:

  • Bachelor's or Master's degree in Computer Science, Electrical Engineering, or related field.
  • 8+ years of experience in software QA or systems testing, with 3+ years in a QA lead or technical lead role.
  • Experience with HPC workloads and environments, including MPI, high-throughput clusters, InfiniBand, and RDMA.
  • Strong understanding of POSIX file systems, object storage interfaces (e.g., S3), and parallel file systems.
  • Proficiency in automation and scripting (Python, Bash, Rust).
  • Hands-on experience with storage benchmarking and profiling tools: IOR, MDTest, FIO, Vdbench, Perf, iostat, collectl.
  • Familiarity with CI/CD tools and infrastructure-as-code (e.g., Jenkins, GitLab CI, Ansible, Terraform).
  • Solid understanding of system-level debugging and analysis tools
  • Strong communication skills and ability to lead cross-functional quality initiatives.
  • Cross-collaboration with Dev teams to understand specifications

Preferred Qualifications:

  • Experience working with large-scale HPC clusters or supercomputing environments.
  • Exposure to data-intensive applications like AI/ML pipelines, genomics, scientific simulations, or real-time analytics.
  • Familiarity with Kubernetes, container storage interfaces (CSI), and containerized HPC workflows .
  • Experience with hardware validation: NVMe, SSD, HDD tiering, network fabric performance tuning.
  • Certifications in storage (e.g., SNIA) or HPC systems.
  • Knowledge of Cloud Solutions (GCP,AWS)

This position requires participation in an on-call rotation to provide after-hours support as needed.



DDN

Join our dynamic and driven team, where engineering excellence is at the heart of everything we do. We seek individuals who love to challenge themselves and are fueled by curiosity. Here, you'll have the opportunity to work across various areas of the company, thanks to our flat organizational structure that encourages hands-on involvement and direct contributions to our mission. Leadership is earned by those who take initiative and consistently deliver outstanding results, both in their work ethic and deliverables, making strong prioritization skills essential. Additionally, we value strong communication skills in all our engineers and researchers, as they are crucial for the success of our teams and the company as a whole.

Interview Process: After submitting your application, one of our recruiters will review your resume. If your application passes this stage, you will be invited to a 30-minute interview during which a member of our team will ask some basic questions. If you clear the interview, you will enter the main process, which can consist of up to four interviews in total:

  • Coding assessment: Often in a language of your choice.
  • Systems design: Translate high-level requirements into a scalable, fault-tolerant service (depending on role).
  • Real-time problem-solving: Demonstrate practical skills in a live problem-solving session.
  • Meet and greet with the wider team.
  • Our goal is to finish the main process in 2-3 weeks at most.

DataDirect Networks (DDN) is an Equal Opportunity/Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity, gender expression, transgender, sex stereotyping, sexual orientation, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.

#LI-Remote

Applied = 0

(web-675dddd98f-rz56g)