Site Reliability Engineer

 

Recruiter:

Rosstone Consulting

Job Ref:

ROS00545

Date posted:

Thursday, August 19, 2021

Location:

, South Africa

Salary:

Negotiable


SUMMARY:
The Role:

Our innovative client is looking for  Site Reliability Engineer to join the team

12 Month Contract - Remote until further notice

Skills and Experience:

Qualification:

  • Relevant Tertiary qualification (Bachelors Degree in IT or Engineering)

Skills and...

POSITION INFO:
The Role:

Our innovative client is looking for  Site Reliability Engineer to join the team

12 Month Contract - Remote until further notice

Skills and Experience:

Qualification:

  • Relevant Tertiary qualification (Bachelors Degree in IT or Engineering)

Skills and experience:

  • 5 or more yearsâ?? experience in a Software Engineering, DevOps Engineer, SRE or Architecture role
  • APM and Infrastructure Monitoring Tool Experience (Prometheus, DynaTrace and Cloudwatch beneficial)
  • Knowledge of Architecture Frameworks, Tools and Standards
  • Experience in Application Performance Monitoring, JVM profiling and Prometheus
  • Extensive experience managing complex and high volume applications
  • Experience optimizing database, infrastructure and application configurations
  • Experience supporting microservices based applications on a Kubernetes platform
  • Experience with AWS technologies and event driven architecture
  • Experience with event driven orchestration
Key Accountabilities:

As Site Reliability Engineer, you would be responsible for driving initiatives proactively leading to high service and platform availability, improved performance and customer experience, enhancing and optimizing monitoring coverage and working with cross-functional teams to proactively build and maintain more reliable services and platforms.

  • Design and implement an observability framework across infrastructure, application and services deployed that can be centrally configuration managed and be deployed to all environments
  • Instrumenting specific java methods or querying values stored in Java Objects for test validation or specific Business metrics
  • Work with cross-functional teams to identify, evaluate and establish initiatives for improvement to services or processes with the purpose of increased availability, improved service levels, reduced costs, and improved customer satisfaction by reducing the number of operational problems.
  • Participate in various cross-functional forums and lead work streams to contribute to the improvement and implementation of policies, frameworks and standards.
  • Responsible for driving initiatives regarding software automation and reliability.
  • Develop and optimize monitoring framework based on industry best practices by developing metrics (SLI`s), monitoring, and alerting (SLO`s) to observe the health of the production system.
  • Maximize value of tooling and leverage metrics for data driven insights and problem management
  • Gather and analyse metrics from various monitoring tools covering but not limited to operating systems, infrastructure and applications to assist in performance tuning and fault finding.
  • Facilitate discussions (including technical discussions) to establish root causes and solutions to any infrastructure, application or process related issues.
  • Conduct research to establish more efficient ways of performing day to day activities using new technologies or frameworks and identify opportunities for automation.
  • Conduct trend analysis of data, both systematically and manually to determine common occurrences and recurring issues to feed into the Problem Management processes.
  • Perform impact assessments to determine priority of a problem relative to other problems and business activities.
  • Clear, concise and timely communication with emphasis on expressing technical issues in a non-technical manner to clients and executives.
  • Drive infrastructure and application performance and availability initiatives.
  • Produce and present regular reports on availability, capacity and service performance.
  • Participation and facilitation of Incident Post-mortems
  • Build and integrate metric collectors such as Prometheus for required metrics
  • Drive the improvement of availability and performance using quality gates of the build pipeline
  • Liaise with application development teams to improve services and customer experience
Personality and Attributes:
  • Statistical analysis and reporting
  • Problem solving
  • Root Cause Analysis
  • Business writing (reports) and presentation
  • Tenacity
  • Stress Management
  • Persuasion
  • Coaching
  • Client orientation


 

NB! This job is now closed. You can apply for other jobs by uploading your CV.



 

 

 

Similar jobs you might be interested in:

Senior Site Reliability Engineer
Location: Blank
Salary: Negotiable
Join a dynamic engineering team driving innovation. As a Senior site reliability engineer, you'll work alongside cross-functional teams to build and scale a mission-critical internal data platform. Your expertise in cloud automation, infrastructure as code, and DevOps tooling will help enable robust, secure, and high-performing services within a...
6 days ago


AWS Site Reliability Engineer (SRE) – Cape Town
Location: Cape Town
Salary:
Build resilient systems. Automate. Scale with confidence.We’re looking for an AWS site reliability engineer who thrives at the intersection of cloud, code, and automation. You’ll design, build, and operate secure, high-performance AWS environments, ensuring our platforms stay fast, reliable, and cost-efficient, even under pressure.
9 days ago


Site Reliability Engineer (Datadog)
Location: Johannesburg
Salary:
Are you a site reliability engineer with solid Datadog experience? Our client in the Warehousing and Logistics sector is looking to employ an engineer to Support the design, implementation, and optimization of Datadog monitoring solutions across infrastructure, applications and services.
30 days ago


IT Infrastructure Engineer - Remote
Location: Remote
Salary: Negotiable depending on experience
A global telecom software company is seeking an experienced IT Infrastructure & Deployment engineer to take ownership of internal IT systems, customer deployments, and cybersecurity operations.
1 day ago


Data Engineer
Location: Sandton
Salary: Market Related
Data engineer
8 days ago


Data Engineer
Location: Sandton
Salary: Market Related
Data engineer
8 days ago


DevOps Engineer (Mid-Level)
Location: Capetown
Salary: Market-related
Help shape scalable cloud infrastructure and innovation for social impact
9 days ago


Snr IT Networking Engineer
Location: Durban
Salary: Market Related and Negotiable
Snr engineer - Networking
13 days ago


Senior DevOps Engineer (remote)
Location: Johannesburg
Salary: Market related
Senior DevOps engineer (remote)
16 days ago


QA Engineer
Location: Cape Town
Salary:
Location: Cape Town / Northern SuburbsIndustry: Manufacturing / engineeringType: Full-time | On-siteAbout the CompanyOur client is a well-established manufacturing company known for delivering high-performance, precision-engineered products across Southern Africa. They are seeking a passionate QA engineer to join their team — someone driven by continuous improvement, product reliability, and tec...
17 days ago


Create a free job alert for Site Reliability Engineer in

Enter your email address below and we will email you similar jobs when they become available:

You can cancel at any time. We will not spam you.
By giving us your email address your agree to our Terms and Conditions