Find Your Next Career
View all jobs

Site Reliability Engineer

Bangalore, Karnataka, India


Job Title:

Site Reliability Engineer

Role Overview:

We are hiring for Site Reliability Engineer who will improve and maintain software development, test and live infrastructure and services. You will articulate and have experience with Linux and other *NIX- derivatives. Your primary mission as an SRE engineer is working with the development, technical operations, quality assurance, and product management teams, to ensure the uptime and performance of McAfee Enterprise Cloud Security Solution.

This position is an integral part of the McAfee Enterprise business segment which was acquired by Symphony Technology Group (STG) in July 2021. McAfee Enterprise and its team members remain committed to keeping governments and enterprises safe. This position is dedicated to and part of the McAfee Enterprise business.

About the role :

Perform Incident Management and Change Management to maintain the continuous availability of all Cloud Infrastructure services
Ensure all SRE and operating procedures are maintained and executed.
Maintain a 24×7 production environment with a high level of service availability and Perform quality reviews, manage operational issues.
Explore and innovate new cloud technologies, features, and tools to improve the platform and automate using Bash, Python or Perl, etc…
Implement automation and orchestration for manual processes required to operate and deploy cloud services, be at the heart of developing new ideas into internal tools by working closely with teams.
Analyze alarms and dashboards to identify problem areas, report incidents, troubleshoot, and escalate as required.
Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding.
Perform ticket review and updates through the JIRA ticketing tool.
Manage and Maintain Runbooks / Standard Operating procedures
Manage, coordinate, and document all types of maintenances / outage events.
Must take initiative and be proactive.
Must take on the responsibility to learn new products and procedures.
Implementation of proactive monitoring, alerting, trend analysis, and self-healing systems.
Understand the existing architecture and work with various Engineering teams to develop and execute strategies to provide a high-quality Global production service.
You are responsible to debug and identify the cause of the problem/outage.
You will work flexible to work in a 24X7 environment (rotational shifts).

About you :

You will have 8+ years of production applications and systems support
System admin experience on Linux environments.
Ability to understand networking and its components
Good experience with Public Cloud Technology AWS
Experience with identifying the thresholds and monitoring setup for infra and application
Experience with Grafana, ELK, Cloud watch, OpsGenie, Pager duty, etc.
Strong communication and analytical/problem-solving skills.
Network knowledge (TCP/IP, UDP, DNS, Load balancing) and prior network administration experience is a big plus•
Experience in writing Root Cause Analysis documents
Experience with source control tools such as Github, SVN, or Perforce
Systematic approach and to drive problems to resolution
Experience configuring and managing web servers (Apache, Tomcat, Nginx)
Experience with deugging production issues at network level
Working experience with Containers and Kubernetes is added plus
Ability to script/program with one or more high level languages, such as Python, Go, etc…
Good to have experience/knowledge of GCP, Azure.
Experience with deployment tools Jenkins, Team city, Harness ,etc.
Experience with any configuration management tools like Salt, Puppet, Ansible,etc.
Experience in Security domain will be added advantage
Experience with continuous integration and deployment automation tools such as Jenkins, Harness, AWS CloudFormation, Salt, or Puppet, Chef, Ansible• Experience with SQL (MySQL) NoSQL databases (Redis, CouchBase, Cassandra, Crate)
Experience with open-source technologies (Kafka, Memcached, Redis, Hadoop, HBase, Zookeeper, Oozie)

Company Benefits and Perks:

We work hard to embrace diversity and inclusion and encourage everyone to bring their authentic selves to work every day. We offer a variety of social programs, flexible work hours and family-friendly benefits to all of our employees.

  • Pension and Retirement Plans
  • Medical, Dental and Vision Coverage
  • Paid Time Off
  • Paid Parental Leave
  • Support for Community Involvement

We're serious about our commitment to diversity which is why we prohibit discrimination based on race, color, religion, gender, national origin, age, disability, veteran status, marital status, pregnancy, gender expression or identity, sexual orientation or any other legally protected status.


Share This Job