Senior Site Reliability Engineer
Vancouver, BC, Canada
Tasktop is an energetic software company building software tools to improve large scale software development. Our culture is centered around a well-defined Agile process, focusing on high quality software engineering. We practice code reviewing, refactoring, automation, and continuous delivery, and we are constantly striving to improve how we work. Tasktop is also committed to your career, hosting quarterly code jams, weekly technology discussions, one-on-ones with your manager, and a well-defined set of career tracks.
Tasktop is looking for an experienced software engineer to fill a Senior Site Reliability Engineering (SRE) position that has experience operating and developing cloud offerings. If you are an engineer who is as comfortable developing automation and software as you are operating reliable services, then Tasktop is interested in hearing from you. In this role you will be helping us extend and expand the operations and reliability of our cloud offerings that will make a big impact in supporting the team to the next level.
Things you’ll be working on:
- Maintaining security, reliability, and availability of Tasktop’s cloud services
- Developing deployment, scaling, HA, DR, and backup automation
- Creating and refining monitoring and alerting infrastructure
- Refining playbooks for operations
- Participating in blameless post mortems and learning from all of our incidents
- Close cross functional work with Product Development teams to design and develop new product features that meet our operational profile
What will the team be like?
You will be working on a small but highly capable team to develop and operate the operational toolset for Tasktop’s cloud products.
Great candidates have:
- Experience in systems automation and tooling:
- Containerization technology such as Docker and Kubernetes
- Continuous integration/deployment tooling such as Jenkins
- DevOps automation tools such as Ansible, Chef, or Puppet
- Monitoring tools such as Datadog, Zabbix, and Prometheus
- Experience deploying and operating applications on cloud architectures (we use AWS)
- Substantial experience administering Linux systems
- Experience with software engineering best practices (e.g. testing, code reviews, CI/CD)
- Experience with scripting languages such as Python, Ruby, Perl, or Bash
- Experience with database deployment and operations (Postgres and/or AWS Aurora)
- Interest in learning new skills outside of your current core competencies
If you know some of the following, that’s even better:
- Knowledge of best practices in software design and Agile development process
- Experience with creating and extending Java software systems and products
- Experience operating and managing Kafka
Interested in joining the team?
Please send your resume to firstname.lastname@example.org with the subject heading “Senior Site Reliability Engineer". We thank all applicants for their interest, however only those candidates selected for an interview will be contacted.