.png)
UMBC Mic'd Up
UMBC Mic'd Up
Behind the Scenes of Systems Engineering Site Reliability & the Future of AI
Go behind the Scenes of Systems Engineering with Saurabh Phaltane, Senior Site Reliability Engineer (Google), in this episode of UMBC Mic'd Up.
Learn how Site Reliability Engineers (SREs) ensure billions of people experience fast, secure, and uninterrupted search results every day. Saurabh shares what it takes to keep critical infrastructure running, how AI is transforming systems engineering, and what skills aspiring professionals should build to thrive in this fast-evolving field.
The views and opinions expressed by Saurabh Phaltane are his own and do not represent those of Google.
Learn more about UMBC’s graduate programs in systems engineering: https://www.umbc.edu/se
Dennise Cardona 0:00
Hey, thanks for tuning into this episode of UMBC Mic'd Up podcast. My name is Dennise Cardona from the Office of Professional Programs here at UMBC, and I can't wait to dive into today's conversation with someone from Google. He works at Google, and he knows about systems engineering, and I just can't wait to get into this conversation. Welcome to the show. Saurabh, so nice to have you here with us.
Saurabh Phaltane 0:26
Thank you, Dennise, it's nice to be here and yeah, looking forward for this conversation,
Dennise Cardona 0:31
Yeah, well, before we dive into the nitty gritty of the conversation, can you tell us a little bit about yourself? I work for Google search.
Saurabh Phaltane 0:40
I work on the SRE org within Google search, and we manage maintain the Google Search core infrastructure. I work on performance improvement, infrastructure optimization, building new automation tools to ensure that search is reliable for millions of users out there. They can access the website. They can basically get to those awesome new AI experiences which are getting rolled out. So yeah, part of this story here at Google search, that's exciting.
Dennise Cardona 1:11
Yeah, Google Search definitely not a stranger to that. We I use it daily. It's I'm sure most people listening in do the same thing. Takes a lot of manpower to keep that up and running. And so I can't speak for everyone, but I am super grateful for the efforts that you all put into Google Search to make it something that is really usable, friendly and just, yeah, super helpful. Well, for those who are unfamiliar with systems engineering, can you define what it is and why it's essential in today's Tech landscape?
Saurabh Phaltane 1:44
Sure. Yeah. So the way I define this is it's more like a multi disciplinary approach to designing systems by keeping user needs in perspective. So that takes us all the way back to the stack, where you have to understand what your user or your application needs are and take your requirements back up to designing how are you going to design the network? How are you going to maintain the network? How are you going to basically make it more reliable, scalable and at the same time, secure? I often remind ourselves, like I'm working on this tripod stand of security, reliability and then maybe scalability. And while we are doing that, you have to keep in mind like it's a ever, ever evolving field. The reason I say this is, if you see look at the tech landscape that has evolved over last 20 or so years we were on mainframes, like we designed for mainframes, there was a huge supercomputer, probably, like, sitting somewhere you, like, ran bunch of applications on there. There was a limited user base. But in last 20 years, we have gone through things like.com bus, like 1000s of websites have shown up in this 20th century. After that, we went through a phase called SaaS, or cloud, which gave a totally different perspective, where you could lease or rent servers on cloud and run your workloads there, which kind of shifted the tragedy trajectory all the way from running something in a physical data center to cloud. So the challenges have shifted, and now we are entering this era, in this 2020 century. I mean this this decade is where AI is taking over the world, where you have to really think of the user experiences from a totally different angle, and it is getting challenging because new security issues might pop up. You have to keep yourself with the demand. With the advent of this new technologies, especially like we went from 2g to 3g three, two to 4g and then 5g you are essentially sending more bits per second over the network. And then, in order to tackle that, you have to basically design your systems in a way that you are basically well supported. You don't fall off while supporting the systems.
Dennise Cardona 4:11
Yeah, thank you so much for explaining that. It sounds very complicated from somebody who is not a systems engineer, but it's it just, yeah, there's so much to it, so many components to it, and so many different facets. And I think what that tells people who are listening in, who may not be systems engineers either, but maybe are interested in that field, is that sounds like there's a lot of opportunity to make your mark in this world, this new world of AI technology and the technology that runs places like Google, and being able to be on the forefront of innovation, and that just sounds like a very exciting career trajectory for many people who are interested in that.
Saurabh Phaltane 4:55
Yes, that's very true. I think I said this is a ever evolving field. Also keep yourself up to date with what's happening in the market, what's happening in the technology landscape, and there is a strong demand. If you are good at your work, you're going to get a good job, and you're going to build your strong career in the industry. So yeah, I mean, there's a certainly a big demand for system engineers out there who know the systems and technology doesn't change in real terms, the Linux kernel, which was probably running there like years ago, has gone through iterations, but it's still Linux kernel down there, down the stack somewhere, running right? You need to know how networks operate at the ground level. You need to know what security protocols essentially, were working in the past. What has changed? And probably get yourself up to date. The way I would say is it's not getting outdated. There's something new that's coming up, and you just keep up with the new things that coming up. And I think that's how the demand for engineers is growing because of new set of technology landscapes coming in, and we just have to keep ourselves up to date with what's happening and learn and evolve.
Dennise Cardona 6:09
From your perspective at Google, do you see a strong demand for systems engineers? Like, what kinds of projects or challenges are driving the need for systems engineers today, I know you just spoke a little bit about that, but are there anything? Is there anything specific that folks who are interested in the field may want to be looking out for.
Saurabh Phaltane 6:30
Based on what I have seen here, there was something that used to sell is the product features. 10, 20, years ago, making a product or a feature on a product, is now very easy. You can just use AI to write the piece of code, get the feature out there. It is relatively easy. I've also it's too easy. It's relatively easy. But on top of that, what you need now is like, how can you ensure that your product is a damn reliable it stays up and serving for those millions and billions of users, it is resilient enough to handle those D source DDoS, sophisticated DDoS attacks that go through and maybe how we can basically keep the systems reliable running so that these are some of the things based On which many of the products which had good features, but they failed, and at the same time, other companies were built by just providing this as some of the metrics that can be used by the sales folks to like sell their companies for. So this is very crucial in this world ever involving systems engineering field, and I think that's the crux here.
Dennise Cardona 7:44
Can you walk us through what a typical day in the life of a systems engineer at Google would look like? What kinds of challenges and projects would be typically worked on in that kind of an environment?
Saurabh Phaltane 7:56
Oh, so I think a day to day life of SRE, what, which is what I am at Google. There's no single definition to what a role of SRE is. I think it's you have to wear different hats at different points in time that there's bunch of monitoring, alerting to the entire story, like you have to have right set of tools in place to ensure that the site is up running. Well, I'll be serving the users, and by that, what we mean is keep your alerts configured, keep the monitoring going. That's probably taken care by automation, but we have to ensure that the automation is robust enough to do that thing. So we are basically building those tools on a day to day basis to ensure that this large scale infrastructure that we're running is monitored and there is, there is a right way to ensure that the releases that are like new code changes, new features, new experiences that we build are reliably and like securely delivered to our users. There's a predefined and evaluated release process, we don't end up breaking something. There's a clear path to production. When there's a feature developed by an developer, it reaches production and it's securely serving the use case like you have to monitor those metrics of success. You have to basically define the SLOs and ensure that we are meeting those SLOs. I think that's very important. Fun part about Google, if you see, is people have been using Google for a while to check if their internet is up. They just keep thinking, Google, that's me, yeah. The fun part is, it's quite surprising, because it's a website, it's an application running up there, and like, we have got used to the fact that it's always up. It's as analogous to, like, it's internet up and running, but that that's what, essentially, we do here. Like our job is to ensure that experience stays as it is. And while we are onboarding new things, we are, like, doing that securely, reliably, without any in that user experience. That's what my day to day life here is. And then again, incidences like you have to have a strong collaborative culture going on. You are working with product managers, you're working with developers, you're working with different security leaders in the team. You're communicating out what's happening down on the ground, you are like leading responses, like to incidents. If the incidences happen, we have a very well defined process and procedure to basically handle things. If it's not, it's rosy green. Every time when the world is running, there are times when there's a there's like an election going on in some part of the world. There is like a cricket match going in the other side of the world. So we do get lot of spikes. We get like things where we have to be careful about what's the user experience. So we have to either plan well in advance to ensure that we are having the right amount of capacity. We are well set up to handle that spike. We can scale up well, in real time. We have the right caching strategies and all technical things in place, and then if worst case was, if things go bad, we have to have our incident hat on to go in down there mitigate the issue, because that's, I think is very important here. As a life of a system engineer, if you're gonna fix something right away, you should have plan in place to mitigate that, because that's how you basically avoid the bad customer experience. And then probably you can work around things and fix things, probably after few hours it's fine, like someone else can fix it for you while you are actually handling that incident, but the prime importance is, like user experience. That's That's my day to day life, and that's what we keep on top of our minds.
Dennise Cardona 11:48
Yeah, great skill set. And take away from that, for people who are listening in, is there sounds like there is a wealth of opportunity to break into this field that has so many. It's not just a narrow tunnel. There are just so many different avenues you can take and building on skill sets that like the collaboration, communication, project management, critical decisioning, decision making, leadership, innovation, it sounds like a great opportunity and a very important career. Everybody in the world that uses technology. I'm going to say that probably that we all rely on Google, specifically, the majority of us anyway. So it's one of those things. I am one of those people who runs to do the test link. I have issues sometimes with my home internet, and I am checking it before I go online, make sure everything's looking good, and it's Yeah. In terms of artificial intelligence, how is AI transforming the field of systems engineering? Are there specific, say, AI tools or frameworks that have been, I should say, have become successful?
Saurabh Phaltane 12:55
What I would say is AI is bringing a drastic change in the way we thought about systems, the way we designed our systems for and also the design of our applications. So certainly, there are a lot of tools out there which are coming in place. They get replaced each another day with something that's better out there. But at the core, what I would like to highlight is it is evolving at a very higher rate, and the way it is helping systems, and the way I forward here is basically more intelligent monitoring, more intelligent alerting, more intelligent triaging of issues, is what is happening to systems with AI you can have AI agents basically do the low level work for you, basically ensuring that your site is up running at the same time it is doing what is intended to do. There is lot of testing involved when you are delivering an experience to a customer, so ensuring that the right customer experience is delivered, can be well, very well, automated with AI. So this metrics tracing, I think these are some of the things which are which are widely improving with AI and impacting systems path forward, I see this is also going to change the way we have been thinking about system asset this 20 year, 30 year landscape, at least from what I've seen, has changed a lot, where mainframes were apart, then the SaaS architectures, where we thought like, why don't we build this multi tenant architectures where we can squeeze like multiple clients in a single shard and serve them. But now it's now changing totally. Things are very different. With AI, you have something called agentic AI, which can do lot of things on the client side, so your reliance on the server to basically do something has drastically reduced. So you probably need to think of systems in a very different way. What that could be. Who knows? Like it probably is evolving. It is probably going to find its own product market fit eventually, but it's going to impact systems engineering a lot. And I see that is already happening around there, infrastructure automation. I think there's lot there. And I think cost for running this infrastructure, it has varied a lot. Now, what companies are trying to do is basically squeeze out Max what they can from the existing infrastructure, like by optimizing they are very cost sensitive, especially with what's going out in the market, I think ensuring that you can reliably and safely do something in the least amount of cost. I think it's that's paramount. And with AI that is just doubling down, because you would want to squeeze more and more from your existing infrastructure, and even from like GPUs and whatever TPUs that you have deployed, because that's the differentiator here, right? You can, pretty much everyone has access to free foundational models. You run your foundational models and you can train your model to do a specific use case, get it out to the users. They will start using it. But what differentiates like this spectrum of different companies working on AI is how cost efficiently Can you deliver that experience to the user? And for that, you have to really work on your systems, because that's where the money is going to be made. So I think these are, like two, three things. I think that is bringing drastic change to the way we have been pursuing systems until today.
Dennise Cardona 16:35
For prospective students considering UMBC graduate program in systems engineering, what advice would you give about preparing and succeeding in this career path?
Saurabh Phaltane 16:46
Most important thing is keep doing what you're doing. That's what I generally say to folks, is whatever the core systems, core operating systems, and the core coursework for computer science is designed, I think it's very well set up to train you to get to a baseline, to understand what really these systems mean. So one like get used to that, do it well and keep doing what you have been doing for part of your Bachelor's or Master's course work, but at the same time, you have to keep yourself up to date on what's happening out in the market. And by that, what I mean is keep attending sessions. Keep attending, like conferences, if you can with this AI and like lot of things, there's so much content out there that you can get for free. So the most important thing in this entire systems engineering landscape is keeping yourself up to date. And the way you do it is like attending meetups, attending sessions, keep following the right leaders in the industry, keep your focus very well defined, I think, on what you are wanting to do. Because I said this is a very wide field. There's lot to be done. And personally, the way I have been looking at is, like, figure out 10 different things, and then eventually you can, like, drill down into one specific field, but it takes lot of time. So I think keep learning, keep yourself up to date. There are, like, lot of resources out there, to name a few. There are things like SRE con, like, this is a nice conference, and most of these things are available on YouTube and other sources. QCon, like DevOps days and like bunch of other things which are happening out there. So keep yourself up to date. Is what is my suggestion to probably, the students of UMBC, and I think they can eventually figure it's not you start here and you end here. The my personal journey has been like I figured out many things on way, and I that is probably one of the things I would suggest, is don't expect everything is just going to happen. It's going to take time, and you will figure out lot of things on way. So keep learning.
Dennise Cardona 19:03
Great advice, great insights. Loved this conversation. It was jam packed with some really good gems, great takeaways for folks who are interested in this field, and it's just really cool to hear like from a company like Google, what you do a day in the life just very fascinating. So thank you so much for sharing your insights with us today. Thank you. Thanks a lot. Thanks for having me here. And thank you so much to everyone for tuning into this episode of UMBC MIc'd up podcast. If you'd like to learn more about our offerings, click the link in the description. Thank you.