Wednesday, December 14, 2022
As we head into 2023, we want to introduce another tool for the public to understand the most current status of systems which impact Search—crawling, indexing, and serving. While system disruptions are extremely rare, we want to be transparent when they do happen. In the past, we’ve worked with our Site Reliability Engineers (SRE) to externalize these disruptions on our Google Search Central Twitter account. Today, we’re introducing the Google Search Status Dashboard to communicate the status of Search going forward.
Over the past couple years we’ve been working with our SREs on improved ways to make information about major incidents generally accessible and useful. The goal was to make reporting issues quick, accurate, and easy. As a result, we have launched a new status dashboard and simplified the process of communicating during incidents.
This dashboard reports widespread issues occurring in the last 7 days, with some details and the current status of the incident. A widespread issue means there’s a systemic problem with a Search system affecting a large number of sites or Search users. Typically these kinds of issues are very visible externally, and internally the SREs’ monitoring and alerting mechanisms are working behind the scenes to flag the issues.
The dashboard has a number of features that you might already be aware of from other Google status dashboards, such as an RSS feed you can subscribe to and view of historical data.
How we communicate incidents and updates
Once we confirm with SREs that there’s an ongoing, widespread issue in Search, we aim to post an incident on the dashboard within an hour, and consecutive updates to the incident within 12 hours. Unlike with a traditional automated dashboard, our global staff reports these updates. The start time of the incident is generally when we managed to confirm the issue.
Outside of the traditional status update you might see, we will also try to provide more information that might resolve the solution. For example, in the hypothetical scenario that the nameserver handling domain name resolution for millions of sites refuses Googlebot’s connection requests, we may post an update saying that changing nameservers may mitigate the issue sites are experiencing. Of course with any issue, we will keep posting updates to the incident—with mitigation possibilities when available—until the incident is resolved.
We consider an incident resolved when our engineers have made changes that will end the impact on the system. While this means that the system itself is now healthy, sites may experience effects for some time until they’re reprocessed, depending on the type of incident.
If you want to learn more about the dashboard, we have a dedicated page for the Search Status dashboard on Google Search Central. If you want to leave some feedback about the dashboard, tweet at us @googlesearchc.
Posted by Gary, Search Team