Linux 6.1 introduces new features that make it easier to identify faulty CPUs
- An American company made 0.7nm chips: EUV lithography machines can’t do it
- CVE-2007-4559 Python vulnerability ignored for 15 years puts 350,000 projects at risk of code execution
- RISC-V only takes 12 years to achieve the milestone of 10 billion cores
- 14000 cores + 450W: RTX 4080 graphics card perfectly replaces the RTX 3080
- Big upgrade: The difference between Bluetooth 5.0 and 5.2
- Geeks Disappointed that RTX 4080/4090 doesn’t come with PCIe 5.0
- What are advantages and disadvantages of different load balancing?
Linux 6.1 introduces new features that make it easier to identify faulty CPUs.
For Linux production environments with multiple CPUs running at the same time (such as large servers), Linux 6.1 adds a very useful feature: in the event of a failure, an error message will inform you which CPU is at fault.
This feature comes from a patch for the x86/CPU branch of the Linux 6.1 merge window: in the event of a segfault, the failure message prints the “suspected” CPU number.
Patch author Rik van Riel explains how the feature works and how it works:
In a large enough computer cluster, there are usually several bad CPUs. It can usually be identified by looking at the running kernel code. If there is a faulty CPU, the kernel code runs fine elsewhere, but keeps crashing on a faulty CPU core.
Over the years, however, the failure mode of the CPU in question has been very specific, you may find segmentation faults in bash, python, or various system daemons, yet the failure message will not tell you which CPU is at fault. Now we add printk() to show_signal_msg() to print the corresponding CPU, core and socket on segmentation fault.
At present, this function is not perfect, and there may be false positives. Since the fault occurs until the corresponding error message is output, the task may be rescheduled on another CPU, resulting in the wrong CPU number being reported.
But it’s good enough to help people identify most of the faulty CPU cores.
Here is a functional example:
segfault: segfault at 0 ip 000000000040113a sp 00007ffc6d32e360 error 4 in \ segfault[401000+1000] likely on CPU 0 (core 0 , socket 0 )
This printk can be controlled by.
According to Phoronix , the feature will be officially enabled in the Linux 6.1 stable release in October.
segfault: Segmentation fault/segmentation fault/segmentation fault, is a bug that is often encountered in software development, and is also the most common bug in the Linux kernel.
The error is caused by illegal memory accesses such as null pointer references, write operations in read-only memory regions, access to protected memory regions, etc.
- DIY a PBX (Phone System) on Raspberry Pi
- How to host multiple websites on Raspberry Pi 3/4?
- A Free Intercom/Paging system with Raspberry pi and old Android phones
- DIY project: How to use Raspberry Pi to build DNS server?
- Raspberry Pi project : How to use Raspberry Pi to build git server?