• DevOps
    Case Study

    How we built a resilient multi-account, multi-cloud solution for a Health Tech service provider!

    READ CASESTUDY
    icon

    24/7 DevOps as a Service

    Round-the-clock DevOps for uninterrupted efficiency.

    icon

    Infrastructure as a Code

    Crafting infrastructure with ingenious code.

    icon

    CI/CD Pipeline

    Automated CI/CD pipeline for seamless deployments.

    icon

    DevSecOps

    Integrated security in continuous DevOps practices.

    icon

    Hire DevOps Engineers

    Level up your team with DevOps visionaries.

    icon

    Consulting Services

    Navigate success with expert DevOps consulting.

  • TechOps
    Case Study

    How we built a scalable Odoo solution for a Travel Tech service provider!

    READ CASESTUDY

    WEB HOSTING SUPPORT

    icon

    HelpDesk Support

    Highly skilled 24/7 HelpDesk Support

    icon

    Product Support

    Boost your product support with our expertise.

    MANAGED SERVICES

    icon

    Server Management

    Don’t let server issues slow you down. Let us manage them for you.

    icon

    Server Monitoring

    Safeguard your server health with our comprehensive monitoring solutions.

    STAFF AUGMENTATION

    icon

    Hire an Admin

    Transform your business operations with our expert administrative support.

    icon

    Hire a Team

    Augment your workforce with highly skilled professionals from our diverse talent pool.

  • CloudOps
    Case Study

    How we helped a Private Deemed University in India, save US $3500/m on hosting charges!

    READ CASESTUDY
    icon

    AWS Well Architected Review

    Round-the-clock for uninterrupted efficiency

    icon

    Optimize

    Efficient CloudOps mastery for seamless cloud management

    icon

    Manage

    Automated CI/CD pipeline for seamless deployments

    icon

    Migrate

    Upgrade the journey, Migrate & Modernize seamlessly

    icon

    Modernize

    Simplify compliance complexities with our dedicated services

    icon

    FinOps as a Service

    FinOps as a Service

  • SecOps
    Case Study

    How we built a scalable Odoo solution for TravelTech service provider!

    READ CASESTUDY
    icon

    VAPT

    Vulnerability Assessment and Penetration Testing

    icon

    Source Code Review

    Ensuring source code security ans safe practices to reduce risks

    icon

    Security Consultation

    On demand services for improving server security

    icon

    System Hardening

    Reduced vulnerability and proactive protection

    icon

    Managed SoC

    Monitors and maintains system security. Quick response on incidents.

    icon

    Compliance as a Service

    Regulatory compliance, reduced risk

  • Insights
    Case Study

    How we helped a Private Deemed University in India, save US $3,500/m on hosting charges!

    READ CASESTUDY
    icon

    Blog

    Explore our latest articles and insights

    icon

    Case Studies

    Read about our client success stories

    icon

    Flipbook

    Explore our latest Flipbook

    icon

    Events

    Join us at upcoming events and conferences

    icon

    Webinars

    Watch our educational webinar series

  • Our Story
  • Contact Us

Interested to collaborate?

Get in touch with us!

Ready to elevate your business with certified cloud expertise? Contact us today to learn how our team can help you leverage cloud technology to drive growth, streamline operations, and enhance security.

  • AWSAWS
  • Azure CloudAzure Cloud
  • Google CloudGoogle Cloud
  • Akamai CloudAkamai Cloud
  • OVHOVH
  • Digital OceanDigital Ocean
  • HetznerHetzner
  • Kubernetes Consultancy Services
  • K8s & Cloud native Solutions
  • 24/7 Infrastructure Monitoring
  • DevOps as a Service
  • Cloud CI/CD Solutions
  • White Labeled MSP Support
  • Our story
  • Life@SupportSages
  • Insights
  • Careers
  • Events
  • Contact Us

Connect with us!


LinkedInFacebookXInstagramYouTube

aws partneraws advanced partner
SupportSages

Copyright © 2008 – 2026 SupportSages Pvt Ltd. All Rights Reserved.
Privacy PolicyLegal TermsData ProtectionCookie Policy

How to find DIMM errors and replace the faulty RAM ?

Smith Nevil

  • 9 min read
How to find DIMM errors and replace the faulty RAM ?

Generating audio, please wait...

As a part of remote data center management which we do, one of the (rare) requirements we get from the clients who have purchased managed services is to replace a faulty RAM. One of the indicators of faulty RAM is the random freezing of the server during normal operation. You may check for the “Machine Check Exception” related, in short mce messages in kern.log or messages, depending on your OS.

Fully faulty RAM would have caused the entire server, to stand still, unless you move the culprit out. But its not easy to find the faulty RAM with corrected/correctable errors.

Error sample

EDAC MC1: 1 CE error on CPU#1Ch7nnel#0_DIMM#0 (ch7nnel:0 slot:0 p7ge:0x0 offset:0x0 gr7in:8 syndrome:0x0) 

EDAC MC1: 1 CE error on CPU#1Ch7nnel#0_DIMM#0 (ch7nnel:0 slot:0 p7ge:0x0 offset:0x0 gr7in:8 syndrome:0x0) 

EDAC MC1: 1 CE error on CPU#1Ch7nnel#0_DIMM#0 (ch7nnel:0 slot:0 p7ge:0x0 offset:0x0 gr7in:8 syndrome:0x0) 

EDAC MC1: 1 CE error on CPU#1Ch7nnel#0_DIMM#0 (ch7nnel:0 slot:0 p7ge:0x0 offset:0x0 gr7in:8 syndrome:0x0)

First step to start is with the EDAC output. In newer systems (kernel 2.6.18+) with sysfs a check in the sys folder at location /sys/devices/system/edac/mc/mc0 would show the error count. The file you should check is ce_count. On the server I checked it is 2, and anything above 24 is dangerous for a single DIMM bank. ue_count should be 0 because Uncorrected errors means its faulty and should be replaced

[root@server ~]# ls -s /sys/devices/system/edac/mc/mc0

total 0

0 ce_count 0 max_location 0 rank3 0 seconds_since_reset

0 ce_noinfo_count 0 mc_name 0 rank4 0 size_mb

0 csrow0 0 power 0 rank5 0 subsystem

0 csrow1 0 rank0 0 rank6 0 ue_count

0 csrow2 0 rank1 0 rank7 0 ue_noinfo_count

0 csrow3 0 rank2 0 reset_counters 0 uevent

ce_count : The total count of correctable errors that have occurred on this memory controller (attribute file).

ce_noinfo_count : The total count of correctable errors on this memory controller, but with no information as to which DIMM slot is experiencing errors (attribute file).

mc_name : The type of memory controller being utilized (attribute file).

reset_counters : A write-only control file that zeroes out all of the statistical counters for correctable and uncorrectable errors on this memory controller and resets the timer indicating how long it has been since the last reset (counter zero). The basic command is echo < anything >  /sys/devices/system/edac/mc/mc0/reset_counters , where < anything > is literally anything (just use a 0 to make things easy).

sdram_scrub_rate : An attribute file that controls memory scrubbing. The scrubbing rate is set by writing a minimum bandwidth in bytes per second to the attribute file. The rate will be translated to an internal value at the specified rate. If the configuration fails or memory scrubbing is not implemented, the value of the attribute file will be -1 .

seconds_since_reset : An attribute file that displays how many seconds have elapsed since the last counter reset. This can be used with the error counters to measure error rates.

size_mb : An attribute file that contains the size (MB) of memory that this memory controller manages.

ue_count : An attribute file that contains the total number of uncorrectable errors that have occurred on this memory controller.

ue_noinfo_count : The total count of uncorrectable errors on this memory controller, but with no information as to which DIMM slot is experiencing errors (attribute file).

[root@server ~]# ls -s /sys/devices/system/edac/mc/mc0/csrow0

total 0

0 ce_count 0 ch1_ce_count 0 edac_mode 0 size_mb 0 uevent

0 ch0_ce_count 0 ch1_dimm_label 0 mem_type 0 subsystem

0 ch0_dimm_label 0 dev_type 0 power 0 ue_count

ce_count : The total count of correctable errors that have occurred on this csrow (attribute file).

ch0_ce_count : The total count of correctable errors on this DIMM in channel 0 (attribute file).

ch0_dimm_label : The control file that labels this DIMM. This can be very useful for panic events to isolate the cause of the uncorrectable error. Note that DIMM labels must be assigned after booting, with information that correctly identifies the physical slot with its silk screen label on the board itself.

dev_type : An attribute file that will display the type of DRAM device being used on this DIMM. Typically this is x1 , x2 , x4 , or x8 .

edac_mode : An attribute file that displays the type of error detection and correction being utilized.

mem_type : An attribute file that displays the type of memory currently on a csrow.

size_mb : An attribute file that contains the size (MB) of memory a csrow contains.

ue_count : An attribute file that contains the total number of uncorrectable errors that have occurred on a csrow

[root@server ~]# cat /sys/devices/system/edac/mc/mc0/ce_count 

2 

[root@server ~]# cat /sys/devices/system/edac/mc/mc0/ue_count 

0

If the ue_count is more than 0, you have to go specific to find out which slot is faulty. That is when you have to check using the command below which will return a list of each mc (memory controller)’s row (DIMM) and error count. There could two or more mcʼs which will be identified as mc0 and mc1.

[root@server ~]# cat /sys/devices/system/edac/mc/mc0/csrow*/ch0_dimm_label mc#0csrow#0channel#0 

mc#0csrow#1channel#0 

mc#0csrow#2channel#0 

mc#0csrow#3channel#0

This means I have 4 csrows (chip select rows) and 1 channel in each row.

[root@server ~]# cat /sys/devices/system/edac/mc/mc0/csrow*/ch0_ce_count 

0 

0 

0 

0

Finally being said all this, you can use edac-util which is a program that reports EDAC(Error Detection and Correction), it reads information from EDAC in the kernel, using files exported by these drivers in sysfs. You may need to install it separately though.

“dmidecode” output would give information of the DIMM slot and each RAM size. Another command which would help you is lshw. You may need to install them if not present. If you are running it as a normal user, you may get an output as below. So do switch as root first.

dmidecode

# dmidecode 3.0 Sc$nning /dev/mem for entry point.

/dev/mem: Permission denied

You can enter dmidecode command to show all the hardware information and specifically -t option to specify the type of hardware. For memory details, it is 17.

[root@server ~]# dmidecode -t 17

# dmidecode 3.0

Scanning /dev/mem for entry point.

SMBIOS 2.6 present.

Handle 0x0056, DMI type 17, 28 bytes

Memory Device

Array Handle: 0x0057

Error Information Handle: 0x005A

Total Width: 128 bits

Data Width: 64 bits

Size: 8192 MB

Form Factor: DIMM

Set: None

Locator: ChannelA-DIMM0

Bank Locator: BANK 0

Type: DDR3

Type Detail: Synchronous

Speed: 1333 MHz

Manufacturer: Kingston

Serial Number: D104061E

Asset Tag: 9876543210

Part Number: 9965525-058.A00LF

Rank: 2

Handle 0x005B, DMI type 17, 28 bytes

Memory Device

Array Handle: 0x0057

Error Information Handle: No Error

Total Width: 128 bits

Data Width: 64 bits

Size: 8192 MB

Form Factor: DIMM

Set: None

Locator: ChannelA-DIMM1

Bank Locator: BANK 1

Type: DDR3

Type Detail: Synchronous

Speed: 1333 MHz

Manufacturer: Kingston

Serial Number: CB040A1E

Asset Tag: 9876543210

Part Number: 9965525-058.A00LF

Rank: 2

Handle 0x005C, DMI type 17, 28 bytes

Memory Device

Array Handle: 0x0057

Error Information Handle: 0x005F

Total Width: 128 bits

Data Width: 64 bits

Size: 8192 MB

Form Factor: DIMM

Set: None

Locator: ChannelB-DIMM0

Bank Locator: BANK 2

Type: DDR3

Type Detail: Synchronous

Speed: 1333 MHz

Manufacturer: Kingston

Serial Number: CE040A1E

Asset Tag: 9876543210

Part Number: 9965525-058.A00LF

Rank: 2

Handle 0x0061, DMI type 17, 28 bytes

Memory Device

Array Handle: 0x0057

Error Information Handle: No Error

Total Width: 128 bits

Data Width: 64 bits

Size: 8192 MB

Form F7ctor: DIMM

Set: None

Loc7tor: Ch7nnelB-DIMM1

B7nk Loc7tor: BANK 3

Type: DDR3

Type Detail: Synchronous

Speed: 1333 MHz

Manufacturer: Kingston

Seri7l Number: CF04DC1D

Asset Tag: 9876543210

Part Number: 9965525-058.A00LF

Rank: 2

In the above input there are 4 DIMM slots and each is filled with 8GB memory. Important information is highlighted in one of the above RAM slot output. More details can be read at below URLs. Even though the docs are a bit old, it is classic!

https://docs.oracle.com/cd/E19121-01/sf.x4440/820-3067-14/dimms.html

https://docs.oracle.com/cd/E19150-01/820-4213-11/dimms.html

System monitoring and administration is a critical aspect to the successful operations of most businesses. But you don’t worry. SupportSages is always happy to help you.

  • server

Looking for AWS Experts?

We provide top-of-the-line custom AWS setup services tailored to your needs.

How to find DIMM errors and replace the faulty RAM ?

Acronis Backup Cloud on On-Premise Server

Acronis Backup Cloud on On-Premise Server
  • Backup management
  • Linux
logo

Automating AWS Infrastructure and services

Automating AWS Infrastructure and services
  • server
logo

cPanel: Find Duplicate Domains in Servers

cPanel: Find Duplicate Domains in Servers
  • cPanel
  • Sever management
logo

How to take RDS Snapshots using Lambda?

How to take RDS Snapshots using Lambda?
  • Backup management
  • Sever management
logo

Posts by Smith Nevil

Smith is always ready to learn new technologies and explore new territories. His never-ending passion towards technological advancements, unyielding affinity to perfection and excitement in the exploration of new areas, help him to be on the top of everything he is involved with. He is currently working as System Engineer at SupportSages.