aboutsummaryrefslogtreecommitdiffstats
path: root/mcelog.triggers.5
blob: 510bbef2be19e7bca991923ac7187a3ed199e11c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
'\" t
.TH "mcelog.triggers" 5 "mcelog"
.SH NAME
mcelog.triggers \- mcelog trigger scripts reference
.SH SYNOPSIS
.B /etc/mcelog/bus-error-trigger
.br
.B /etc/mcelog/cache-error-trigger
.br
.B /etc/mcelog/dimm-error-trigger
.br
.B /etc/mcelog/iomca-error-trigger
.br
.B /etc/mcelog/page-error-trigger
.br
.B /etc/mcelog/socket-memory-error-trigger
.br
.B /etc/mcelog/unknown-error-trigger
.br
.SH DESCRIPTION
.BR mcelog(8) 
maintains thresholds of errors using a 
.I leaky-bucket
algorithm.
When the number of errors in a specific
time window exceeds a pre-configured threshold a 
.I trigger
will be executed. Triggers are usually shell scripts in the
.B /etc/mcelog 
directory
but can be also other internal actions. Thresholds and triggers
can be configured in
.BR mcelog.conf(5)

Trigger will run as the user configured for mcelog
in 
.I mcelog.conf,
by default root. The default trigger action can
be overridden by specifying a different trigger script in the configuration file.
Actions in addition to the default trigger
(like notifying an administrator) can be put into the respective
.I /etc/mcelog/*.local
script which is executed after the default action. This allows updating the default
scripts without overriding local actions. All trigger actions are also
logged to syslog.
.PP
.B "The DIMM and socket memory error triggers"
.PP
The 
.B /etc/mcelog/dimm-error-trigger
and 
.B /etc/mcelog/socket-memory-error-trigger
scripts are executed when a DIMM or a CPU socket exceeds
a configured corrected or uncorrected memory error threshold.
The thresholds are configured in the 
.B mcelog.conf
.I [dimm]
and
.I [socket]
sections.
The default triggers log a warning message in the system log.
The triggers are only executed when mcelog runs as a daemon.

Arguments are passed as environment variables
.TS
tab(:);
l l.
THRESHOLD:human readable threshold status
MESSAGE:Human readable consolidated error message
TOTALCOUNT:total corrected or uncorrected count of errors for current DIMM  depending on what triggered the event
LOCATION:Consolidated location as a single string
DMI_LOCATION:DIMM location from DMI/SMBIOS if available
DMI_NAME:DIMM identifier from DMI/SMBIOS if available
DIMM:DIMM number reported by hardware
CHANNEL:Channel number reported by hardware
SOCKETID:Socket ID of CPU that includes the memory controller with the DIMM
CECOUNT:Total corrected error count for DIMM
UCCOUNT:Total uncorrected error count for DIMM
LASTEVENT:Time stamp of event that triggered threshold (in time_t format, seconds)
THRESHOLD_COUNT:Total umber of events in current threshold time period of specific type
.TE

After the default action local actions in 
.B /etc/mcelog/dimm-error-trigger.local
or respective 
.B /etc/mcelog/socket-memory-error-trigger.local
are executed.

.PP
.B "The page error trigger"
.PP
The 
.B /etc/mcelog/page-error-trigger 
script is 
executed by mcelog in daemon mode when a page
in memory exceeds a pre-configured corrected or uncorrected error threshold.
mcelog internally also implements offlining the page through the kernel.
This is configured through the 
.I [page]
section of 
.BR mcelog.conf(5)
.PP
The environment arguments are the same as for the 
.I dimm-error-trigger
script
.PP
After the default action local actions in 
.I /etc/mcelog/page-error-trigger.loccal are executed.

.PP
.B "The cache error trigger"
.PP
The
.I /etc/mcelog/cache-error-trigger
shell script is called for cache error handling in daemon mode
when a CPU reports excessive corrected cache errors.
This could be a indication for future uncorrected errors.
.PP
This trigger is configured through the 
.B [cache]
section in the 
.BR mcelog.conf(5) 
configuration file. The threshold is defined by the CPU.  The default trigger offlines the affected CPU cores, unless it is the last core running. 
.PP
Arguments are passed as environment variables
.TS
tab(:);
l l.
MESSAGE:Human readable error message
CPU:Linux CPU number that triggered the error
LEVEL:Cache level affected by error
TYPE:Cache type affected by error (Data,Instruction,Generic)
AFFECTED_CPUS:List of CPUs sharing the affected cache
SOCKETID:Socket ID of affected CPU
.TE
.PP
After the default action local actions in 
.I /etc/mcelog/cache-error-trigger.local are executed.
.PP
.B "The bus-uc-threshold-trigger"
.PP
The 
.B bus-uc-threshold-trigger
runs on uncorrected errors on a IO bus. It is configured through the 
.B bus-uc-threshold-trigger
and
.B bus-uc-threshold-trigger-threshold
options in
.I /etc/mcelog.conf(5). 
By default it logs a message with the error location to the system log.
After the default action local actions in 
.I /etc/mcelog/bus-uc-error-trigger.local 
are executed.
.PP
Arguments are passed as environment variables
.TS
tab(:);
l l.
MESSAGE:Human readable consolidated error message. 
LOCATION:Consolidated location as a single string 
SOCKETID:Socket ID of CPU that includes the memory controller with the DIMM
LEVEL:Interconnect level 
PARTICIPATION:Processor Participation (Originator, Responder or Observer) 
REQUEST:Request type (read, write, prefetch, etc.) 
ORIGIN :Memory or IO
TIMEOUT:The request timed out or not 
.TE
.PP
.B "The iomca-error-trigger"
.PP
The 
.B iomca-error-trigger
runs when a socket receives bus or interconnect errors.
It is configured through the 
.B iomca-error-trigger 
and 
.B iomca-error-trigger-threshold
options in
.I /etc/mcelog.conf. By default it logs a message with the error location to the system log.
After the default action local actions in 
.I /etc/mcelog/iomca-error-trigger.local are executed.
.PP
Arguments are passed as environment variables
.TS
tab(:);
l l.
MESSAGE:Human readable consolidated error message
LOCATION:Consolidated location as a single string
SOCKETID:Socket ID of CPU that includes the memory controller with the DIMM
CPU:Linux CPU number that triggered the error
SET:PCI segment number
BUS:PCI bus number
DEVICE:PCI device number
FUNCTION:PCI function number
.TE
.PP
.B "The unknown-error-trigger"
.PP
The 
.B unknown-error-trigger
runs on any errors not otherwise categorized.
It is configured through the 
.B unknown-error-trigger
and
.B unknown-error-trigger-threshold
options in
.I /etc/mcelog.conf. 
By default it logs a message to the system log.
After the default action local actions in 
.I /etc/mcelog/unknown-error-trigger.local 
are executed.
.PP
Arguments are passed as environment variables
.TS
tab(:);
l l.
MESSAGE:Human readable consolidated error message
LOCATION:Consolidated location as a single string
SOCKETID:Socket ID of CPU that includes the memory controller with the DIMM
CPU:Linux CPU number that triggered the error
STATUS:IA32_MCi_STATUS register value
ADDR:IA32_MCi_ADDR register value
MISC:IA32_MCi_MISC register value
MCGSTATUS:IA32_MCG_STATUS register value
MCGCAP:IA32_MCG_CAP register value
.TE
.SH SEE ALSO
http://www.mcelog.org

.B mcelog(8),
.B mcelog.conf(5)