Cisco images failure after few minutes

Before posting something, READ the changelog, WATCH the videos, howto and provide following:
Your install is: Bare metal, ESXi, what CPU model, RAM, HD, what EVE version you have, output of the uname -a and any other info that might help us faster.

Moderator: mike

Post Reply
User avatar
bgp-lu
Posts: 4
Joined: Thu Jan 09, 2020 7:26 pm

Cisco images failure after few minutes

Post by bgp-lu » Wed May 11, 2022 7:17 pm

Hello there,

I´m triying to bring up a topology that contains several nodes. Until now, i never tried to bring up the whole nodes. I already know that the HW is limited.

Servers specs:
Hypervisor: VMware ESXi, 6.5.0, 5969303
Modelo: UCSC-C220-M4S
Type of processor: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Logical Processors: 32
NIC: 4
Total memory: 128Gb

Images used:
XRv9-k9full 7.6.1 x8
XRv-6.6.2 x2
CRS1000v 17.03.05 x4
NE40E V800R011C00SPC607B607 x15
vMX 18.4R1.8 x1

The NE40 , vMX and XRv works flawless but the XRv9 and CSR1000v have problems after be initialized a couple of minutes. The XRv9 crash or looks like goes down itself and the CSR display a log related with the CPU and goes freeze for a few minutes, after that it comes back again

XRv9

Code: Select all

0/RP0/ADMIN0:May 11 18:10:25.150 UTC: vm_manager[3262]: %INFRA-VM_MANAGER-3-MSG_HEARTBEAT_FAILURE : VM default-sdr--1 failed to maintain heartbe 
0/RP0/ADMIN0:May 11 18:10:25.169 UTC: sdr_mgr[3216]: %SM-SDR_MANAGER-3-MSG_VM_RELOAD_ON_HB_FAILURE : Info :SDR NM : VM Reload on HB failure, sdr 
0/RP0/ADMIN0:May 11 18:10:25.170 UTC: sdr_mgr[3216]: %SM-SDR_MANAGER-3-MSG_VM_UNGRACEFUL_RELOAD_TOO_OFTEN : Info :sdr default-sdr vm_id 1 ungrac 
[18:10:44.777] Sending KILL signal to processmgr..
[18:10:44.777] Sending KILL signal to ds..
PM disconnect successStopping OpenBSD Secure Shell server: sshdinitctl: Unknown instance: 
The audit system is disabled
Stopping system message bus: dbus.
Stopping random number generator daemon.
Stopping system log daemon...0
Stopping kernel log daemon...0
Stopping internet superserver: xinetd.
Stopping crond: OK
Stopping rpcbind daemon...
done.
Libvirt not initialized for container instance
Deconfiguring network interfaces... done.
Sending all processes the KILL signal...
Unmounting remote filesystems...
Deactivating swap...
Unmounting local filesystems...
Connection closed by foreign host.
Wed May 11 18:11:24 UTC 2022 (/opt/cisco/hostos/bin/xr_con_telnet_wrapper.sh): XR console connection lost to port 9001
CRS1000v

Code: Select all

*May 11 18:24:10.581: %PLATFORM-4-ELEMENT_WARNING: R0/0: smand: RP/0: 5-Minute Load Average value 9.49 exceeds warning level 8.00.
*May 11 18:24:44.445: %EVENTLIB-3-CPUHOG: R0/0: hman: undefined: 1311ms, Traceback=1#08ca21ba637c850b75436450ffff3b6d   c:7FA1A2665000+37370 c:7FA1A2665000+15BC9C :564DD7383000+2CDCA :564DD7383000+2D518 :564DD7383000+49343 uipeer:7FA1ACCD2000+3F6A9 uipeer:7FA1ACCD2000+1ED06 evlib:7FA1AE2F7000+9145 evlib:7FA1AE2F7000+9A9C orchestrator_lib:7FA1A94CC000+CE31 orchestrator_lib:7FA1A94CC000+CDB4
*May 11 18:24:44.472: %EVENTLIB-3-CPUHOG: R0/0: hman: undefined: 1135ms, Traceback=1#08ca21ba637c850b75436450ffff3b6d   c:7FA1A2665000+37370 c:7FA1A2665000+EACA4 c:7FA1A2665000+7BCFB c:7FA1A2665000+7BE9D c:7FA1A2665000+6FFA2 procmib_lib:7FA1A7581000+6472 :564DD7383000+4FAB4 evlib:7FA1AE2F7000+9145 evlib:7FA1AE2F7000+9A9C orchestrator_lib:7FA1A94CC000+CE31 orchestrator_lib:7FA1A94CC000+CDB4
*May 11 18:25:00.072: %EVENTLIB-3-CPUHOG: R0/0: smd: write asyncon 0x55df3a8908e8: 136ms, Traceback=1#aacc8f6f6ff3ee394cf2c4311553234a   c:7F38368EF000+37370 pthread:7F3836AAF000+117FA bipc:7F384D54A000+5192 evutil:7F385A7E4000+9CD2 evlib:7F385B6CB000+8D8E evlib:7F385B6CB000+9A9C orchestrator_lib:7F385B4A7000+CE31 orchestrator_lib:7F385B4A7000+CDB4 luajit:7F3837461000+7C696 luajit:7F3837461000+35C44 luajit:7F3837461000+BFF9
Anyone has tried use newer images of cisco XRv9 and CSR1000v ??
It is posible that the problems were related with the storage ?? (the eve VM its located in a Vmware Datastore, not in the local storage)

I need to do some test with IS-IS (migration from OSPF), MVPN control-plane and if it's posible, SR-MPLS.

Regards

Uldis (UD)
Posts: 4477
Joined: Wed Mar 15, 2017 4:44 pm
Location: London
Contact:

Re: Cisco images failure after few minutes

Post by Uldis (UD) » Fri May 13, 2022 6:40 am

How many CPU are assigned for your EVE in total?

Show EVE CLI output

Code: Select all

eve-info

User avatar
bgp-lu
Posts: 4
Joined: Thu Jan 09, 2020 7:26 pm

Re: Cisco images failure after few minutes

Post by bgp-lu » Fri May 13, 2022 1:00 pm

Hello Uldis,

here is the output for that command:

Code: Select all

---------------Packages Installed----------------
ii eve-ng 2.0.3-112
ii eve-ng-addons-ostinato-drone 2.0.3-61
ii eve-ng-dynamips 2.0.2-2
ii eve-ng-guacamole 2.0.3-112
ii eve-ng-qemu 2.0.5-24
ii eve-ng-schema 2.0.6-14
ii eve-ng-vpcs 1.0-eve-ng
ii linux-headers-4.9.40-eve-ng-ukms+ 4.9.40-eve-ng-ukms-brctl
ii linux-image-4.20.17-eve-ng-ukms+ 4.20.17-eve-ng-ukms-brctl

---------------Hostname--------------------------
   Static hostname: cochambre
    Virtualization: vmware
  Operating System: Ubuntu 16.04.7 LTS
            Kernel: Linux 4.20.17-eve-ng-ukms+
      Architecture: x86-64
---------------Disk Usage------------------------
Filesystem                    Size  Used Avail Use% Mounted on
udev                           61G     0   61G   0% /dev
tmpfs                          13G   24M   13G   1% /run
/dev/mapper/eve--ng--vg-root  228G  126G   93G  58% /
tmpfs                          61G     0   61G   0% /dev/shm
tmpfs                         5.0M     0  5.0M   0% /run/lock
tmpfs                          61G     0   61G   0% /sys/fs/cgroup
/dev/sda1                     472M  118M  330M  27% /boot

---------------CPU Info--------------------------
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    1
Core(s) per socket:    2
Socket(s):             16
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Stepping:              2
CPU MHz:               2394.230
BogoMIPS:              4788.46
Virtualization:        VT-x
Hypervisor vendor:     VMware
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              20480K
NUMA node0 CPU(s):     0-15
NUMA node1 CPU(s):     16-31
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti tpr_shadow vnmi ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid xsaveopt arat

---------------Memory Info-----------------------
              total        used        free      shared  buff/cache   available
Mem:           121G         27G         89G         34M        4.9G         92G
Swap:          8.0G          0B        8.0G

---------------Nic Info--------------------------
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master pnet0 state UP mode DEFAULT group default qlen 1000

---------------IP Info---------------------------
*        State: n/a

---------------Bridge Info-----------------------
pnet0           8000.0050568adc47       no              eth0
pnet1           8000.000000000000       no
pnet2           8000.000000000000       no
pnet3           8000.000000000000       no
pnet4           8000.000000000000       no
pnet5           8000.000000000000       no
pnet6           8000.000000000000       no
pnet7           8000.000000000000       no
pnet8           8000.000000000000       no
pnet9           8000.000000000000       no

---------------H/W Accel-------------------------
INFO: /dev/kvm exists
KVM acceleration can be used

Post Reply