make things happen, follow the white rabbit

sata controller for non-x86

Workstations, servers, PDAs, ... MIPS, PPC, HPPA, ARM, X86, ...

sata controller for non-x86

Postby ivelegacy on Fri May 03, 2019 4:31 pm

Architecture Machine CPU PCI kernel
PA-RISC2 C3600 1xPA8600@550Mhz seveal PCI32 and PCI64 slots 4.16 .. 5.1
PowerPC Apple PowerMacG4 MDD 2xPPC7450@1.2Ghz four PCI64 slots 4.16, 4.20
MIPS4/BE SGI Octane2/IP30 2xR14000@600Mhz three PCI64 slots via XIO-PCI _?_


Note:
PCIX is not the same as PCI64. PCI-X differs from PCI64 in interrupt handling


Our team is using these machines. We haven't yet found a stable and decent SATA controller.
brand/model chip/driver specs tested result
Highpoint RocketRAID 1640 HTP374 4ch, PCI32 C3600/v4.16 DMA problems, >500MB files result corrupted
HighPoint RocketRAID 2224 Marvell MV88SX6081 4ch+e*, PCI-X C3600/v4.16,5.1 under testing
HighPoint RocketRAID 2224 Marvell MV88SX6081 4ch+e*, PCI-X PowerMacG4/v4.16 working!!!
Highpoint RocketRAID 3220 Marvel xxx 4ch, PCI-X C3600/v4.16, x86/v4.20 3.3V-only, x86-only
Adaptec 1210SA Silicon Image Sil3112 2ch, PCI32 C3600/v4.16 working!!!
Adaptec 2410SA i960 4ch, PCI64 C3600/v4.16, x86/v4.20 x86-only
(wanted) Adaptec AAR-1420SA __ 4ch, PCI-X __ __
(wanted) SYBA-SY-PCI40010 Silicon Image Sil?? 4ch, PCI32 __ __
SYBA-SY-PCX40009 Silicon Image Sil3124 4ch, PCI-X C3600/4.20 __
SYBA-SY-PCX40009 Silicon Image Sil3124 4ch, PCI-X PowerMacG4/4.16 working!!!
VIA generic card VIA6421 2ch, PCI32 __ __
(wanted) LSI Fuel/Tezro LSI SAS3041X 4ch SAS, PCI-X __ __
(wanted) LINDY SATA-II Multilane _ e*, PCI-X __ __
(wanted) Sonnet Tempo™ SATA X4i sil?!? 4ch, PCI-X __ __


Note:
SY-PCX40009 PCI-X card supports two models { 32-bit at 66 MHz, 64-bit at 133 MHz }.
Backward compatible to PCI 2.3. Known to be working with C8000

e*: Infiniband Multilane (SFF-8470)
like being a monkey coloring within the lines

show post TAG


Houston, we have a serious problem

Postby ivelegacy on Fri May 03, 2019 4:33 pm

Adaptec AAR-2410SA was tested on x86, C3600, and Apple PowerMac G4 MDD. It worked only on x86.

Code: Select all
0001:10:15.0 RAID bus controller: Adaptec AAC-RAID (rev 01)
        Subsystem: Adaptec AAR-2410SA PCI SATA 4ch (Jaguar II)
        Flags: 66MHz, slow devsel, IRQ 58
        Memory at 84000000 (32-bit, prefetchable) [size=64M]
        Expansion ROM at 80088000 [disabled] [size=32K]
        Capabilities: [80] Power Management version 2


Alan Cox explained some reasons for this. His email is long, and not public, so I summarize what we have understood.

as soon as the computer bootstraps, the firmware in its BIOS scans every PCI-peripherical for any BIOS-extension, it finds then there is a BIOS-extension ROM on the SATA-card, and it loads and executes it: the flash-chip on the card contains x86 opcode! The ROM initializes some features on the SATA-card and loads and bootstraps a firmware there (the firmware is contained in the flash, but it somehow requires to be launched by the PC, dunno how/what), the PC goes ahead and bootstrap the OS-loader (Grub? Lilo? ... this stuff), the Linux kernel is loaded and bootstrapped too, the kernel is now running, and it probes for the SATA-controller device, and it finds it, so the kernel-driver finds the SATA-controller already configured and - it's running its own firmware - so, when the kernel issues commands, it responds properly!

So, if you put the Adaptec AAR-2410SA SATA-card into a non-x86 computer ... the BIOS extension is not expected, and the Linux kernel does not find the SATA correctly configured-card, in fact, the kernel complains the card is not even found running its own firmware running, and this can't be fixed, unless you do a full reverse-engineering of flash-code, in order to create a new kernel-driver able to directly initialize the card instead of waiting for the job done by the PC-BIOS.

hardware RAID cards are usually problematic for the same reason.
like being a monkey coloring within the lines

show post TAG


Serious disk corruption with HPT374 on C3600

Postby LordCrimson on Fri May 03, 2019 8:15 pm

What's happening is similar to the Bug 2271 appeared in 2004

Ivelegacy, Madame and I tried HighPoint HPT374 on a C3600 workstation running Kernel v4.16 in 64bit mode. It didn't panic but, during a file-copy operation, the DMA caused corruption to the file. The filesystem was not corrupted.

Code: Select all
# lssize data1.bin
400 Mbyte
# cp data1 data2
md5sum data1.bin data2.bin
6004eb9dd9189770655f8b49a1d688a8  data1.bin
6004eb9dd9189770655f8b49a1d688a8  data2.bin


Code: Select all
# lssize bigone1
5 Gbyte
# cp bigone1 bigone2
# md5sum bigone1 bigone2
f60a9f7ff4bcec465ea47e0f009354fd  bigone1
5e1fdedc560cfe82a5d59b740a980091  bigone2 <---- corrupted


Digging deeper it only happens with big files.
Mystères of the invisible, Loa are intermediaries between Bondye the Supreme Creator, who is distant from the world—and humanity.

show post TAG


copy test

Postby ivelegacy on Sat May 04, 2019 12:22 pm

/usr/bin/safecp
Code: Select all
if [ "*$1" == "*" ]
   then
       exit
   fi
if [ "*$2" == "*" ]
   then
       exit
   fi

cp $1 $2
ans="$?"

if [ "$ans" != "0" ]
   then
       echo "copy error, l1"
   else
       check1=`md5sum $1`
       check2=`md5sum $2`
       check1=`myparam1 $check1`
       check2=`myparam1 $check2`
       if [ "*$check1" != "*$check2" ]
          then
              echo "copy error, l2"
          fi
   fi


Code: Select all
dd if=/dev/urandom of=data_01GB.bin count=1024 bs=1M
dd if=/dev/urandom of=data_02GB.bin count=1024 bs=2M
dd if=/dev/urandom of=data_04GB.bin count=1024 bs=4M
dd if=/dev/urandom of=data_08GB.bin count=1024 bs=8M
dd if=/dev/urandom of=data_16GB.bin count=1024 bs=16M
dd if=/dev/urandom of=data_32GB.bin count=1024 bs=32M


Code: Select all
safecp data_512MB.bin copy.bin

Code: Select all
safecp data_1GB.bin copy.bin

Code: Select all
safecp data_2GB.bin copy.bin

Code: Select all
safecp data_4GB.bin copy.bin

Code: Select all
safecp data_8GB.bin copy.bin

Code: Select all
safecp data_16GB.bin copy.bin


The last two do stress-out both the PCI and the DMA, so this is a good test for the worst case: all of our machines fail the test.
like being a monkey coloring within the lines

show post TAG


Highpoint RR3220 is 3.3V-only and x86-only

Postby ivelegacy on Sat May 04, 2019 12:32 pm

The Highpoint RocketRAID RR3220 comes with two mini-SAS connectors and it's not keyed as "3.3V-only". Putting the card into a 3V-only PCI-X slot doesn't suffer from any problem, and it works as expected. Tried on a SuperMicro Xeon motherboard that offers both 3V-only and 5V-tolerant PCI-X slots: on the 5V-tolerant slot, it triggers an over-current alarm.

The card must be keyed as "3.3V-only" by removing the 5V notch on the connector. Anyway, its firmware is x86-only, so it cannot be used on non-x86 machines.
like being a monkey coloring within the lines

show post TAG


Drivers with BLOB are a red flag

Postby ivelegacy on Thu May 09, 2019 3:16 pm

Basically, *anything* that uses binary "BLOB" firmware loaded by the driver is usually a big red flag for a whole bunch of reasons. This applies to not only sata-cards but also ethernet NICs that support features like layer 2/3 offloading, intelligent serial and so on.
like being a monkey coloring within the lines

show post TAG


HPPA, C3600 panics with Marvell MV88SX6081

Postby ivelegacy on Fri May 10, 2019 9:16 am

Tested on HPPA C3600, with git/deller/parisc-linux.git, parisc-5.2. I've been using kgcc-v7.3 and the kernel size issue has been compiled with CONFIG_MLONGCALLS. This issue is long-standing. The Binutils linker lacks support for long branch stubs when linking 64-bit code. So I cannot compile the kernel without MLONGCALLS and the final size is about 21Mbyte

Code: Select all
# mycp data_8GB.bin copy.bin


Performance is about 15Mbyte/sec, which is not good on a device that should go at 50Mbyte/sec at least, and it's not stable.
When copying files bigger than 4Gbyte, I experiment issues with the DMA. Usually the machine halts and reboots.

Hard Fail vs. Soft Fail on PCI Master Abort
Master Abort means the MMIO transaction timed out - usually due to the device not responding to an MMIO read. We would like HF to be enabled to find driver problems, though it means the system will crash with a HPMC. In SoftFail mode "~0L" is returned as a result of a timeout on the pci bus. This is like how PCI busses on x86 and most other architectures behave. In order to increase compatibility with existing (x86) PCI hardware and existing Linux drivers we enable Soft Faul mode on PA-RISC now too.




Code: Select all
c3600 ~ # uname -r
5.1.0-deller-5.2-c3600-64bit
c3600 ~ # cat /proc/cpuinfo | grep failmode
PCI failmode    : soft


softfail mycp data_08GB.bin copy.bin success
softfail mycp data_16GB.bin copy.bin failure Kernel panic - not syncing High Priority Machine Check (HPMC)
hardfail mycp data_08GB.bin copy.bin success 13m44.431s, 3m24.382s, 7m39.433s ~10Mbyte/sec
hardfail mycp data_16GB.bin copy.bin failure Kernel panic - not syncing High Priority Machine Check (HPMC)


Code: Select all
# gcc-config -l
[1] hppa2.0-unknown-linux-gnu-5.4.0
[2] hppa2.0-unknown-linux-gnu-6.4.0
[3] hppa2.0-unknown-linux-gnu-7.3.0
[4] hppa2.0-unknown-linux-gnu-8.2.0 *
[5] hppa64-unknown-linux-gnu-7.3.0 * <-------- using this to compile the kernel
[6] hppa64-unknown-linux-gnu-8.2.0
like being a monkey coloring within the lines

show post TAG


Marvell 88SX60xx

Postby TheHalloween on Fri May 10, 2019 7:35 pm

The driver is drivers/ata/sata_mv.c, which is per-device queues, full SATA control including hotplug.
  • The 88SX50xx "GEN_I" series supports TCQ, but not NCQ or PM.
  • The 88SX6xxx "GEN_II" series (6040, 6041, 6080, and 6081) supports TCQ, NCQ, and PM.
  • The 88SX7xxx "GEN_IIE" series (6042, 7042, and various system-on-chip hosts) supports TCQ, NCQ, FBS, and PM.

88SExxxx series of chips present an ahci-interface. Some of the recent HighPoint cards are based on the Marvell 88SX50xx and 88SX60xx chips. These will be supported by the Marvell libata driver. Anyway, although Marvell-controllers driven by sata_mv are well supported, various Marvell-AHCI controllers are suffering from incomplete and/or buggy support and Marvell doesn't seem to allocate any resource on upstream Linux support and communication between Marvell and libata developers is weak.

:uc-waves:

show post TAG


Marvell 88SX60xx on HPPA, HPMC log

Postby ivelegacy on Sat May 11, 2019 11:39 am

Code: Select all
Main Menu: Enter command > ser pim

PROCESSOR PIM INFORMATION

-----------------  Processor 0 HPMC Information ------------------

Timestamp =
  Sat May  11 11:37:53 GMT 2019    (20:19:05:11:11:37:53)

HPMC Chassis Codes = 2cbf0  2500b  27825  2cbfb

General Registers 0 - 31
00-03   0000000000000000  0000000040cc4360  00000000406dce7c  0000000049159530
04-07   0000000040c0bb60  000000012ec37020  00000000000a4000  0000000000000001
08-11   0000000000000001  000000012ec37020  0000000049159258  000000012ec33920
12-15   000000012ec38920  0000000040c36360  0000000000002218  0000000000000002
16-19   0000000000000001  000000012f1f17b0  0000000040c36360  00000000491677a0
20-23   0000000008ff0000  0000000049165d70  0000000000000007  0000000000000001
24-27   0000000000000000  0000000000000000  0000000000000010  0000000040c0bb60
28-31   000008d1a8d10800  0000000049159b30  00000000491595c0  00000008d1080008

<Press any key to continue (q to quit)>

Control Registers 0 - 31
00-03   0000000000000000  0000000000000000  0000000000000000  0000000000000000
04-07   0000000000000000  0000000000000000  0000000000000000  0000000000000000
08-11   00000000000017c6  0000000000000000  00000000000000c0  000000000000003f
12-15   0000000000000000  0000000000000000  0000000000183000  fe00000000000000
16-19   000002f2cd0df74f  0000000000000000  00000000406dcea8  0000000048dc0048
20-23   00000000a627ffdc  00000000090a4024  000000000804000e  8800000000000000
24-27   00000000010f9000  00000000dafe6000  00000000ffffffff  00000000f8f00480
28-31   00000000ffffffff  00000000ffffffff  0000000040fc3000  00000000ffffffff
Space Registers 0 - 7

00-03   005f1800          00000000          00000000          005f1800
04-07   00000000          00000000          00000000          00000000

<Press any key to continue (q to quit)>

IIA Space                    = 0x0000000000000000
IIA Offset                   = 0x00000000406dceac
Check Type                   = 0x20000000
CPU State                    = 0x9e000004
Cache Check                  = 0x00000000
TLB Check                    = 0x00000000
Bus Check                    = 0x0030103b
Assists Check                = 0x00000000
Assist State                 = 0x00000000
Path Info                    = 0x00000000
System Responder Address     = 0x000000fff7024024
System Requestor Address     = 0xfffffffffffa0000

Floating-Point Registers 0 - 31
00-03   0000001f00000000  0000000000000000  0000000000000000  0000000000000000
04-07   41d735a7c10ed08d  0000000000000000  0000000000000000  0000000000000000
08-11   0000000000000000  0000000000000000  0000000000000000  0000000000000000
12-15   0000000000000000  0000000000000000  0000000000000000  0000000000000000
16-19   0000000000000000  0000000000000000  0000000000000000  0000000000000000
20-23   0000000000000000  0000000000000000  0000000000000000  0000000000000000
24-27   0000000000000000  0000000000000000  0000000000000000  0000000000000000
28-31   0000000000000000  0000000000000000  0000000000000000  0000000000000000

<Press any key to continue (q to quit)>


'9000/785 B,C,J Workstation Unarchitected (per-CPU)', rev 1, 140 bytes:

Check Summary                = 0xcb81045028000000
Available Memory             = 0x0000000200000000
CPU Diagnose Register 2      = 0x0301000000802004
CPU Status Register 0        = 0x2420c20000000000
CPU Status Register 1        = 0x8002000000000000
SADD LOG                     = 0x00c0000400000000
Read Short LOG               = 0xc1a0f0fff7024024
ERROR_STATUS                 = 0x0000000000500050
MEM_ADDR                     = 0x000001ff3fffffff
MEM_SYND                     = 0x0000000000000000
MEM_ADDR_CORR                = 0x0000018a00db1dad
MEM_SYND_CORR                = 0x0000000000000094
RUN_DATA_HIGH                = 0xc1bff0fffed08040
RUN_DATA_LOW                 = 0xc1bff0fffed08040
RUN_CTRL                     = 0x0000021c00001418
RUN_ADDR                     = 0xc1bff0fffed08040
System Responder Path        = 0x00ffffff0a060200


HPMC PIM Analysis Information:

Timestamp =
  Sat May  11 11:37:53 GMT 2019    (20:19:05:11:11:37:53)


'9000/785 B,C,J Workstation HPMC PIM Analysis (per-CPU)', rev 0, 1304 bytes:

A Data I/O Fetch Timeout occurred while CPU 0 was
requesting information from a device at the path 10/6/2/0 (PCI slot 2).


Memory/IO Controller Error Analysis Information:

There were multiple correctable memory errors.  See 'Memory Error Log Info'.

<Press any key to continue (q to quit)>

-----------------  Processor 0 LPMC Information ------------------

Check Type                   = 0x00000000
I/D Cache Parity Info        = 0x00000000
Cache Check                  = 0x00000000
TLB Check                    = 0x00000000
Bus Check                    = 0x00000000
Assists Check                = 0x00000000
Assist State                 = 0x00000000
Path Info                    = 0x00000000
System Responder Address     = 0x0000000000000000
System Requestor Address     = 0x0000000000000000


-----------------  Processor 0 TOC Information -------------------

General Registers 0 - 31
00-03   0000000000000000  0000000000000000  0000000000000000  0000000000000000
04-07   0000000000000000  0000000000000000  0000000000000000  0000000000000000
08-11   0000000000000000  0000000000000000  0000000000000000  0000000000000000
12-15   0000000000000000  0000000000000000  0000000000000000  0000000000000000
16-19   0000000000000000  0000000000000000  0000000000000000  0000000000000000
20-23   0000000000000000  0000000000000000  0000000000000000  0000000000000000
24-27   0000000000000000  0000000000000000  0000000000000000  0000000000000000
28-31   0000000000000000  0000000000000000  0000000000000000  0000000000000000

<Press any key to continue (q to quit)>

Control Registers 0 - 31
00-03   0000000000000000  0000000000000000  0000000000000000  0000000000000000
04-07   0000000000000000  0000000000000000  0000000000000000  0000000000000000
08-11   0000000000000000  0000000000000000  0000000000000000  0000000000000000
12-15   0000000000000000  0000000000000000  0000000000000000  0000000000000000
16-19   0000000000000000  0000000000000000  0000000000000000  0000000000000000
20-23   0000000000000000  0000000000000000  0000000000000000  0000000000000000
24-27   0000000000000000  0000000000000000  0000000000000000  0000000000000000
28-31   0000000000000000  0000000000000000  0000000000000000  0000000000000000
Space Registers 0 - 7

00-03   00000000          00000000          00000000          00000000
04-07   00000000          00000000          00000000          00000000

IIA Space                    = 0x0000000000000000
IIA Offset                   = 0x0000000000000000
CPU State                    = 0x00000000


<Press any key to continue (q to quit)>

Memory Error Log Information:

Timestamp =
  Sat May  11 11:37:53 GMT 2019    (20:19:05:11:11:37:53)


'9000/785 B,C,J Workstation Memory Error Log', rev 0, 64 bytes:

This log displays the contents of memory specific registers when the
HPMC occurred.  If there are multiple memory errors, the order they are
listed is not indicative of the order they occurred.

                                   Trans  Addr
   Memory Error Type(s)  OV  MID    ID    par  CP   DIMM       Runway Address
   --------------------  --  ---  -----  ----  --  -------  -------------------
1) Correctable Mem       1   0x6  0x a   na    na  05       0x       0036c76b40

                                                Syndrome
                                           ------------------
                                        1) 0x94
<Press any key to continue (q to quit)>

I/O Module Error Log Information:

Timestamp =
  Sat May  11 11:37:53 GMT 2019    (20:19:05:11:11:37:53)


'9000/785 B,C,J Workstation IO Error Log', rev 0, 228 bytes:

Rope     Word1        Word2            Word3
------ ------------ ------------
   0    0x00000000   0x0e0cc009   0x00000000fed30048
   1    0x00000000   0x1e0cc009   0x00000000fed32048
   2    ----------   0x2e0cc009   ------------------
   3    ----------   0x3e0cc009   ------------------
   4    0x00000000   0x4e0cc009   0x00000000fed38048
   5    ----------   0x5e0cc009   ------------------
   6    0x00000000   0x6e0cc2a9   0x00000000fed3c048
   7    ----------   0x7e0cc009   ------------------


A Data I/O Fetch Timeout occurred while CPU 0 was requesting information from a device at the path 10/6/2/0 (PCI slot 2).


Code: Select all
c3600 / # lspci -nn
00:0c.0 Ethernet controller [0200]: Digital Equipment Corporation DECchip 21142/43 [1011:0019] (rev 41)
00:0d.0 Multimedia audio controller [0401]: Analog Devices Device [11d4:1889]
00:0e.0 IDE interface [0101]: National Semiconductor Corporation 87415/87560 IDE [100b:0002] (rev 03)
00:0e.1 Bridge [0680]: National Semiconductor Corporation 87560 Legacy I/O [100b:000e] (rev 01)
00:0e.2 USB controller [0c03]: National Semiconductor Corporation USB Controller [100b:0012] (rev 02)
00:0f.0 SCSI storage controller [0100]: LSI Logic / Symbios Logic 53C896/897 [1000:000b] (rev 07)
00:0f.1 SCSI storage controller [0100]: LSI Logic / Symbios Logic 53C896/897 [1000:000b] (rev 07)
01:04.0 RAID bus controller [0104]: Silicon Image, Inc. SiI 3124 PCI-X Serial ATA Controller [1095:3124] (rev 01)
03:02.0 SCSI storage controller [0100]: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller [11ab:60)


Silicon Image, Inc. SiI 3124 PCI-X Serial ATA Controller <--- not in use
Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller <--- in use
like being a monkey coloring within the lines

show post TAG


C3600 PCI

Postby madame on Sun May 12, 2019 6:37 pm

  • One PCI 64-bit/66 MHz, 3.3 V slot (1)
  • Three PCI 64-bit/33 MHz, 5 V slots
  • Two PCI 32-bit/33 MHz, 5 V slots

Code: Select all
S1: PCI-64/33, pci0, 5 V <--------- testing 64bit cards here
S2: PCI-64/66, pci1, 3.3 V
S3: PCI-64/33, pci0, 5 V
S4: PCI-64/33, pci2, 5 V
S5: PCI-32/33, pci3, 5 V
S6: PCI-32/33, pci3, 5 V <--------- testing 32bit cards here



(1) clocked at 100 MHz on C3750
youse guys have got to turn your world around. chinese stuff is deadly and crap.

show post TAG


Next

Return to Computers with a UNIX-like OS