diff mbox series

scsi: libsas: Check and update the link rate during discovery

Message ID 20221102100555.3537275-1-liyihang9@huawei.com
State New
Headers show
Series scsi: libsas: Check and update the link rate during discovery | expand

Commit Message

Yihang Li Nov. 2, 2022, 10:05 a.m. UTC
+----------+              +----------+
   |          |              |          |
   |          |--- 12.0 G ---|          |--- 12.0 G --- SAS  disk
   |initiator |              | Expander |
   | device   |--- 12.0 G ---|          |--- 12.0 G --- SAS  disk
   |          |              |          |
   |          |--- 12.0 G ---|          |--- 6.0 G --- SATA disk
   |          |              |          |
   |      phy0|--- 12.0 G ---|          |--- 6.0 G --- SATA disk
   |          |              |          |
   +----------+              +----------+

In the scenario where the expander device is connected to a wide port,
the preceding figure shows the link topology after initialization.
All physical PHYs negotiate connections at the rate of 12 Gbit, and
the expander SATA PHY negotiates connections at the rate of 6 Gbit.

We found that when the FIO was running, if phy0 was link down due to link
instability, and the link connection was reestablished after a period of
time. During the physical link disconnection, the physical PHY gradually
decreases the link rate, attempts to renegotiate the link rate and
establish the connection. This is an HW behavior. When the physical PHY
try to re-establish the link at a link rate of 3 Gbit and the physical
link is successfully established, the negotiated link rate is 3 Gbit.
Like this:

   +----------+              +----------+
   |          |              |          |
   |          |--- 12.0 G ---|          |--- 12.0 G --- SAS  disk
   |initiator |              | Expander |
   | device   |--- 12.0 G ---|          |--- 12.0 G --- SAS  disk
   |          |              |          |
   |          |--- 12.0 G ---|          |--- 6.0 G --- SATA disk
   |          |              |          |
   |      phy0|--- 3.0 G ----|          |--- 6.0 G --- SATA disk
   |          |              |          |
   +----------+              +----------+

SATA disk which connected to expander PHY maybe reject IO request due to
the connection setup error (OPEN_REJECT-CONNECTION RATE NOT SUPPORTED).
The log as follows:

[175712.419423] hisi_sas_v3_hw 0000:74:02.0: erroneous completion iptt=2985 task=00000000268357f1 dev id=10 exp 0x500e004aaaaaaa1f phy9 addr=500e004aaaaaaa09 CQ hdr: 0x102b 0xa0ba9 0x1000 0x20000 Error info: 0x200 0x0 0x0 0x0

After analysis, it is concluded that: when one of the physical links
connected on the wide port is re-established, the link rate of the port
and expander device and the expander SATA PHY are not updated. As a
result, the expander PHY attached to a SATA PHY is using link rate
(6.0 Gbit) greater than the physical PHY link rate (3.0 Gbit).

Therefore, add function sas_check_port_linkrate() to check whether the
link rate of physical PHY which is connected to the wide port changes
after the phy up occur, if the link rate of the newly established
physical phy is lower than the link rate of the port, a smaller link rate
value is transmitted to port->linkrate.

Use the sas_update_linkrate_root_expander() function to update the root
expander link rate. Traverse all expanders connected to the port, check
and update expander PHYs that need to be updated and triggers revalidation.

Signed-off-by: Yihang Li <liyihang9@huawei.com>
---
 drivers/scsi/libsas/sas_discover.c | 19 ++++++-
 drivers/scsi/libsas/sas_expander.c | 79 ++++++++++++++++++++++++++++++
 drivers/scsi/libsas/sas_internal.h |  1 +
 3 files changed, 98 insertions(+), 1 deletion(-)

Comments

John Garry Nov. 2, 2022, 11:55 a.m. UTC | #1
On 02/11/2022 10:05, Yihang Li wrote:

note: This is not discovery in which this erroneous condition occurs. 
Discovery is the phase in which the device is found initially.

>     +----------+              +----------+
>     |          |              |          |
>     |          |--- 12.0 G ---|          |--- 12.0 G --- SAS  disk
>     |initiator |              | Expander |
>     | device   |--- 12.0 G ---|          |--- 12.0 G --- SAS  disk
>     |          |              |          |
>     |          |--- 12.0 G ---|          |--- 6.0 G --- SATA disk
>     |          |              |          |
>     |      phy0|--- 12.0 G ---|          |--- 6.0 G --- SATA disk
>     |          |              |          |
>     +----------+              +----------+
> 
> In the scenario where the expander device is connected to a wide port,
> the preceding figure shows the link topology after initialization.
> All physical PHYs negotiate connections at the rate of 12 Gbit, and
> the expander SATA PHY negotiates connections at the rate of 6 Gbit.
> 
> We found that when the FIO was running, if phy0 was link down due to link
> instability, and the link connection was reestablished after a period of
> time. During the physical link disconnection, the physical PHY gradually
> decreases the link rate, attempts to renegotiate the link rate and
> establish the connection. This is an HW behavior. When the physical PHY
> try to re-establish the link at a link rate of 3 Gbit and the physical
> link is successfully established, the negotiated link rate is 3 Gbit.
> Like this:
> 
>     +----------+              +----------+
>     |          |              |          |
>     |          |--- 12.0 G ---|          |--- 12.0 G --- SAS  disk
>     |initiator |              | Expander |
>     | device   |--- 12.0 G ---|          |--- 12.0 G --- SAS  disk
>     |          |              |          |
>     |          |--- 12.0 G ---|          |--- 6.0 G --- SATA disk
>     |          |              |          |
>     |      phy0|--- 3.0 G ----|          |--- 6.0 G --- SATA disk
>     |          |              |          |
>     +----------+              +----------+
> 
> SATA disk which connected to expander PHY maybe reject IO request due to
> the connection setup error (OPEN_REJECT-CONNECTION RATE NOT SUPPORTED).
> The log as follows:
> 
> [175712.419423] hisi_sas_v3_hw 0000:74:02.0: erroneous completion iptt=2985 task=00000000268357f1 dev id=10 exp 0x500e004aaaaaaa1f phy9 addr=500e004aaaaaaa09 CQ hdr: 0x102b 0xa0ba9 0x1000 0x20000 Error info: 0x200 0x0 0x0 0x0
> 
> After analysis, it is concluded that: when one of the physical links
> connected on the wide port is re-established, the link rate of the port
> and expander device and the expander SATA PHY are not updated. As a
> result, the expander PHY attached to a SATA PHY is using link rate
> (6.0 Gbit) greater than the physical PHY link rate (3.0 Gbit).

Please mention the SAS spec section in which min pathway is described.

> 
> Therefore, add function sas_check_port_linkrate() to check whether the
> link rate of physical PHY which is connected to the wide port changes
> after the phy up occur, if the link rate of the newly established
> physical phy is lower than the link rate of the port, a smaller link rate
> value is transmitted to port->linkrate.
> 
> Use the sas_update_linkrate_root_expander() function to update the root
> expander link rate. Traverse all expanders connected to the port, check
> and update expander PHYs that need to be updated and triggers revalidation.

So are you saying that you want to lower the linkrate on all pathways to 
the SATA disk? In your example, that would be 3Gbps. If so, won't that 
affect the end-to-end linkrate of all other devices attached (and lower 
throughput drastically)?
kernel test robot Nov. 3, 2022, 5:04 a.m. UTC | #2
Hi Yihang,

https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yihang-Li/scsi-libsas-Check-and-update-the-link-rate-during-discovery/20221102-180734
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next
patch link:    https://lore.kernel.org/r/20221102100555.3537275-1-liyihang9%40huawei.com
patch subject: [PATCH] scsi: libsas: Check and update the link rate during discovery
config: m68k-randconfig-m041-20221102
compiler: m68k-linux-gcc (GCC) 12.1.0

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <error27@gmail.com>

New smatch warnings:
drivers/scsi/libsas/sas_expander.c:1962 sas_ex_update_linkrate() warn: iterator used outside loop: 'child'

Old smatch warnings:
drivers/scsi/libsas/sas_expander.c:253 sas_set_ex_phy() error: potential null dereference 'phy->phy'.  (sas_phy_alloc returns null)
drivers/scsi/libsas/sas_expander.c:253 sas_set_ex_phy() error: we previously assumed 'phy->phy' could be null (see line 189)
drivers/scsi/libsas/sas_expander.c:1974 sas_ex_update_linkrate() warn: iterator used outside loop: 'child'

vim +/child +1962 drivers/scsi/libsas/sas_expander.c

39d64046e3dfc7 Yihang Li       2022-11-02  1947  static void sas_ex_update_linkrate(struct domain_device *parent)
39d64046e3dfc7 Yihang Li       2022-11-02  1948  {
39d64046e3dfc7 Yihang Li       2022-11-02  1949  	struct expander_device *ex = &parent->ex_dev;
39d64046e3dfc7 Yihang Li       2022-11-02  1950  	int i = 0, end = ex->num_phys;
39d64046e3dfc7 Yihang Li       2022-11-02  1951  
39d64046e3dfc7 Yihang Li       2022-11-02  1952  	for ( ; i < end; i++) {
39d64046e3dfc7 Yihang Li       2022-11-02  1953  		struct ex_phy *ex_phy = &ex->ex_phy[i];
39d64046e3dfc7 Yihang Li       2022-11-02  1954  		struct domain_device *child;
39d64046e3dfc7 Yihang Li       2022-11-02  1955  
39d64046e3dfc7 Yihang Li       2022-11-02  1956  		list_for_each_entry(child, &parent->ex_dev.children, siblings)
                                                                                    ^^^^^
Imagine this loop exits without finding the correct child.

39d64046e3dfc7 Yihang Li       2022-11-02  1957  			if (SAS_ADDR(child->sas_addr) ==
39d64046e3dfc7 Yihang Li       2022-11-02  1958  			    SAS_ADDR(ex_phy->attached_sas_addr))
39d64046e3dfc7 Yihang Li       2022-11-02  1959  				break;
39d64046e3dfc7 Yihang Li       2022-11-02  1960  
39d64046e3dfc7 Yihang Li       2022-11-02  1961  		if (dev_is_sata(child)) {
                                                                                ^^^^^
That means "child" is not a valid pointer.  Not sure why the warning is
only triggered on the line below instead of here.

39d64046e3dfc7 Yihang Li       2022-11-02 @1962  			if (child->linkrate > parent->min_linkrate) {
39d64046e3dfc7 Yihang Li       2022-11-02  1963  				struct sas_phy_linkrates rates = {
39d64046e3dfc7 Yihang Li       2022-11-02  1964  					.maximum_linkrate = parent->min_linkrate,
39d64046e3dfc7 Yihang Li       2022-11-02  1965  					.minimum_linkrate = parent->min_linkrate,
39d64046e3dfc7 Yihang Li       2022-11-02  1966  				};
39d64046e3dfc7 Yihang Li       2022-11-02  1967  
39d64046e3dfc7 Yihang Li       2022-11-02  1968  				sas_smp_phy_control(parent, i,
39d64046e3dfc7 Yihang Li       2022-11-02  1969  						    PHY_FUNC_LINK_RESET, &rates);
39d64046e3dfc7 Yihang Li       2022-11-02  1970  				ex_phy->phy_change_count = -1;
39d64046e3dfc7 Yihang Li       2022-11-02  1971  			}
39d64046e3dfc7 Yihang Li       2022-11-02  1972  		}
39d64046e3dfc7 Yihang Li       2022-11-02  1973  
39d64046e3dfc7 Yihang Li       2022-11-02  1974  		if (dev_is_expander(child->dev_type)) {
39d64046e3dfc7 Yihang Li       2022-11-02  1975  			child->min_linkrate = min(parent->min_linkrate,
39d64046e3dfc7 Yihang Li       2022-11-02  1976  						  ex_phy->linkrate);
39d64046e3dfc7 Yihang Li       2022-11-02  1977  			child->max_linkrate = max(parent->max_linkrate,
39d64046e3dfc7 Yihang Li       2022-11-02  1978  						  ex_phy->linkrate);
39d64046e3dfc7 Yihang Li       2022-11-02  1979  			child->linkrate = min(ex_phy->linkrate,
39d64046e3dfc7 Yihang Li       2022-11-02  1980  					      child->max_linkrate);
39d64046e3dfc7 Yihang Li       2022-11-02  1981  			ex_phy->phy_change_count = -1;
39d64046e3dfc7 Yihang Li       2022-11-02  1982  		}
39d64046e3dfc7 Yihang Li       2022-11-02  1983  	}
39d64046e3dfc7 Yihang Li       2022-11-02  1984  }
Yihang Li Nov. 7, 2022, 11:52 a.m. UTC | #3
On 2022/11/2 19:55, John Garry wrote:
> On 02/11/2022 10:05, Yihang Li wrote:
> 
> note: This is not discovery in which this erroneous condition occurs. Discovery is the phase in which the device is found initially.

ok, I will change the description in the next version.

> 
>>     +----------+              +----------+
>>     |          |              |          |
>>     |          |--- 12.0 G ---|          |--- 12.0 G --- SAS  disk
>>     |initiator |              | Expander |
>>     | device   |--- 12.0 G ---|          |--- 12.0 G --- SAS  disk
>>     |          |              |          |
>>     |          |--- 12.0 G ---|          |--- 6.0 G --- SATA disk
>>     |          |              |          |
>>     |      phy0|--- 12.0 G ---|          |--- 6.0 G --- SATA disk
>>     |          |              |          |
>>     +----------+              +----------+
>>
>> In the scenario where the expander device is connected to a wide port,
>> the preceding figure shows the link topology after initialization.
>> All physical PHYs negotiate connections at the rate of 12 Gbit, and
>> the expander SATA PHY negotiates connections at the rate of 6 Gbit.
>>
>> We found that when the FIO was running, if phy0 was link down due to link
>> instability, and the link connection was reestablished after a period of
>> time. During the physical link disconnection, the physical PHY gradually
>> decreases the link rate, attempts to renegotiate the link rate and
>> establish the connection. This is an HW behavior. When the physical PHY
>> try to re-establish the link at a link rate of 3 Gbit and the physical
>> link is successfully established, the negotiated link rate is 3 Gbit.
>> Like this:
>>
>>     +----------+              +----------+
>>     |          |              |          |
>>     |          |--- 12.0 G ---|          |--- 12.0 G --- SAS  disk
>>     |initiator |              | Expander |
>>     | device   |--- 12.0 G ---|          |--- 12.0 G --- SAS  disk
>>     |          |              |          |
>>     |          |--- 12.0 G ---|          |--- 6.0 G --- SATA disk
>>     |          |              |          |
>>     |      phy0|--- 3.0 G ----|          |--- 6.0 G --- SATA disk
>>     |          |              |          |
>>     +----------+              +----------+
>>
>> SATA disk which connected to expander PHY maybe reject IO request due to
>> the connection setup error (OPEN_REJECT-CONNECTION RATE NOT SUPPORTED).
>> The log as follows:
>>
>> [175712.419423] hisi_sas_v3_hw 0000:74:02.0: erroneous completion iptt=2985 task=00000000268357f1 dev id=10 exp 0x500e004aaaaaaa1f phy9 addr=500e004aaaaaaa09 CQ hdr: 0x102b 0xa0ba9 0x1000 0x20000 Error info: 0x200 0x0 0x0 0x0
>>
>> After analysis, it is concluded that: when one of the physical links
>> connected on the wide port is re-established, the link rate of the port
>> and expander device and the expander SATA PHY are not updated. As a
>> result, the expander PHY attached to a SATA PHY is using link rate
>> (6.0 Gbit) greater than the physical PHY link rate (3.0 Gbit).
> 
> Please mention the SAS spec section in which min pathway is described.

Do you mean the original text of the SAS spec section needs to be added here?

Like this:

According to Serial Attached SCSI:
If an STP initiator port discovers a SATA device behind an STP/SATA bridge with a physical link rate greater
than the maximum connection rate supported by the pathway from the STP initiator port, the STP initiator port
should use the SMP PHY CONTROL function (see 10.4.3.10) to set the MAXIMUM PHYSICAL LINK RATE field of
the expander phy attached to the SATA device to the maximum connection rate supported by the pathway.

> 
>>
>> Therefore, add function sas_check_port_linkrate() to check whether the
>> link rate of physical PHY which is connected to the wide port changes
>> after the phy up occur, if the link rate of the newly established
>> physical phy is lower than the link rate of the port, a smaller link rate
>> value is transmitted to port->linkrate.
>>
>> Use the sas_update_linkrate_root_expander() function to update the root
>> expander link rate. Traverse all expanders connected to the port, check
>> and update expander PHYs that need to be updated and triggers revalidation.
> 
> So are you saying that you want to lower the linkrate on all pathways to the SATA disk? In your example, that would be 3Gbps. If so, won't that affect the end-to-end linkrate of all other devices attached (and lower throughput drastically)?

Yes, this will lower performance drastically, but I consider the following two things:

a. Ensure that all disks work properly when the issue we discussed occurs.

b. When the user limits the linkrate to a lower level through the sysfs interface, the linkrate on all pathways to the SATA disk should be reduced.

[root@localhost phy-5:0]# lsscsi
[5:0:0:0]    disk    HUAWEI   HWE32SS3008M001N 2774  /dev/sda
[5:0:1:0]    disk    ATA      ST4000NM0035-1V4 TN03  /dev/sdb
[5:0:12:0]   enclosu HUAWEI   Expander 12Gx16  131   -
[root@localhost phy-5:0]# cat maximum_linkrate
12.0 Gbit
[root@localhost phy-5:0]# cat minimum_linkrate
1.5 Gbit
[root@localhost phy-5:0]# echo 1.5 Gbit > maximum_linkrate
[root@localhost phy-5:0]# cat negotiated_linkrate
1.5 Gbit
[root@localhost phy-5:0]# lsscsi
[5:0:0:0]    disk    HUAWEI   HWE32SS3008M001N 2774  /dev/sda
[5:0:12:0]   enclosu HUAWEI   Expander 12Gx16  131   -
[5:0:13:0]   disk    ATA      ST4000NM0035-1V4 TN03  /dev/sdb
[root@localhost phy-5:0]# cd ../phy-5\:0:1
[root@localhost phy-5:0:1]# cat negotiated_linkrate
1.5 Gbit


Thanks,
Yihang

> 
> 
> 
> .
>
John Garry Nov. 7, 2022, 12:44 p.m. UTC | #4
>>>
>>> SATA disk which connected to expander PHY maybe reject IO request due to
>>> the connection setup error (OPEN_REJECT-CONNECTION RATE NOT SUPPORTED).
>>> The log as follows:
>>>
>>> [175712.419423] hisi_sas_v3_hw 0000:74:02.0: erroneous completion iptt=2985 task=00000000268357f1 dev id=10 exp 0x500e004aaaaaaa1f phy9 addr=500e004aaaaaaa09 CQ hdr: 0x102b 0xa0ba9 0x1000 0x20000 Error info: 0x200 0x0 0x0 0x0
>>>
>>> After analysis, it is concluded that: when one of the physical links
>>> connected on the wide port is re-established, the link rate of the port
>>> and expander device and the expander SATA PHY are not updated. As a
>>> result, the expander PHY attached to a SATA PHY is using link rate
>>> (6.0 Gbit) greater than the physical PHY link rate (3.0 Gbit).
>>
>> Please mention the SAS spec section in which min pathway is described.
> 
> Do you mean the original text of the SAS spec section needs to be added here?
> 

I mean to at least mention the spec version and section number and title 
(in that spec version).

> Like this:
> 
> According to Serial Attached SCSI:
> If an STP initiator port discovers a SATA device behind an STP/SATA bridge with a physical link rate greater
> than the maximum connection rate supported by the pathway from the STP initiator port, the STP initiator port
> should use the SMP PHY CONTROL function (see 10.4.3.10) to set the MAXIMUM PHYSICAL LINK RATE field of
> the expander phy attached to the SATA device to the maximum connection rate supported by the pathway.

I think that this condition in the spec is a flaw. Or at least annoying.

> 
>>
>>>
>>> Therefore, add function sas_check_port_linkrate() to check whether the
>>> link rate of physical PHY which is connected to the wide port changes
>>> after the phy up occur, if the link rate of the newly established
>>> physical phy is lower than the link rate of the port, a smaller link rate
>>> value is transmitted to port->linkrate.
>>>
>>> Use the sas_update_linkrate_root_expander() function to update the root
>>> expander link rate. Traverse all expanders connected to the port, check
>>> and update expander PHYs that need to be updated and triggers revalidation.
>>
>> So are you saying that you want to lower the linkrate on all pathways to the SATA disk? In your example, that would be 3Gbps. If so, won't that affect the end-to-end linkrate of all other devices attached (and lower throughput drastically)?
> 
> Yes, this will lower performance drastically, but I consider the following two things:
> 
> a. Ensure that all disks work properly when the issue we discussed occurs.
> 
> b. When the user limits the linkrate to a lower level through the sysfs interface, the linkrate on all pathways to the SATA disk should be reduced.
> 
> [root@localhost phy-5:0]# lsscsi
> [5:0:0:0]    disk    HUAWEI   HWE32SS3008M001N 2774  /dev/sda
> [5:0:1:0]    disk    ATA      ST4000NM0035-1V4 TN03  /dev/sdb
> [5:0:12:0]   enclosu HUAWEI   Expander 12Gx16  131   -
> [root@localhost phy-5:0]# cat maximum_linkrate
> 12.0 Gbit
> [root@localhost phy-5:0]# cat minimum_linkrate
> 1.5 Gbit
> [root@localhost phy-5:0]# echo 1.5 Gbit > maximum_linkrate
> [root@localhost phy-5:0]# cat negotiated_linkrate
> 1.5 Gbit
> [root@localhost phy-5:0]# lsscsi
> [5:0:0:0]    disk    HUAWEI   HWE32SS3008M001N 2774  /dev/sda
> [5:0:12:0]   enclosu HUAWEI   Expander 12Gx16  131   -
> [5:0:13:0]   disk    ATA      ST4000NM0035-1V4 TN03  /dev/sdb
> [root@localhost phy-5:0]# cd ../phy-5\:0:1
> [root@localhost phy-5:0:1]# cat negotiated_linkrate

So do we reset the linkrate of the SATA-attached phy, right? Could that 
cause the disk to be lost and found again? If so, doesn't seem useful if 
that disk had a filesystem mounted...

> 1.5 Gbit
> 
> 

I just wonder if it is better to disable that phy altogether rather than 
drag every other pathway down to this lower linkrate:

a. that would be simpler than trying to maintain this min pathway
b. the condition that gives rise to issue is very rare (so don't need to 
burden libsas with supporting it according to the spec).

Thanks,
John
Yihang Li Nov. 8, 2022, 11:33 a.m. UTC | #5
On 2022/11/7 20:44, John Garry wrote:
> 
>>>>
>>>> SATA disk which connected to expander PHY maybe reject IO request due to
>>>> the connection setup error (OPEN_REJECT-CONNECTION RATE NOT SUPPORTED).
>>>> The log as follows:
>>>>
>>>> [175712.419423] hisi_sas_v3_hw 0000:74:02.0: erroneous completion iptt=2985 task=00000000268357f1 dev id=10 exp 0x500e004aaaaaaa1f phy9 addr=500e004aaaaaaa09 CQ hdr: 0x102b 0xa0ba9 0x1000 0x20000 Error info: 0x200 0x0 0x0 0x0
>>>>
>>>> After analysis, it is concluded that: when one of the physical links
>>>> connected on the wide port is re-established, the link rate of the port
>>>> and expander device and the expander SATA PHY are not updated. As a
>>>> result, the expander PHY attached to a SATA PHY is using link rate
>>>> (6.0 Gbit) greater than the physical PHY link rate (3.0 Gbit).
>>>
>>> Please mention the SAS spec section in which min pathway is described.
>>
>> Do you mean the original text of the SAS spec section needs to be added here?
>>
> 
> I mean to at least mention the spec version and section number and title (in that spec version).

ok

> 
>> Like this:
>>
>> According to Serial Attached SCSI:
>> If an STP initiator port discovers a SATA device behind an STP/SATA bridge with a physical link rate greater
>> than the maximum connection rate supported by the pathway from the STP initiator port, the STP initiator port
>> should use the SMP PHY CONTROL function (see 10.4.3.10) to set the MAXIMUM PHYSICAL LINK RATE field of
>> the expander phy attached to the SATA device to the maximum connection rate supported by the pathway.
> 
> I think that this condition in the spec is a flaw. Or at least annoying.
> 
>>
>>>
>>>>
>>>> Therefore, add function sas_check_port_linkrate() to check whether the
>>>> link rate of physical PHY which is connected to the wide port changes
>>>> after the phy up occur, if the link rate of the newly established
>>>> physical phy is lower than the link rate of the port, a smaller link rate
>>>> value is transmitted to port->linkrate.
>>>>
>>>> Use the sas_update_linkrate_root_expander() function to update the root
>>>> expander link rate. Traverse all expanders connected to the port, check
>>>> and update expander PHYs that need to be updated and triggers revalidation.
>>>
>>> So are you saying that you want to lower the linkrate on all pathways to the SATA disk? In your example, that would be 3Gbps. If so, won't that affect the end-to-end linkrate of all other devices attached (and lower throughput drastically)?
>>
>> Yes, this will lower performance drastically, but I consider the following two things:
>>
>> a. Ensure that all disks work properly when the issue we discussed occurs.
>>
>> b. When the user limits the linkrate to a lower level through the sysfs interface, the linkrate on all pathways to the SATA disk should be reduced.
>>
>> [root@localhost phy-5:0]# lsscsi
>> [5:0:0:0]    disk    HUAWEI   HWE32SS3008M001N 2774  /dev/sda
>> [5:0:1:0]    disk    ATA      ST4000NM0035-1V4 TN03  /dev/sdb
>> [5:0:12:0]   enclosu HUAWEI   Expander 12Gx16  131   -
>> [root@localhost phy-5:0]# cat maximum_linkrate
>> 12.0 Gbit
>> [root@localhost phy-5:0]# cat minimum_linkrate
>> 1.5 Gbit
>> [root@localhost phy-5:0]# echo 1.5 Gbit > maximum_linkrate
>> [root@localhost phy-5:0]# cat negotiated_linkrate
>> 1.5 Gbit
>> [root@localhost phy-5:0]# lsscsi
>> [5:0:0:0]    disk    HUAWEI   HWE32SS3008M001N 2774  /dev/sda
>> [5:0:12:0]   enclosu HUAWEI   Expander 12Gx16  131   -
>> [5:0:13:0]   disk    ATA      ST4000NM0035-1V4 TN03  /dev/sdb
>> [root@localhost phy-5:0]# cd ../phy-5\:0:1
>> [root@localhost phy-5:0:1]# cat negotiated_linkrate
> 
> So do we reset the linkrate of the SATA-attached phy, right? Could that cause the disk to be lost and found again? If so, doesn't seem useful if that disk had a filesystem mounted...

Yes, the disk will be lost and found again, and I don't think of a good solution for disk with a filesystem mounted, perhaps the way you mentioned disable that phy does solve this issue.

> 
>> 1.5 Gbit
>>
>>
> 
> I just wonder if it is better to disable that phy altogether rather than drag every other pathway down to this lower linkrate:
> 
> a. that would be simpler than trying to maintain this min pathway
> b. the condition that gives rise to issue is very rare (so don't need to burden libsas with supporting it according to the spec).

However, if the user decreases the linkrate through the sysfs interface, the corresponding phy is disabled, which seems to violate the user's intention.

So I think the libsas layer lacks a mechanism to detect linkrate changes and reduce the linkrate of all SATA-attached phys, whether due to active or passive linkrate changes, and my patch amounts to adding this mechanism.

In other words, it is better to disable the phy in abnormal scenarios. On the other hand, a new mechanism needs to be established to detect and reset the phy if the user actively changes the link rate?

Thanks,
Yihang
diff mbox series

Patch

diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index 110549030bcf..57b5446a3a21 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -164,6 +164,20 @@  static int sas_get_port_device(struct asd_sas_port *port)
 	return 0;
 }
 
+static void sas_check_port_linkrate(struct asd_sas_port *port)
+{
+	u32 link_rate = port->linkrate;
+	struct asd_sas_phy *phy;
+
+	list_for_each_entry(phy, &port->phy_list, port_phy_el)
+		link_rate = min(link_rate, phy->linkrate);
+
+	if (port->linkrate != link_rate) {
+		port->linkrate = link_rate;
+		sas_update_linkrate_root_expander(port->port_dev);
+	}
+}
+
 /* ---------- Discover and Revalidate ---------- */
 
 int sas_notify_lldd_dev_found(struct domain_device *dev)
@@ -434,8 +448,11 @@  static void sas_discover_domain(struct work_struct *work)
 
 	clear_bit(DISCE_DISCOVER_DOMAIN, &port->disc.pending);
 
-	if (port->port_dev)
+	if (port->port_dev) {
+		if (dev_is_expander(port->port_dev->dev_type))
+			sas_check_port_linkrate(port);
 		return;
+	}
 
 	error = sas_get_port_device(port);
 	if (error)
diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 5ce251830104..6f067cc6f2e2 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -1951,6 +1951,75 @@  static int sas_discover_new(struct domain_device *dev, int phy_id)
 	return res;
 }
 
+static void sas_ex_update_linkrate(struct domain_device *parent)
+{
+	struct expander_device *ex = &parent->ex_dev;
+	int i = 0, end = ex->num_phys;
+
+	for ( ; i < end; i++) {
+		struct ex_phy *ex_phy = &ex->ex_phy[i];
+		struct domain_device *child;
+
+		list_for_each_entry(child, &parent->ex_dev.children, siblings)
+			if (SAS_ADDR(child->sas_addr) ==
+			    SAS_ADDR(ex_phy->attached_sas_addr))
+				break;
+
+		if (dev_is_sata(child)) {
+			if (child->linkrate > parent->min_linkrate) {
+				struct sas_phy_linkrates rates = {
+					.maximum_linkrate = parent->min_linkrate,
+					.minimum_linkrate = parent->min_linkrate,
+				};
+
+				sas_smp_phy_control(parent, i,
+						    PHY_FUNC_LINK_RESET, &rates);
+				ex_phy->phy_change_count = -1;
+			}
+		}
+
+		if (dev_is_expander(child->dev_type)) {
+			child->min_linkrate = min(parent->min_linkrate,
+						  ex_phy->linkrate);
+			child->max_linkrate = max(parent->max_linkrate,
+						  ex_phy->linkrate);
+			child->linkrate = min(ex_phy->linkrate,
+					      child->max_linkrate);
+			ex_phy->phy_change_count = -1;
+		}
+	}
+}
+
+static void sas_ex_level_update(struct asd_sas_port *port, const int level)
+{
+	struct domain_device *dev;
+
+	list_for_each_entry(dev, &port->dev_list, dev_list_node) {
+		if (dev_is_expander(dev->dev_type)) {
+			struct sas_expander_device *ex =
+				rphy_to_expander_device(dev->rphy);
+
+			if (level == ex->level)
+				sas_ex_update_linkrate(dev);
+		}
+	}
+}
+
+void sas_update_linkrate_root_expander(struct domain_device *dev)
+{
+	struct asd_sas_port *port = dev->port;
+	struct sas_expander_device *ex = rphy_to_expander_device(dev->rphy);
+	int level;
+
+	dev->linkrate = port->linkrate;
+	dev->min_linkrate = port->linkrate;
+	dev->max_linkrate = port->linkrate;
+	dev->ex_dev.ex_change_count = -1;
+
+	for (level = ex->level; level <= port->disc.max_level; level++)
+		sas_ex_level_update(port, level);
+}
+
 static bool dev_type_flutter(enum sas_device_type new, enum sas_device_type old)
 {
 	if (old == new)
@@ -2013,6 +2082,7 @@  static int sas_rediscover_dev(struct domain_device *dev, int phy_id,
 	} else if (SAS_ADDR(sas_addr) == SAS_ADDR(phy->attached_sas_addr) &&
 		   dev_type_flutter(type, phy->attached_dev_type)) {
 		struct domain_device *ata_dev = sas_ex_to_ata(dev, phy_id);
+		enum sas_linkrate linkrate = phy->linkrate;
 		char *action = "";
 
 		sas_ex_phy_discover(dev, phy_id);
@@ -2021,6 +2091,15 @@  static int sas_rediscover_dev(struct domain_device *dev, int phy_id,
 			action = ", needs recovery";
 		pr_debug("ex %016llx phy%02d broadcast flutter%s\n",
 			 SAS_ADDR(dev->sas_addr), phy_id, action);
+
+		if (linkrate != phy->linkrate) {
+			pr_debug("ex %016llx phy%d linkrate changed from %d to %d\n",
+				 SAS_ADDR(dev->sas_addr), phy_id,
+				 linkrate, phy->linkrate);
+			sas_unregister_devs_sas_addr(dev, phy_id, last);
+			phy->phy_change_count = -1;
+			ex->ex_change_count = -1;
+		}
 		return res;
 	}
 
diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
index b54bcf3c9a9d..d26ef24370f6 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -48,6 +48,7 @@  int sas_show_oob_mode(enum sas_oob_mode oob_mode, char *buf);
 
 int  sas_register_phys(struct sas_ha_struct *sas_ha);
 void sas_unregister_phys(struct sas_ha_struct *sas_ha);
+void sas_update_linkrate_root_expander(struct domain_device *dev);
 
 struct asd_sas_event *sas_alloc_event(struct asd_sas_phy *phy, gfp_t gfp_flags);
 void sas_free_event(struct asd_sas_event *event);