mbox series

[0/6] Miscellaneous FWU fixes

Message ID 20240830114057.891069-1-sughosh.ganu@linaro.org
Headers show
Series Miscellaneous FWU fixes | expand

Message

Sughosh Ganu Aug. 30, 2024, 11:40 a.m. UTC
The following set of patches are miscellaneous fixes and some
hardening of the FWU update logic.

Sughosh Ganu (6):
  fwu: v2: perform some checks before reading metadata
  fwu: v2: try reading both copies of metadata
  fwu: v1: do a version check for the metadata
  fwu: check all images for transitioning out of Trial State
  fwu: add dependency checks for selecting FWU metadata version
  fwu: do not allow capsule processing on exceeding Trial Counter
    threshold

 include/fwu.h            | 11 ++++++
 lib/fwu_updates/Kconfig  |  1 +
 lib/fwu_updates/fwu.c    | 31 +++++++++++++++-
 lib/fwu_updates/fwu_v1.c | 18 +++++++--
 lib/fwu_updates/fwu_v2.c | 80 ++++++++++++++++++++++------------------
 5 files changed, 99 insertions(+), 42 deletions(-)

Comments

Michal Simek Sept. 4, 2024, 8:15 a.m. UTC | #1
On 8/30/24 13:40, Sughosh Ganu wrote:
> 
> The following set of patches are miscellaneous fixes and some
> hardening of the FWU update logic.
> 
> Sughosh Ganu (6):
>    fwu: v2: perform some checks before reading metadata
>    fwu: v2: try reading both copies of metadata
>    fwu: v1: do a version check for the metadata
>    fwu: check all images for transitioning out of Trial State
>    fwu: add dependency checks for selecting FWU metadata version
>    fwu: do not allow capsule processing on exceeding Trial Counter
>      threshold
> 
>   include/fwu.h            | 11 ++++++
>   lib/fwu_updates/Kconfig  |  1 +
>   lib/fwu_updates/fwu.c    | 31 +++++++++++++++-
>   lib/fwu_updates/fwu_v1.c | 18 +++++++--
>   lib/fwu_updates/fwu_v2.c | 80 ++++++++++++++++++++++------------------
>   5 files changed, 99 insertions(+), 42 deletions(-)
> 

I think there is still issue with returning values.
Take a look at my log. I am in trial state - I accepted both images already 
that's why fwu has everything accepted. But I can still apply empty capsules 
which are passing.
1. if they are accepted status should be reflected and visible via fwu
2. if they are not accepted error from command should be returned.

Thanks,
Michal

ZynqMP> fwu
	FWU Metadata
crc32: 0x12fd554e
version: 0x2
size: 0xda
active_index: 0x0
previous_active_index: 0x1
bank_state[0]: 0xfc
bank_state[1]: 0xfc
bank_state[2]: 0xff
bank_state[3]: 0xff
	Image Info

Image Type Guid: DE6066E8-0256-4FAD-8238-E406E274C4CF
Location Guid: D7CE8A58-CE2C-11ED-81CD-D324E93AC223
Image Guid:  F64A0548-2CCE-ED11-8F66-7BC4531CFE6B
Image Acceptance: yes
Image Guid:  3E9C814B-2CCE-ED11-BEC8-23DE4C6D2CF2
Image Acceptance: yes

Image Type Guid: CF9ECFD4-938B-41C5-8551-1F883AB7DC18
Location Guid: D7CE8A58-CE2C-11ED-81CD-D324E93AC223
Image Guid:  52DA04FB-9D0E-EE11-A57F-637805837C3F
Image Acceptance: yes
Image Guid:  46926007-9E0E-EE11-A23A-A38980B779A1
Image Acceptance: yes
Custom fields covered by CRC 0x12
CUSTOM 00000000: 31 32 33 34 35 36 37 38 64 65 61 64 62 65 65 66  12345678deadbeef
CUSTOM 00000010: 0a 0b                                            ..
0xffd80050 = 0xb002c001
0xffd80054 = 0xb0020ff0
0xffd80058 = 0x1d000048
0xffd8005c = 0x0
ZynqMP>   tftpboot 0x100000 192.168.0.105:capsule1-revert.bin && efidebug 
capsule update -v 0x100000
Set rate id 48/125000000
Using ethernet@ff0e0000 device
TFTP from server 192.168.0.105; our IP address is 192.168.0.155
Filename 'capsule1-revert.bin'.
Load address: 0x100000
Loading: #
	 3.9 KiB/s
done
Bytes transferred = 28 (1c hex)
Capsule guid: acd58b4b-c0e8-475f-99b5-6b3f7e07aaf0
Capsule flags: 0x0
Capsule header size: 0x1c
Capsule image size: 0x1c
ZynqMP> echo $?
0
ZynqMP> fwu
	FWU Metadata
crc32: 0x4aef4913
version: 0x2
size: 0xda
active_index: 0x1
previous_active_index: 0x0
bank_state[0]: 0xfc
bank_state[1]: 0xfc
bank_state[2]: 0xff
bank_state[3]: 0xff
	Image Info

Image Type Guid: DE6066E8-0256-4FAD-8238-E406E274C4CF
Location Guid: D7CE8A58-CE2C-11ED-81CD-D324E93AC223
Image Guid:  F64A0548-2CCE-ED11-8F66-7BC4531CFE6B
Image Acceptance: yes
Image Guid:  3E9C814B-2CCE-ED11-BEC8-23DE4C6D2CF2
Image Acceptance: yes

Image Type Guid: CF9ECFD4-938B-41C5-8551-1F883AB7DC18
Location Guid: D7CE8A58-CE2C-11ED-81CD-D324E93AC223
Image Guid:  52DA04FB-9D0E-EE11-A57F-637805837C3F
Image Acceptance: yes
Image Guid:  46926007-9E0E-EE11-A23A-A38980B779A1
Image Acceptance: yes
Custom fields covered by CRC 0x12
CUSTOM 00000000: 31 32 33 34 35 36 37 38 64 65 61 64 62 65 65 66  12345678deadbeef
CUSTOM 00000010: 0a 0b                                            ..
0xffd80050 = 0xb002c001
0xffd80054 = 0xb0020ff0
0xffd80058 = 0x1d000048
0xffd8005c = 0x0
ZynqMP>   tftpboot 0x100000 192.168.0.105:capsule2-accept.bin && efidebug 
capsule update -v 0x100000
Set rate id 48/125000000
Using ethernet@ff0e0000 device
TFTP from server 192.168.0.105; our IP address is 192.168.0.155
Filename 'capsule2-accept.bin'.
Load address: 0x100000
Loading: #
	 5.9 KiB/s
done
Bytes transferred = 44 (2c hex)
Capsule guid: 0c996046-bcc0-4d04-85ec-e1fcedf1c6f8
Capsule flags: 0x0
Capsule header size: 0x1c
Capsule image size: 0x2c
ZynqMP> echo $?
0
ZynqMP> fwu
	FWU Metadata
crc32: 0x4aef4913
version: 0x2
size: 0xda
active_index: 0x1
previous_active_index: 0x0
bank_state[0]: 0xfc
bank_state[1]: 0xfc
bank_state[2]: 0xff
bank_state[3]: 0xff
	Image Info

Image Type Guid: DE6066E8-0256-4FAD-8238-E406E274C4CF
Location Guid: D7CE8A58-CE2C-11ED-81CD-D324E93AC223
Image Guid:  F64A0548-2CCE-ED11-8F66-7BC4531CFE6B
Image Acceptance: yes
Image Guid:  3E9C814B-2CCE-ED11-BEC8-23DE4C6D2CF2
Image Acceptance: yes

Image Type Guid: CF9ECFD4-938B-41C5-8551-1F883AB7DC18
Location Guid: D7CE8A58-CE2C-11ED-81CD-D324E93AC223
Image Guid:  52DA04FB-9D0E-EE11-A57F-637805837C3F
Image Acceptance: yes
Image Guid:  46926007-9E0E-EE11-A23A-A38980B779A1
Image Acceptance: yes
Custom fields covered by CRC 0x12
CUSTOM 00000000: 31 32 33 34 35 36 37 38 64 65 61 64 62 65 65 66  12345678deadbeef
CUSTOM 00000010: 0a 0b                                            ..
0xffd80050 = 0xb002c001
0xffd80054 = 0xb0020ff0
0xffd80058 = 0x1d000048
0xffd8005c = 0x0
ZynqMP>
Sughosh Ganu Sept. 4, 2024, 10:48 a.m. UTC | #2
On Wed, 4 Sept 2024 at 13:45, Michal Simek <michal.simek@amd.com> wrote:
>
>
>
> On 8/30/24 13:40, Sughosh Ganu wrote:
> >
> > The following set of patches are miscellaneous fixes and some
> > hardening of the FWU update logic.
> >
> > Sughosh Ganu (6):
> >    fwu: v2: perform some checks before reading metadata
> >    fwu: v2: try reading both copies of metadata
> >    fwu: v1: do a version check for the metadata
> >    fwu: check all images for transitioning out of Trial State
> >    fwu: add dependency checks for selecting FWU metadata version
> >    fwu: do not allow capsule processing on exceeding Trial Counter
> >      threshold
> >
> >   include/fwu.h            | 11 ++++++
> >   lib/fwu_updates/Kconfig  |  1 +
> >   lib/fwu_updates/fwu.c    | 31 +++++++++++++++-
> >   lib/fwu_updates/fwu_v1.c | 18 +++++++--
> >   lib/fwu_updates/fwu_v2.c | 80 ++++++++++++++++++++++------------------
> >   5 files changed, 99 insertions(+), 42 deletions(-)
> >
>
> I think there is still issue with returning values.
> Take a look at my log. I am in trial state - I accepted both images already
> that's why fwu has everything accepted. But I can still apply empty capsules
> which are passing.
> 1. if they are accepted status should be reflected and visible via fwu
> 2. if they are not accepted error from command should be returned.

The code currently, does not behave incorrectly, but we can indeed
return an error status if the empty capsule checks fail, like is done
for normal capsules. I will add this change in the next version.
Thanks.

-sughosh

>
> Thanks,
> Michal
>
> ZynqMP> fwu
>         FWU Metadata
> crc32: 0x12fd554e
> version: 0x2
> size: 0xda
> active_index: 0x0
> previous_active_index: 0x1
> bank_state[0]: 0xfc
> bank_state[1]: 0xfc
> bank_state[2]: 0xff
> bank_state[3]: 0xff
>         Image Info
>
> Image Type Guid: DE6066E8-0256-4FAD-8238-E406E274C4CF
> Location Guid: D7CE8A58-CE2C-11ED-81CD-D324E93AC223
> Image Guid:  F64A0548-2CCE-ED11-8F66-7BC4531CFE6B
> Image Acceptance: yes
> Image Guid:  3E9C814B-2CCE-ED11-BEC8-23DE4C6D2CF2
> Image Acceptance: yes
>
> Image Type Guid: CF9ECFD4-938B-41C5-8551-1F883AB7DC18
> Location Guid: D7CE8A58-CE2C-11ED-81CD-D324E93AC223
> Image Guid:  52DA04FB-9D0E-EE11-A57F-637805837C3F
> Image Acceptance: yes
> Image Guid:  46926007-9E0E-EE11-A23A-A38980B779A1
> Image Acceptance: yes
> Custom fields covered by CRC 0x12
> CUSTOM 00000000: 31 32 33 34 35 36 37 38 64 65 61 64 62 65 65 66  12345678deadbeef
> CUSTOM 00000010: 0a 0b                                            ..
> 0xffd80050 = 0xb002c001
> 0xffd80054 = 0xb0020ff0
> 0xffd80058 = 0x1d000048
> 0xffd8005c = 0x0
> ZynqMP>   tftpboot 0x100000 192.168.0.105:capsule1-revert.bin && efidebug
> capsule update -v 0x100000
> Set rate id 48/125000000
> Using ethernet@ff0e0000 device
> TFTP from server 192.168.0.105; our IP address is 192.168.0.155
> Filename 'capsule1-revert.bin'.
> Load address: 0x100000
> Loading: #
>          3.9 KiB/s
> done
> Bytes transferred = 28 (1c hex)
> Capsule guid: acd58b4b-c0e8-475f-99b5-6b3f7e07aaf0
> Capsule flags: 0x0
> Capsule header size: 0x1c
> Capsule image size: 0x1c
> ZynqMP> echo $?
> 0
> ZynqMP> fwu
>         FWU Metadata
> crc32: 0x4aef4913
> version: 0x2
> size: 0xda
> active_index: 0x1
> previous_active_index: 0x0
> bank_state[0]: 0xfc
> bank_state[1]: 0xfc
> bank_state[2]: 0xff
> bank_state[3]: 0xff
>         Image Info
>
> Image Type Guid: DE6066E8-0256-4FAD-8238-E406E274C4CF
> Location Guid: D7CE8A58-CE2C-11ED-81CD-D324E93AC223
> Image Guid:  F64A0548-2CCE-ED11-8F66-7BC4531CFE6B
> Image Acceptance: yes
> Image Guid:  3E9C814B-2CCE-ED11-BEC8-23DE4C6D2CF2
> Image Acceptance: yes
>
> Image Type Guid: CF9ECFD4-938B-41C5-8551-1F883AB7DC18
> Location Guid: D7CE8A58-CE2C-11ED-81CD-D324E93AC223
> Image Guid:  52DA04FB-9D0E-EE11-A57F-637805837C3F
> Image Acceptance: yes
> Image Guid:  46926007-9E0E-EE11-A23A-A38980B779A1
> Image Acceptance: yes
> Custom fields covered by CRC 0x12
> CUSTOM 00000000: 31 32 33 34 35 36 37 38 64 65 61 64 62 65 65 66  12345678deadbeef
> CUSTOM 00000010: 0a 0b                                            ..
> 0xffd80050 = 0xb002c001
> 0xffd80054 = 0xb0020ff0
> 0xffd80058 = 0x1d000048
> 0xffd8005c = 0x0
> ZynqMP>   tftpboot 0x100000 192.168.0.105:capsule2-accept.bin && efidebug
> capsule update -v 0x100000
> Set rate id 48/125000000
> Using ethernet@ff0e0000 device
> TFTP from server 192.168.0.105; our IP address is 192.168.0.155
> Filename 'capsule2-accept.bin'.
> Load address: 0x100000
> Loading: #
>          5.9 KiB/s
> done
> Bytes transferred = 44 (2c hex)
> Capsule guid: 0c996046-bcc0-4d04-85ec-e1fcedf1c6f8
> Capsule flags: 0x0
> Capsule header size: 0x1c
> Capsule image size: 0x2c
> ZynqMP> echo $?
> 0
> ZynqMP> fwu
>         FWU Metadata
> crc32: 0x4aef4913
> version: 0x2
> size: 0xda
> active_index: 0x1
> previous_active_index: 0x0
> bank_state[0]: 0xfc
> bank_state[1]: 0xfc
> bank_state[2]: 0xff
> bank_state[3]: 0xff
>         Image Info
>
> Image Type Guid: DE6066E8-0256-4FAD-8238-E406E274C4CF
> Location Guid: D7CE8A58-CE2C-11ED-81CD-D324E93AC223
> Image Guid:  F64A0548-2CCE-ED11-8F66-7BC4531CFE6B
> Image Acceptance: yes
> Image Guid:  3E9C814B-2CCE-ED11-BEC8-23DE4C6D2CF2
> Image Acceptance: yes
>
> Image Type Guid: CF9ECFD4-938B-41C5-8551-1F883AB7DC18
> Location Guid: D7CE8A58-CE2C-11ED-81CD-D324E93AC223
> Image Guid:  52DA04FB-9D0E-EE11-A57F-637805837C3F
> Image Acceptance: yes
> Image Guid:  46926007-9E0E-EE11-A23A-A38980B779A1
> Image Acceptance: yes
> Custom fields covered by CRC 0x12
> CUSTOM 00000000: 31 32 33 34 35 36 37 38 64 65 61 64 62 65 65 66  12345678deadbeef
> CUSTOM 00000010: 0a 0b                                            ..
> 0xffd80050 = 0xb002c001
> 0xffd80054 = 0xb0020ff0
> 0xffd80058 = 0x1d000048
> 0xffd8005c = 0x0
> ZynqMP>
>
Michal Simek Sept. 4, 2024, 11:59 a.m. UTC | #3
On 8/30/24 13:40, Sughosh Ganu wrote:
> 
> The following set of patches are miscellaneous fixes and some
> hardening of the FWU update logic.
> 
> Sughosh Ganu (6):
>    fwu: v2: perform some checks before reading metadata
>    fwu: v2: try reading both copies of metadata
>    fwu: v1: do a version check for the metadata
>    fwu: check all images for transitioning out of Trial State
>    fwu: add dependency checks for selecting FWU metadata version
>    fwu: do not allow capsule processing on exceeding Trial Counter
>      threshold
> 
>   include/fwu.h            | 11 ++++++
>   lib/fwu_updates/Kconfig  |  1 +
>   lib/fwu_updates/fwu.c    | 31 +++++++++++++++-
>   lib/fwu_updates/fwu_v1.c | 18 +++++++--
>   lib/fwu_updates/fwu_v2.c | 80 ++++++++++++++++++++++------------------
>   5 files changed, 99 insertions(+), 42 deletions(-)
> 

I found one more thing.

I did this change

diff --git a/tools/mkfwumdata.c b/tools/mkfwumdata.c
index fbc2067bc12d..dab9530e499c 100644
--- a/tools/mkfwumdata.c
+++ b/tools/mkfwumdata.c
@@ -293,7 +293,7 @@ static void fwu_fill_version_specific_mdata(struct 
fwu_mdata_object *mobj)
         struct fwu_mdata *mdata = mobj->mdata;

         mdata->metadata_size = mobj->size;
-       mdata->desc_offset = sizeof(struct fwu_mdata);
+       mdata->desc_offset = 0x21; sizeof(struct fwu_mdata);

         for (i = 0; i < MAX_BANKS_V2; i++)
                 mdata->bank_state[i] = i < mobj->banks ?


to break desc_offset location. I generated mdata structure and write it to 
primary mdata partition.

This has been spotted by (my debug) which is the reason why I did it.

	if (g_mdata.desc_offset != FWU_IMG_DESC_OFFSET) {
		printf("mdata offset is not 0x20\n");
		return false;
	}

But I got one more message below which

mdata offset is not 0x20
Both FWU metadata copies are valid but do not match. Restoring the secondary 
partition from the primary

But this is wrong. 0x21 there is wrong as been detected but sync code detects it 
as correct one and by this break also backup copy.

Thanks,
Michal
Sughosh Ganu Sept. 5, 2024, 6:24 a.m. UTC | #4
On Wed, 4 Sept 2024 at 17:30, Michal Simek <michal.simek@amd.com> wrote:
>
>
>
> On 8/30/24 13:40, Sughosh Ganu wrote:
> >
> > The following set of patches are miscellaneous fixes and some
> > hardening of the FWU update logic.
> >
> > Sughosh Ganu (6):
> >    fwu: v2: perform some checks before reading metadata
> >    fwu: v2: try reading both copies of metadata
> >    fwu: v1: do a version check for the metadata
> >    fwu: check all images for transitioning out of Trial State
> >    fwu: add dependency checks for selecting FWU metadata version
> >    fwu: do not allow capsule processing on exceeding Trial Counter
> >      threshold
> >
> >   include/fwu.h            | 11 ++++++
> >   lib/fwu_updates/Kconfig  |  1 +
> >   lib/fwu_updates/fwu.c    | 31 +++++++++++++++-
> >   lib/fwu_updates/fwu_v1.c | 18 +++++++--
> >   lib/fwu_updates/fwu_v2.c | 80 ++++++++++++++++++++++------------------
> >   5 files changed, 99 insertions(+), 42 deletions(-)
> >
>
> I found one more thing.
>
> I did this change
>
> diff --git a/tools/mkfwumdata.c b/tools/mkfwumdata.c
> index fbc2067bc12d..dab9530e499c 100644
> --- a/tools/mkfwumdata.c
> +++ b/tools/mkfwumdata.c
> @@ -293,7 +293,7 @@ static void fwu_fill_version_specific_mdata(struct
> fwu_mdata_object *mobj)
>          struct fwu_mdata *mdata = mobj->mdata;
>
>          mdata->metadata_size = mobj->size;
> -       mdata->desc_offset = sizeof(struct fwu_mdata);
> +       mdata->desc_offset = 0x21; sizeof(struct fwu_mdata);
>
>          for (i = 0; i < MAX_BANKS_V2; i++)
>                  mdata->bank_state[i] = i < mobj->banks ?
>
>
> to break desc_offset location. I generated mdata structure and write it to
> primary mdata partition.
>
> This has been spotted by (my debug) which is the reason why I did it.
>
>         if (g_mdata.desc_offset != FWU_IMG_DESC_OFFSET) {
>                 printf("mdata offset is not 0x20\n");
>                 return false;
>         }
>
> But I got one more message below which
>
> mdata offset is not 0x20
> Both FWU metadata copies are valid but do not match. Restoring the secondary
> partition from the primary

The reason why this logic is in place is to handle the scenario
whereby, during the metadata update, the primary copy gets updated
correctly, but due to some reason (like a power cut), the secondary
copy does not -- this is because the primary copy gets updated first.
In such a scenario, the secondary copy of metadata can be then
restored from the primary copy. In the scenario where the primary copy
is corrupted, it will show up in the crc check, and will get restored
from the secondary copy.

-sughosh

>
> But this is wrong. 0x21 there is wrong as been detected but sync code detects it
> as correct one and by this break also backup copy.
>
> Thanks,
> Michal
Michal Simek Sept. 5, 2024, 7:33 a.m. UTC | #5
On 9/5/24 08:24, Sughosh Ganu wrote:
> On Wed, 4 Sept 2024 at 17:30, Michal Simek <michal.simek@amd.com> wrote:
>>
>>
>>
>> On 8/30/24 13:40, Sughosh Ganu wrote:
>>>
>>> The following set of patches are miscellaneous fixes and some
>>> hardening of the FWU update logic.
>>>
>>> Sughosh Ganu (6):
>>>     fwu: v2: perform some checks before reading metadata
>>>     fwu: v2: try reading both copies of metadata
>>>     fwu: v1: do a version check for the metadata
>>>     fwu: check all images for transitioning out of Trial State
>>>     fwu: add dependency checks for selecting FWU metadata version
>>>     fwu: do not allow capsule processing on exceeding Trial Counter
>>>       threshold
>>>
>>>    include/fwu.h            | 11 ++++++
>>>    lib/fwu_updates/Kconfig  |  1 +
>>>    lib/fwu_updates/fwu.c    | 31 +++++++++++++++-
>>>    lib/fwu_updates/fwu_v1.c | 18 +++++++--
>>>    lib/fwu_updates/fwu_v2.c | 80 ++++++++++++++++++++++------------------
>>>    5 files changed, 99 insertions(+), 42 deletions(-)
>>>
>>
>> I found one more thing.
>>
>> I did this change
>>
>> diff --git a/tools/mkfwumdata.c b/tools/mkfwumdata.c
>> index fbc2067bc12d..dab9530e499c 100644
>> --- a/tools/mkfwumdata.c
>> +++ b/tools/mkfwumdata.c
>> @@ -293,7 +293,7 @@ static void fwu_fill_version_specific_mdata(struct
>> fwu_mdata_object *mobj)
>>           struct fwu_mdata *mdata = mobj->mdata;
>>
>>           mdata->metadata_size = mobj->size;
>> -       mdata->desc_offset = sizeof(struct fwu_mdata);
>> +       mdata->desc_offset = 0x21; sizeof(struct fwu_mdata);
>>
>>           for (i = 0; i < MAX_BANKS_V2; i++)
>>                   mdata->bank_state[i] = i < mobj->banks ?
>>
>>
>> to break desc_offset location. I generated mdata structure and write it to
>> primary mdata partition.
>>
>> This has been spotted by (my debug) which is the reason why I did it.
>>
>>          if (g_mdata.desc_offset != FWU_IMG_DESC_OFFSET) {
>>                  printf("mdata offset is not 0x20\n");
>>                  return false;
>>          }
>>
>> But I got one more message below which
>>
>> mdata offset is not 0x20
>> Both FWU metadata copies are valid but do not match. Restoring the secondary
>> partition from the primary
> 
> The reason why this logic is in place is to handle the scenario
> whereby, during the metadata update, the primary copy gets updated
> correctly, but due to some reason (like a power cut), the secondary
> copy does not -- this is because the primary copy gets updated first.
> In such a scenario, the secondary copy of metadata can be then
> restored from the primary copy. In the scenario where the primary copy
> is corrupted, it will show up in the crc check, and will get restored
> from the secondary copy.

But in this case - CRC of primary is correct but data inside is wrong because 
offset is at 0x21 not at 0x20.
That likely means that U-Boot is not actually using this field to find where 
data is but I think that's fine to say not supported mdata structure but u-boot 
should never copy this "primary broken" description to secondary one as syncup.

M
Sughosh Ganu Sept. 5, 2024, 7:38 a.m. UTC | #6
On Thu, 5 Sept 2024 at 13:03, Michal Simek <michal.simek@amd.com> wrote:
>
>
>
> On 9/5/24 08:24, Sughosh Ganu wrote:
> > On Wed, 4 Sept 2024 at 17:30, Michal Simek <michal.simek@amd.com> wrote:
> >>
> >>
> >>
> >> On 8/30/24 13:40, Sughosh Ganu wrote:
> >>>
> >>> The following set of patches are miscellaneous fixes and some
> >>> hardening of the FWU update logic.
> >>>
> >>> Sughosh Ganu (6):
> >>>     fwu: v2: perform some checks before reading metadata
> >>>     fwu: v2: try reading both copies of metadata
> >>>     fwu: v1: do a version check for the metadata
> >>>     fwu: check all images for transitioning out of Trial State
> >>>     fwu: add dependency checks for selecting FWU metadata version
> >>>     fwu: do not allow capsule processing on exceeding Trial Counter
> >>>       threshold
> >>>
> >>>    include/fwu.h            | 11 ++++++
> >>>    lib/fwu_updates/Kconfig  |  1 +
> >>>    lib/fwu_updates/fwu.c    | 31 +++++++++++++++-
> >>>    lib/fwu_updates/fwu_v1.c | 18 +++++++--
> >>>    lib/fwu_updates/fwu_v2.c | 80 ++++++++++++++++++++++------------------
> >>>    5 files changed, 99 insertions(+), 42 deletions(-)
> >>>
> >>
> >> I found one more thing.
> >>
> >> I did this change
> >>
> >> diff --git a/tools/mkfwumdata.c b/tools/mkfwumdata.c
> >> index fbc2067bc12d..dab9530e499c 100644
> >> --- a/tools/mkfwumdata.c
> >> +++ b/tools/mkfwumdata.c
> >> @@ -293,7 +293,7 @@ static void fwu_fill_version_specific_mdata(struct
> >> fwu_mdata_object *mobj)
> >>           struct fwu_mdata *mdata = mobj->mdata;
> >>
> >>           mdata->metadata_size = mobj->size;
> >> -       mdata->desc_offset = sizeof(struct fwu_mdata);
> >> +       mdata->desc_offset = 0x21; sizeof(struct fwu_mdata);
> >>
> >>           for (i = 0; i < MAX_BANKS_V2; i++)
> >>                   mdata->bank_state[i] = i < mobj->banks ?
> >>
> >>
> >> to break desc_offset location. I generated mdata structure and write it to
> >> primary mdata partition.
> >>
> >> This has been spotted by (my debug) which is the reason why I did it.
> >>
> >>          if (g_mdata.desc_offset != FWU_IMG_DESC_OFFSET) {
> >>                  printf("mdata offset is not 0x20\n");
> >>                  return false;
> >>          }
> >>
> >> But I got one more message below which
> >>
> >> mdata offset is not 0x20
> >> Both FWU metadata copies are valid but do not match. Restoring the secondary
> >> partition from the primary
> >
> > The reason why this logic is in place is to handle the scenario
> > whereby, during the metadata update, the primary copy gets updated
> > correctly, but due to some reason (like a power cut), the secondary
> > copy does not -- this is because the primary copy gets updated first.
> > In such a scenario, the secondary copy of metadata can be then
> > restored from the primary copy. In the scenario where the primary copy
> > is corrupted, it will show up in the crc check, and will get restored
> > from the secondary copy.
>
> But in this case - CRC of primary is correct but data inside is wrong because
> offset is at 0x21 not at 0x20.

How will this event play out ? Am I missing something ? When the
metadata is getting updated, the offset will be written as 0x20, and
not 0x21. And in case the metadata gets corrupted, the CRC check would
fail and the primary partition would get restored from the secondary
(assuming that the secondary copy is correct).

-sughosh

> That likely means that U-Boot is not actually using this field to find where
> data is but I think that's fine to say not supported mdata structure but u-boot
> should never copy this "primary broken" description to secondary one as syncup.
>
> M
>
Michal Simek Sept. 5, 2024, 7:39 a.m. UTC | #7
On 9/5/24 09:38, Sughosh Ganu wrote:
> On Thu, 5 Sept 2024 at 13:03, Michal Simek <michal.simek@amd.com> wrote:
>>
>>
>>
>> On 9/5/24 08:24, Sughosh Ganu wrote:
>>> On Wed, 4 Sept 2024 at 17:30, Michal Simek <michal.simek@amd.com> wrote:
>>>>
>>>>
>>>>
>>>> On 8/30/24 13:40, Sughosh Ganu wrote:
>>>>>
>>>>> The following set of patches are miscellaneous fixes and some
>>>>> hardening of the FWU update logic.
>>>>>
>>>>> Sughosh Ganu (6):
>>>>>      fwu: v2: perform some checks before reading metadata
>>>>>      fwu: v2: try reading both copies of metadata
>>>>>      fwu: v1: do a version check for the metadata
>>>>>      fwu: check all images for transitioning out of Trial State
>>>>>      fwu: add dependency checks for selecting FWU metadata version
>>>>>      fwu: do not allow capsule processing on exceeding Trial Counter
>>>>>        threshold
>>>>>
>>>>>     include/fwu.h            | 11 ++++++
>>>>>     lib/fwu_updates/Kconfig  |  1 +
>>>>>     lib/fwu_updates/fwu.c    | 31 +++++++++++++++-
>>>>>     lib/fwu_updates/fwu_v1.c | 18 +++++++--
>>>>>     lib/fwu_updates/fwu_v2.c | 80 ++++++++++++++++++++++------------------
>>>>>     5 files changed, 99 insertions(+), 42 deletions(-)
>>>>>
>>>>
>>>> I found one more thing.
>>>>
>>>> I did this change
>>>>
>>>> diff --git a/tools/mkfwumdata.c b/tools/mkfwumdata.c
>>>> index fbc2067bc12d..dab9530e499c 100644
>>>> --- a/tools/mkfwumdata.c
>>>> +++ b/tools/mkfwumdata.c
>>>> @@ -293,7 +293,7 @@ static void fwu_fill_version_specific_mdata(struct
>>>> fwu_mdata_object *mobj)
>>>>            struct fwu_mdata *mdata = mobj->mdata;
>>>>
>>>>            mdata->metadata_size = mobj->size;
>>>> -       mdata->desc_offset = sizeof(struct fwu_mdata);
>>>> +       mdata->desc_offset = 0x21; sizeof(struct fwu_mdata);
>>>>
>>>>            for (i = 0; i < MAX_BANKS_V2; i++)
>>>>                    mdata->bank_state[i] = i < mobj->banks ?
>>>>
>>>>
>>>> to break desc_offset location. I generated mdata structure and write it to
>>>> primary mdata partition.
>>>>
>>>> This has been spotted by (my debug) which is the reason why I did it.
>>>>
>>>>           if (g_mdata.desc_offset != FWU_IMG_DESC_OFFSET) {
>>>>                   printf("mdata offset is not 0x20\n");
>>>>                   return false;
>>>>           }
>>>>
>>>> But I got one more message below which
>>>>
>>>> mdata offset is not 0x20
>>>> Both FWU metadata copies are valid but do not match. Restoring the secondary
>>>> partition from the primary
>>>
>>> The reason why this logic is in place is to handle the scenario
>>> whereby, during the metadata update, the primary copy gets updated
>>> correctly, but due to some reason (like a power cut), the secondary
>>> copy does not -- this is because the primary copy gets updated first.
>>> In such a scenario, the secondary copy of metadata can be then
>>> restored from the primary copy. In the scenario where the primary copy
>>> is corrupted, it will show up in the crc check, and will get restored
>>> from the secondary copy.
>>
>> But in this case - CRC of primary is correct but data inside is wrong because
>> offset is at 0x21 not at 0x20.
> 
> How will this event play out ? Am I missing something ? When the
> metadata is getting updated, the offset will be written as 0x20, and
> not 0x21. And in case the metadata gets corrupted, the CRC check would
> fail and the primary partition would get restored from the secondary
> (assuming that the secondary copy is correct).

It is not after update. It is initial state with incorrect mdata structure.

M
Sughosh Ganu Sept. 5, 2024, 7:43 a.m. UTC | #8
On Thu, 5 Sept 2024 at 13:09, Michal Simek <michal.simek@amd.com> wrote:
>
>
>
> On 9/5/24 09:38, Sughosh Ganu wrote:
> > On Thu, 5 Sept 2024 at 13:03, Michal Simek <michal.simek@amd.com> wrote:
> >>
> >>
> >>
> >> On 9/5/24 08:24, Sughosh Ganu wrote:
> >>> On Wed, 4 Sept 2024 at 17:30, Michal Simek <michal.simek@amd.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 8/30/24 13:40, Sughosh Ganu wrote:
> >>>>>
> >>>>> The following set of patches are miscellaneous fixes and some
> >>>>> hardening of the FWU update logic.
> >>>>>
> >>>>> Sughosh Ganu (6):
> >>>>>      fwu: v2: perform some checks before reading metadata
> >>>>>      fwu: v2: try reading both copies of metadata
> >>>>>      fwu: v1: do a version check for the metadata
> >>>>>      fwu: check all images for transitioning out of Trial State
> >>>>>      fwu: add dependency checks for selecting FWU metadata version
> >>>>>      fwu: do not allow capsule processing on exceeding Trial Counter
> >>>>>        threshold
> >>>>>
> >>>>>     include/fwu.h            | 11 ++++++
> >>>>>     lib/fwu_updates/Kconfig  |  1 +
> >>>>>     lib/fwu_updates/fwu.c    | 31 +++++++++++++++-
> >>>>>     lib/fwu_updates/fwu_v1.c | 18 +++++++--
> >>>>>     lib/fwu_updates/fwu_v2.c | 80 ++++++++++++++++++++++------------------
> >>>>>     5 files changed, 99 insertions(+), 42 deletions(-)
> >>>>>
> >>>>
> >>>> I found one more thing.
> >>>>
> >>>> I did this change
> >>>>
> >>>> diff --git a/tools/mkfwumdata.c b/tools/mkfwumdata.c
> >>>> index fbc2067bc12d..dab9530e499c 100644
> >>>> --- a/tools/mkfwumdata.c
> >>>> +++ b/tools/mkfwumdata.c
> >>>> @@ -293,7 +293,7 @@ static void fwu_fill_version_specific_mdata(struct
> >>>> fwu_mdata_object *mobj)
> >>>>            struct fwu_mdata *mdata = mobj->mdata;
> >>>>
> >>>>            mdata->metadata_size = mobj->size;
> >>>> -       mdata->desc_offset = sizeof(struct fwu_mdata);
> >>>> +       mdata->desc_offset = 0x21; sizeof(struct fwu_mdata);
> >>>>
> >>>>            for (i = 0; i < MAX_BANKS_V2; i++)
> >>>>                    mdata->bank_state[i] = i < mobj->banks ?
> >>>>
> >>>>
> >>>> to break desc_offset location. I generated mdata structure and write it to
> >>>> primary mdata partition.
> >>>>
> >>>> This has been spotted by (my debug) which is the reason why I did it.
> >>>>
> >>>>           if (g_mdata.desc_offset != FWU_IMG_DESC_OFFSET) {
> >>>>                   printf("mdata offset is not 0x20\n");
> >>>>                   return false;
> >>>>           }
> >>>>
> >>>> But I got one more message below which
> >>>>
> >>>> mdata offset is not 0x20
> >>>> Both FWU metadata copies are valid but do not match. Restoring the secondary
> >>>> partition from the primary
> >>>
> >>> The reason why this logic is in place is to handle the scenario
> >>> whereby, during the metadata update, the primary copy gets updated
> >>> correctly, but due to some reason (like a power cut), the secondary
> >>> copy does not -- this is because the primary copy gets updated first.
> >>> In such a scenario, the secondary copy of metadata can be then
> >>> restored from the primary copy. In the scenario where the primary copy
> >>> is corrupted, it will show up in the crc check, and will get restored
> >>> from the secondary copy.
> >>
> >> But in this case - CRC of primary is correct but data inside is wrong because
> >> offset is at 0x21 not at 0x20.
> >
> > How will this event play out ? Am I missing something ? When the
> > metadata is getting updated, the offset will be written as 0x20, and
> > not 0x21. And in case the metadata gets corrupted, the CRC check would
> > fail and the primary partition would get restored from the secondary
> > (assuming that the secondary copy is correct).
>
> It is not after update. It is initial state with incorrect mdata structure.

Well, if the metadata partition is being written with incorrect data,
not sure how we combat that (or even if we should). After all, the
spec clearly states that the metadata cannot be protected against
malicious writes. The logic that you pointed out earlier is to handle
the scenario where the primary partition got updated, but the
secondary did not.

-sughosh
Michal Simek Sept. 5, 2024, 7:50 a.m. UTC | #9
On 9/5/24 09:43, Sughosh Ganu wrote:
> On Thu, 5 Sept 2024 at 13:09, Michal Simek <michal.simek@amd.com> wrote:
>>
>>
>>
>> On 9/5/24 09:38, Sughosh Ganu wrote:
>>> On Thu, 5 Sept 2024 at 13:03, Michal Simek <michal.simek@amd.com> wrote:
>>>>
>>>>
>>>>
>>>> On 9/5/24 08:24, Sughosh Ganu wrote:
>>>>> On Wed, 4 Sept 2024 at 17:30, Michal Simek <michal.simek@amd.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 8/30/24 13:40, Sughosh Ganu wrote:
>>>>>>>
>>>>>>> The following set of patches are miscellaneous fixes and some
>>>>>>> hardening of the FWU update logic.
>>>>>>>
>>>>>>> Sughosh Ganu (6):
>>>>>>>       fwu: v2: perform some checks before reading metadata
>>>>>>>       fwu: v2: try reading both copies of metadata
>>>>>>>       fwu: v1: do a version check for the metadata
>>>>>>>       fwu: check all images for transitioning out of Trial State
>>>>>>>       fwu: add dependency checks for selecting FWU metadata version
>>>>>>>       fwu: do not allow capsule processing on exceeding Trial Counter
>>>>>>>         threshold
>>>>>>>
>>>>>>>      include/fwu.h            | 11 ++++++
>>>>>>>      lib/fwu_updates/Kconfig  |  1 +
>>>>>>>      lib/fwu_updates/fwu.c    | 31 +++++++++++++++-
>>>>>>>      lib/fwu_updates/fwu_v1.c | 18 +++++++--
>>>>>>>      lib/fwu_updates/fwu_v2.c | 80 ++++++++++++++++++++++------------------
>>>>>>>      5 files changed, 99 insertions(+), 42 deletions(-)
>>>>>>>
>>>>>>
>>>>>> I found one more thing.
>>>>>>
>>>>>> I did this change
>>>>>>
>>>>>> diff --git a/tools/mkfwumdata.c b/tools/mkfwumdata.c
>>>>>> index fbc2067bc12d..dab9530e499c 100644
>>>>>> --- a/tools/mkfwumdata.c
>>>>>> +++ b/tools/mkfwumdata.c
>>>>>> @@ -293,7 +293,7 @@ static void fwu_fill_version_specific_mdata(struct
>>>>>> fwu_mdata_object *mobj)
>>>>>>             struct fwu_mdata *mdata = mobj->mdata;
>>>>>>
>>>>>>             mdata->metadata_size = mobj->size;
>>>>>> -       mdata->desc_offset = sizeof(struct fwu_mdata);
>>>>>> +       mdata->desc_offset = 0x21; sizeof(struct fwu_mdata);
>>>>>>
>>>>>>             for (i = 0; i < MAX_BANKS_V2; i++)
>>>>>>                     mdata->bank_state[i] = i < mobj->banks ?
>>>>>>
>>>>>>
>>>>>> to break desc_offset location. I generated mdata structure and write it to
>>>>>> primary mdata partition.
>>>>>>
>>>>>> This has been spotted by (my debug) which is the reason why I did it.
>>>>>>
>>>>>>            if (g_mdata.desc_offset != FWU_IMG_DESC_OFFSET) {
>>>>>>                    printf("mdata offset is not 0x20\n");
>>>>>>                    return false;
>>>>>>            }
>>>>>>
>>>>>> But I got one more message below which
>>>>>>
>>>>>> mdata offset is not 0x20
>>>>>> Both FWU metadata copies are valid but do not match. Restoring the secondary
>>>>>> partition from the primary
>>>>>
>>>>> The reason why this logic is in place is to handle the scenario
>>>>> whereby, during the metadata update, the primary copy gets updated
>>>>> correctly, but due to some reason (like a power cut), the secondary
>>>>> copy does not -- this is because the primary copy gets updated first.
>>>>> In such a scenario, the secondary copy of metadata can be then
>>>>> restored from the primary copy. In the scenario where the primary copy
>>>>> is corrupted, it will show up in the crc check, and will get restored
>>>>> from the secondary copy.
>>>>
>>>> But in this case - CRC of primary is correct but data inside is wrong because
>>>> offset is at 0x21 not at 0x20.
>>>
>>> How will this event play out ? Am I missing something ? When the
>>> metadata is getting updated, the offset will be written as 0x20, and
>>> not 0x21. And in case the metadata gets corrupted, the CRC check would
>>> fail and the primary partition would get restored from the secondary
>>> (assuming that the secondary copy is correct).
>>
>> It is not after update. It is initial state with incorrect mdata structure.
> 
> Well, if the metadata partition is being written with incorrect data,
> not sure how we combat that (or even if we should). After all, the
> spec clearly states that the metadata cannot be protected against
> malicious writes. The logic that you pointed out earlier is to handle
> the scenario where the primary partition got updated, but the
> secondary did not.

I don't disagree with you.

Scenario here is that primary partition is updated with incorrect data content 
(but still correct CRC). This is correctly detected that data is not right.
It means primary partition is wrong and shouldn't be used and it is not used.

But because it has correct CRC syncup code logic is doing syncup.

That's why I think we have two codes which have pretty much independent logic 
what to do. If primary is wrong sync should be from secondary to primary but 
that's not what code is doing.

Thanks,
Michal
Sughosh Ganu Sept. 5, 2024, 8:12 a.m. UTC | #10
On Thu, 5 Sept 2024 at 13:20, Michal Simek <michal.simek@amd.com> wrote:
>
>
>
> On 9/5/24 09:43, Sughosh Ganu wrote:
> > On Thu, 5 Sept 2024 at 13:09, Michal Simek <michal.simek@amd.com> wrote:
> >>
> >>
> >>
> >> On 9/5/24 09:38, Sughosh Ganu wrote:
> >>> On Thu, 5 Sept 2024 at 13:03, Michal Simek <michal.simek@amd.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 9/5/24 08:24, Sughosh Ganu wrote:
> >>>>> On Wed, 4 Sept 2024 at 17:30, Michal Simek <michal.simek@amd.com> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 8/30/24 13:40, Sughosh Ganu wrote:
> >>>>>>>
> >>>>>>> The following set of patches are miscellaneous fixes and some
> >>>>>>> hardening of the FWU update logic.
> >>>>>>>
> >>>>>>> Sughosh Ganu (6):
> >>>>>>>       fwu: v2: perform some checks before reading metadata
> >>>>>>>       fwu: v2: try reading both copies of metadata
> >>>>>>>       fwu: v1: do a version check for the metadata
> >>>>>>>       fwu: check all images for transitioning out of Trial State
> >>>>>>>       fwu: add dependency checks for selecting FWU metadata version
> >>>>>>>       fwu: do not allow capsule processing on exceeding Trial Counter
> >>>>>>>         threshold
> >>>>>>>
> >>>>>>>      include/fwu.h            | 11 ++++++
> >>>>>>>      lib/fwu_updates/Kconfig  |  1 +
> >>>>>>>      lib/fwu_updates/fwu.c    | 31 +++++++++++++++-
> >>>>>>>      lib/fwu_updates/fwu_v1.c | 18 +++++++--
> >>>>>>>      lib/fwu_updates/fwu_v2.c | 80 ++++++++++++++++++++++------------------
> >>>>>>>      5 files changed, 99 insertions(+), 42 deletions(-)
> >>>>>>>
> >>>>>>
> >>>>>> I found one more thing.
> >>>>>>
> >>>>>> I did this change
> >>>>>>
> >>>>>> diff --git a/tools/mkfwumdata.c b/tools/mkfwumdata.c
> >>>>>> index fbc2067bc12d..dab9530e499c 100644
> >>>>>> --- a/tools/mkfwumdata.c
> >>>>>> +++ b/tools/mkfwumdata.c
> >>>>>> @@ -293,7 +293,7 @@ static void fwu_fill_version_specific_mdata(struct
> >>>>>> fwu_mdata_object *mobj)
> >>>>>>             struct fwu_mdata *mdata = mobj->mdata;
> >>>>>>
> >>>>>>             mdata->metadata_size = mobj->size;
> >>>>>> -       mdata->desc_offset = sizeof(struct fwu_mdata);
> >>>>>> +       mdata->desc_offset = 0x21; sizeof(struct fwu_mdata);
> >>>>>>
> >>>>>>             for (i = 0; i < MAX_BANKS_V2; i++)
> >>>>>>                     mdata->bank_state[i] = i < mobj->banks ?
> >>>>>>
> >>>>>>
> >>>>>> to break desc_offset location. I generated mdata structure and write it to
> >>>>>> primary mdata partition.
> >>>>>>
> >>>>>> This has been spotted by (my debug) which is the reason why I did it.
> >>>>>>
> >>>>>>            if (g_mdata.desc_offset != FWU_IMG_DESC_OFFSET) {
> >>>>>>                    printf("mdata offset is not 0x20\n");
> >>>>>>                    return false;
> >>>>>>            }
> >>>>>>
> >>>>>> But I got one more message below which
> >>>>>>
> >>>>>> mdata offset is not 0x20
> >>>>>> Both FWU metadata copies are valid but do not match. Restoring the secondary
> >>>>>> partition from the primary
> >>>>>
> >>>>> The reason why this logic is in place is to handle the scenario
> >>>>> whereby, during the metadata update, the primary copy gets updated
> >>>>> correctly, but due to some reason (like a power cut), the secondary
> >>>>> copy does not -- this is because the primary copy gets updated first.
> >>>>> In such a scenario, the secondary copy of metadata can be then
> >>>>> restored from the primary copy. In the scenario where the primary copy
> >>>>> is corrupted, it will show up in the crc check, and will get restored
> >>>>> from the secondary copy.
> >>>>
> >>>> But in this case - CRC of primary is correct but data inside is wrong because
> >>>> offset is at 0x21 not at 0x20.
> >>>
> >>> How will this event play out ? Am I missing something ? When the
> >>> metadata is getting updated, the offset will be written as 0x20, and
> >>> not 0x21. And in case the metadata gets corrupted, the CRC check would
> >>> fail and the primary partition would get restored from the secondary
> >>> (assuming that the secondary copy is correct).
> >>
> >> It is not after update. It is initial state with incorrect mdata structure.
> >
> > Well, if the metadata partition is being written with incorrect data,
> > not sure how we combat that (or even if we should). After all, the
> > spec clearly states that the metadata cannot be protected against
> > malicious writes. The logic that you pointed out earlier is to handle
> > the scenario where the primary partition got updated, but the
> > secondary did not.
>
> I don't disagree with you.
>
> Scenario here is that primary partition is updated with incorrect data content
> (but still correct CRC). This is correctly detected that data is not right.
> It means primary partition is wrong and shouldn't be used and it is not used.
>
> But because it has correct CRC syncup code logic is doing syncup.
>
> That's why I think we have two codes which have pretty much independent logic
> what to do. If primary is wrong sync should be from secondary to primary but
> that's not what code is doing.

The scenario of "primary is wrong" will only play out in case of
corruption in the metadata. The scenario that you highlight, that of
the metadata being wrong, but the CRC check succeeds will only happen
when the metadata has been maliciously written -- if not, then the
secondary partition should also be written with the wrong metadata,
and the checks would then fail. In the case of the primary metadata
getting corrupted, it does get restored from the secondary partition,
assuming that the secondary partition is intact.

-sughosh

>
> Thanks,
> Michal
>
Michal Simek Sept. 5, 2024, 8:37 a.m. UTC | #11
On 9/5/24 10:12, Sughosh Ganu wrote:
> On Thu, 5 Sept 2024 at 13:20, Michal Simek <michal.simek@amd.com> wrote:
>>
>>
>>
>> On 9/5/24 09:43, Sughosh Ganu wrote:
>>> On Thu, 5 Sept 2024 at 13:09, Michal Simek <michal.simek@amd.com> wrote:
>>>>
>>>>
>>>>
>>>> On 9/5/24 09:38, Sughosh Ganu wrote:
>>>>> On Thu, 5 Sept 2024 at 13:03, Michal Simek <michal.simek@amd.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 9/5/24 08:24, Sughosh Ganu wrote:
>>>>>>> On Wed, 4 Sept 2024 at 17:30, Michal Simek <michal.simek@amd.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 8/30/24 13:40, Sughosh Ganu wrote:
>>>>>>>>>
>>>>>>>>> The following set of patches are miscellaneous fixes and some
>>>>>>>>> hardening of the FWU update logic.
>>>>>>>>>
>>>>>>>>> Sughosh Ganu (6):
>>>>>>>>>        fwu: v2: perform some checks before reading metadata
>>>>>>>>>        fwu: v2: try reading both copies of metadata
>>>>>>>>>        fwu: v1: do a version check for the metadata
>>>>>>>>>        fwu: check all images for transitioning out of Trial State
>>>>>>>>>        fwu: add dependency checks for selecting FWU metadata version
>>>>>>>>>        fwu: do not allow capsule processing on exceeding Trial Counter
>>>>>>>>>          threshold
>>>>>>>>>
>>>>>>>>>       include/fwu.h            | 11 ++++++
>>>>>>>>>       lib/fwu_updates/Kconfig  |  1 +
>>>>>>>>>       lib/fwu_updates/fwu.c    | 31 +++++++++++++++-
>>>>>>>>>       lib/fwu_updates/fwu_v1.c | 18 +++++++--
>>>>>>>>>       lib/fwu_updates/fwu_v2.c | 80 ++++++++++++++++++++++------------------
>>>>>>>>>       5 files changed, 99 insertions(+), 42 deletions(-)
>>>>>>>>>
>>>>>>>>
>>>>>>>> I found one more thing.
>>>>>>>>
>>>>>>>> I did this change
>>>>>>>>
>>>>>>>> diff --git a/tools/mkfwumdata.c b/tools/mkfwumdata.c
>>>>>>>> index fbc2067bc12d..dab9530e499c 100644
>>>>>>>> --- a/tools/mkfwumdata.c
>>>>>>>> +++ b/tools/mkfwumdata.c
>>>>>>>> @@ -293,7 +293,7 @@ static void fwu_fill_version_specific_mdata(struct
>>>>>>>> fwu_mdata_object *mobj)
>>>>>>>>              struct fwu_mdata *mdata = mobj->mdata;
>>>>>>>>
>>>>>>>>              mdata->metadata_size = mobj->size;
>>>>>>>> -       mdata->desc_offset = sizeof(struct fwu_mdata);
>>>>>>>> +       mdata->desc_offset = 0x21; sizeof(struct fwu_mdata);
>>>>>>>>
>>>>>>>>              for (i = 0; i < MAX_BANKS_V2; i++)
>>>>>>>>                      mdata->bank_state[i] = i < mobj->banks ?
>>>>>>>>
>>>>>>>>
>>>>>>>> to break desc_offset location. I generated mdata structure and write it to
>>>>>>>> primary mdata partition.
>>>>>>>>
>>>>>>>> This has been spotted by (my debug) which is the reason why I did it.
>>>>>>>>
>>>>>>>>             if (g_mdata.desc_offset != FWU_IMG_DESC_OFFSET) {
>>>>>>>>                     printf("mdata offset is not 0x20\n");
>>>>>>>>                     return false;
>>>>>>>>             }
>>>>>>>>
>>>>>>>> But I got one more message below which
>>>>>>>>
>>>>>>>> mdata offset is not 0x20
>>>>>>>> Both FWU metadata copies are valid but do not match. Restoring the secondary
>>>>>>>> partition from the primary
>>>>>>>
>>>>>>> The reason why this logic is in place is to handle the scenario
>>>>>>> whereby, during the metadata update, the primary copy gets updated
>>>>>>> correctly, but due to some reason (like a power cut), the secondary
>>>>>>> copy does not -- this is because the primary copy gets updated first.
>>>>>>> In such a scenario, the secondary copy of metadata can be then
>>>>>>> restored from the primary copy. In the scenario where the primary copy
>>>>>>> is corrupted, it will show up in the crc check, and will get restored
>>>>>>> from the secondary copy.
>>>>>>
>>>>>> But in this case - CRC of primary is correct but data inside is wrong because
>>>>>> offset is at 0x21 not at 0x20.
>>>>>
>>>>> How will this event play out ? Am I missing something ? When the
>>>>> metadata is getting updated, the offset will be written as 0x20, and
>>>>> not 0x21. And in case the metadata gets corrupted, the CRC check would
>>>>> fail and the primary partition would get restored from the secondary
>>>>> (assuming that the secondary copy is correct).
>>>>
>>>> It is not after update. It is initial state with incorrect mdata structure.
>>>
>>> Well, if the metadata partition is being written with incorrect data,
>>> not sure how we combat that (or even if we should). After all, the
>>> spec clearly states that the metadata cannot be protected against
>>> malicious writes. The logic that you pointed out earlier is to handle
>>> the scenario where the primary partition got updated, but the
>>> secondary did not.
>>
>> I don't disagree with you.
>>
>> Scenario here is that primary partition is updated with incorrect data content
>> (but still correct CRC). This is correctly detected that data is not right.
>> It means primary partition is wrong and shouldn't be used and it is not used.
>>
>> But because it has correct CRC syncup code logic is doing syncup.
>>
>> That's why I think we have two codes which have pretty much independent logic
>> what to do. If primary is wrong sync should be from secondary to primary but
>> that's not what code is doing.
> 
> The scenario of "primary is wrong" will only play out in case of
> corruption in the metadata. The scenario that you highlight, that of
> the metadata being wrong, but the CRC check succeeds will only happen
> when the metadata has been maliciously written --

yes. And I did it on purpose to check your code and all error condition you are 
checking to make sure that code traps them.

> if not, then the
> secondary partition should also be written with the wrong metadata,
> and the checks would then fail. In the case of the primary metadata
> getting corrupted, it does get restored from the secondary partition,
> assuming that the secondary partition is intact.

When primary is cleared/has incorrect CRC there is no problem and recovery 
happens but if CRC is right but data inside wrong code doesn't check it.

Thanks,
Michal
Sughosh Ganu Sept. 5, 2024, 9:17 a.m. UTC | #12
On Thu, 5 Sept 2024 at 14:07, Michal Simek <michal.simek@amd.com> wrote:
>
>
>
> On 9/5/24 10:12, Sughosh Ganu wrote:
> > On Thu, 5 Sept 2024 at 13:20, Michal Simek <michal.simek@amd.com> wrote:
> >>
> >>
> >>
> >> On 9/5/24 09:43, Sughosh Ganu wrote:
> >>> On Thu, 5 Sept 2024 at 13:09, Michal Simek <michal.simek@amd.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 9/5/24 09:38, Sughosh Ganu wrote:
> >>>>> On Thu, 5 Sept 2024 at 13:03, Michal Simek <michal.simek@amd.com> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 9/5/24 08:24, Sughosh Ganu wrote:
> >>>>>>> On Wed, 4 Sept 2024 at 17:30, Michal Simek <michal.simek@amd.com> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 8/30/24 13:40, Sughosh Ganu wrote:
> >>>>>>>>>
> >>>>>>>>> The following set of patches are miscellaneous fixes and some
> >>>>>>>>> hardening of the FWU update logic.
> >>>>>>>>>
> >>>>>>>>> Sughosh Ganu (6):
> >>>>>>>>>        fwu: v2: perform some checks before reading metadata
> >>>>>>>>>        fwu: v2: try reading both copies of metadata
> >>>>>>>>>        fwu: v1: do a version check for the metadata
> >>>>>>>>>        fwu: check all images for transitioning out of Trial State
> >>>>>>>>>        fwu: add dependency checks for selecting FWU metadata version
> >>>>>>>>>        fwu: do not allow capsule processing on exceeding Trial Counter
> >>>>>>>>>          threshold
> >>>>>>>>>
> >>>>>>>>>       include/fwu.h            | 11 ++++++
> >>>>>>>>>       lib/fwu_updates/Kconfig  |  1 +
> >>>>>>>>>       lib/fwu_updates/fwu.c    | 31 +++++++++++++++-
> >>>>>>>>>       lib/fwu_updates/fwu_v1.c | 18 +++++++--
> >>>>>>>>>       lib/fwu_updates/fwu_v2.c | 80 ++++++++++++++++++++++------------------
> >>>>>>>>>       5 files changed, 99 insertions(+), 42 deletions(-)
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> I found one more thing.
> >>>>>>>>
> >>>>>>>> I did this change
> >>>>>>>>
> >>>>>>>> diff --git a/tools/mkfwumdata.c b/tools/mkfwumdata.c
> >>>>>>>> index fbc2067bc12d..dab9530e499c 100644
> >>>>>>>> --- a/tools/mkfwumdata.c
> >>>>>>>> +++ b/tools/mkfwumdata.c
> >>>>>>>> @@ -293,7 +293,7 @@ static void fwu_fill_version_specific_mdata(struct
> >>>>>>>> fwu_mdata_object *mobj)
> >>>>>>>>              struct fwu_mdata *mdata = mobj->mdata;
> >>>>>>>>
> >>>>>>>>              mdata->metadata_size = mobj->size;
> >>>>>>>> -       mdata->desc_offset = sizeof(struct fwu_mdata);
> >>>>>>>> +       mdata->desc_offset = 0x21; sizeof(struct fwu_mdata);
> >>>>>>>>
> >>>>>>>>              for (i = 0; i < MAX_BANKS_V2; i++)
> >>>>>>>>                      mdata->bank_state[i] = i < mobj->banks ?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> to break desc_offset location. I generated mdata structure and write it to
> >>>>>>>> primary mdata partition.
> >>>>>>>>
> >>>>>>>> This has been spotted by (my debug) which is the reason why I did it.
> >>>>>>>>
> >>>>>>>>             if (g_mdata.desc_offset != FWU_IMG_DESC_OFFSET) {
> >>>>>>>>                     printf("mdata offset is not 0x20\n");
> >>>>>>>>                     return false;
> >>>>>>>>             }
> >>>>>>>>
> >>>>>>>> But I got one more message below which
> >>>>>>>>
> >>>>>>>> mdata offset is not 0x20
> >>>>>>>> Both FWU metadata copies are valid but do not match. Restoring the secondary
> >>>>>>>> partition from the primary
> >>>>>>>
> >>>>>>> The reason why this logic is in place is to handle the scenario
> >>>>>>> whereby, during the metadata update, the primary copy gets updated
> >>>>>>> correctly, but due to some reason (like a power cut), the secondary
> >>>>>>> copy does not -- this is because the primary copy gets updated first.
> >>>>>>> In such a scenario, the secondary copy of metadata can be then
> >>>>>>> restored from the primary copy. In the scenario where the primary copy
> >>>>>>> is corrupted, it will show up in the crc check, and will get restored
> >>>>>>> from the secondary copy.
> >>>>>>
> >>>>>> But in this case - CRC of primary is correct but data inside is wrong because
> >>>>>> offset is at 0x21 not at 0x20.
> >>>>>
> >>>>> How will this event play out ? Am I missing something ? When the
> >>>>> metadata is getting updated, the offset will be written as 0x20, and
> >>>>> not 0x21. And in case the metadata gets corrupted, the CRC check would
> >>>>> fail and the primary partition would get restored from the secondary
> >>>>> (assuming that the secondary copy is correct).
> >>>>
> >>>> It is not after update. It is initial state with incorrect mdata structure.
> >>>
> >>> Well, if the metadata partition is being written with incorrect data,
> >>> not sure how we combat that (or even if we should). After all, the
> >>> spec clearly states that the metadata cannot be protected against
> >>> malicious writes. The logic that you pointed out earlier is to handle
> >>> the scenario where the primary partition got updated, but the
> >>> secondary did not.
> >>
> >> I don't disagree with you.
> >>
> >> Scenario here is that primary partition is updated with incorrect data content
> >> (but still correct CRC). This is correctly detected that data is not right.
> >> It means primary partition is wrong and shouldn't be used and it is not used.
> >>
> >> But because it has correct CRC syncup code logic is doing syncup.
> >>
> >> That's why I think we have two codes which have pretty much independent logic
> >> what to do. If primary is wrong sync should be from secondary to primary but
> >> that's not what code is doing.
> >
> > The scenario of "primary is wrong" will only play out in case of
> > corruption in the metadata. The scenario that you highlight, that of
> > the metadata being wrong, but the CRC check succeeds will only happen
> > when the metadata has been maliciously written --
>
> yes. And I did it on purpose to check your code and all error condition you are
> checking to make sure that code traps them.

The code should detect and fix any genuine error conditions that might
come up in the field. Even the check for the case where the two
metadata copies are valid but different is a scenario that can
actually happen when updating the metadata copies, and there is a
reason why we are assuming that the primary is the correct copy and
should be used to restore the secondary. If there are any valid
scenarios that are not being handled, they should be identified and
fixed.

>
> > if not, then the
> > secondary partition should also be written with the wrong metadata,
> > and the checks would then fail. In the case of the primary metadata
> > getting corrupted, it does get restored from the secondary partition,
> > assuming that the secondary partition is intact.
>
> When primary is cleared/has incorrect CRC there is no problem and recovery
> happens but if CRC is right but data inside wrong code doesn't check it.

This is where I think that this is not a genuine "error" case. So, if
someone has written a metadata copy with wrong data but right CRC, and
only written this bad copy to the primary partition, how do we trust
that the secondary copy is actually correct ? One can generate a
secondary metadata copy where the primary checks pass, but the image
information in the metadata is incorrect.

-sughosh

>
> Thanks,
> Michal
>
Michal Simek Sept. 6, 2024, 6:35 a.m. UTC | #13
On 9/5/24 11:17, Sughosh Ganu wrote:
> On Thu, 5 Sept 2024 at 14:07, Michal Simek <michal.simek@amd.com> wrote:
>>
>>
>>
>> On 9/5/24 10:12, Sughosh Ganu wrote:
>>> On Thu, 5 Sept 2024 at 13:20, Michal Simek <michal.simek@amd.com> wrote:
>>>>
>>>>
>>>>
>>>> On 9/5/24 09:43, Sughosh Ganu wrote:
>>>>> On Thu, 5 Sept 2024 at 13:09, Michal Simek <michal.simek@amd.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 9/5/24 09:38, Sughosh Ganu wrote:
>>>>>>> On Thu, 5 Sept 2024 at 13:03, Michal Simek <michal.simek@amd.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 9/5/24 08:24, Sughosh Ganu wrote:
>>>>>>>>> On Wed, 4 Sept 2024 at 17:30, Michal Simek <michal.simek@amd.com> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 8/30/24 13:40, Sughosh Ganu wrote:
>>>>>>>>>>>
>>>>>>>>>>> The following set of patches are miscellaneous fixes and some
>>>>>>>>>>> hardening of the FWU update logic.
>>>>>>>>>>>
>>>>>>>>>>> Sughosh Ganu (6):
>>>>>>>>>>>         fwu: v2: perform some checks before reading metadata
>>>>>>>>>>>         fwu: v2: try reading both copies of metadata
>>>>>>>>>>>         fwu: v1: do a version check for the metadata
>>>>>>>>>>>         fwu: check all images for transitioning out of Trial State
>>>>>>>>>>>         fwu: add dependency checks for selecting FWU metadata version
>>>>>>>>>>>         fwu: do not allow capsule processing on exceeding Trial Counter
>>>>>>>>>>>           threshold
>>>>>>>>>>>
>>>>>>>>>>>        include/fwu.h            | 11 ++++++
>>>>>>>>>>>        lib/fwu_updates/Kconfig  |  1 +
>>>>>>>>>>>        lib/fwu_updates/fwu.c    | 31 +++++++++++++++-
>>>>>>>>>>>        lib/fwu_updates/fwu_v1.c | 18 +++++++--
>>>>>>>>>>>        lib/fwu_updates/fwu_v2.c | 80 ++++++++++++++++++++++------------------
>>>>>>>>>>>        5 files changed, 99 insertions(+), 42 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I found one more thing.
>>>>>>>>>>
>>>>>>>>>> I did this change
>>>>>>>>>>
>>>>>>>>>> diff --git a/tools/mkfwumdata.c b/tools/mkfwumdata.c
>>>>>>>>>> index fbc2067bc12d..dab9530e499c 100644
>>>>>>>>>> --- a/tools/mkfwumdata.c
>>>>>>>>>> +++ b/tools/mkfwumdata.c
>>>>>>>>>> @@ -293,7 +293,7 @@ static void fwu_fill_version_specific_mdata(struct
>>>>>>>>>> fwu_mdata_object *mobj)
>>>>>>>>>>               struct fwu_mdata *mdata = mobj->mdata;
>>>>>>>>>>
>>>>>>>>>>               mdata->metadata_size = mobj->size;
>>>>>>>>>> -       mdata->desc_offset = sizeof(struct fwu_mdata);
>>>>>>>>>> +       mdata->desc_offset = 0x21; sizeof(struct fwu_mdata);
>>>>>>>>>>
>>>>>>>>>>               for (i = 0; i < MAX_BANKS_V2; i++)
>>>>>>>>>>                       mdata->bank_state[i] = i < mobj->banks ?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> to break desc_offset location. I generated mdata structure and write it to
>>>>>>>>>> primary mdata partition.
>>>>>>>>>>
>>>>>>>>>> This has been spotted by (my debug) which is the reason why I did it.
>>>>>>>>>>
>>>>>>>>>>              if (g_mdata.desc_offset != FWU_IMG_DESC_OFFSET) {
>>>>>>>>>>                      printf("mdata offset is not 0x20\n");
>>>>>>>>>>                      return false;
>>>>>>>>>>              }
>>>>>>>>>>
>>>>>>>>>> But I got one more message below which
>>>>>>>>>>
>>>>>>>>>> mdata offset is not 0x20
>>>>>>>>>> Both FWU metadata copies are valid but do not match. Restoring the secondary
>>>>>>>>>> partition from the primary
>>>>>>>>>
>>>>>>>>> The reason why this logic is in place is to handle the scenario
>>>>>>>>> whereby, during the metadata update, the primary copy gets updated
>>>>>>>>> correctly, but due to some reason (like a power cut), the secondary
>>>>>>>>> copy does not -- this is because the primary copy gets updated first.
>>>>>>>>> In such a scenario, the secondary copy of metadata can be then
>>>>>>>>> restored from the primary copy. In the scenario where the primary copy
>>>>>>>>> is corrupted, it will show up in the crc check, and will get restored
>>>>>>>>> from the secondary copy.
>>>>>>>>
>>>>>>>> But in this case - CRC of primary is correct but data inside is wrong because
>>>>>>>> offset is at 0x21 not at 0x20.
>>>>>>>
>>>>>>> How will this event play out ? Am I missing something ? When the
>>>>>>> metadata is getting updated, the offset will be written as 0x20, and
>>>>>>> not 0x21. And in case the metadata gets corrupted, the CRC check would
>>>>>>> fail and the primary partition would get restored from the secondary
>>>>>>> (assuming that the secondary copy is correct).
>>>>>>
>>>>>> It is not after update. It is initial state with incorrect mdata structure.
>>>>>
>>>>> Well, if the metadata partition is being written with incorrect data,
>>>>> not sure how we combat that (or even if we should). After all, the
>>>>> spec clearly states that the metadata cannot be protected against
>>>>> malicious writes. The logic that you pointed out earlier is to handle
>>>>> the scenario where the primary partition got updated, but the
>>>>> secondary did not.
>>>>
>>>> I don't disagree with you.
>>>>
>>>> Scenario here is that primary partition is updated with incorrect data content
>>>> (but still correct CRC). This is correctly detected that data is not right.
>>>> It means primary partition is wrong and shouldn't be used and it is not used.
>>>>
>>>> But because it has correct CRC syncup code logic is doing syncup.
>>>>
>>>> That's why I think we have two codes which have pretty much independent logic
>>>> what to do. If primary is wrong sync should be from secondary to primary but
>>>> that's not what code is doing.
>>>
>>> The scenario of "primary is wrong" will only play out in case of
>>> corruption in the metadata. The scenario that you highlight, that of
>>> the metadata being wrong, but the CRC check succeeds will only happen
>>> when the metadata has been maliciously written --
>>
>> yes. And I did it on purpose to check your code and all error condition you are
>> checking to make sure that code traps them.
> 
> The code should detect and fix any genuine error conditions that might
> come up in the field. Even the check for the case where the two
> metadata copies are valid but different is a scenario that can
> actually happen when updating the metadata copies, and there is a
> reason why we are assuming that the primary is the correct copy and
> should be used to restore the secondary. If there are any valid
> scenarios that are not being handled, they should be identified and
> fixed.
> 
>>
>>> if not, then the
>>> secondary partition should also be written with the wrong metadata,
>>> and the checks would then fail. In the case of the primary metadata
>>> getting corrupted, it does get restored from the secondary partition,
>>> assuming that the secondary partition is intact.
>>
>> When primary is cleared/has incorrect CRC there is no problem and recovery
>> happens but if CRC is right but data inside wrong code doesn't check it.
> 
> This is where I think that this is not a genuine "error" case. So, if
> someone has written a metadata copy with wrong data but right CRC, and
> only written this bad copy to the primary partition, how do we trust
> that the secondary copy is actually correct ? One can generate a
> secondary metadata copy where the primary checks pass, but the image
> information in the metadata is incorrect.

If U-Boot knows that data is not correct, based on it's check (in this case one 
incorrect field), you should never use it. If second also won't pass this check 
U-Boot shouldn't use it too.

If secondary copy pass all your tests you trust that it is fine. But if not, 
then you don't trust it.

And that's exactly what it is happening here. Code knows that primary data is 
wrong and properly detect it but another code blindly not checking it and just 
copy it incorrect data over correct one.

Thanks,
Michal
Michal Simek Sept. 6, 2024, 6:46 a.m. UTC | #14
On 9/6/24 08:35, Michal Simek wrote:
> 
> 
> On 9/5/24 11:17, Sughosh Ganu wrote:
>> On Thu, 5 Sept 2024 at 14:07, Michal Simek <michal.simek@amd.com> wrote:
>>>
>>>
>>>
>>> On 9/5/24 10:12, Sughosh Ganu wrote:
>>>> On Thu, 5 Sept 2024 at 13:20, Michal Simek <michal.simek@amd.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 9/5/24 09:43, Sughosh Ganu wrote:
>>>>>> On Thu, 5 Sept 2024 at 13:09, Michal Simek <michal.simek@amd.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 9/5/24 09:38, Sughosh Ganu wrote:
>>>>>>>> On Thu, 5 Sept 2024 at 13:03, Michal Simek <michal.simek@amd.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 9/5/24 08:24, Sughosh Ganu wrote:
>>>>>>>>>> On Wed, 4 Sept 2024 at 17:30, Michal Simek <michal.simek@amd.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 8/30/24 13:40, Sughosh Ganu wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> The following set of patches are miscellaneous fixes and some
>>>>>>>>>>>> hardening of the FWU update logic.
>>>>>>>>>>>>
>>>>>>>>>>>> Sughosh Ganu (6):
>>>>>>>>>>>>         fwu: v2: perform some checks before reading metadata
>>>>>>>>>>>>         fwu: v2: try reading both copies of metadata
>>>>>>>>>>>>         fwu: v1: do a version check for the metadata
>>>>>>>>>>>>         fwu: check all images for transitioning out of Trial State
>>>>>>>>>>>>         fwu: add dependency checks for selecting FWU metadata version
>>>>>>>>>>>>         fwu: do not allow capsule processing on exceeding Trial Counter
>>>>>>>>>>>>           threshold
>>>>>>>>>>>>
>>>>>>>>>>>>        include/fwu.h            | 11 ++++++
>>>>>>>>>>>>        lib/fwu_updates/Kconfig  |  1 +
>>>>>>>>>>>>        lib/fwu_updates/fwu.c    | 31 +++++++++++++++-
>>>>>>>>>>>>        lib/fwu_updates/fwu_v1.c | 18 +++++++--
>>>>>>>>>>>>        lib/fwu_updates/fwu_v2.c | 80 
>>>>>>>>>>>> ++++++++++++++++++++++------------------
>>>>>>>>>>>>        5 files changed, 99 insertions(+), 42 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I found one more thing.
>>>>>>>>>>>
>>>>>>>>>>> I did this change
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/tools/mkfwumdata.c b/tools/mkfwumdata.c
>>>>>>>>>>> index fbc2067bc12d..dab9530e499c 100644
>>>>>>>>>>> --- a/tools/mkfwumdata.c
>>>>>>>>>>> +++ b/tools/mkfwumdata.c
>>>>>>>>>>> @@ -293,7 +293,7 @@ static void fwu_fill_version_specific_mdata(struct
>>>>>>>>>>> fwu_mdata_object *mobj)
>>>>>>>>>>>               struct fwu_mdata *mdata = mobj->mdata;
>>>>>>>>>>>
>>>>>>>>>>>               mdata->metadata_size = mobj->size;
>>>>>>>>>>> -       mdata->desc_offset = sizeof(struct fwu_mdata);
>>>>>>>>>>> +       mdata->desc_offset = 0x21; sizeof(struct fwu_mdata);
>>>>>>>>>>>
>>>>>>>>>>>               for (i = 0; i < MAX_BANKS_V2; i++)
>>>>>>>>>>>                       mdata->bank_state[i] = i < mobj->banks ?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> to break desc_offset location. I generated mdata structure and write 
>>>>>>>>>>> it to
>>>>>>>>>>> primary mdata partition.
>>>>>>>>>>>
>>>>>>>>>>> This has been spotted by (my debug) which is the reason why I did it.
>>>>>>>>>>>
>>>>>>>>>>>              if (g_mdata.desc_offset != FWU_IMG_DESC_OFFSET) {
>>>>>>>>>>>                      printf("mdata offset is not 0x20\n");
>>>>>>>>>>>                      return false;
>>>>>>>>>>>              }
>>>>>>>>>>>
>>>>>>>>>>> But I got one more message below which
>>>>>>>>>>>
>>>>>>>>>>> mdata offset is not 0x20
>>>>>>>>>>> Both FWU metadata copies are valid but do not match. Restoring the 
>>>>>>>>>>> secondary
>>>>>>>>>>> partition from the primary
>>>>>>>>>>
>>>>>>>>>> The reason why this logic is in place is to handle the scenario
>>>>>>>>>> whereby, during the metadata update, the primary copy gets updated
>>>>>>>>>> correctly, but due to some reason (like a power cut), the secondary
>>>>>>>>>> copy does not -- this is because the primary copy gets updated first.
>>>>>>>>>> In such a scenario, the secondary copy of metadata can be then
>>>>>>>>>> restored from the primary copy. In the scenario where the primary copy
>>>>>>>>>> is corrupted, it will show up in the crc check, and will get restored
>>>>>>>>>> from the secondary copy.
>>>>>>>>>
>>>>>>>>> But in this case - CRC of primary is correct but data inside is wrong 
>>>>>>>>> because
>>>>>>>>> offset is at 0x21 not at 0x20.
>>>>>>>>
>>>>>>>> How will this event play out ? Am I missing something ? When the
>>>>>>>> metadata is getting updated, the offset will be written as 0x20, and
>>>>>>>> not 0x21. And in case the metadata gets corrupted, the CRC check would
>>>>>>>> fail and the primary partition would get restored from the secondary
>>>>>>>> (assuming that the secondary copy is correct).
>>>>>>>
>>>>>>> It is not after update. It is initial state with incorrect mdata structure.
>>>>>>
>>>>>> Well, if the metadata partition is being written with incorrect data,
>>>>>> not sure how we combat that (or even if we should). After all, the
>>>>>> spec clearly states that the metadata cannot be protected against
>>>>>> malicious writes. The logic that you pointed out earlier is to handle
>>>>>> the scenario where the primary partition got updated, but the
>>>>>> secondary did not.
>>>>>
>>>>> I don't disagree with you.
>>>>>
>>>>> Scenario here is that primary partition is updated with incorrect data content
>>>>> (but still correct CRC). This is correctly detected that data is not right.
>>>>> It means primary partition is wrong and shouldn't be used and it is not used.
>>>>>
>>>>> But because it has correct CRC syncup code logic is doing syncup.
>>>>>
>>>>> That's why I think we have two codes which have pretty much independent logic
>>>>> what to do. If primary is wrong sync should be from secondary to primary but
>>>>> that's not what code is doing.
>>>>
>>>> The scenario of "primary is wrong" will only play out in case of
>>>> corruption in the metadata. The scenario that you highlight, that of
>>>> the metadata being wrong, but the CRC check succeeds will only happen
>>>> when the metadata has been maliciously written --
>>>
>>> yes. And I did it on purpose to check your code and all error condition you are
>>> checking to make sure that code traps them.
>>
>> The code should detect and fix any genuine error conditions that might
>> come up in the field. Even the check for the case where the two
>> metadata copies are valid but different is a scenario that can
>> actually happen when updating the metadata copies, and there is a
>> reason why we are assuming that the primary is the correct copy and
>> should be used to restore the secondary. If there are any valid
>> scenarios that are not being handled, they should be identified and
>> fixed.
>>
>>>
>>>> if not, then the
>>>> secondary partition should also be written with the wrong metadata,
>>>> and the checks would then fail. In the case of the primary metadata
>>>> getting corrupted, it does get restored from the secondary partition,
>>>> assuming that the secondary partition is intact.
>>>
>>> When primary is cleared/has incorrect CRC there is no problem and recovery
>>> happens but if CRC is right but data inside wrong code doesn't check it.
>>
>> This is where I think that this is not a genuine "error" case. So, if
>> someone has written a metadata copy with wrong data but right CRC, and
>> only written this bad copy to the primary partition, how do we trust
>> that the secondary copy is actually correct ? One can generate a
>> secondary metadata copy where the primary checks pass, but the image
>> information in the metadata is incorrect.
> 
> If U-Boot knows that data is not correct, based on it's check (in this case one 
> incorrect field), you should never use it. If second also won't pass this check 
> U-Boot shouldn't use it too.
> 
> If secondary copy pass all your tests you trust that it is fine. But if not, 
> then you don't trust it.
> 
> And that's exactly what it is happening here. Code knows that primary data is 
> wrong and properly detect it but another code blindly not checking it and just 
> copy it incorrect data over correct one.

here is where the problem is. Code reads mdata structure and checks only CRC 
over it and that's it. There should be additional checking perform before you 
can say that data is correct.

diff --git a/lib/fwu_updates/fwu.c b/lib/fwu_updates/fwu.c
index 983f232bd179..9ec750b51da0 100644
--- a/lib/fwu_updates/fwu.c
+++ b/lib/fwu_updates/fwu.c
@@ -313,10 +313,13 @@ int fwu_get_mdata(struct fwu_mdata *mdata)
                 err = fwu_read_mdata(g_dev, parts_mdata[i], !i, mdata_size);
                 if (!err) {
                         err = mdata_crc_check(parts_mdata[i]);
-                       if (!err)
-                               parts_ok[i] = true;
-                       else
-                               log_debug("mdata : %s crc32 failed\n", i ? 
"secondary" : "primary");
+                       if (!err) {
+                               if (check that format is corrrect)
+                                       parts_ok[i] = true;
+
+                               continue;
+                       }
+                       log_debug("mdata : %s crc32 failed\n", i ? "secondary" : 
"primary");
                 }
         }

Because issue is that CRC check only cause that part_ok[0] = true but that's wrong.

Thanks,
Michal
Sughosh Ganu Sept. 6, 2024, 9:47 a.m. UTC | #15
On Fri, 6 Sept 2024 at 12:05, Michal Simek <michal.simek@amd.com> wrote:
>
>
>
> On 9/5/24 11:17, Sughosh Ganu wrote:
> > On Thu, 5 Sept 2024 at 14:07, Michal Simek <michal.simek@amd.com> wrote:
> >>
> >>
> >>
> >> On 9/5/24 10:12, Sughosh Ganu wrote:
> >>> On Thu, 5 Sept 2024 at 13:20, Michal Simek <michal.simek@amd.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 9/5/24 09:43, Sughosh Ganu wrote:
> >>>>> On Thu, 5 Sept 2024 at 13:09, Michal Simek <michal.simek@amd.com> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 9/5/24 09:38, Sughosh Ganu wrote:
> >>>>>>> On Thu, 5 Sept 2024 at 13:03, Michal Simek <michal.simek@amd.com> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 9/5/24 08:24, Sughosh Ganu wrote:
> >>>>>>>>> On Wed, 4 Sept 2024 at 17:30, Michal Simek <michal.simek@amd.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 8/30/24 13:40, Sughosh Ganu wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> The following set of patches are miscellaneous fixes and some
> >>>>>>>>>>> hardening of the FWU update logic.
> >>>>>>>>>>>
> >>>>>>>>>>> Sughosh Ganu (6):
> >>>>>>>>>>>         fwu: v2: perform some checks before reading metadata
> >>>>>>>>>>>         fwu: v2: try reading both copies of metadata
> >>>>>>>>>>>         fwu: v1: do a version check for the metadata
> >>>>>>>>>>>         fwu: check all images for transitioning out of Trial State
> >>>>>>>>>>>         fwu: add dependency checks for selecting FWU metadata version
> >>>>>>>>>>>         fwu: do not allow capsule processing on exceeding Trial Counter
> >>>>>>>>>>>           threshold
> >>>>>>>>>>>
> >>>>>>>>>>>        include/fwu.h            | 11 ++++++
> >>>>>>>>>>>        lib/fwu_updates/Kconfig  |  1 +
> >>>>>>>>>>>        lib/fwu_updates/fwu.c    | 31 +++++++++++++++-
> >>>>>>>>>>>        lib/fwu_updates/fwu_v1.c | 18 +++++++--
> >>>>>>>>>>>        lib/fwu_updates/fwu_v2.c | 80 ++++++++++++++++++++++------------------
> >>>>>>>>>>>        5 files changed, 99 insertions(+), 42 deletions(-)
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> I found one more thing.
> >>>>>>>>>>
> >>>>>>>>>> I did this change
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/tools/mkfwumdata.c b/tools/mkfwumdata.c
> >>>>>>>>>> index fbc2067bc12d..dab9530e499c 100644
> >>>>>>>>>> --- a/tools/mkfwumdata.c
> >>>>>>>>>> +++ b/tools/mkfwumdata.c
> >>>>>>>>>> @@ -293,7 +293,7 @@ static void fwu_fill_version_specific_mdata(struct
> >>>>>>>>>> fwu_mdata_object *mobj)
> >>>>>>>>>>               struct fwu_mdata *mdata = mobj->mdata;
> >>>>>>>>>>
> >>>>>>>>>>               mdata->metadata_size = mobj->size;
> >>>>>>>>>> -       mdata->desc_offset = sizeof(struct fwu_mdata);
> >>>>>>>>>> +       mdata->desc_offset = 0x21; sizeof(struct fwu_mdata);
> >>>>>>>>>>
> >>>>>>>>>>               for (i = 0; i < MAX_BANKS_V2; i++)
> >>>>>>>>>>                       mdata->bank_state[i] = i < mobj->banks ?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> to break desc_offset location. I generated mdata structure and write it to
> >>>>>>>>>> primary mdata partition.
> >>>>>>>>>>
> >>>>>>>>>> This has been spotted by (my debug) which is the reason why I did it.
> >>>>>>>>>>
> >>>>>>>>>>              if (g_mdata.desc_offset != FWU_IMG_DESC_OFFSET) {
> >>>>>>>>>>                      printf("mdata offset is not 0x20\n");
> >>>>>>>>>>                      return false;
> >>>>>>>>>>              }
> >>>>>>>>>>
> >>>>>>>>>> But I got one more message below which
> >>>>>>>>>>
> >>>>>>>>>> mdata offset is not 0x20
> >>>>>>>>>> Both FWU metadata copies are valid but do not match. Restoring the secondary
> >>>>>>>>>> partition from the primary
> >>>>>>>>>
> >>>>>>>>> The reason why this logic is in place is to handle the scenario
> >>>>>>>>> whereby, during the metadata update, the primary copy gets updated
> >>>>>>>>> correctly, but due to some reason (like a power cut), the secondary
> >>>>>>>>> copy does not -- this is because the primary copy gets updated first.
> >>>>>>>>> In such a scenario, the secondary copy of metadata can be then
> >>>>>>>>> restored from the primary copy. In the scenario where the primary copy
> >>>>>>>>> is corrupted, it will show up in the crc check, and will get restored
> >>>>>>>>> from the secondary copy.
> >>>>>>>>
> >>>>>>>> But in this case - CRC of primary is correct but data inside is wrong because
> >>>>>>>> offset is at 0x21 not at 0x20.
> >>>>>>>
> >>>>>>> How will this event play out ? Am I missing something ? When the
> >>>>>>> metadata is getting updated, the offset will be written as 0x20, and
> >>>>>>> not 0x21. And in case the metadata gets corrupted, the CRC check would
> >>>>>>> fail and the primary partition would get restored from the secondary
> >>>>>>> (assuming that the secondary copy is correct).
> >>>>>>
> >>>>>> It is not after update. It is initial state with incorrect mdata structure.
> >>>>>
> >>>>> Well, if the metadata partition is being written with incorrect data,
> >>>>> not sure how we combat that (or even if we should). After all, the
> >>>>> spec clearly states that the metadata cannot be protected against
> >>>>> malicious writes. The logic that you pointed out earlier is to handle
> >>>>> the scenario where the primary partition got updated, but the
> >>>>> secondary did not.
> >>>>
> >>>> I don't disagree with you.
> >>>>
> >>>> Scenario here is that primary partition is updated with incorrect data content
> >>>> (but still correct CRC). This is correctly detected that data is not right.
> >>>> It means primary partition is wrong and shouldn't be used and it is not used.
> >>>>
> >>>> But because it has correct CRC syncup code logic is doing syncup.
> >>>>
> >>>> That's why I think we have two codes which have pretty much independent logic
> >>>> what to do. If primary is wrong sync should be from secondary to primary but
> >>>> that's not what code is doing.
> >>>
> >>> The scenario of "primary is wrong" will only play out in case of
> >>> corruption in the metadata. The scenario that you highlight, that of
> >>> the metadata being wrong, but the CRC check succeeds will only happen
> >>> when the metadata has been maliciously written --
> >>
> >> yes. And I did it on purpose to check your code and all error condition you are
> >> checking to make sure that code traps them.
> >
> > The code should detect and fix any genuine error conditions that might
> > come up in the field. Even the check for the case where the two
> > metadata copies are valid but different is a scenario that can
> > actually happen when updating the metadata copies, and there is a
> > reason why we are assuming that the primary is the correct copy and
> > should be used to restore the secondary. If there are any valid
> > scenarios that are not being handled, they should be identified and
> > fixed.
> >
> >>
> >>> if not, then the
> >>> secondary partition should also be written with the wrong metadata,
> >>> and the checks would then fail. In the case of the primary metadata
> >>> getting corrupted, it does get restored from the secondary partition,
> >>> assuming that the secondary partition is intact.
> >>
> >> When primary is cleared/has incorrect CRC there is no problem and recovery
> >> happens but if CRC is right but data inside wrong code doesn't check it.
> >
> > This is where I think that this is not a genuine "error" case. So, if
> > someone has written a metadata copy with wrong data but right CRC, and
> > only written this bad copy to the primary partition, how do we trust
> > that the secondary copy is actually correct ? One can generate a
> > secondary metadata copy where the primary checks pass, but the image
> > information in the metadata is incorrect.
>
> If U-Boot knows that data is not correct, based on it's check (in this case one
> incorrect field), you should never use it. If second also won't pass this check
> U-Boot shouldn't use it too.

The fact that we are trying to counter incorrect provisioning of
metadata, which can well be a malicious act of writing incorrect
data/correct CRC copy only to one partition seems like an anomaly
which should not be handled. I would still be okay if this were a
fool-proof way of fixing the board, but it isn't. Like I mentioned
earlier, we can still have a secondary copy of the metadata with wrong
data/correct CRC, and which does not get detected in the initial
checks, as the top level structure fields are correct, but the image
information is not.

>
> If secondary copy pass all your tests you trust that it is fine. But if not,
> then you don't trust it.

But we cannot detect that the secondary copy is passing all the tests,
as the issue could well be with the image description, and that would
be detected only when an update is attempted, or the user identifies
it (possibly) using the fwu_mdata_read command. And before that, the
primary copy would be restored with the incorrect secondary copy, as
part of the metadata read function.

What I think can be done is, as a middle ground, we can instead have
this as a platform policy. So we add a function pointer for this, and
when such a scenario is detected, the platform can have a callback
which then restores the primary copy from the secondary. What are your
thoughts on this ?

-sughosh

>
> And that's exactly what it is happening here. Code knows that primary data is
> wrong and properly detect it but another code blindly not checking it and just
> copy it incorrect data over correct one.
>
> Thanks,
> Michal
Michal Simek Sept. 6, 2024, 3:03 p.m. UTC | #16
On 9/6/24 11:47, Sughosh Ganu wrote:
> On Fri, 6 Sept 2024 at 12:05, Michal Simek <michal.simek@amd.com> wrote:
>>
>>
>>
>> On 9/5/24 11:17, Sughosh Ganu wrote:
>>> On Thu, 5 Sept 2024 at 14:07, Michal Simek <michal.simek@amd.com> wrote:
>>>>
>>>>
>>>>
>>>> On 9/5/24 10:12, Sughosh Ganu wrote:
>>>>> On Thu, 5 Sept 2024 at 13:20, Michal Simek <michal.simek@amd.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 9/5/24 09:43, Sughosh Ganu wrote:
>>>>>>> On Thu, 5 Sept 2024 at 13:09, Michal Simek <michal.simek@amd.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 9/5/24 09:38, Sughosh Ganu wrote:
>>>>>>>>> On Thu, 5 Sept 2024 at 13:03, Michal Simek <michal.simek@amd.com> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 9/5/24 08:24, Sughosh Ganu wrote:
>>>>>>>>>>> On Wed, 4 Sept 2024 at 17:30, Michal Simek <michal.simek@amd.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 8/30/24 13:40, Sughosh Ganu wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> The following set of patches are miscellaneous fixes and some
>>>>>>>>>>>>> hardening of the FWU update logic.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sughosh Ganu (6):
>>>>>>>>>>>>>          fwu: v2: perform some checks before reading metadata
>>>>>>>>>>>>>          fwu: v2: try reading both copies of metadata
>>>>>>>>>>>>>          fwu: v1: do a version check for the metadata
>>>>>>>>>>>>>          fwu: check all images for transitioning out of Trial State
>>>>>>>>>>>>>          fwu: add dependency checks for selecting FWU metadata version
>>>>>>>>>>>>>          fwu: do not allow capsule processing on exceeding Trial Counter
>>>>>>>>>>>>>            threshold
>>>>>>>>>>>>>
>>>>>>>>>>>>>         include/fwu.h            | 11 ++++++
>>>>>>>>>>>>>         lib/fwu_updates/Kconfig  |  1 +
>>>>>>>>>>>>>         lib/fwu_updates/fwu.c    | 31 +++++++++++++++-
>>>>>>>>>>>>>         lib/fwu_updates/fwu_v1.c | 18 +++++++--
>>>>>>>>>>>>>         lib/fwu_updates/fwu_v2.c | 80 ++++++++++++++++++++++------------------
>>>>>>>>>>>>>         5 files changed, 99 insertions(+), 42 deletions(-)
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I found one more thing.
>>>>>>>>>>>>
>>>>>>>>>>>> I did this change
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/tools/mkfwumdata.c b/tools/mkfwumdata.c
>>>>>>>>>>>> index fbc2067bc12d..dab9530e499c 100644
>>>>>>>>>>>> --- a/tools/mkfwumdata.c
>>>>>>>>>>>> +++ b/tools/mkfwumdata.c
>>>>>>>>>>>> @@ -293,7 +293,7 @@ static void fwu_fill_version_specific_mdata(struct
>>>>>>>>>>>> fwu_mdata_object *mobj)
>>>>>>>>>>>>                struct fwu_mdata *mdata = mobj->mdata;
>>>>>>>>>>>>
>>>>>>>>>>>>                mdata->metadata_size = mobj->size;
>>>>>>>>>>>> -       mdata->desc_offset = sizeof(struct fwu_mdata);
>>>>>>>>>>>> +       mdata->desc_offset = 0x21; sizeof(struct fwu_mdata);
>>>>>>>>>>>>
>>>>>>>>>>>>                for (i = 0; i < MAX_BANKS_V2; i++)
>>>>>>>>>>>>                        mdata->bank_state[i] = i < mobj->banks ?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> to break desc_offset location. I generated mdata structure and write it to
>>>>>>>>>>>> primary mdata partition.
>>>>>>>>>>>>
>>>>>>>>>>>> This has been spotted by (my debug) which is the reason why I did it.
>>>>>>>>>>>>
>>>>>>>>>>>>               if (g_mdata.desc_offset != FWU_IMG_DESC_OFFSET) {
>>>>>>>>>>>>                       printf("mdata offset is not 0x20\n");
>>>>>>>>>>>>                       return false;
>>>>>>>>>>>>               }
>>>>>>>>>>>>
>>>>>>>>>>>> But I got one more message below which
>>>>>>>>>>>>
>>>>>>>>>>>> mdata offset is not 0x20
>>>>>>>>>>>> Both FWU metadata copies are valid but do not match. Restoring the secondary
>>>>>>>>>>>> partition from the primary
>>>>>>>>>>>
>>>>>>>>>>> The reason why this logic is in place is to handle the scenario
>>>>>>>>>>> whereby, during the metadata update, the primary copy gets updated
>>>>>>>>>>> correctly, but due to some reason (like a power cut), the secondary
>>>>>>>>>>> copy does not -- this is because the primary copy gets updated first.
>>>>>>>>>>> In such a scenario, the secondary copy of metadata can be then
>>>>>>>>>>> restored from the primary copy. In the scenario where the primary copy
>>>>>>>>>>> is corrupted, it will show up in the crc check, and will get restored
>>>>>>>>>>> from the secondary copy.
>>>>>>>>>>
>>>>>>>>>> But in this case - CRC of primary is correct but data inside is wrong because
>>>>>>>>>> offset is at 0x21 not at 0x20.
>>>>>>>>>
>>>>>>>>> How will this event play out ? Am I missing something ? When the
>>>>>>>>> metadata is getting updated, the offset will be written as 0x20, and
>>>>>>>>> not 0x21. And in case the metadata gets corrupted, the CRC check would
>>>>>>>>> fail and the primary partition would get restored from the secondary
>>>>>>>>> (assuming that the secondary copy is correct).
>>>>>>>>
>>>>>>>> It is not after update. It is initial state with incorrect mdata structure.
>>>>>>>
>>>>>>> Well, if the metadata partition is being written with incorrect data,
>>>>>>> not sure how we combat that (or even if we should). After all, the
>>>>>>> spec clearly states that the metadata cannot be protected against
>>>>>>> malicious writes. The logic that you pointed out earlier is to handle
>>>>>>> the scenario where the primary partition got updated, but the
>>>>>>> secondary did not.
>>>>>>
>>>>>> I don't disagree with you.
>>>>>>
>>>>>> Scenario here is that primary partition is updated with incorrect data content
>>>>>> (but still correct CRC). This is correctly detected that data is not right.
>>>>>> It means primary partition is wrong and shouldn't be used and it is not used.
>>>>>>
>>>>>> But because it has correct CRC syncup code logic is doing syncup.
>>>>>>
>>>>>> That's why I think we have two codes which have pretty much independent logic
>>>>>> what to do. If primary is wrong sync should be from secondary to primary but
>>>>>> that's not what code is doing.
>>>>>
>>>>> The scenario of "primary is wrong" will only play out in case of
>>>>> corruption in the metadata. The scenario that you highlight, that of
>>>>> the metadata being wrong, but the CRC check succeeds will only happen
>>>>> when the metadata has been maliciously written --
>>>>
>>>> yes. And I did it on purpose to check your code and all error condition you are
>>>> checking to make sure that code traps them.
>>>
>>> The code should detect and fix any genuine error conditions that might
>>> come up in the field. Even the check for the case where the two
>>> metadata copies are valid but different is a scenario that can
>>> actually happen when updating the metadata copies, and there is a
>>> reason why we are assuming that the primary is the correct copy and
>>> should be used to restore the secondary. If there are any valid
>>> scenarios that are not being handled, they should be identified and
>>> fixed.
>>>
>>>>
>>>>> if not, then the
>>>>> secondary partition should also be written with the wrong metadata,
>>>>> and the checks would then fail. In the case of the primary metadata
>>>>> getting corrupted, it does get restored from the secondary partition,
>>>>> assuming that the secondary partition is intact.
>>>>
>>>> When primary is cleared/has incorrect CRC there is no problem and recovery
>>>> happens but if CRC is right but data inside wrong code doesn't check it.
>>>
>>> This is where I think that this is not a genuine "error" case. So, if
>>> someone has written a metadata copy with wrong data but right CRC, and
>>> only written this bad copy to the primary partition, how do we trust
>>> that the secondary copy is actually correct ? One can generate a
>>> secondary metadata copy where the primary checks pass, but the image
>>> information in the metadata is incorrect.
>>
>> If U-Boot knows that data is not correct, based on it's check (in this case one
>> incorrect field), you should never use it. If second also won't pass this check
>> U-Boot shouldn't use it too.
> 
> The fact that we are trying to counter incorrect provisioning of
> metadata, which can well be a malicious act of writing incorrect
> data/correct CRC copy only to one partition seems like an anomaly
> which should not be handled. I would still be okay if this were a
> fool-proof way of fixing the board, but it isn't. Like I mentioned
> earlier, we can still have a secondary copy of the metadata with wrong
> data/correct CRC, and which does not get detected in the initial
> checks, as the top level structure fields are correct, but the image
> information is not.

Partially agree. You are checking in fwu_mdata_sanity_checks and even data is 
matching u-boot configuration. You are checking now num_banks, num_images. guid 
should be also known to u-boot and I can't see the reason why this functions 
can't check it too.

And more and more I am looking at the code more things that current 
fwu_mdata_sanity_checks should be called before you mark parts_ok[] = true.



>>
>> If secondary copy pass all your tests you trust that it is fine. But if not,
>> then you don't trust it.
> 
> But we cannot detect that the secondary copy is passing all the tests,
> as the issue could well be with the image description, and that would
> be detected only when an update is attempted, or the user identifies
> it (possibly) using the fwu_mdata_read command. And before that, the
> primary copy would be restored with the incorrect secondary copy, as
> part of the metadata read function.
> 
> What I think can be done is, as a middle ground, we can instead have
> this as a platform policy. So we add a function pointer for this, and
> when such a scenario is detected, the platform can have a callback
> which then restores the primary copy from the secondary. What are your
> thoughts on this ?

I have to think about this more (not on Friday afternoon).

Cheers,
Michal