Message ID | 20210119183428.556706-1-richard.henderson@linaro.org |
---|---|
Headers | show |
Series | tcg: Dynamically allocate temporaries | expand |
On Tue, 19 Jan 2021, Richard Henderson wrote: > My recent change for caching tcg constants has, in a number of cases, > overflowed the statically allocated array of temporaries. Change to > dynamic allocation. This seems to work for me so Tested-by: BALATON Zoltan <balaton@eik.bme.hu> but have you done any performance tests to check that this actually improves emulation speed? To mee it seems slower. Booting AmigaOS on sam460ex with c0dd6654f207 (just before your TCG series) takes: real 0m33.829s user 0m34.432s sys 0m0.296s but on HEAD with this series: real 0m44.381s user 0m46.058s sys 0m0.532s This is noticable decrease in speed also without measuring it. With just increasing the TCG_MAX_TEMPS to 2048 on 7c79721606be without this series I get: real 0m42.681s user 0m44.208s sys 0m0.435s So the performance regression is somewhere in the original series not in this fix up series. > I'll note that nothing in check-acceptance triggers this overflow. > Anyone care to add some more test cases there? The proposed test for the upcoming pegasos2 machine may also catch this (when that will be merged, its dependencies are still under review) or the sam460ex test that currently only checks the firmware could be enhanced to try to boot AROS if somebody wants to do that. The drawback is that it needs an external iso whereas the current test doesn't need any additional images but it did not catch problems with IRQ and neither this problem with TCG temps. This problem was also found with riscv and mips I think but don't know if those would be easier to test. Regards, BALATON Zoltan
On 1/20/21 12:06 AM, BALATON Zoltan wrote: > On Tue, 19 Jan 2021, Richard Henderson wrote: >> My recent change for caching tcg constants has, in a number of cases, >> overflowed the statically allocated array of temporaries. Change to >> dynamic allocation. > > This seems to work for me so > > Tested-by: BALATON Zoltan <balaton@eik.bme.hu> > > but have you done any performance tests to check that this actually > improves emulation speed? To mee it seems slower. Booting AmigaOS on > sam460ex with c0dd6654f207 (just before your TCG series) takes: > > real 0m33.829s > user 0m34.432s > sys 0m0.296s > > but on HEAD with this series: > > real 0m44.381s > user 0m46.058s > sys 0m0.532s > > This is noticable decrease in speed also without measuring it. With just > increasing the TCG_MAX_TEMPS to 2048 on 7c79721606be without this series > I get: > > real 0m42.681s > user 0m44.208s > sys 0m0.435s > > So the performance regression is somewhere in the original series not in > this fix up series. Cc'ing Lukas for the performance part, as he is investigating how to catch such regressions. >> I'll note that nothing in check-acceptance triggers this overflow. >> Anyone care to add some more test cases there? > > The proposed test for the upcoming pegasos2 machine may also catch this > (when that will be merged, its dependencies are still under review) What are your running on pegasos2? > or > the sam460ex test that currently only checks the firmware could be > enhanced to try to boot AROS if somebody wants to do that. The drawback > is that it needs an external iso whereas the current test doesn't need > any additional images but it did not catch problems with IRQ and neither > this problem with TCG temps. So this other option is not very useful, right?
On Wed, 20 Jan 2021, Philippe Mathieu-Daudé wrote: > On 1/20/21 12:06 AM, BALATON Zoltan wrote: >> On Tue, 19 Jan 2021, Richard Henderson wrote: >>> My recent change for caching tcg constants has, in a number of cases, >>> overflowed the statically allocated array of temporaries. Change to >>> dynamic allocation. >> >> This seems to work for me so >> >> Tested-by: BALATON Zoltan <balaton@eik.bme.hu> >> >> but have you done any performance tests to check that this actually >> improves emulation speed? To mee it seems slower. Booting AmigaOS on >> sam460ex with c0dd6654f207 (just before your TCG series) takes: >> >> real 0m33.829s >> user 0m34.432s >> sys 0m0.296s >> >> but on HEAD with this series: >> >> real 0m44.381s >> user 0m46.058s >> sys 0m0.532s >> >> This is noticable decrease in speed also without measuring it. With just >> increasing the TCG_MAX_TEMPS to 2048 on 7c79721606be without this series >> I get: >> >> real 0m42.681s >> user 0m44.208s >> sys 0m0.435s >> >> So the performance regression is somewhere in the original series not in >> this fix up series. > > Cc'ing Lukas for the performance part, as he is investigating > how to catch such regressions. I think there was a GSoC last year that resulted in some scripts to do performance testing and even bisecting regressions. I've seen a few reports posted about those but maybe the project should also use the results and run these on some dedicated test machines regularly to be useful. The GSoC has ended, so the student has left and I think the mentor was Aleksandar M. who may also be doing something else now so these benchmarking scripts seem to be abandoned. But maybe could be picked up as a starting point or inspiration for any similar activity to build on those results. >>> I'll note that nothing in check-acceptance triggers this overflow. >>> Anyone care to add some more test cases there? >> >> The proposed test for the upcoming pegasos2 machine may also catch this >> (when that will be merged, its dependencies are still under review) > > What are your running on pegasos2? I've sent you before what test I think we could do for pegasos2, see: https://lists.nongnu.org/archive/html/qemu-ppc/2021-01/msg00112.html but I could not write the script for that and have no way to test it so some help would be needed with that. By the way, before that there are also the vt82c686 patches still waiting for review. I hope you haven't forgot and will eventually come back to them. >> or >> the sam460ex test that currently only checks the firmware could be >> enhanced to try to boot AROS if somebody wants to do that. The drawback >> is that it needs an external iso whereas the current test doesn't need >> any additional images but it did not catch problems with IRQ and neither >> this problem with TCG temps. > > So this other option is not very useful, right? It's still useful to test if the machine is working at all but the firmware does not seem to use interrupts and if you don't boot anything it won't access disks so some parts will not be tested by only firmware level testing. Basically only CPU, RAM, ROM, serial would be tested and that the machine could be created which would still catch some bugs but IRQs and IDE probably would only be tested by trying to boot an OS. I think Guenter runs Linux kernel boot tests but with -kernel option so disks would still not be tested by that therefore it may be more useful to run some other OS booted from CD just to increase coverage. Since I plan to use MorphOS for pegasos2 as test, sam460ex could use AROS (AmigaOS is not freely available so that cannot be used but I also test with that). The AROS isos are free but may be somewhat unstable so instead of using the latest, one should use a known working version and only update after manual testing, otherwise the test may break due to change in AROS. Regards, BALATON Zoltan
On Wed, 20 Jan 2021, BALATON Zoltan wrote: > On Tue, 19 Jan 2021, Richard Henderson wrote: >> My recent change for caching tcg constants has, in a number of cases, >> overflowed the statically allocated array of temporaries. Change to >> dynamic allocation. > > This seems to work for me so > > Tested-by: BALATON Zoltan <balaton@eik.bme.hu> > > but have you done any performance tests to check that this actually improves > emulation speed? To mee it seems slower. Booting AmigaOS on sam460ex with > c0dd6654f207 (just before your TCG series) takes: > > real 0m33.829s > user 0m34.432s > sys 0m0.296s > > but on HEAD with this series: > > real 0m44.381s > user 0m46.058s > sys 0m0.532s > > This is noticable decrease in speed also without measuring it. With just > increasing the TCG_MAX_TEMPS to 2048 on 7c79721606be without this series I > get: > > real 0m42.681s > user 0m44.208s > sys 0m0.435s > > So the performance regression is somewhere in the original series not in this > fix up series. I've tried to do more measurements to identify where it got slower but I could not reproduce it today. I'm now getting around 42 seconds both before and after the series so not sure what made it faster before but it's probably not because of a code change then. Regards, BALATON Zoltan
On 1/21/21 10:09 AM, BALATON Zoltan wrote: > On Wed, 20 Jan 2021, BALATON Zoltan wrote: >> On Tue, 19 Jan 2021, Richard Henderson wrote: >>> My recent change for caching tcg constants has, in a number of cases, >>> overflowed the statically allocated array of temporaries. Change to >>> dynamic allocation. >> >> This seems to work for me so >> >> Tested-by: BALATON Zoltan <balaton@eik.bme.hu> >> >> but have you done any performance tests to check that this actually improves >> emulation speed? To mee it seems slower. Booting AmigaOS on sam460ex with >> c0dd6654f207 (just before your TCG series) takes: >> >> real 0m33.829s >> user 0m34.432s >> sys 0m0.296s >> >> but on HEAD with this series: >> >> real 0m44.381s >> user 0m46.058s >> sys 0m0.532s >> >> This is noticable decrease in speed also without measuring it. With just >> increasing the TCG_MAX_TEMPS to 2048 on 7c79721606be without this series I get: >> >> real 0m42.681s >> user 0m44.208s >> sys 0m0.435s >> >> So the performance regression is somewhere in the original series not in this >> fix up series. > > I've tried to do more measurements to identify where it got slower but I could > not reproduce it today. I'm now getting around 42 seconds both before and after > the series so not sure what made it faster before but it's probably not because > of a code change then. That's reassuring. I hadn't been able to measure a performance regression myself. (The kind of TB that caused this SEGV thread and creates oodles of temps seems to be an outlier. Otherwise there should be very little change to non-vector code.) r~