Message ID | 1492679620-12792-1-git-send-email-m.szyprowski@samsung.com |
---|---|
Headers | show |
Series | Exynos DRM: add Picture Processor extension | expand |
Hello everyone, Marek Szyprowski wrote: > Hi Laurent, > > On 2017-04-20 12:25, Laurent Pinchart wrote: >> Hi Marek, >> >> (CC'ing Sakari Ailus) >> >> Thank you for the patches. >> >> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote: >>> Dear all, >>> >>> This is an updated proposal for extending EXYNOS DRM API with generic >>> support for hardware modules, which can be used for processing image >>> data >>> from the one memory buffer to another. Typical memory-to-memory >>> operations >>> are: rotation, scaling, colour space conversion or mix of them. This >>> is a >>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer >>> processors", which has been rejected as "not really needed in the DRM >>> core": >>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html >>> >>> >>> In this proposal I moved all the code to Exynos DRM driver, so now this >>> will be specific only to Exynos DRM. I've also changed the name from >>> framebuffer processor (fbproc) to picture processor (pp) to avoid >>> confusion >>> with fbdev API. >>> >>> Here is a bit more information what picture processors are: >>> >>> Embedded SoCs are known to have a number of hardware blocks, which >>> perform >>> such operations. They can be used in paralel to the main GPU module to >>> offload CPU from processing grapics or video data. One of example use of >>> such modules is implementing video overlay, which usually requires color >>> space conversion from NV12 (or similar) to RGB32 color space and >>> scaling to >>> target window size. >>> >>> The proposed API is heavily inspired by atomic KMS approach - it is also >>> based on DRM objects and their properties. A new DRM object is >>> introduced: >>> picture processor (called pp for convenience). Such objects have a >>> set of >>> standard DRM properties, which describes the operation to be >>> performed by >>> respective hardware module. In typical case those properties are a >>> source >>> fb id and rectangle (x, y, width, height) and destination fb id and >>> rectangle. Optionally a rotation property can be also specified if >>> supported by the given hardware. To perform an operation on image data, >>> userspace provides a set of properties and their values for given fbproc >>> object in a similar way as object and properties are provided for >>> performing atomic page flip / mode setting. >>> >>> The proposed API consists of the 3 new ioctls: >>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture >>> processors, >>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture >>> processor, >>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given >>> property set. >>> >>> The proposed API is extensible. Drivers can attach their own, custom >>> properties to add support for more advanced picture processing (for >>> example >>> blending). >>> >>> This proposal aims to replace Exynos DRM IPP (Image Post Processing) >>> subsystem. IPP API is over-engineered in general, but not really >>> extensible >>> on the other side. It is also buggy, with significant design flaws - the >>> biggest issue is the fact that the API covers memory-2-memory picture >>> operations together with CRTC writeback and duplicating features, which >>> belongs to video plane. Comparing with IPP subsystem, the PP >>> framework is >>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos >>> rotator driver smaller by over 200 lines). >> This seems to be the kind of hardware that is typically supported by >> V4L2. >> Stupid question, why DRM ? > > Let me elaborate a bit on the reasons for implementing it in Exynos DRM: > > 1. we want to replace existing Exynos IPP subsystem: > - it is used only in some internal/vendor trees, not in open-source > - we want it to have sane and potentially extensible userspace API > - but we don't want to loose its functionality > > 2. we want to have simple API for performing single image processing > operation: > - typically it will be used by compositing window manager, this means that > some parameters of the processing might change on each vblank (like > destination rectangle for example). This api allows such change on each > operation without any additional cost. V4L2 requires to reinitialize > queues with new configuration on such change, what means that a bunch of > ioctls has to be called. > - validating processing parameters in V4l2 API is really complicated, > because the parameters (format, src&dest rectangles, rotation) are being > set incrementally, so we have to either allow some impossible, > transitional > configurations or complicate the configuration steps even more (like > calling some ioctls multiple times for both input and output). In the > end > all parameters have to be again validated just before performing the > operation. > > 3. generic approach (to add it to DRM core) has been rejected: > http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html > > 4. this api can be considered as extended 'blit' operation, other DRM > drivers > (MGA, R128, VIA) already have ioctls for such operation, so there is > also > place in DRM for it > I just wanted to say that I like the proposed API. I could do some reviewing, if the API isn't discarded quickly like last time (when it was still proposed as part of DRM core). To comment about why this should be in DRM land. I personally could never get comfortable with the V4L2 API. It seems to be overengineered for the context of a blitting op, hence what Marek already said. I prefer to work with one API, so I don't have to worry about potential problems when two of them interact. For V4L2 this would be exchange of buffers, where I always have to think where I have to create some buffer, because I might be able to export it, but not import it on the other side. In this case DRM is way simpler, I just pass around my GEM handle and that's it. Stuff like that. Also, I still haven't dropped this small project of mine, using such an API for an Exynos renderer backend for the mpv media player. With best wishes, Tobias >>> Open questions: >>> - How to expose pp capabilities and supported formats? Currently this is >>> done with a drm_exynos_pp_get structure and DRM_IOCTL_EXYNOS_PP_GET >>> ioctl. >>> However one can try to use IMMUTABLE properties for capabilities and >>> src/dst format set. Rationale: recently Rob Clark proposed to create >>> a DRM >>> property with supported pixelformats and modifiers: >>> http://www.spinics.net/lists/dri-devel/msg137380.html >>> - Is it okay to use DRM objects and properties API >>> (DRM_IOCTL_MODE_GETPROPERTY and DRM_IOCTL_MODE_OBJ_GETPROPERTIES ioctls) >>> for this purpose? >>> >>> TODO: >>> - convert remaining Exynos DRM IPP drivers (FIMC, GScaller) >>> - remove Exynos DRM IPP subsystem >>> - (optional) provide virtual V4L2 mem2mem device on top of Exynos PP >>> framework >>> >>> Patches were tested on Exynos 4412-based Odroid U3 board, on top of >>> Linux >>> next-20170420 kernel. >>> >>> Best regards >>> Marek Szyprowski >>> Samsung R&D Institute Poland >>> >>> >>> Changelog: >>> v1: >>> - moved this feature from DRM core to Exynos DRM driver >>> - changed name from framebuffer processor to picture processor >>> - simplified code to cover only things needed by Exynos drivers >>> - implemented simple fifo task scheduler >>> - cleaned up rotator driver conversion (removed IPP remainings) >>> >>> >>> v0: >>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html >>> >>> - initial post of "[RFC 0/2] New feature: Framebuffer processors" >>> - generic approach implemented in DRM core, rejected >>> >>> >>> Patch summary: >>> >>> Marek Szyprowski (4): >>> drm: Export functions to create custom DRM objects >>> drm: Add support for vendor specific DRM objects with custom >>> properties >>> drm/exynos: Add Picture Processor framework >>> drm/exynos: Convert Exynos Rotator driver to Picture Processor >>> interface >>> >>> drivers/gpu/drm/drm_crtc_internal.h | 4 - >>> drivers/gpu/drm/drm_mode_object.c | 11 +- >>> drivers/gpu/drm/drm_property.c | 2 +- >>> drivers/gpu/drm/exynos/Kconfig | 1 - >>> drivers/gpu/drm/exynos/Makefile | 3 +- >>> drivers/gpu/drm/exynos/exynos_drm_drv.c | 9 + >>> drivers/gpu/drm/exynos/exynos_drm_drv.h | 15 + >>> drivers/gpu/drm/exynos/exynos_drm_pp.c | 775 >>> +++++++++++++++++++++++++ >>> drivers/gpu/drm/exynos/exynos_drm_pp.h | 155 ++++++ >>> drivers/gpu/drm/exynos/exynos_drm_rotator.c | 513 +++++------------- >>> drivers/gpu/drm/exynos/exynos_drm_rotator.h | 19 - >>> include/drm/drm_mode_object.h | 6 + >>> include/drm/drm_property.h | 7 + >>> include/uapi/drm/drm_mode.h | 1 + >>> include/uapi/drm/exynos_drm.h | 62 +++ >>> 15 files changed, 1166 insertions(+), 417 deletions(-) >>> create mode 100644 drivers/gpu/drm/exynos/exynos_drm_pp.c >>> create mode 100644 drivers/gpu/drm/exynos/exynos_drm_pp.h >>> delete mode 100644 drivers/gpu/drm/exynos/exynos_drm_rotator.h > > Best regards -- To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Dave, On 2017-04-20 21:02, Dave Airlie wrote: > On 20 April 2017 at 19:13, Marek Szyprowski <m.szyprowski@samsung.com> wrote: >> This is an updated proposal for extending EXYNOS DRM API with generic support >> for hardware modules, which can be used for processing image data from the >> one memory buffer to another. Typical memory-to-memory operations are: >> rotation, scaling, colour space conversion or mix of them. This is >> a follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer >> processors", which has been rejected as "not really needed in the DRM core": >> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html >> >> In this proposal I moved all the code to Exynos DRM driver, so now this >> will be specific only to Exynos DRM. I've also changed the name from >> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion >> with fbdev API. >> >> Here is a bit more information what picture processors are: >> >> Embedded SoCs are known to have a number of hardware blocks, which perform >> such operations. They can be used in paralel to the main GPU module to >> offload CPU from processing grapics or video data. One of example use of >> such modules is implementing video overlay, which usually requires color >> space conversion from NV12 (or similar) to RGB32 color space and scaling to >> target window size. >> >> The proposed API is heavily inspired by atomic KMS approach - it is also >> based on DRM objects and their properties. A new DRM object is introduced: >> picture processor (called pp for convenience). Such objects have a set of >> standard DRM properties, which describes the operation to be performed by >> respective hardware module. In typical case those properties are a source >> fb id and rectangle (x, y, width, height) and destination fb id and >> rectangle. Optionally a rotation property can be also specified if >> supported by the given hardware. To perform an operation on image data, >> userspace provides a set of properties and their values for given fbproc >> object in a similar way as object and properties are provided for >> performing atomic page flip / mode setting. >> >> The proposed API consists of the 3 new ioctls: >> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture >> processors, >> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture >> processor, >> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given >> property set. >> >> The proposed API is extensible. Drivers can attach their own, custom >> properties to add support for more advanced picture processing (for example >> blending). > So this looks more like a command submission API like we have for other drivers. > > Is there an overarching reason why it needs to reuse the core drm > object tracking > and properties, or was that just a nice to have, vs something like > amdgpu chunks. Thanks for your comment. DRM objects and properties were my first choice when designing this new api, but I'm still a bit new to DRM at all. I was also a bit fascinated by the atomic KMS approach, though. I selected them simply to reuse the code for managing objects and enumerating their properties from userspace. If this is not the preferred approach, I will rewrite the code to use something custom. I didn't know about amdgpu chunks, but from the quick look they are just a structure to store a set of ids and data for them. Maybe there is no need to use strings for enumerating the properties and limiting the API to the known set of IDs will be more than enough in this case. > My worry about exposing objects and properties to the drivers is I'm > sure someone > could get quite inventive and end up with a forked atomic API that we don't see, > or undocumented things. Okay. I will try different approach then. Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland -- To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Marek, On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote: > Hi Laurent, > > On 2017-04-20 12:25, Laurent Pinchart wrote: > >Hi Marek, > > > >(CC'ing Sakari Ailus) > > > >Thank you for the patches. > > > >On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote: > >>Dear all, > >> > >>This is an updated proposal for extending EXYNOS DRM API with generic > >>support for hardware modules, which can be used for processing image data > >>from the one memory buffer to another. Typical memory-to-memory operations > >>are: rotation, scaling, colour space conversion or mix of them. This is a > >>follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer > >>processors", which has been rejected as "not really needed in the DRM > >>core": > >>http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html > >> > >>In this proposal I moved all the code to Exynos DRM driver, so now this > >>will be specific only to Exynos DRM. I've also changed the name from > >>framebuffer processor (fbproc) to picture processor (pp) to avoid confusion > >>with fbdev API. > >> > >>Here is a bit more information what picture processors are: > >> > >>Embedded SoCs are known to have a number of hardware blocks, which perform > >>such operations. They can be used in paralel to the main GPU module to > >>offload CPU from processing grapics or video data. One of example use of > >>such modules is implementing video overlay, which usually requires color > >>space conversion from NV12 (or similar) to RGB32 color space and scaling to > >>target window size. > >> > >>The proposed API is heavily inspired by atomic KMS approach - it is also > >>based on DRM objects and their properties. A new DRM object is introduced: > >>picture processor (called pp for convenience). Such objects have a set of > >>standard DRM properties, which describes the operation to be performed by > >>respective hardware module. In typical case those properties are a source > >>fb id and rectangle (x, y, width, height) and destination fb id and > >>rectangle. Optionally a rotation property can be also specified if > >>supported by the given hardware. To perform an operation on image data, > >>userspace provides a set of properties and their values for given fbproc > >>object in a similar way as object and properties are provided for > >>performing atomic page flip / mode setting. > >> > >>The proposed API consists of the 3 new ioctls: > >>- DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture > >> processors, > >>- DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture > >> processor, > >>- DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given > >> property set. > >> > >>The proposed API is extensible. Drivers can attach their own, custom > >>properties to add support for more advanced picture processing (for example > >>blending). > >> > >>This proposal aims to replace Exynos DRM IPP (Image Post Processing) > >>subsystem. IPP API is over-engineered in general, but not really extensible > >>on the other side. It is also buggy, with significant design flaws - the > >>biggest issue is the fact that the API covers memory-2-memory picture > >>operations together with CRTC writeback and duplicating features, which > >>belongs to video plane. Comparing with IPP subsystem, the PP framework is > >>smaller (1807 vs 778 lines) and allows driver simplification (Exynos > >>rotator driver smaller by over 200 lines). > >This seems to be the kind of hardware that is typically supported by V4L2. > >Stupid question, why DRM ? > > Let me elaborate a bit on the reasons for implementing it in Exynos DRM: > > 1. we want to replace existing Exynos IPP subsystem: > - it is used only in some internal/vendor trees, not in open-source > - we want it to have sane and potentially extensible userspace API > - but we don't want to loose its functionality > > 2. we want to have simple API for performing single image processing > operation: > - typically it will be used by compositing window manager, this means that > some parameters of the processing might change on each vblank (like > destination rectangle for example). This api allows such change on each > operation without any additional cost. V4L2 requires to reinitialize > queues with new configuration on such change, what means that a bunch of > ioctls has to be called. What do you mean by re-initialising the queue? Format, buffers or something else? If you need a larger buffer than what you have already allocated, you'll need to re-allocate, V4L2 or not. We also do lack a way to destroy individual buffers in V4L2. It'd be up to implementing that and some work in videobuf2. Another thing is that V4L2 is very stream oriented. For most devices that's fine as a lot of the parameters are not changeable during streaming, especially if the pipeline is handled by multiple drivers. That said, for devices that process data from memory to memory performing changes in the media bus formats and pipeline configuration is not very efficient currently, largely for the same reason. The request API that people have been working for a bit different use cases isn't in mainline yet. It would allow more efficient per-request configuration than what is currently possible, but it has turned out to be far from trivial to implement. > - validating processing parameters in V4l2 API is really complicated, > because the parameters (format, src&dest rectangles, rotation) are being > set incrementally, so we have to either allow some impossible, > transitional > configurations or complicate the configuration steps even more (like > calling some ioctls multiple times for both input and output). In the end > all parameters have to be again validated just before performing the > operation. You have to validate the parameters in any case. In a MC pipeline this takes place when the stream is started. > > 3. generic approach (to add it to DRM core) has been rejected: > http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html For GPUs I generally understand the reasoning: there's a very limited number of users of this API --- primarily because it's not an application interface. If you have a device that however falls under the scope of V4L2 (at least API-wise), does this continue to be the case? Will there be only one or two (or so) users for this API? Is it the case here? Using a device specific interface definitely has some benefits: there's no need to think how would you generalise the interface for other similar devices. There's no need to consider backwards compatibility as it's not a requirement. The drawback is that the applications that need to support similar devices will bear the burden of having to support different APIs. I don't mean to say that you should ram whatever under V4L2 / MC independently of how unworkable that might be, but there are also clear advantages in using a standardised interface such as V4L2. V4L2 has a long history behind it and if it was designed today, I bet it would look quite different from what it is now. > > 4. this api can be considered as extended 'blit' operation, other DRM > drivers > (MGA, R128, VIA) already have ioctls for such operation, so there is also > place in DRM for it Added LMML to cc. -- Kind regards, Sakari Ailus e-mail: sakari.ailus@iki.fi XMPP: sailus@retiisi.org.uk -- To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 26 avril 2017 à 01:21 +0300, Sakari Ailus a écrit : > Hi Marek, > > On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote: > > Hi Laurent, > > > > On 2017-04-20 12:25, Laurent Pinchart wrote: > > > Hi Marek, > > > > > > (CC'ing Sakari Ailus) > > > > > > Thank you for the patches. > > > > > > On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote: > > > > Dear all, > > > > > > > > This is an updated proposal for extending EXYNOS DRM API with generic > > > > support for hardware modules, which can be used for processing image data > > > > from the one memory buffer to another. Typical memory-to-memory operations > > > > are: rotation, scaling, colour space conversion or mix of them. This is a > > > > follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer > > > > processors", which has been rejected as "not really needed in the DRM > > > > core": > > > > http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html > > > > > > > > In this proposal I moved all the code to Exynos DRM driver, so now this > > > > will be specific only to Exynos DRM. I've also changed the name from > > > > framebuffer processor (fbproc) to picture processor (pp) to avoid confusion > > > > with fbdev API. > > > > > > > > Here is a bit more information what picture processors are: > > > > > > > > Embedded SoCs are known to have a number of hardware blocks, which perform > > > > such operations. They can be used in paralel to the main GPU module to > > > > offload CPU from processing grapics or video data. One of example use of > > > > such modules is implementing video overlay, which usually requires color > > > > space conversion from NV12 (or similar) to RGB32 color space and scaling to > > > > target window size. > > > > > > > > The proposed API is heavily inspired by atomic KMS approach - it is also > > > > based on DRM objects and their properties. A new DRM object is introduced: > > > > picture processor (called pp for convenience). Such objects have a set of > > > > standard DRM properties, which describes the operation to be performed by > > > > respective hardware module. In typical case those properties are a source > > > > fb id and rectangle (x, y, width, height) and destination fb id and > > > > rectangle. Optionally a rotation property can be also specified if > > > > supported by the given hardware. To perform an operation on image data, > > > > userspace provides a set of properties and their values for given fbproc > > > > object in a similar way as object and properties are provided for > > > > performing atomic page flip / mode setting. > > > > > > > > The proposed API consists of the 3 new ioctls: > > > > - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture > > > > processors, > > > > - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture > > > > processor, > > > > - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given > > > > property set. > > > > > > > > The proposed API is extensible. Drivers can attach their own, custom > > > > properties to add support for more advanced picture processing (for example > > > > blending). > > > > > > > > This proposal aims to replace Exynos DRM IPP (Image Post Processing) > > > > subsystem. IPP API is over-engineered in general, but not really extensible > > > > on the other side. It is also buggy, with significant design flaws - the > > > > biggest issue is the fact that the API covers memory-2-memory picture > > > > operations together with CRTC writeback and duplicating features, which > > > > belongs to video plane. Comparing with IPP subsystem, the PP framework is > > > > smaller (1807 vs 778 lines) and allows driver simplification (Exynos > > > > rotator driver smaller by over 200 lines). Just a side note, we have written code in GStreamer using the Exnynos 4 FIMC IPP driver. I don't know how many, if any, deployment still exist (Exynos 4 is relatively old now), but there exist userspace for the FIMC driver. We use this for color transformation (from tiled to linear) and scaling. The FIMC driver is in fact quite stable in upstream kernel today. The GScaler V4L2 M2M driver on Exynos 5 is largely based on it and has received some maintenance to properly work in GStreamer. unlike this DRM API, you can reuse the same userspace code across multiple platforms (which we do already). We have also integrated this driver in Chromium in the past (not upstream though). I am well aware that the blitter driver has not got much attention though. But again, V4L2 offers a generic interface to userspace application. Fixing this driver could enable some work like this one: https://bugzilla.gnome.org/show_bug.cgi?id=772766 This work in progress feature is a generic hardware accelerated video mixer. It has been tested with IMX.6 v4l2 m2m blitter driver (which I believe is in staging right now). Again, unlike the exynos/drm, this code could be reused between platforms. In general, the problem with the DRM approach is that it only targets displays. We often need to use these IP block for stream pre/post processing outside a "playback" use case. What I'd like so see instead here, is an approach that helps both world instead of trying to win the control over the IP block. Renesas development seems to lead toward the right direction by creating drivers that can be both interfaced in DRM and V4L2. For IPP and GScaler on Exynos, this would be a greater benefit and finally the code could be shared, having a single place to fix when we find bugs. > > > > > > This seems to be the kind of hardware that is typically supported by V4L2. > > > Stupid question, why DRM ? > > > > Let me elaborate a bit on the reasons for implementing it in Exynos DRM: > > > > 1. we want to replace existing Exynos IPP subsystem: > > - it is used only in some internal/vendor trees, not in open-source > > - we want it to have sane and potentially extensible userspace API > > - but we don't want to loose its functionality > > > > 2. we want to have simple API for performing single image processing > > operation: > > - typically it will be used by compositing window manager, this means that > > some parameters of the processing might change on each vblank (like > > destination rectangle for example). This api allows such change on each > > operation without any additional cost. V4L2 requires to reinitialize > > queues with new configuration on such change, what means that a bunch of > > ioctls has to be called. > > What do you mean by re-initialising the queue? Format, buffers or something > else? > > If you need a larger buffer than what you have already allocated, you'll > need to re-allocate, V4L2 or not. > > We also do lack a way to destroy individual buffers in V4L2. It'd be up to > implementing that and some work in videobuf2. > > Another thing is that V4L2 is very stream oriented. For most devices that's > fine as a lot of the parameters are not changeable during streaming, > especially if the pipeline is handled by multiple drivers. That said, for > devices that process data from memory to memory performing changes in the > media bus formats and pipeline configuration is not very efficient > currently, largely for the same reason. > > The request API that people have been working for a bit different use cases > isn't in mainline yet. It would allow more efficient per-request > configuration than what is currently possible, but it has turned out to be > far from trivial to implement. > > > - validating processing parameters in V4l2 API is really complicated, > > because the parameters (format, src&dest rectangles, rotation) are being > > set incrementally, so we have to either allow some impossible, > > transitional > > configurations or complicate the configuration steps even more (like > > calling some ioctls multiple times for both input and output). In the end > > all parameters have to be again validated just before performing the > > operation. > > You have to validate the parameters in any case. In a MC pipeline this takes > place when the stream is started. > > > > > 3. generic approach (to add it to DRM core) has been rejected: > > http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html > > For GPUs I generally understand the reasoning: there's a very limited number > of users of this API --- primarily because it's not an application > interface. > > If you have a device that however falls under the scope of V4L2 (at least > API-wise), does this continue to be the case? Will there be only one or two > (or so) users for this API? Is it the case here? > > Using a device specific interface definitely has some benefits: there's no > need to think how would you generalise the interface for other similar > devices. There's no need to consider backwards compatibility as it's not a > requirement. The drawback is that the applications that need to support > similar devices will bear the burden of having to support different APIs. > > I don't mean to say that you should ram whatever under V4L2 / MC > independently of how unworkable that might be, but there are also clear > advantages in using a standardised interface such as V4L2. > > V4L2 has a long history behind it and if it was designed today, I bet it > would look quite different from what it is now. > > > > > 4. this api can be considered as extended 'blit' operation, other DRM > > drivers > > (MGA, R128, VIA) already have ioctls for such operation, so there is also > > place in DRM for it Note that I am convince that using these custom IOCTL within a "compositor" implementation is much easier and uniform compared to using a v4l2 driver. It probably offers lower latency. But these are non-generic and are not a great fit for streaming purpose. Request API and probably explicit fence may mitigate this though. Meanwhile, there is some indication that even though complex, there is already some people that do think implementing a compositor combining V4L2 and DRM is feasible. http://events.linuxfoundation.org/sites/events/files/slides/als2015_way land_weston_v2.pdf > > Added LMML to cc. Thanks.
Le mercredi 26 avril 2017 à 18:52 +0200, Tobias Jakobi a écrit : > Hello again, > > > Nicolas Dufresne wrote: > > Le mercredi 26 avril 2017 à 01:21 +0300, Sakari Ailus a écrit : > > > Hi Marek, > > > > > > On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote: > > > > Hi Laurent, > > > > > > > > On 2017-04-20 12:25, Laurent Pinchart wrote: > > > > > Hi Marek, > > > > > > > > > > (CC'ing Sakari Ailus) > > > > > > > > > > Thank you for the patches. > > > > > > > > > > On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote: > > > > > > Dear all, > > > > > > > > > > > > This is an updated proposal for extending EXYNOS DRM API with generic > > > > > > support for hardware modules, which can be used for processing image data > > > > > > from the one memory buffer to another. Typical memory-to-memory operations > > > > > > are: rotation, scaling, colour space conversion or mix of them. This is a > > > > > > follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer > > > > > > processors", which has been rejected as "not really needed in the DRM > > > > > > core": > > > > > > http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html > > > > > > > > > > > > In this proposal I moved all the code to Exynos DRM driver, so now this > > > > > > will be specific only to Exynos DRM. I've also changed the name from > > > > > > framebuffer processor (fbproc) to picture processor (pp) to avoid confusion > > > > > > with fbdev API. > > > > > > > > > > > > Here is a bit more information what picture processors are: > > > > > > > > > > > > Embedded SoCs are known to have a number of hardware blocks, which perform > > > > > > such operations. They can be used in paralel to the main GPU module to > > > > > > offload CPU from processing grapics or video data. One of example use of > > > > > > such modules is implementing video overlay, which usually requires color > > > > > > space conversion from NV12 (or similar) to RGB32 color space and scaling to > > > > > > target window size. > > > > > > > > > > > > The proposed API is heavily inspired by atomic KMS approach - it is also > > > > > > based on DRM objects and their properties. A new DRM object is introduced: > > > > > > picture processor (called pp for convenience). Such objects have a set of > > > > > > standard DRM properties, which describes the operation to be performed by > > > > > > respective hardware module. In typical case those properties are a source > > > > > > fb id and rectangle (x, y, width, height) and destination fb id and > > > > > > rectangle. Optionally a rotation property can be also specified if > > > > > > supported by the given hardware. To perform an operation on image data, > > > > > > userspace provides a set of properties and their values for given fbproc > > > > > > object in a similar way as object and properties are provided for > > > > > > performing atomic page flip / mode setting. > > > > > > > > > > > > The proposed API consists of the 3 new ioctls: > > > > > > - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture > > > > > > processors, > > > > > > - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture > > > > > > processor, > > > > > > - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given > > > > > > property set. > > > > > > > > > > > > The proposed API is extensible. Drivers can attach their own, custom > > > > > > properties to add support for more advanced picture processing (for example > > > > > > blending). > > > > > > > > > > > > This proposal aims to replace Exynos DRM IPP (Image Post Processing) > > > > > > subsystem. IPP API is over-engineered in general, but not really extensible > > > > > > on the other side. It is also buggy, with significant design flaws - the > > > > > > biggest issue is the fact that the API covers memory-2-memory picture > > > > > > operations together with CRTC writeback and duplicating features, which > > > > > > belongs to video plane. Comparing with IPP subsystem, the PP framework is > > > > > > smaller (1807 vs 778 lines) and allows driver simplification (Exynos > > > > > > rotator driver smaller by over 200 lines). > > > > Just a side note, we have written code in GStreamer using the Exnynos 4 > > FIMC IPP driver. I don't know how many, if any, deployment still exist > > (Exynos 4 is relatively old now), but there exist userspace for the > > FIMC driver. > > I was searching for this code, but I didn't find anything. Are you sure > you really mean the FIMC IPP in Exynos DRM, and not just the FIMC driver > from the V4L2 subsystem? Oops, I manage to be unclear. Having two drivers on the same IP isn't helping. We wrote code around the FIMC driver on V4L2 side. This driver: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/platform/exynos4-is/fimc-m2m.c And this code: https://cgit.freedesktop.org/gstreamer/gst-plugins-good/tree/sys/v4l2/gstv4l2transform.c Unless I have miss-read, the proposal here is to deprecate the V4L side and improve the DRM side (which I stand against in my reply). > > > With best wishes, > Tobias > > > > > We use this for color transformation (from tiled to > > linear) and scaling. The FIMC driver is in fact quite stable in > > upstream kernel today. The GScaler V4L2 M2M driver on Exynos 5 is > > largely based on it and has received some maintenance to properly work > > in GStreamer. unlike this DRM API, you can reuse the same userspace > > code across multiple platforms (which we do already). We have also > > integrated this driver in Chromium in the past (not upstream though). > > > > I am well aware that the blitter driver has not got much attention > > though. But again, V4L2 offers a generic interface to userspace > > application. Fixing this driver could enable some work like this one: > > > > https://bugzilla.gnome.org/show_bug.cgi?id=772766 > > > > This work in progress feature is a generic hardware accelerated video > > mixer. It has been tested with IMX.6 v4l2 m2m blitter driver (which I > > believe is in staging right now). Again, unlike the exynos/drm, this > > code could be reused between platforms. > > > > In general, the problem with the DRM approach is that it only targets > > displays. We often need to use these IP block for stream pre/post > > processing outside a "playback" use case. > > > > What I'd like so see instead here, is an approach that helps both world > > instead of trying to win the control over the IP block. Renesas > > development seems to lead toward the right direction by creating > > drivers that can be both interfaced in DRM and V4L2. For IPP and > > GScaler on Exynos, this would be a greater benefit and finally the code > > could be shared, having a single place to fix when we find bugs. > > > > > > > > > > > > This seems to be the kind of hardware that is typically supported by V4L2. > > > > > Stupid question, why DRM ? > > > > > > > > Let me elaborate a bit on the reasons for implementing it in Exynos DRM: > > > > > > > > 1. we want to replace existing Exynos IPP subsystem: > > > > - it is used only in some internal/vendor trees, not in open-source > > > > - we want it to have sane and potentially extensible userspace API > > > > - but we don't want to loose its functionality > > > > > > > > 2. we want to have simple API for performing single image processing > > > > operation: > > > > - typically it will be used by compositing window manager, this means that > > > > some parameters of the processing might change on each vblank (like > > > > destination rectangle for example). This api allows such change on each > > > > operation without any additional cost. V4L2 requires to reinitialize > > > > queues with new configuration on such change, what means that a bunch of > > > > ioctls has to be called. > > > > > > What do you mean by re-initialising the queue? Format, buffers or something > > > else? > > > > > > If you need a larger buffer than what you have already allocated, you'll > > > need to re-allocate, V4L2 or not. > > > > > > We also do lack a way to destroy individual buffers in V4L2. It'd be up to > > > implementing that and some work in videobuf2. > > > > > > Another thing is that V4L2 is very stream oriented. For most devices that's > > > fine as a lot of the parameters are not changeable during streaming, > > > especially if the pipeline is handled by multiple drivers. That said, for > > > devices that process data from memory to memory performing changes in the > > > media bus formats and pipeline configuration is not very efficient > > > currently, largely for the same reason. > > > > > > The request API that people have been working for a bit different use cases > > > isn't in mainline yet. It would allow more efficient per-request > > > configuration than what is currently possible, but it has turned out to be > > > far from trivial to implement. > > > > > > > - validating processing parameters in V4l2 API is really complicated, > > > > because the parameters (format, src&dest rectangles, rotation) are being > > > > set incrementally, so we have to either allow some impossible, > > > > transitional > > > > configurations or complicate the configuration steps even more (like > > > > calling some ioctls multiple times for both input and output). In the end > > > > all parameters have to be again validated just before performing the > > > > operation. > > > > > > You have to validate the parameters in any case. In a MC pipeline this takes > > > place when the stream is started. > > > > > > > > > > > 3. generic approach (to add it to DRM core) has been rejected: > > > > http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html > > > > > > For GPUs I generally understand the reasoning: there's a very limited number > > > of users of this API --- primarily because it's not an application > > > interface. > > > > > > If you have a device that however falls under the scope of V4L2 (at least > > > API-wise), does this continue to be the case? Will there be only one or two > > > (or so) users for this API? Is it the case here? > > > > > > Using a device specific interface definitely has some benefits: there's no > > > need to think how would you generalise the interface for other similar > > > devices. There's no need to consider backwards compatibility as it's not a > > > requirement. The drawback is that the applications that need to support > > > similar devices will bear the burden of having to support different APIs. > > > > > > I don't mean to say that you should ram whatever under V4L2 / MC > > > independently of how unworkable that might be, but there are also clear > > > advantages in using a standardised interface such as V4L2. > > > > > > V4L2 has a long history behind it and if it was designed today, I bet it > > > would look quite different from what it is now. > > > > > > > > > > > 4. this api can be considered as extended 'blit' operation, other DRM > > > > drivers > > > > (MGA, R128, VIA) already have ioctls for such operation, so there is also > > > > place in DRM for it > > > > Note that I am convince that using these custom IOCTL within a > > "compositor" implementation is much easier and uniform compared to > > using a v4l2 driver. It probably offers lower latency. But these are > > non-generic and are not a great fit for streaming purpose. Request API > > and probably explicit fence may mitigate this though. Meanwhile, there > > is some indication that even though complex, there is already some > > people that do think implementing a compositor combining V4L2 and DRM > > is feasible. > > > > http://events.linuxfoundation.org/sites/events/files/slides/als2015_way > > land_weston_v2.pdf > > > > > > > > Added LMML to cc. > > > > Thanks. > > > >
Hey, Nicolas Dufresne wrote: > Le mercredi 26 avril 2017 à 18:52 +0200, Tobias Jakobi a écrit : >> Hello again, >> >> >> Nicolas Dufresne wrote: >>> Le mercredi 26 avril 2017 à 01:21 +0300, Sakari Ailus a écrit : >>>> Hi Marek, >>>> >>>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote: >>>>> Hi Laurent, >>>>> >>>>> On 2017-04-20 12:25, Laurent Pinchart wrote: >>>>>> Hi Marek, >>>>>> >>>>>> (CC'ing Sakari Ailus) >>>>>> >>>>>> Thank you for the patches. >>>>>> >>>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote: >>>>>>> Dear all, >>>>>>> >>>>>>> This is an updated proposal for extending EXYNOS DRM API with generic >>>>>>> support for hardware modules, which can be used for processing image data >>>>>>> from the one memory buffer to another. Typical memory-to-memory operations >>>>>>> are: rotation, scaling, colour space conversion or mix of them. This is a >>>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer >>>>>>> processors", which has been rejected as "not really needed in the DRM >>>>>>> core": >>>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html >>>>>>> >>>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this >>>>>>> will be specific only to Exynos DRM. I've also changed the name from >>>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion >>>>>>> with fbdev API. >>>>>>> >>>>>>> Here is a bit more information what picture processors are: >>>>>>> >>>>>>> Embedded SoCs are known to have a number of hardware blocks, which perform >>>>>>> such operations. They can be used in paralel to the main GPU module to >>>>>>> offload CPU from processing grapics or video data. One of example use of >>>>>>> such modules is implementing video overlay, which usually requires color >>>>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to >>>>>>> target window size. >>>>>>> >>>>>>> The proposed API is heavily inspired by atomic KMS approach - it is also >>>>>>> based on DRM objects and their properties. A new DRM object is introduced: >>>>>>> picture processor (called pp for convenience). Such objects have a set of >>>>>>> standard DRM properties, which describes the operation to be performed by >>>>>>> respective hardware module. In typical case those properties are a source >>>>>>> fb id and rectangle (x, y, width, height) and destination fb id and >>>>>>> rectangle. Optionally a rotation property can be also specified if >>>>>>> supported by the given hardware. To perform an operation on image data, >>>>>>> userspace provides a set of properties and their values for given fbproc >>>>>>> object in a similar way as object and properties are provided for >>>>>>> performing atomic page flip / mode setting. >>>>>>> >>>>>>> The proposed API consists of the 3 new ioctls: >>>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture >>>>>>> processors, >>>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture >>>>>>> processor, >>>>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given >>>>>>> property set. >>>>>>> >>>>>>> The proposed API is extensible. Drivers can attach their own, custom >>>>>>> properties to add support for more advanced picture processing (for example >>>>>>> blending). >>>>>>> >>>>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing) >>>>>>> subsystem. IPP API is over-engineered in general, but not really extensible >>>>>>> on the other side. It is also buggy, with significant design flaws - the >>>>>>> biggest issue is the fact that the API covers memory-2-memory picture >>>>>>> operations together with CRTC writeback and duplicating features, which >>>>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is >>>>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos >>>>>>> rotator driver smaller by over 200 lines). >>> >>> Just a side note, we have written code in GStreamer using the Exnynos 4 >>> FIMC IPP driver. I don't know how many, if any, deployment still exist >>> (Exynos 4 is relatively old now), but there exist userspace for the >>> FIMC driver. >> >> I was searching for this code, but I didn't find anything. Are you sure >> you really mean the FIMC IPP in Exynos DRM, and not just the FIMC driver >> from the V4L2 subsystem? > > Oops, I manage to be unclear. Having two drivers on the same IP isn't > helping. We wrote code around the FIMC driver on V4L2 side. This > driver: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/platform/exynos4-is/fimc-m2m.c > > And this code: > > https://cgit.freedesktop.org/gstreamer/gst-plugins-good/tree/sys/v4l2/gstv4l2transform.c > > Unless I have miss-read, the proposal here is to deprecate the V4L side > and improve the DRM side (which I stand against in my reply). I'm pretty sure you have misread Marek's description of the patchset. The picture processor API should replaced/deprecate the IPP API that is currently implemented in the Exynos DRM. In particular this affects the following files: - drivers/gpu/drm/exynos/exynos_drm_ipp.{c,h} - drivers/gpu/drm/exynos/exynos_drm_fimc.{c,h} - drivers/gpu/drm/exynos/exynos_drm_gsc.{c,h} - drivers/gpu/drm/exynos/exynos_drm_rotator.{c,h} I know only two places where the IPP API is actually used. Tizen and my experimental mpv backend. With best wishes, Tobias > >> >> >> With best wishes, >> Tobias >> >> >> >>> We use this for color transformation (from tiled to >>> linear) and scaling. The FIMC driver is in fact quite stable in >>> upstream kernel today. The GScaler V4L2 M2M driver on Exynos 5 is >>> largely based on it and has received some maintenance to properly work >>> in GStreamer. unlike this DRM API, you can reuse the same userspace >>> code across multiple platforms (which we do already). We have also >>> integrated this driver in Chromium in the past (not upstream though). >>> >>> I am well aware that the blitter driver has not got much attention >>> though. But again, V4L2 offers a generic interface to userspace >>> application. Fixing this driver could enable some work like this one: >>> >>> https://bugzilla.gnome.org/show_bug.cgi?id=772766 >>> >>> This work in progress feature is a generic hardware accelerated video >>> mixer. It has been tested with IMX.6 v4l2 m2m blitter driver (which I >>> believe is in staging right now). Again, unlike the exynos/drm, this >>> code could be reused between platforms. >>> >>> In general, the problem with the DRM approach is that it only targets >>> displays. We often need to use these IP block for stream pre/post >>> processing outside a "playback" use case. >>> >>> What I'd like so see instead here, is an approach that helps both world >>> instead of trying to win the control over the IP block. Renesas >>> development seems to lead toward the right direction by creating >>> drivers that can be both interfaced in DRM and V4L2. For IPP and >>> GScaler on Exynos, this would be a greater benefit and finally the code >>> could be shared, having a single place to fix when we find bugs. >>> >>>>>> >>>>>> This seems to be the kind of hardware that is typically supported by V4L2. >>>>>> Stupid question, why DRM ? >>>>> >>>>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM: >>>>> >>>>> 1. we want to replace existing Exynos IPP subsystem: >>>>> - it is used only in some internal/vendor trees, not in open-source >>>>> - we want it to have sane and potentially extensible userspace API >>>>> - but we don't want to loose its functionality >>>>> >>>>> 2. we want to have simple API for performing single image processing >>>>> operation: >>>>> - typically it will be used by compositing window manager, this means that >>>>> some parameters of the processing might change on each vblank (like >>>>> destination rectangle for example). This api allows such change on each >>>>> operation without any additional cost. V4L2 requires to reinitialize >>>>> queues with new configuration on such change, what means that a bunch of >>>>> ioctls has to be called. >>>> >>>> What do you mean by re-initialising the queue? Format, buffers or something >>>> else? >>>> >>>> If you need a larger buffer than what you have already allocated, you'll >>>> need to re-allocate, V4L2 or not. >>>> >>>> We also do lack a way to destroy individual buffers in V4L2. It'd be up to >>>> implementing that and some work in videobuf2. >>>> >>>> Another thing is that V4L2 is very stream oriented. For most devices that's >>>> fine as a lot of the parameters are not changeable during streaming, >>>> especially if the pipeline is handled by multiple drivers. That said, for >>>> devices that process data from memory to memory performing changes in the >>>> media bus formats and pipeline configuration is not very efficient >>>> currently, largely for the same reason. >>>> >>>> The request API that people have been working for a bit different use cases >>>> isn't in mainline yet. It would allow more efficient per-request >>>> configuration than what is currently possible, but it has turned out to be >>>> far from trivial to implement. >>>> >>>>> - validating processing parameters in V4l2 API is really complicated, >>>>> because the parameters (format, src&dest rectangles, rotation) are being >>>>> set incrementally, so we have to either allow some impossible, >>>>> transitional >>>>> configurations or complicate the configuration steps even more (like >>>>> calling some ioctls multiple times for both input and output). In the end >>>>> all parameters have to be again validated just before performing the >>>>> operation. >>>> >>>> You have to validate the parameters in any case. In a MC pipeline this takes >>>> place when the stream is started. >>>> >>>>> >>>>> 3. generic approach (to add it to DRM core) has been rejected: >>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html >>>> >>>> For GPUs I generally understand the reasoning: there's a very limited number >>>> of users of this API --- primarily because it's not an application >>>> interface. >>>> >>>> If you have a device that however falls under the scope of V4L2 (at least >>>> API-wise), does this continue to be the case? Will there be only one or two >>>> (or so) users for this API? Is it the case here? >>>> >>>> Using a device specific interface definitely has some benefits: there's no >>>> need to think how would you generalise the interface for other similar >>>> devices. There's no need to consider backwards compatibility as it's not a >>>> requirement. The drawback is that the applications that need to support >>>> similar devices will bear the burden of having to support different APIs. >>>> >>>> I don't mean to say that you should ram whatever under V4L2 / MC >>>> independently of how unworkable that might be, but there are also clear >>>> advantages in using a standardised interface such as V4L2. >>>> >>>> V4L2 has a long history behind it and if it was designed today, I bet it >>>> would look quite different from what it is now. >>>> >>>>> >>>>> 4. this api can be considered as extended 'blit' operation, other DRM >>>>> drivers >>>>> (MGA, R128, VIA) already have ioctls for such operation, so there is also >>>>> place in DRM for it >>> >>> Note that I am convince that using these custom IOCTL within a >>> "compositor" implementation is much easier and uniform compared to >>> using a v4l2 driver. It probably offers lower latency. But these are >>> non-generic and are not a great fit for streaming purpose. Request API >>> and probably explicit fence may mitigate this though. Meanwhile, there >>> is some indication that even though complex, there is already some >>> people that do think implementing a compositor combining V4L2 and DRM >>> is feasible. >>> >>> http://events.linuxfoundation.org/sites/events/files/slides/als2015_way >>> land_weston_v2.pdf >>> >>>> >>>> Added LMML to cc. >>> >>> Thanks. >>> >> -- To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Sakari, On 2017-04-26 00:21, Sakari Ailus wrote: > Hi Marek, > > On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote: >> Hi Laurent, >> >> On 2017-04-20 12:25, Laurent Pinchart wrote: >>> Hi Marek, >>> >>> (CC'ing Sakari Ailus) >>> >>> Thank you for the patches. >>> >>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote: >>>> Dear all, >>>> >>>> This is an updated proposal for extending EXYNOS DRM API with generic >>>> support for hardware modules, which can be used for processing image data >>> >from the one memory buffer to another. Typical memory-to-memory operations >>>> are: rotation, scaling, colour space conversion or mix of them. This is a >>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer >>>> processors", which has been rejected as "not really needed in the DRM >>>> core": >>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html >>>> >>>> In this proposal I moved all the code to Exynos DRM driver, so now this >>>> will be specific only to Exynos DRM. I've also changed the name from >>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion >>>> with fbdev API. >>>> >>>> Here is a bit more information what picture processors are: >>>> >>>> Embedded SoCs are known to have a number of hardware blocks, which perform >>>> such operations. They can be used in paralel to the main GPU module to >>>> offload CPU from processing grapics or video data. One of example use of >>>> such modules is implementing video overlay, which usually requires color >>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to >>>> target window size. >>>> >>>> The proposed API is heavily inspired by atomic KMS approach - it is also >>>> based on DRM objects and their properties. A new DRM object is introduced: >>>> picture processor (called pp for convenience). Such objects have a set of >>>> standard DRM properties, which describes the operation to be performed by >>>> respective hardware module. In typical case those properties are a source >>>> fb id and rectangle (x, y, width, height) and destination fb id and >>>> rectangle. Optionally a rotation property can be also specified if >>>> supported by the given hardware. To perform an operation on image data, >>>> userspace provides a set of properties and their values for given fbproc >>>> object in a similar way as object and properties are provided for >>>> performing atomic page flip / mode setting. >>>> >>>> The proposed API consists of the 3 new ioctls: >>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture >>>> processors, >>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture >>>> processor, >>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given >>>> property set. >>>> >>>> The proposed API is extensible. Drivers can attach their own, custom >>>> properties to add support for more advanced picture processing (for example >>>> blending). >>>> >>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing) >>>> subsystem. IPP API is over-engineered in general, but not really extensible >>>> on the other side. It is also buggy, with significant design flaws - the >>>> biggest issue is the fact that the API covers memory-2-memory picture >>>> operations together with CRTC writeback and duplicating features, which >>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is >>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos >>>> rotator driver smaller by over 200 lines). >>> This seems to be the kind of hardware that is typically supported by V4L2. >>> Stupid question, why DRM ? >> Let me elaborate a bit on the reasons for implementing it in Exynos DRM: >> >> 1. we want to replace existing Exynos IPP subsystem: >> - it is used only in some internal/vendor trees, not in open-source >> - we want it to have sane and potentially extensible userspace API >> - but we don't want to loose its functionality >> >> 2. we want to have simple API for performing single image processing >> operation: >> - typically it will be used by compositing window manager, this means that >> some parameters of the processing might change on each vblank (like >> destination rectangle for example). This api allows such change on each >> operation without any additional cost. V4L2 requires to reinitialize >> queues with new configuration on such change, what means that a bunch of >> ioctls has to be called. > What do you mean by re-initialising the queue? Format, buffers or something > else? In case of compositor use case, the parameter that is being changed most frequently is source and/or destination rectangle position and/or size. > If you need a larger buffer than what you have already allocated, you'll > need to re-allocate, V4L2 or not. > > We also do lack a way to destroy individual buffers in V4L2. It'd be up to > implementing that and some work in videobuf2. Well if we would use V4l2, buffers will always come as dmabuf objects. There is a hard limit of the number of buffers that can be imported to v4l2/vb2 queue to get buffer ids. This also limits easy processing of the buffers in the compositor, because you would need to reinitialize the v4l2 queues to get new set of v4l2/vb2 buffer ids. > Another thing is that V4L2 is very stream oriented. For most devices that's > fine as a lot of the parameters are not changeable during streaming, > especially if the pipeline is handled by multiple drivers. That said, for > devices that process data from memory to memory performing changes in the > media bus formats and pipeline configuration is not very efficient > currently, largely for the same reason. > > The request API that people have been working for a bit different use cases > isn't in mainline yet. It would allow more efficient per-request > configuration than what is currently possible, but it has turned out to be > far from trivial to implement. > >> - validating processing parameters in V4l2 API is really complicated, >> because the parameters (format, src&dest rectangles, rotation) are being >> set incrementally, so we have to either allow some impossible, >> transitional >> configurations or complicate the configuration steps even more (like >> calling some ioctls multiple times for both input and output). In the end >> all parameters have to be again validated just before performing the >> operation. > You have to validate the parameters in any case. In a MC pipeline this takes > place when the stream is started. Well, in case of v4l2 one would need to stop and restart 'steaming' on both queues of mem2mem device just to change some transformation parameters... >> 3. generic approach (to add it to DRM core) has been rejected: >> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html > For GPUs I generally understand the reasoning: there's a very limited number > of users of this API --- primarily because it's not an application > interface. > > If you have a device that however falls under the scope of V4L2 (at least > API-wise), does this continue to be the case? Will there be only one or two > (or so) users for this API? Is it the case here? > > Using a device specific interface definitely has some benefits: there's no > need to think how would you generalise the interface for other similar > devices. There's no need to consider backwards compatibility as it's not a > requirement. The drawback is that the applications that need to support > similar devices will bear the burden of having to support different APIs. > > I don't mean to say that you should ram whatever under V4L2 / MC > independently of how unworkable that might be, but there are also clear > advantages in using a standardised interface such as V4L2. > > V4L2 has a long history behind it and if it was designed today, I bet it > would look quite different from what it is now. IMHO V4l2 becomes both a bit over-engineered because of the 'backwards compatibility' and too limited on the other hand to support really complex hardware (iirc none of the top mobile Android HW vendors use V4l2 for their cameras subsystems). This is however a completely separate topic. >> 4. this api can be considered as extended 'blit' operation, other DRM >> drivers >> (MGA, R128, VIA) already have ioctls for such operation, so there is also >> place in DRM for it > Added LMML to cc. > Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland -- To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2017년 04월 26일 07:21에 Sakari Ailus 이(가) 쓴 글: > Hi Marek, > > On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote: >> Hi Laurent, >> >> On 2017-04-20 12:25, Laurent Pinchart wrote: >>> Hi Marek, >>> >>> (CC'ing Sakari Ailus) >>> >>> Thank you for the patches. >>> >>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote: >>>> Dear all, >>>> >>>> This is an updated proposal for extending EXYNOS DRM API with generic >>>> support for hardware modules, which can be used for processing image data >>> >from the one memory buffer to another. Typical memory-to-memory operations >>>> are: rotation, scaling, colour space conversion or mix of them. This is a >>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer >>>> processors", which has been rejected as "not really needed in the DRM >>>> core": >>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html >>>> >>>> In this proposal I moved all the code to Exynos DRM driver, so now this >>>> will be specific only to Exynos DRM. I've also changed the name from >>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion >>>> with fbdev API. >>>> >>>> Here is a bit more information what picture processors are: >>>> >>>> Embedded SoCs are known to have a number of hardware blocks, which perform >>>> such operations. They can be used in paralel to the main GPU module to >>>> offload CPU from processing grapics or video data. One of example use of >>>> such modules is implementing video overlay, which usually requires color >>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to >>>> target window size. >>>> >>>> The proposed API is heavily inspired by atomic KMS approach - it is also >>>> based on DRM objects and their properties. A new DRM object is introduced: >>>> picture processor (called pp for convenience). Such objects have a set of >>>> standard DRM properties, which describes the operation to be performed by >>>> respective hardware module. In typical case those properties are a source >>>> fb id and rectangle (x, y, width, height) and destination fb id and >>>> rectangle. Optionally a rotation property can be also specified if >>>> supported by the given hardware. To perform an operation on image data, >>>> userspace provides a set of properties and their values for given fbproc >>>> object in a similar way as object and properties are provided for >>>> performing atomic page flip / mode setting. >>>> >>>> The proposed API consists of the 3 new ioctls: >>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture >>>> processors, >>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture >>>> processor, >>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given >>>> property set. >>>> >>>> The proposed API is extensible. Drivers can attach their own, custom >>>> properties to add support for more advanced picture processing (for example >>>> blending). >>>> >>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing) >>>> subsystem. IPP API is over-engineered in general, but not really extensible >>>> on the other side. It is also buggy, with significant design flaws - the >>>> biggest issue is the fact that the API covers memory-2-memory picture >>>> operations together with CRTC writeback and duplicating features, which >>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is >>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos >>>> rotator driver smaller by over 200 lines). >>> This seems to be the kind of hardware that is typically supported by V4L2. >>> Stupid question, why DRM ? >> >> Let me elaborate a bit on the reasons for implementing it in Exynos DRM: >> >> 1. we want to replace existing Exynos IPP subsystem: >> - it is used only in some internal/vendor trees, not in open-source >> - we want it to have sane and potentially extensible userspace API >> - but we don't want to loose its functionality >> >> 2. we want to have simple API for performing single image processing >> operation: >> - typically it will be used by compositing window manager, this means that >> some parameters of the processing might change on each vblank (like >> destination rectangle for example). This api allows such change on each >> operation without any additional cost. V4L2 requires to reinitialize >> queues with new configuration on such change, what means that a bunch of >> ioctls has to be called. > > What do you mean by re-initialising the queue? Format, buffers or something > else? > > If you need a larger buffer than what you have already allocated, you'll > need to re-allocate, V4L2 or not. > > We also do lack a way to destroy individual buffers in V4L2. It'd be up to > implementing that and some work in videobuf2. > > Another thing is that V4L2 is very stream oriented. For most devices that's > fine as a lot of the parameters are not changeable during streaming, > especially if the pipeline is handled by multiple drivers. That said, for > devices that process data from memory to memory performing changes in the > media bus formats and pipeline configuration is not very efficient > currently, largely for the same reason. > > The request API that people have been working for a bit different use cases > isn't in mainline yet. It would allow more efficient per-request > configuration than what is currently possible, but it has turned out to be > far from trivial to implement. > >> - validating processing parameters in V4l2 API is really complicated, >> because the parameters (format, src&dest rectangles, rotation) are being >> set incrementally, so we have to either allow some impossible, >> transitional >> configurations or complicate the configuration steps even more (like >> calling some ioctls multiple times for both input and output). In the end >> all parameters have to be again validated just before performing the >> operation. > > You have to validate the parameters in any case. In a MC pipeline this takes > place when the stream is started. > >> >> 3. generic approach (to add it to DRM core) has been rejected: >> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html > > For GPUs I generally understand the reasoning: there's a very limited number > of users of this API --- primarily because it's not an application > interface. > > If you have a device that however falls under the scope of V4L2 (at least > API-wise), does this continue to be the case? Will there be only one or two > (or so) users for this API? Is it the case here? > > Using a device specific interface definitely has some benefits: there's no > need to think how would you generalise the interface for other similar > devices. There's no need to consider backwards compatibility as it's not a > requirement. The drawback is that the applications that need to support > similar devices will bear the burden of having to support different APIs. > > I don't mean to say that you should ram whatever under V4L2 / MC > independently of how unworkable that might be, but there are also clear > advantages in using a standardised interface such as V4L2. > > V4L2 has a long history behind it and if it was designed today, I bet it > would look quite different from what it is now. It's true. There is definitely a benefit with V4L2 because V4L2 provides Linux standard ABI - for DRM as of now not. However, I think that is a only benefit we could get through V4L2. Using V4L2 makes software stack of Platform to be complicated - We have to open video device node and card device node to display a image on the screen scaling or converting color space of the image and also we need to export DMA buffer from one side and import it to other side using DMABUF. It may not related to this but even V4L2 has performance problem - every QBUF/DQBUF requests performs mapping/unmapping DMA buffer you already know this. :) In addition, recently Display subsystems on ARM SoC tend to include pre/post processing hardware in Display controller - OMAP, Exynos8895 and MSM as long as I know. Thanks, Inki Dae > >> >> 4. this api can be considered as extended 'blit' operation, other DRM >> drivers >> (MGA, R128, VIA) already have ioctls for such operation, so there is also >> place in DRM for it > > Added LMML to cc. > -- To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Everyone, On Wed, May 10, 2017 at 9:24 AM, Inki Dae <inki.dae@samsung.com> wrote: > > > 2017년 04월 26일 07:21에 Sakari Ailus 이(가) 쓴 글: >> Hi Marek, >> >> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote: >>> Hi Laurent, >>> >>> On 2017-04-20 12:25, Laurent Pinchart wrote: >>>> Hi Marek, >>>> >>>> (CC'ing Sakari Ailus) >>>> >>>> Thank you for the patches. >>>> >>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote: >>>>> Dear all, >>>>> >>>>> This is an updated proposal for extending EXYNOS DRM API with generic >>>>> support for hardware modules, which can be used for processing image data >>>> >from the one memory buffer to another. Typical memory-to-memory operations >>>>> are: rotation, scaling, colour space conversion or mix of them. This is a >>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer >>>>> processors", which has been rejected as "not really needed in the DRM >>>>> core": >>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html >>>>> >>>>> In this proposal I moved all the code to Exynos DRM driver, so now this >>>>> will be specific only to Exynos DRM. I've also changed the name from >>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion >>>>> with fbdev API. >>>>> >>>>> Here is a bit more information what picture processors are: >>>>> >>>>> Embedded SoCs are known to have a number of hardware blocks, which perform >>>>> such operations. They can be used in paralel to the main GPU module to >>>>> offload CPU from processing grapics or video data. One of example use of >>>>> such modules is implementing video overlay, which usually requires color >>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to >>>>> target window size. >>>>> >>>>> The proposed API is heavily inspired by atomic KMS approach - it is also >>>>> based on DRM objects and their properties. A new DRM object is introduced: >>>>> picture processor (called pp for convenience). Such objects have a set of >>>>> standard DRM properties, which describes the operation to be performed by >>>>> respective hardware module. In typical case those properties are a source >>>>> fb id and rectangle (x, y, width, height) and destination fb id and >>>>> rectangle. Optionally a rotation property can be also specified if >>>>> supported by the given hardware. To perform an operation on image data, >>>>> userspace provides a set of properties and their values for given fbproc >>>>> object in a similar way as object and properties are provided for >>>>> performing atomic page flip / mode setting. >>>>> >>>>> The proposed API consists of the 3 new ioctls: >>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture >>>>> processors, >>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture >>>>> processor, >>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given >>>>> property set. >>>>> >>>>> The proposed API is extensible. Drivers can attach their own, custom >>>>> properties to add support for more advanced picture processing (for example >>>>> blending). >>>>> >>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing) >>>>> subsystem. IPP API is over-engineered in general, but not really extensible >>>>> on the other side. It is also buggy, with significant design flaws - the >>>>> biggest issue is the fact that the API covers memory-2-memory picture >>>>> operations together with CRTC writeback and duplicating features, which >>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is >>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos >>>>> rotator driver smaller by over 200 lines). >>>> This seems to be the kind of hardware that is typically supported by V4L2. >>>> Stupid question, why DRM ? >>> >>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM: >>> >>> 1. we want to replace existing Exynos IPP subsystem: >>> - it is used only in some internal/vendor trees, not in open-source >>> - we want it to have sane and potentially extensible userspace API >>> - but we don't want to loose its functionality >>> >>> 2. we want to have simple API for performing single image processing >>> operation: >>> - typically it will be used by compositing window manager, this means that >>> some parameters of the processing might change on each vblank (like >>> destination rectangle for example). This api allows such change on each >>> operation without any additional cost. V4L2 requires to reinitialize >>> queues with new configuration on such change, what means that a bunch of >>> ioctls has to be called. >> >> What do you mean by re-initialising the queue? Format, buffers or something >> else? >> >> If you need a larger buffer than what you have already allocated, you'll >> need to re-allocate, V4L2 or not. >> >> We also do lack a way to destroy individual buffers in V4L2. It'd be up to >> implementing that and some work in videobuf2. >> >> Another thing is that V4L2 is very stream oriented. For most devices that's >> fine as a lot of the parameters are not changeable during streaming, >> especially if the pipeline is handled by multiple drivers. That said, for >> devices that process data from memory to memory performing changes in the >> media bus formats and pipeline configuration is not very efficient >> currently, largely for the same reason. >> >> The request API that people have been working for a bit different use cases >> isn't in mainline yet. It would allow more efficient per-request >> configuration than what is currently possible, but it has turned out to be >> far from trivial to implement. >> >>> - validating processing parameters in V4l2 API is really complicated, >>> because the parameters (format, src&dest rectangles, rotation) are being >>> set incrementally, so we have to either allow some impossible, >>> transitional >>> configurations or complicate the configuration steps even more (like >>> calling some ioctls multiple times for both input and output). In the end >>> all parameters have to be again validated just before performing the >>> operation. >> >> You have to validate the parameters in any case. In a MC pipeline this takes >> place when the stream is started. >> >>> >>> 3. generic approach (to add it to DRM core) has been rejected: >>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html >> >> For GPUs I generally understand the reasoning: there's a very limited number >> of users of this API --- primarily because it's not an application >> interface. >> >> If you have a device that however falls under the scope of V4L2 (at least >> API-wise), does this continue to be the case? Will there be only one or two >> (or so) users for this API? Is it the case here? >> >> Using a device specific interface definitely has some benefits: there's no >> need to think how would you generalise the interface for other similar >> devices. There's no need to consider backwards compatibility as it's not a >> requirement. The drawback is that the applications that need to support >> similar devices will bear the burden of having to support different APIs. >> >> I don't mean to say that you should ram whatever under V4L2 / MC >> independently of how unworkable that might be, but there are also clear >> advantages in using a standardised interface such as V4L2. >> >> V4L2 has a long history behind it and if it was designed today, I bet it >> would look quite different from what it is now. > > It's true. There is definitely a benefit with V4L2 because V4L2 provides Linux standard ABI - for DRM as of now not. > > However, I think that is a only benefit we could get through V4L2. Using V4L2 makes software stack of Platform to be complicated - We have to open video device node and card device node to display a image on the screen scaling or converting color space of the image and also we need to export DMA buffer from one side and import it to other side using DMABUF. > > It may not related to this but even V4L2 has performance problem - every QBUF/DQBUF requests performs mapping/unmapping DMA buffer you already know this. :) > > In addition, recently Display subsystems on ARM SoC tend to include pre/post processing hardware in Display controller - OMAP, Exynos8895 and MSM as long as I know. > I agree with many of the arguments given by Inki above and earlier by Marek. However, they apply to already existing V4L2 implementation, not V4L2 as the idea in general, and I believe a comparison against a complete new API that doesn't even exist in the kernel tree and userspace yet (only in terms of patches on the list) is not fair. I strongly (if that's of any value) stand on Sakari's side and also agree with DRM maintainers. V4L2 is already there, provides a general interface for the userspace and already support the kind of devices Marek mention. Sure, it might have several issues, but please give me an example of a subsystem/interface/code that doesn't have any. Instead of taking the easy (for short term) path, with a bit more effort we can get something than in long run should end up much better. Best regards, Tomasz > > Thanks, > Inki Dae > >> >>> >>> 4. this api can be considered as extended 'blit' operation, other DRM >>> drivers >>> (MGA, R128, VIA) already have ioctls for such operation, so there is also >>> place in DRM for it >> >> Added LMML to cc. >> -- To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Tomasz, 2017년 05월 10일 14:38에 Tomasz Figa 이(가) 쓴 글: > Hi Everyone, > > On Wed, May 10, 2017 at 9:24 AM, Inki Dae <inki.dae@samsung.com> wrote: >> >> >> 2017년 04월 26일 07:21에 Sakari Ailus 이(가) 쓴 글: >>> Hi Marek, >>> >>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote: >>>> Hi Laurent, >>>> >>>> On 2017-04-20 12:25, Laurent Pinchart wrote: >>>>> Hi Marek, >>>>> >>>>> (CC'ing Sakari Ailus) >>>>> >>>>> Thank you for the patches. >>>>> >>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote: >>>>>> Dear all, >>>>>> >>>>>> This is an updated proposal for extending EXYNOS DRM API with generic >>>>>> support for hardware modules, which can be used for processing image data >>>>> >from the one memory buffer to another. Typical memory-to-memory operations >>>>>> are: rotation, scaling, colour space conversion or mix of them. This is a >>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer >>>>>> processors", which has been rejected as "not really needed in the DRM >>>>>> core": >>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html >>>>>> >>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this >>>>>> will be specific only to Exynos DRM. I've also changed the name from >>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion >>>>>> with fbdev API. >>>>>> >>>>>> Here is a bit more information what picture processors are: >>>>>> >>>>>> Embedded SoCs are known to have a number of hardware blocks, which perform >>>>>> such operations. They can be used in paralel to the main GPU module to >>>>>> offload CPU from processing grapics or video data. One of example use of >>>>>> such modules is implementing video overlay, which usually requires color >>>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to >>>>>> target window size. >>>>>> >>>>>> The proposed API is heavily inspired by atomic KMS approach - it is also >>>>>> based on DRM objects and their properties. A new DRM object is introduced: >>>>>> picture processor (called pp for convenience). Such objects have a set of >>>>>> standard DRM properties, which describes the operation to be performed by >>>>>> respective hardware module. In typical case those properties are a source >>>>>> fb id and rectangle (x, y, width, height) and destination fb id and >>>>>> rectangle. Optionally a rotation property can be also specified if >>>>>> supported by the given hardware. To perform an operation on image data, >>>>>> userspace provides a set of properties and their values for given fbproc >>>>>> object in a similar way as object and properties are provided for >>>>>> performing atomic page flip / mode setting. >>>>>> >>>>>> The proposed API consists of the 3 new ioctls: >>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture >>>>>> processors, >>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture >>>>>> processor, >>>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given >>>>>> property set. >>>>>> >>>>>> The proposed API is extensible. Drivers can attach their own, custom >>>>>> properties to add support for more advanced picture processing (for example >>>>>> blending). >>>>>> >>>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing) >>>>>> subsystem. IPP API is over-engineered in general, but not really extensible >>>>>> on the other side. It is also buggy, with significant design flaws - the >>>>>> biggest issue is the fact that the API covers memory-2-memory picture >>>>>> operations together with CRTC writeback and duplicating features, which >>>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is >>>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos >>>>>> rotator driver smaller by over 200 lines). >>>>> This seems to be the kind of hardware that is typically supported by V4L2. >>>>> Stupid question, why DRM ? >>>> >>>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM: >>>> >>>> 1. we want to replace existing Exynos IPP subsystem: >>>> - it is used only in some internal/vendor trees, not in open-source >>>> - we want it to have sane and potentially extensible userspace API >>>> - but we don't want to loose its functionality >>>> >>>> 2. we want to have simple API for performing single image processing >>>> operation: >>>> - typically it will be used by compositing window manager, this means that >>>> some parameters of the processing might change on each vblank (like >>>> destination rectangle for example). This api allows such change on each >>>> operation without any additional cost. V4L2 requires to reinitialize >>>> queues with new configuration on such change, what means that a bunch of >>>> ioctls has to be called. >>> >>> What do you mean by re-initialising the queue? Format, buffers or something >>> else? >>> >>> If you need a larger buffer than what you have already allocated, you'll >>> need to re-allocate, V4L2 or not. >>> >>> We also do lack a way to destroy individual buffers in V4L2. It'd be up to >>> implementing that and some work in videobuf2. >>> >>> Another thing is that V4L2 is very stream oriented. For most devices that's >>> fine as a lot of the parameters are not changeable during streaming, >>> especially if the pipeline is handled by multiple drivers. That said, for >>> devices that process data from memory to memory performing changes in the >>> media bus formats and pipeline configuration is not very efficient >>> currently, largely for the same reason. >>> >>> The request API that people have been working for a bit different use cases >>> isn't in mainline yet. It would allow more efficient per-request >>> configuration than what is currently possible, but it has turned out to be >>> far from trivial to implement. >>> >>>> - validating processing parameters in V4l2 API is really complicated, >>>> because the parameters (format, src&dest rectangles, rotation) are being >>>> set incrementally, so we have to either allow some impossible, >>>> transitional >>>> configurations or complicate the configuration steps even more (like >>>> calling some ioctls multiple times for both input and output). In the end >>>> all parameters have to be again validated just before performing the >>>> operation. >>> >>> You have to validate the parameters in any case. In a MC pipeline this takes >>> place when the stream is started. >>> >>>> >>>> 3. generic approach (to add it to DRM core) has been rejected: >>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html >>> >>> For GPUs I generally understand the reasoning: there's a very limited number >>> of users of this API --- primarily because it's not an application >>> interface. >>> >>> If you have a device that however falls under the scope of V4L2 (at least >>> API-wise), does this continue to be the case? Will there be only one or two >>> (or so) users for this API? Is it the case here? >>> >>> Using a device specific interface definitely has some benefits: there's no >>> need to think how would you generalise the interface for other similar >>> devices. There's no need to consider backwards compatibility as it's not a >>> requirement. The drawback is that the applications that need to support >>> similar devices will bear the burden of having to support different APIs. >>> >>> I don't mean to say that you should ram whatever under V4L2 / MC >>> independently of how unworkable that might be, but there are also clear >>> advantages in using a standardised interface such as V4L2. >>> >>> V4L2 has a long history behind it and if it was designed today, I bet it >>> would look quite different from what it is now. >> >> It's true. There is definitely a benefit with V4L2 because V4L2 provides Linux standard ABI - for DRM as of now not. >> >> However, I think that is a only benefit we could get through V4L2. Using V4L2 makes software stack of Platform to be complicated - We have to open video device node and card device node to display a image on the screen scaling or converting color space of the image and also we need to export DMA buffer from one side and import it to other side using DMABUF. >> >> It may not related to this but even V4L2 has performance problem - every QBUF/DQBUF requests performs mapping/unmapping DMA buffer you already know this. :) >> >> In addition, recently Display subsystems on ARM SoC tend to include pre/post processing hardware in Display controller - OMAP, Exynos8895 and MSM as long as I know. >> > > I agree with many of the arguments given by Inki above and earlier by > Marek. However, they apply to already existing V4L2 implementation, > not V4L2 as the idea in general, and I believe a comparison against a > complete new API that doesn't even exist in the kernel tree and > userspace yet (only in terms of patches on the list) is not fair. Below is a user space who uses Exynos DRM post processor driver, IPP driver. https://review.tizen.org/git/?p=platform/adaptation/samsung_exynos/libtdm-exynos.git;a=blob;f=src/tdm_exynos_pp.c;h=db20e6f226d313672d1d468e06d80526ea30121c;hb=refs/heads/tizen Marek patch series is just a new version of this driver which is specific to Exynos DRM. Marek is trying to enhance this driver. Ps. other DRM drivers in mainline already have such or similar API. We will also open the user space who uses new API later. Thanks, Inki Dae > > I strongly (if that's of any value) stand on Sakari's side and also > agree with DRM maintainers. V4L2 is already there, provides a general > interface for the userspace and already support the kind of devices > Marek mention. Sure, it might have several issues, but please give me > an example of a subsystem/interface/code that doesn't have any. > Instead of taking the easy (for short term) path, with a bit more > effort we can get something than in long run should end up much > better. > > Best regards, > Tomasz > >> >> Thanks, >> Inki Dae >> >>> >>>> >>>> 4. this api can be considered as extended 'blit' operation, other DRM >>>> drivers >>>> (MGA, R128, VIA) already have ioctls for such operation, so there is also >>>> place in DRM for it >>> >>> Added LMML to cc. >>> > -- > To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 10, 2017 at 2:27 PM, Inki Dae <inki.dae@samsung.com> wrote: > Hi Tomasz, > > 2017년 05월 10일 14:38에 Tomasz Figa 이(가) 쓴 글: >> Hi Everyone, >> >> On Wed, May 10, 2017 at 9:24 AM, Inki Dae <inki.dae@samsung.com> wrote: >>> >>> >>> 2017년 04월 26일 07:21에 Sakari Ailus 이(가) 쓴 글: >>>> Hi Marek, >>>> >>>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote: >>>>> Hi Laurent, >>>>> >>>>> On 2017-04-20 12:25, Laurent Pinchart wrote: >>>>>> Hi Marek, >>>>>> >>>>>> (CC'ing Sakari Ailus) >>>>>> >>>>>> Thank you for the patches. >>>>>> >>>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote: >>>>>>> Dear all, >>>>>>> >>>>>>> This is an updated proposal for extending EXYNOS DRM API with generic >>>>>>> support for hardware modules, which can be used for processing image data >>>>>> >from the one memory buffer to another. Typical memory-to-memory operations >>>>>>> are: rotation, scaling, colour space conversion or mix of them. This is a >>>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer >>>>>>> processors", which has been rejected as "not really needed in the DRM >>>>>>> core": >>>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html >>>>>>> >>>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this >>>>>>> will be specific only to Exynos DRM. I've also changed the name from >>>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion >>>>>>> with fbdev API. >>>>>>> >>>>>>> Here is a bit more information what picture processors are: >>>>>>> >>>>>>> Embedded SoCs are known to have a number of hardware blocks, which perform >>>>>>> such operations. They can be used in paralel to the main GPU module to >>>>>>> offload CPU from processing grapics or video data. One of example use of >>>>>>> such modules is implementing video overlay, which usually requires color >>>>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to >>>>>>> target window size. >>>>>>> >>>>>>> The proposed API is heavily inspired by atomic KMS approach - it is also >>>>>>> based on DRM objects and their properties. A new DRM object is introduced: >>>>>>> picture processor (called pp for convenience). Such objects have a set of >>>>>>> standard DRM properties, which describes the operation to be performed by >>>>>>> respective hardware module. In typical case those properties are a source >>>>>>> fb id and rectangle (x, y, width, height) and destination fb id and >>>>>>> rectangle. Optionally a rotation property can be also specified if >>>>>>> supported by the given hardware. To perform an operation on image data, >>>>>>> userspace provides a set of properties and their values for given fbproc >>>>>>> object in a similar way as object and properties are provided for >>>>>>> performing atomic page flip / mode setting. >>>>>>> >>>>>>> The proposed API consists of the 3 new ioctls: >>>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture >>>>>>> processors, >>>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture >>>>>>> processor, >>>>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given >>>>>>> property set. >>>>>>> >>>>>>> The proposed API is extensible. Drivers can attach their own, custom >>>>>>> properties to add support for more advanced picture processing (for example >>>>>>> blending). >>>>>>> >>>>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing) >>>>>>> subsystem. IPP API is over-engineered in general, but not really extensible >>>>>>> on the other side. It is also buggy, with significant design flaws - the >>>>>>> biggest issue is the fact that the API covers memory-2-memory picture >>>>>>> operations together with CRTC writeback and duplicating features, which >>>>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is >>>>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos >>>>>>> rotator driver smaller by over 200 lines). >>>>>> This seems to be the kind of hardware that is typically supported by V4L2. >>>>>> Stupid question, why DRM ? >>>>> >>>>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM: >>>>> >>>>> 1. we want to replace existing Exynos IPP subsystem: >>>>> - it is used only in some internal/vendor trees, not in open-source >>>>> - we want it to have sane and potentially extensible userspace API >>>>> - but we don't want to loose its functionality >>>>> >>>>> 2. we want to have simple API for performing single image processing >>>>> operation: >>>>> - typically it will be used by compositing window manager, this means that >>>>> some parameters of the processing might change on each vblank (like >>>>> destination rectangle for example). This api allows such change on each >>>>> operation without any additional cost. V4L2 requires to reinitialize >>>>> queues with new configuration on such change, what means that a bunch of >>>>> ioctls has to be called. >>>> >>>> What do you mean by re-initialising the queue? Format, buffers or something >>>> else? >>>> >>>> If you need a larger buffer than what you have already allocated, you'll >>>> need to re-allocate, V4L2 or not. >>>> >>>> We also do lack a way to destroy individual buffers in V4L2. It'd be up to >>>> implementing that and some work in videobuf2. >>>> >>>> Another thing is that V4L2 is very stream oriented. For most devices that's >>>> fine as a lot of the parameters are not changeable during streaming, >>>> especially if the pipeline is handled by multiple drivers. That said, for >>>> devices that process data from memory to memory performing changes in the >>>> media bus formats and pipeline configuration is not very efficient >>>> currently, largely for the same reason. >>>> >>>> The request API that people have been working for a bit different use cases >>>> isn't in mainline yet. It would allow more efficient per-request >>>> configuration than what is currently possible, but it has turned out to be >>>> far from trivial to implement. >>>> >>>>> - validating processing parameters in V4l2 API is really complicated, >>>>> because the parameters (format, src&dest rectangles, rotation) are being >>>>> set incrementally, so we have to either allow some impossible, >>>>> transitional >>>>> configurations or complicate the configuration steps even more (like >>>>> calling some ioctls multiple times for both input and output). In the end >>>>> all parameters have to be again validated just before performing the >>>>> operation. >>>> >>>> You have to validate the parameters in any case. In a MC pipeline this takes >>>> place when the stream is started. >>>> >>>>> >>>>> 3. generic approach (to add it to DRM core) has been rejected: >>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html >>>> >>>> For GPUs I generally understand the reasoning: there's a very limited number >>>> of users of this API --- primarily because it's not an application >>>> interface. >>>> >>>> If you have a device that however falls under the scope of V4L2 (at least >>>> API-wise), does this continue to be the case? Will there be only one or two >>>> (or so) users for this API? Is it the case here? >>>> >>>> Using a device specific interface definitely has some benefits: there's no >>>> need to think how would you generalise the interface for other similar >>>> devices. There's no need to consider backwards compatibility as it's not a >>>> requirement. The drawback is that the applications that need to support >>>> similar devices will bear the burden of having to support different APIs. >>>> >>>> I don't mean to say that you should ram whatever under V4L2 / MC >>>> independently of how unworkable that might be, but there are also clear >>>> advantages in using a standardised interface such as V4L2. >>>> >>>> V4L2 has a long history behind it and if it was designed today, I bet it >>>> would look quite different from what it is now. >>> >>> It's true. There is definitely a benefit with V4L2 because V4L2 provides Linux standard ABI - for DRM as of now not. >>> >>> However, I think that is a only benefit we could get through V4L2. Using V4L2 makes software stack of Platform to be complicated - We have to open video device node and card device node to display a image on the screen scaling or converting color space of the image and also we need to export DMA buffer from one side and import it to other side using DMABUF. >>> >>> It may not related to this but even V4L2 has performance problem - every QBUF/DQBUF requests performs mapping/unmapping DMA buffer you already know this. :) >>> >>> In addition, recently Display subsystems on ARM SoC tend to include pre/post processing hardware in Display controller - OMAP, Exynos8895 and MSM as long as I know. >>> >> >> I agree with many of the arguments given by Inki above and earlier by >> Marek. However, they apply to already existing V4L2 implementation, >> not V4L2 as the idea in general, and I believe a comparison against a >> complete new API that doesn't even exist in the kernel tree and >> userspace yet (only in terms of patches on the list) is not fair. > > Below is a user space who uses Exynos DRM post processor driver, IPP driver. > https://review.tizen.org/git/?p=platform/adaptation/samsung_exynos/libtdm-exynos.git;a=blob;f=src/tdm_exynos_pp.c;h=db20e6f226d313672d1d468e06d80526ea30121c;hb=refs/heads/tizen > Right, but the API is really Exynos-specific, while V4L2 is designed from the start as a generic one and maintained as such. > Marek patch series is just a new version of this driver which is specific to Exynos DRM. Marek is trying to enhance this driver. > Ps. other DRM drivers in mainline already have such or similar API. This kind of contradicts with response Marek received from DRM community about his proposal. Which drivers in particular you have in mind? > > We will also open the user space who uses new API later. There is also already user space which uses V4L2 for this and V4L2 drivers for hardware similar to the one targeted by Marek's proposal, including GStreamer support and iMX6 devices that Nicolas mentioned before. Best regards, Tomasz > > > Thanks, > Inki Dae > >> >> I strongly (if that's of any value) stand on Sakari's side and also >> agree with DRM maintainers. V4L2 is already there, provides a general >> interface for the userspace and already support the kind of devices >> Marek mention. Sure, it might have several issues, but please give me >> an example of a subsystem/interface/code that doesn't have any. >> Instead of taking the easy (for short term) path, with a bit more >> effort we can get something than in long run should end up much >> better. >> >> Best regards, >> Tomasz >> >>> >>> Thanks, >>> Inki Dae >>> >>>> >>>>> >>>>> 4. this api can be considered as extended 'blit' operation, other DRM >>>>> drivers >>>>> (MGA, R128, VIA) already have ioctls for such operation, so there is also >>>>> place in DRM for it >>>> >>>> Added LMML to cc. >>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> -- To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 10, 2017 at 03:27:02PM +0900, Inki Dae wrote: > Hi Tomasz, > > 2017년 05월 10일 14:38에 Tomasz Figa 이(가) 쓴 글: > > Hi Everyone, > > > > On Wed, May 10, 2017 at 9:24 AM, Inki Dae <inki.dae@samsung.com> wrote: > >> > >> > >> 2017년 04월 26일 07:21에 Sakari Ailus 이(가) 쓴 글: > >>> Hi Marek, > >>> > >>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote: > >>>> Hi Laurent, > >>>> > >>>> On 2017-04-20 12:25, Laurent Pinchart wrote: > >>>>> Hi Marek, > >>>>> > >>>>> (CC'ing Sakari Ailus) > >>>>> > >>>>> Thank you for the patches. > >>>>> > >>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote: > >>>>>> Dear all, > >>>>>> > >>>>>> This is an updated proposal for extending EXYNOS DRM API with generic > >>>>>> support for hardware modules, which can be used for processing image data > >>>>> >from the one memory buffer to another. Typical memory-to-memory operations > >>>>>> are: rotation, scaling, colour space conversion or mix of them. This is a > >>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer > >>>>>> processors", which has been rejected as "not really needed in the DRM > >>>>>> core": > >>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html > >>>>>> > >>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this > >>>>>> will be specific only to Exynos DRM. I've also changed the name from > >>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion > >>>>>> with fbdev API. > >>>>>> > >>>>>> Here is a bit more information what picture processors are: > >>>>>> > >>>>>> Embedded SoCs are known to have a number of hardware blocks, which perform > >>>>>> such operations. They can be used in paralel to the main GPU module to > >>>>>> offload CPU from processing grapics or video data. One of example use of > >>>>>> such modules is implementing video overlay, which usually requires color > >>>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to > >>>>>> target window size. > >>>>>> > >>>>>> The proposed API is heavily inspired by atomic KMS approach - it is also > >>>>>> based on DRM objects and their properties. A new DRM object is introduced: > >>>>>> picture processor (called pp for convenience). Such objects have a set of > >>>>>> standard DRM properties, which describes the operation to be performed by > >>>>>> respective hardware module. In typical case those properties are a source > >>>>>> fb id and rectangle (x, y, width, height) and destination fb id and > >>>>>> rectangle. Optionally a rotation property can be also specified if > >>>>>> supported by the given hardware. To perform an operation on image data, > >>>>>> userspace provides a set of properties and their values for given fbproc > >>>>>> object in a similar way as object and properties are provided for > >>>>>> performing atomic page flip / mode setting. > >>>>>> > >>>>>> The proposed API consists of the 3 new ioctls: > >>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture > >>>>>> processors, > >>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture > >>>>>> processor, > >>>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given > >>>>>> property set. > >>>>>> > >>>>>> The proposed API is extensible. Drivers can attach their own, custom > >>>>>> properties to add support for more advanced picture processing (for example > >>>>>> blending). > >>>>>> > >>>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing) > >>>>>> subsystem. IPP API is over-engineered in general, but not really extensible > >>>>>> on the other side. It is also buggy, with significant design flaws - the > >>>>>> biggest issue is the fact that the API covers memory-2-memory picture > >>>>>> operations together with CRTC writeback and duplicating features, which > >>>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is > >>>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos > >>>>>> rotator driver smaller by over 200 lines). > >>>>> This seems to be the kind of hardware that is typically supported by V4L2. > >>>>> Stupid question, why DRM ? > >>>> > >>>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM: > >>>> > >>>> 1. we want to replace existing Exynos IPP subsystem: > >>>> - it is used only in some internal/vendor trees, not in open-source > >>>> - we want it to have sane and potentially extensible userspace API > >>>> - but we don't want to loose its functionality > >>>> > >>>> 2. we want to have simple API for performing single image processing > >>>> operation: > >>>> - typically it will be used by compositing window manager, this means that > >>>> some parameters of the processing might change on each vblank (like > >>>> destination rectangle for example). This api allows such change on each > >>>> operation without any additional cost. V4L2 requires to reinitialize > >>>> queues with new configuration on such change, what means that a bunch of > >>>> ioctls has to be called. > >>> > >>> What do you mean by re-initialising the queue? Format, buffers or something > >>> else? > >>> > >>> If you need a larger buffer than what you have already allocated, you'll > >>> need to re-allocate, V4L2 or not. > >>> > >>> We also do lack a way to destroy individual buffers in V4L2. It'd be up to > >>> implementing that and some work in videobuf2. > >>> > >>> Another thing is that V4L2 is very stream oriented. For most devices that's > >>> fine as a lot of the parameters are not changeable during streaming, > >>> especially if the pipeline is handled by multiple drivers. That said, for > >>> devices that process data from memory to memory performing changes in the > >>> media bus formats and pipeline configuration is not very efficient > >>> currently, largely for the same reason. > >>> > >>> The request API that people have been working for a bit different use cases > >>> isn't in mainline yet. It would allow more efficient per-request > >>> configuration than what is currently possible, but it has turned out to be > >>> far from trivial to implement. > >>> > >>>> - validating processing parameters in V4l2 API is really complicated, > >>>> because the parameters (format, src&dest rectangles, rotation) are being > >>>> set incrementally, so we have to either allow some impossible, > >>>> transitional > >>>> configurations or complicate the configuration steps even more (like > >>>> calling some ioctls multiple times for both input and output). In the end > >>>> all parameters have to be again validated just before performing the > >>>> operation. > >>> > >>> You have to validate the parameters in any case. In a MC pipeline this takes > >>> place when the stream is started. > >>> > >>>> > >>>> 3. generic approach (to add it to DRM core) has been rejected: > >>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html > >>> > >>> For GPUs I generally understand the reasoning: there's a very limited number > >>> of users of this API --- primarily because it's not an application > >>> interface. > >>> > >>> If you have a device that however falls under the scope of V4L2 (at least > >>> API-wise), does this continue to be the case? Will there be only one or two > >>> (or so) users for this API? Is it the case here? > >>> > >>> Using a device specific interface definitely has some benefits: there's no > >>> need to think how would you generalise the interface for other similar > >>> devices. There's no need to consider backwards compatibility as it's not a > >>> requirement. The drawback is that the applications that need to support > >>> similar devices will bear the burden of having to support different APIs. > >>> > >>> I don't mean to say that you should ram whatever under V4L2 / MC > >>> independently of how unworkable that might be, but there are also clear > >>> advantages in using a standardised interface such as V4L2. > >>> > >>> V4L2 has a long history behind it and if it was designed today, I bet it > >>> would look quite different from what it is now. > >> > >> It's true. There is definitely a benefit with V4L2 because V4L2 provides Linux standard ABI - for DRM as of now not. > >> > >> However, I think that is a only benefit we could get through V4L2. Using V4L2 makes software stack of Platform to be complicated - We have to open video device node and card device node to display a image on the screen scaling or converting color space of the image and also we need to export DMA buffer from one side and import it to other side using DMABUF. > >> > >> It may not related to this but even V4L2 has performance problem - every QBUF/DQBUF requests performs mapping/unmapping DMA buffer you already know this. :) > >> > >> In addition, recently Display subsystems on ARM SoC tend to include pre/post processing hardware in Display controller - OMAP, Exynos8895 and MSM as long as I know. > >> > > > > I agree with many of the arguments given by Inki above and earlier by > > Marek. However, they apply to already existing V4L2 implementation, > > not V4L2 as the idea in general, and I believe a comparison against a > > complete new API that doesn't even exist in the kernel tree and > > userspace yet (only in terms of patches on the list) is not fair. > > Below is a user space who uses Exynos DRM post processor driver, IPP driver. > https://review.tizen.org/git/?p=platform/adaptation/samsung_exynos/libtdm-exynos.git;a=blob;f=src/tdm_exynos_pp.c;h=db20e6f226d313672d1d468e06d80526ea30121c;hb=refs/heads/tizen > > Marek patch series is just a new version of this driver which is specific to Exynos DRM. Marek is trying to enhance this driver. > Ps. other DRM drivers in mainline already have such or similar API. > > We will also open the user space who uses new API later. Those drivers are different, because they just expose a hw-specific abi. Like the current IPP interfaces exposed by drm/exynos. I think you have 2 options: - Extend the current IPP interface in drm/exynos with whatever new pixel processor modes you want. Of course this still means you need to have the userspace side open-source, but otherwise it's all private to exynos hardware and software. - If you want something standardized otoh, go with v4l2. And the issues you point out in v4l2 aren't uapi issues, but implementation details of the current vbuf helpers, which can be fixed. At least that's my understanding. And it should be fairly easy to fix that, simply switch from doing a map/unmap for every q/deqbuf to caching the mappings and use the stream dma-api interfaces to only do the flush (if needed at all, should turn into a no-op) on q/deqbuf. Trying to come up with a generic drm api has imo not much chance of getting accepted anytime soon (since for the simple pixel processor pipeline it's just duplicating v4l, and for something more generic/faster a generic interfaces is alwas too slow). -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch -- To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2017년 05월 10일 15:38에 Tomasz Figa 이(가) 쓴 글: > On Wed, May 10, 2017 at 2:27 PM, Inki Dae <inki.dae@samsung.com> wrote: >> Hi Tomasz, >> >> 2017년 05월 10일 14:38에 Tomasz Figa 이(가) 쓴 글: >>> Hi Everyone, >>> >>> On Wed, May 10, 2017 at 9:24 AM, Inki Dae <inki.dae@samsung.com> wrote: >>>> >>>> >>>> 2017년 04월 26일 07:21에 Sakari Ailus 이(가) 쓴 글: >>>>> Hi Marek, >>>>> >>>>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote: >>>>>> Hi Laurent, >>>>>> >>>>>> On 2017-04-20 12:25, Laurent Pinchart wrote: >>>>>>> Hi Marek, >>>>>>> >>>>>>> (CC'ing Sakari Ailus) >>>>>>> >>>>>>> Thank you for the patches. >>>>>>> >>>>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote: >>>>>>>> Dear all, >>>>>>>> >>>>>>>> This is an updated proposal for extending EXYNOS DRM API with generic >>>>>>>> support for hardware modules, which can be used for processing image data >>>>>>> >from the one memory buffer to another. Typical memory-to-memory operations >>>>>>>> are: rotation, scaling, colour space conversion or mix of them. This is a >>>>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer >>>>>>>> processors", which has been rejected as "not really needed in the DRM >>>>>>>> core": >>>>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html >>>>>>>> >>>>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this >>>>>>>> will be specific only to Exynos DRM. I've also changed the name from >>>>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion >>>>>>>> with fbdev API. >>>>>>>> >>>>>>>> Here is a bit more information what picture processors are: >>>>>>>> >>>>>>>> Embedded SoCs are known to have a number of hardware blocks, which perform >>>>>>>> such operations. They can be used in paralel to the main GPU module to >>>>>>>> offload CPU from processing grapics or video data. One of example use of >>>>>>>> such modules is implementing video overlay, which usually requires color >>>>>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to >>>>>>>> target window size. >>>>>>>> >>>>>>>> The proposed API is heavily inspired by atomic KMS approach - it is also >>>>>>>> based on DRM objects and their properties. A new DRM object is introduced: >>>>>>>> picture processor (called pp for convenience). Such objects have a set of >>>>>>>> standard DRM properties, which describes the operation to be performed by >>>>>>>> respective hardware module. In typical case those properties are a source >>>>>>>> fb id and rectangle (x, y, width, height) and destination fb id and >>>>>>>> rectangle. Optionally a rotation property can be also specified if >>>>>>>> supported by the given hardware. To perform an operation on image data, >>>>>>>> userspace provides a set of properties and their values for given fbproc >>>>>>>> object in a similar way as object and properties are provided for >>>>>>>> performing atomic page flip / mode setting. >>>>>>>> >>>>>>>> The proposed API consists of the 3 new ioctls: >>>>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture >>>>>>>> processors, >>>>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture >>>>>>>> processor, >>>>>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given >>>>>>>> property set. >>>>>>>> >>>>>>>> The proposed API is extensible. Drivers can attach their own, custom >>>>>>>> properties to add support for more advanced picture processing (for example >>>>>>>> blending). >>>>>>>> >>>>>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing) >>>>>>>> subsystem. IPP API is over-engineered in general, but not really extensible >>>>>>>> on the other side. It is also buggy, with significant design flaws - the >>>>>>>> biggest issue is the fact that the API covers memory-2-memory picture >>>>>>>> operations together with CRTC writeback and duplicating features, which >>>>>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is >>>>>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos >>>>>>>> rotator driver smaller by over 200 lines). >>>>>>> This seems to be the kind of hardware that is typically supported by V4L2. >>>>>>> Stupid question, why DRM ? >>>>>> >>>>>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM: >>>>>> >>>>>> 1. we want to replace existing Exynos IPP subsystem: >>>>>> - it is used only in some internal/vendor trees, not in open-source >>>>>> - we want it to have sane and potentially extensible userspace API >>>>>> - but we don't want to loose its functionality >>>>>> >>>>>> 2. we want to have simple API for performing single image processing >>>>>> operation: >>>>>> - typically it will be used by compositing window manager, this means that >>>>>> some parameters of the processing might change on each vblank (like >>>>>> destination rectangle for example). This api allows such change on each >>>>>> operation without any additional cost. V4L2 requires to reinitialize >>>>>> queues with new configuration on such change, what means that a bunch of >>>>>> ioctls has to be called. >>>>> >>>>> What do you mean by re-initialising the queue? Format, buffers or something >>>>> else? >>>>> >>>>> If you need a larger buffer than what you have already allocated, you'll >>>>> need to re-allocate, V4L2 or not. >>>>> >>>>> We also do lack a way to destroy individual buffers in V4L2. It'd be up to >>>>> implementing that and some work in videobuf2. >>>>> >>>>> Another thing is that V4L2 is very stream oriented. For most devices that's >>>>> fine as a lot of the parameters are not changeable during streaming, >>>>> especially if the pipeline is handled by multiple drivers. That said, for >>>>> devices that process data from memory to memory performing changes in the >>>>> media bus formats and pipeline configuration is not very efficient >>>>> currently, largely for the same reason. >>>>> >>>>> The request API that people have been working for a bit different use cases >>>>> isn't in mainline yet. It would allow more efficient per-request >>>>> configuration than what is currently possible, but it has turned out to be >>>>> far from trivial to implement. >>>>> >>>>>> - validating processing parameters in V4l2 API is really complicated, >>>>>> because the parameters (format, src&dest rectangles, rotation) are being >>>>>> set incrementally, so we have to either allow some impossible, >>>>>> transitional >>>>>> configurations or complicate the configuration steps even more (like >>>>>> calling some ioctls multiple times for both input and output). In the end >>>>>> all parameters have to be again validated just before performing the >>>>>> operation. >>>>> >>>>> You have to validate the parameters in any case. In a MC pipeline this takes >>>>> place when the stream is started. >>>>> >>>>>> >>>>>> 3. generic approach (to add it to DRM core) has been rejected: >>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html >>>>> >>>>> For GPUs I generally understand the reasoning: there's a very limited number >>>>> of users of this API --- primarily because it's not an application >>>>> interface. >>>>> >>>>> If you have a device that however falls under the scope of V4L2 (at least >>>>> API-wise), does this continue to be the case? Will there be only one or two >>>>> (or so) users for this API? Is it the case here? >>>>> >>>>> Using a device specific interface definitely has some benefits: there's no >>>>> need to think how would you generalise the interface for other similar >>>>> devices. There's no need to consider backwards compatibility as it's not a >>>>> requirement. The drawback is that the applications that need to support >>>>> similar devices will bear the burden of having to support different APIs. >>>>> >>>>> I don't mean to say that you should ram whatever under V4L2 / MC >>>>> independently of how unworkable that might be, but there are also clear >>>>> advantages in using a standardised interface such as V4L2. >>>>> >>>>> V4L2 has a long history behind it and if it was designed today, I bet it >>>>> would look quite different from what it is now. >>>> >>>> It's true. There is definitely a benefit with V4L2 because V4L2 provides Linux standard ABI - for DRM as of now not. >>>> >>>> However, I think that is a only benefit we could get through V4L2. Using V4L2 makes software stack of Platform to be complicated - We have to open video device node and card device node to display a image on the screen scaling or converting color space of the image and also we need to export DMA buffer from one side and import it to other side using DMABUF. >>>> >>>> It may not related to this but even V4L2 has performance problem - every QBUF/DQBUF requests performs mapping/unmapping DMA buffer you already know this. :) >>>> >>>> In addition, recently Display subsystems on ARM SoC tend to include pre/post processing hardware in Display controller - OMAP, Exynos8895 and MSM as long as I know. >>>> >>> >>> I agree with many of the arguments given by Inki above and earlier by >>> Marek. However, they apply to already existing V4L2 implementation, >>> not V4L2 as the idea in general, and I believe a comparison against a >>> complete new API that doesn't even exist in the kernel tree and >>> userspace yet (only in terms of patches on the list) is not fair. >> >> Below is a user space who uses Exynos DRM post processor driver, IPP driver. >> https://review.tizen.org/git/?p=platform/adaptation/samsung_exynos/libtdm-exynos.git;a=blob;f=src/tdm_exynos_pp.c;h=db20e6f226d313672d1d468e06d80526ea30121c;hb=refs/heads/tizen >> > > Right, but the API is really Exynos-specific, while V4L2 is designed > from the start as a generic one and maintained as such. As I mentioned before, I think this is only benefit we could get through V4L2. > >> Marek patch series is just a new version of this driver which is specific to Exynos DRM. Marek is trying to enhance this driver. >> Ps. other DRM drivers in mainline already have such or similar API. > > This kind of contradicts with response Marek received from DRM > community about his proposal. Which drivers in particular you have in > mind? You can check vmw_overlay_ioctl of vmwgfx driver and intel_overlay_put_image_ioctl of i915 driver. These was all I could find in mainline. Seems the boundaries of whether we have to implement pre/post post processing mem2mem driver in V4L2 or DRM are really vague. Thanks, Inki Dae > >> >> We will also open the user space who uses new API later. > > There is also already user space which uses V4L2 for this and V4L2 > drivers for hardware similar to the one targeted by Marek's proposal, > including GStreamer support and iMX6 devices that Nicolas mentioned > before. > > Best regards, > Tomasz > >> >> >> Thanks, >> Inki Dae >> >>> >>> I strongly (if that's of any value) stand on Sakari's side and also >>> agree with DRM maintainers. V4L2 is already there, provides a general >>> interface for the userspace and already support the kind of devices >>> Marek mention. Sure, it might have several issues, but please give me >>> an example of a subsystem/interface/code that doesn't have any. >>> Instead of taking the easy (for short term) path, with a bit more >>> effort we can get something than in long run should end up much >>> better. >>> >>> Best regards, >>> Tomasz >>> >>>> >>>> Thanks, >>>> Inki Dae >>>> >>>>> >>>>>> >>>>>> 4. this api can be considered as extended 'blit' operation, other DRM >>>>>> drivers >>>>>> (MGA, R128, VIA) already have ioctls for such operation, so there is also >>>>>> place in DRM for it >>>>> >>>>> Added LMML to cc. >>>>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >>> > > > -- To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2017년 05월 10일 16:55에 Daniel Vetter 이(가) 쓴 글: > On Wed, May 10, 2017 at 03:27:02PM +0900, Inki Dae wrote: >> Hi Tomasz, >> >> 2017년 05월 10일 14:38에 Tomasz Figa 이(가) 쓴 글: >>> Hi Everyone, >>> >>> On Wed, May 10, 2017 at 9:24 AM, Inki Dae <inki.dae@samsung.com> wrote: >>>> >>>> >>>> 2017년 04월 26일 07:21에 Sakari Ailus 이(가) 쓴 글: >>>>> Hi Marek, >>>>> >>>>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote: >>>>>> Hi Laurent, >>>>>> >>>>>> On 2017-04-20 12:25, Laurent Pinchart wrote: >>>>>>> Hi Marek, >>>>>>> >>>>>>> (CC'ing Sakari Ailus) >>>>>>> >>>>>>> Thank you for the patches. >>>>>>> >>>>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote: >>>>>>>> Dear all, >>>>>>>> >>>>>>>> This is an updated proposal for extending EXYNOS DRM API with generic >>>>>>>> support for hardware modules, which can be used for processing image data >>>>>>> >from the one memory buffer to another. Typical memory-to-memory operations >>>>>>>> are: rotation, scaling, colour space conversion or mix of them. This is a >>>>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer >>>>>>>> processors", which has been rejected as "not really needed in the DRM >>>>>>>> core": >>>>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html >>>>>>>> >>>>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this >>>>>>>> will be specific only to Exynos DRM. I've also changed the name from >>>>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion >>>>>>>> with fbdev API. >>>>>>>> >>>>>>>> Here is a bit more information what picture processors are: >>>>>>>> >>>>>>>> Embedded SoCs are known to have a number of hardware blocks, which perform >>>>>>>> such operations. They can be used in paralel to the main GPU module to >>>>>>>> offload CPU from processing grapics or video data. One of example use of >>>>>>>> such modules is implementing video overlay, which usually requires color >>>>>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to >>>>>>>> target window size. >>>>>>>> >>>>>>>> The proposed API is heavily inspired by atomic KMS approach - it is also >>>>>>>> based on DRM objects and their properties. A new DRM object is introduced: >>>>>>>> picture processor (called pp for convenience). Such objects have a set of >>>>>>>> standard DRM properties, which describes the operation to be performed by >>>>>>>> respective hardware module. In typical case those properties are a source >>>>>>>> fb id and rectangle (x, y, width, height) and destination fb id and >>>>>>>> rectangle. Optionally a rotation property can be also specified if >>>>>>>> supported by the given hardware. To perform an operation on image data, >>>>>>>> userspace provides a set of properties and their values for given fbproc >>>>>>>> object in a similar way as object and properties are provided for >>>>>>>> performing atomic page flip / mode setting. >>>>>>>> >>>>>>>> The proposed API consists of the 3 new ioctls: >>>>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture >>>>>>>> processors, >>>>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture >>>>>>>> processor, >>>>>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given >>>>>>>> property set. >>>>>>>> >>>>>>>> The proposed API is extensible. Drivers can attach their own, custom >>>>>>>> properties to add support for more advanced picture processing (for example >>>>>>>> blending). >>>>>>>> >>>>>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing) >>>>>>>> subsystem. IPP API is over-engineered in general, but not really extensible >>>>>>>> on the other side. It is also buggy, with significant design flaws - the >>>>>>>> biggest issue is the fact that the API covers memory-2-memory picture >>>>>>>> operations together with CRTC writeback and duplicating features, which >>>>>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is >>>>>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos >>>>>>>> rotator driver smaller by over 200 lines). >>>>>>> This seems to be the kind of hardware that is typically supported by V4L2. >>>>>>> Stupid question, why DRM ? >>>>>> >>>>>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM: >>>>>> >>>>>> 1. we want to replace existing Exynos IPP subsystem: >>>>>> - it is used only in some internal/vendor trees, not in open-source >>>>>> - we want it to have sane and potentially extensible userspace API >>>>>> - but we don't want to loose its functionality >>>>>> >>>>>> 2. we want to have simple API for performing single image processing >>>>>> operation: >>>>>> - typically it will be used by compositing window manager, this means that >>>>>> some parameters of the processing might change on each vblank (like >>>>>> destination rectangle for example). This api allows such change on each >>>>>> operation without any additional cost. V4L2 requires to reinitialize >>>>>> queues with new configuration on such change, what means that a bunch of >>>>>> ioctls has to be called. >>>>> >>>>> What do you mean by re-initialising the queue? Format, buffers or something >>>>> else? >>>>> >>>>> If you need a larger buffer than what you have already allocated, you'll >>>>> need to re-allocate, V4L2 or not. >>>>> >>>>> We also do lack a way to destroy individual buffers in V4L2. It'd be up to >>>>> implementing that and some work in videobuf2. >>>>> >>>>> Another thing is that V4L2 is very stream oriented. For most devices that's >>>>> fine as a lot of the parameters are not changeable during streaming, >>>>> especially if the pipeline is handled by multiple drivers. That said, for >>>>> devices that process data from memory to memory performing changes in the >>>>> media bus formats and pipeline configuration is not very efficient >>>>> currently, largely for the same reason. >>>>> >>>>> The request API that people have been working for a bit different use cases >>>>> isn't in mainline yet. It would allow more efficient per-request >>>>> configuration than what is currently possible, but it has turned out to be >>>>> far from trivial to implement. >>>>> >>>>>> - validating processing parameters in V4l2 API is really complicated, >>>>>> because the parameters (format, src&dest rectangles, rotation) are being >>>>>> set incrementally, so we have to either allow some impossible, >>>>>> transitional >>>>>> configurations or complicate the configuration steps even more (like >>>>>> calling some ioctls multiple times for both input and output). In the end >>>>>> all parameters have to be again validated just before performing the >>>>>> operation. >>>>> >>>>> You have to validate the parameters in any case. In a MC pipeline this takes >>>>> place when the stream is started. >>>>> >>>>>> >>>>>> 3. generic approach (to add it to DRM core) has been rejected: >>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html >>>>> >>>>> For GPUs I generally understand the reasoning: there's a very limited number >>>>> of users of this API --- primarily because it's not an application >>>>> interface. >>>>> >>>>> If you have a device that however falls under the scope of V4L2 (at least >>>>> API-wise), does this continue to be the case? Will there be only one or two >>>>> (or so) users for this API? Is it the case here? >>>>> >>>>> Using a device specific interface definitely has some benefits: there's no >>>>> need to think how would you generalise the interface for other similar >>>>> devices. There's no need to consider backwards compatibility as it's not a >>>>> requirement. The drawback is that the applications that need to support >>>>> similar devices will bear the burden of having to support different APIs. >>>>> >>>>> I don't mean to say that you should ram whatever under V4L2 / MC >>>>> independently of how unworkable that might be, but there are also clear >>>>> advantages in using a standardised interface such as V4L2. >>>>> >>>>> V4L2 has a long history behind it and if it was designed today, I bet it >>>>> would look quite different from what it is now. >>>> >>>> It's true. There is definitely a benefit with V4L2 because V4L2 provides Linux standard ABI - for DRM as of now not. >>>> >>>> However, I think that is a only benefit we could get through V4L2. Using V4L2 makes software stack of Platform to be complicated - We have to open video device node and card device node to display a image on the screen scaling or converting color space of the image and also we need to export DMA buffer from one side and import it to other side using DMABUF. >>>> >>>> It may not related to this but even V4L2 has performance problem - every QBUF/DQBUF requests performs mapping/unmapping DMA buffer you already know this. :) >>>> >>>> In addition, recently Display subsystems on ARM SoC tend to include pre/post processing hardware in Display controller - OMAP, Exynos8895 and MSM as long as I know. >>>> >>> >>> I agree with many of the arguments given by Inki above and earlier by >>> Marek. However, they apply to already existing V4L2 implementation, >>> not V4L2 as the idea in general, and I believe a comparison against a >>> complete new API that doesn't even exist in the kernel tree and >>> userspace yet (only in terms of patches on the list) is not fair. >> >> Below is a user space who uses Exynos DRM post processor driver, IPP driver. >> https://review.tizen.org/git/?p=platform/adaptation/samsung_exynos/libtdm-exynos.git;a=blob;f=src/tdm_exynos_pp.c;h=db20e6f226d313672d1d468e06d80526ea30121c;hb=refs/heads/tizen >> >> Marek patch series is just a new version of this driver which is specific to Exynos DRM. Marek is trying to enhance this driver. >> Ps. other DRM drivers in mainline already have such or similar API. >> >> We will also open the user space who uses new API later. > > Those drivers are different, because they just expose a hw-specific abi. > Like the current IPP interfaces exposed by drm/exynos. > > I think you have 2 options: > - Extend the current IPP interface in drm/exynos with whatever new pixel > processor modes you want. Of course this still means you need to have Yes, this is only thing we could select as of now. Thanks, Inki Dae > the userspace side open-source, but otherwise it's all private to exynos > hardware and software. > > - If you want something standardized otoh, go with v4l2. And the issues > you point out in v4l2 aren't uapi issues, but implementation details of > the current vbuf helpers, which can be fixed. At least that's my > understanding. And it should be fairly easy to fix that, simply switch > from doing a map/unmap for every q/deqbuf to caching the mappings and > use the stream dma-api interfaces to only do the flush (if needed at > all, should turn into a no-op) on q/deqbuf. > > Trying to come up with a generic drm api has imo not much chance of > getting accepted anytime soon (since for the simple pixel processor > pipeline it's just duplicating v4l, and for something more generic/faster > a generic interfaces is alwas too slow). > -Daniel > -- To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Daniel, On 2017-05-10 09:55, Daniel Vetter wrote: > On Wed, May 10, 2017 at 03:27:02PM +0900, Inki Dae wrote: >> Hi Tomasz, >> >> 2017년 05월 10일 14:38에 Tomasz Figa 이(가) 쓴 글: >>> Hi Everyone, >>> >>> On Wed, May 10, 2017 at 9:24 AM, Inki Dae <inki.dae@samsung.com> wrote: >>>> >>>> 2017년 04월 26일 07:21에 Sakari Ailus 이(가) 쓴 글: >>>>> Hi Marek, >>>>> >>>>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote: >>>>>> Hi Laurent, >>>>>> >>>>>> On 2017-04-20 12:25, Laurent Pinchart wrote: >>>>>>> Hi Marek, >>>>>>> >>>>>>> (CC'ing Sakari Ailus) >>>>>>> >>>>>>> Thank you for the patches. >>>>>>> >>>>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote: >>>>>>>> Dear all, >>>>>>>> >>>>>>>> This is an updated proposal for extending EXYNOS DRM API with generic >>>>>>>> support for hardware modules, which can be used for processing image data >>>>>>> >from the one memory buffer to another. Typical memory-to-memory operations >>>>>>>> are: rotation, scaling, colour space conversion or mix of them. This is a >>>>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer >>>>>>>> processors", which has been rejected as "not really needed in the DRM >>>>>>>> core": >>>>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html >>>>>>>> >>>>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this >>>>>>>> will be specific only to Exynos DRM. I've also changed the name from >>>>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion >>>>>>>> with fbdev API. >>>>>>>> >>>>>>>> Here is a bit more information what picture processors are: >>>>>>>> >>>>>>>> Embedded SoCs are known to have a number of hardware blocks, which perform >>>>>>>> such operations. They can be used in paralel to the main GPU module to >>>>>>>> offload CPU from processing grapics or video data. One of example use of >>>>>>>> such modules is implementing video overlay, which usually requires color >>>>>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to >>>>>>>> target window size. >>>>>>>> >>>>>>>> The proposed API is heavily inspired by atomic KMS approach - it is also >>>>>>>> based on DRM objects and their properties. A new DRM object is introduced: >>>>>>>> picture processor (called pp for convenience). Such objects have a set of >>>>>>>> standard DRM properties, which describes the operation to be performed by >>>>>>>> respective hardware module. In typical case those properties are a source >>>>>>>> fb id and rectangle (x, y, width, height) and destination fb id and >>>>>>>> rectangle. Optionally a rotation property can be also specified if >>>>>>>> supported by the given hardware. To perform an operation on image data, >>>>>>>> userspace provides a set of properties and their values for given fbproc >>>>>>>> object in a similar way as object and properties are provided for >>>>>>>> performing atomic page flip / mode setting. >>>>>>>> >>>>>>>> The proposed API consists of the 3 new ioctls: >>>>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture >>>>>>>> processors, >>>>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture >>>>>>>> processor, >>>>>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given >>>>>>>> property set. >>>>>>>> >>>>>>>> The proposed API is extensible. Drivers can attach their own, custom >>>>>>>> properties to add support for more advanced picture processing (for example >>>>>>>> blending). >>>>>>>> >>>>>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing) >>>>>>>> subsystem. IPP API is over-engineered in general, but not really extensible >>>>>>>> on the other side. It is also buggy, with significant design flaws - the >>>>>>>> biggest issue is the fact that the API covers memory-2-memory picture >>>>>>>> operations together with CRTC writeback and duplicating features, which >>>>>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is >>>>>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos >>>>>>>> rotator driver smaller by over 200 lines). >>>>>>> This seems to be the kind of hardware that is typically supported by V4L2. >>>>>>> Stupid question, why DRM ? >>>>>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM: >>>>>> >>>>>> 1. we want to replace existing Exynos IPP subsystem: >>>>>> - it is used only in some internal/vendor trees, not in open-source >>>>>> - we want it to have sane and potentially extensible userspace API >>>>>> - but we don't want to loose its functionality >>>>>> >>>>>> 2. we want to have simple API for performing single image processing >>>>>> operation: >>>>>> - typically it will be used by compositing window manager, this means that >>>>>> some parameters of the processing might change on each vblank (like >>>>>> destination rectangle for example). This api allows such change on each >>>>>> operation without any additional cost. V4L2 requires to reinitialize >>>>>> queues with new configuration on such change, what means that a bunch of >>>>>> ioctls has to be called. >>>>> What do you mean by re-initialising the queue? Format, buffers or something >>>>> else? >>>>> >>>>> If you need a larger buffer than what you have already allocated, you'll >>>>> need to re-allocate, V4L2 or not. >>>>> >>>>> We also do lack a way to destroy individual buffers in V4L2. It'd be up to >>>>> implementing that and some work in videobuf2. >>>>> >>>>> Another thing is that V4L2 is very stream oriented. For most devices that's >>>>> fine as a lot of the parameters are not changeable during streaming, >>>>> especially if the pipeline is handled by multiple drivers. That said, for >>>>> devices that process data from memory to memory performing changes in the >>>>> media bus formats and pipeline configuration is not very efficient >>>>> currently, largely for the same reason. >>>>> >>>>> The request API that people have been working for a bit different use cases >>>>> isn't in mainline yet. It would allow more efficient per-request >>>>> configuration than what is currently possible, but it has turned out to be >>>>> far from trivial to implement. >>>>> >>>>>> - validating processing parameters in V4l2 API is really complicated, >>>>>> because the parameters (format, src&dest rectangles, rotation) are being >>>>>> set incrementally, so we have to either allow some impossible, >>>>>> transitional >>>>>> configurations or complicate the configuration steps even more (like >>>>>> calling some ioctls multiple times for both input and output). In the end >>>>>> all parameters have to be again validated just before performing the >>>>>> operation. >>>>> You have to validate the parameters in any case. In a MC pipeline this takes >>>>> place when the stream is started. >>>>> >>>>>> 3. generic approach (to add it to DRM core) has been rejected: >>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html >>>>> For GPUs I generally understand the reasoning: there's a very limited number >>>>> of users of this API --- primarily because it's not an application >>>>> interface. >>>>> >>>>> If you have a device that however falls under the scope of V4L2 (at least >>>>> API-wise), does this continue to be the case? Will there be only one or two >>>>> (or so) users for this API? Is it the case here? >>>>> >>>>> Using a device specific interface definitely has some benefits: there's no >>>>> need to think how would you generalise the interface for other similar >>>>> devices. There's no need to consider backwards compatibility as it's not a >>>>> requirement. The drawback is that the applications that need to support >>>>> similar devices will bear the burden of having to support different APIs. >>>>> >>>>> I don't mean to say that you should ram whatever under V4L2 / MC >>>>> independently of how unworkable that might be, but there are also clear >>>>> advantages in using a standardised interface such as V4L2. >>>>> >>>>> V4L2 has a long history behind it and if it was designed today, I bet it >>>>> would look quite different from what it is now. >>>> It's true. There is definitely a benefit with V4L2 because V4L2 provides Linux standard ABI - for DRM as of now not. >>>> >>>> However, I think that is a only benefit we could get through V4L2. Using V4L2 makes software stack of Platform to be complicated - We have to open video device node and card device node to display a image on the screen scaling or converting color space of the image and also we need to export DMA buffer from one side and import it to other side using DMABUF. >>>> >>>> It may not related to this but even V4L2 has performance problem - every QBUF/DQBUF requests performs mapping/unmapping DMA buffer you already know this. :) >>>> >>>> In addition, recently Display subsystems on ARM SoC tend to include pre/post processing hardware in Display controller - OMAP, Exynos8895 and MSM as long as I know. >>>> >>> I agree with many of the arguments given by Inki above and earlier by >>> Marek. However, they apply to already existing V4L2 implementation, >>> not V4L2 as the idea in general, and I believe a comparison against a >>> complete new API that doesn't even exist in the kernel tree and >>> userspace yet (only in terms of patches on the list) is not fair. >> Below is a user space who uses Exynos DRM post processor driver, IPP driver. >> https://review.tizen.org/git/?p=platform/adaptation/samsung_exynos/libtdm-exynos.git;a=blob;f=src/tdm_exynos_pp.c;h=db20e6f226d313672d1d468e06d80526ea30121c;hb=refs/heads/tizen >> >> Marek patch series is just a new version of this driver which is specific to Exynos DRM. Marek is trying to enhance this driver. >> Ps. other DRM drivers in mainline already have such or similar API. >> >> We will also open the user space who uses new API later. > Those drivers are different, because they just expose a hw-specific abi. > Like the current IPP interfaces exposed by drm/exynos. > > I think you have 2 options: > - Extend the current IPP interface in drm/exynos with whatever new pixel > processor modes you want. Of course this still means you need to have > the userspace side open-source, but otherwise it's all private to exynos > hardware and software. That's what I proposed in RFC v2 posted 2 days ago - everything is kept in Exynos DRM driver, no changes to DRM-core at all. Maybe calling it Exynos IPP v2 would have been a better idea. > - If you want something standardized otoh, go with v4l2. And the issues > you point out in v4l2 aren't uapi issues, but implementation details of > the current vbuf helpers, which can be fixed. At least that's my > understanding. And it should be fairly easy to fix that, simply switch > from doing a map/unmap for every q/deqbuf to caching the mappings and > use the stream dma-api interfaces to only do the flush (if needed at > all, should turn into a no-op) on q/deqbuf. > > Trying to come up with a generic drm api has imo not much chance of > getting accepted anytime soon (since for the simple pixel processor > pipeline it's just duplicating v4l, and for something more generic/faster > a generic interfaces is alwas too slow). Okay, I don't want to make it DRM-wide. For us it is enough to have it at Exynos DRM, like existing IPP ioctls. Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland -- To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html