Hi,
while I haven't got much time to work on the actual code right now, I think it might still be useful if we could get the device tree binding to a point where everybody is happy with it. That'll also save me some time once I get to writing the code because I won't have to redo it over again. =)
So here's the current proposal:
host1x { compatible = "nvidia,tegra20-host1x", "simple-bus"; reg = <0x50000000 0x00024000>; interrupts = <0 64 0x04 /* cop syncpt */ 0 65 0x04 /* mpcore syncpt */ 0 66 0x04 /* cop general */ 0 67 0x04>; /* mpcore general */
#address-cells = <1>; #size-cells = <1>;
ranges = <0x54000000 0x54000000 0x04000000>;
status = "disabled";
gart = <&gart>;
/* video-encoding/decoding */ mpe { reg = <0x54040000 0x00040000>; interrupts = <0 68 0x04>; status = "disabled"; };
/* video input */ vi { reg = <0x54080000 0x00040000>; interrupts = <0 69 0x04>; status = "disabled"; };
/* EPP */ epp { reg = <0x540c0000 0x00040000>; interrupts = <0 70 0x04>; status = "disabled"; };
/* ISP */ isp { reg = <0x54100000 0x00040000>; interrupts = <0 71 0x04>; status = "disabled"; };
/* 2D engine */ gr2d { reg = <0x54140000 0x00040000>; interrupts = <0 72 0x04>; status = "disabled"; };
/* 3D engine */ gr3d { reg = <0x54180000 0x00040000>; status = "disabled"; };
/* display controllers */ dc1: dc@54200000 { compatible = "nvidia,tegra20-dc"; reg = <0x54200000 0x00040000>; interrupts = <0 73 0x04>; status = "disabled"; };
dc2: dc@54240000 { compatible = "nvidia,tegra20-dc"; reg = <0x54240000 0x00040000>; interrupts = <0 74 0x04>; status = "disabled"; };
/* outputs */ rgb { compatible = "nvidia,tegra20-rgb"; status = "disabled"; };
hdmi { compatible = "nvidia,tegra20-hdmi"; reg = <0x54280000 0x00040000>; interrupts = <0 75 0x04>; status = "disabled"; };
tvo { compatible = "nvidia,tegra20-tvo"; reg = <0x542c0000 0x00040000>; interrupts = <0 76 0x04>; status = "disabled"; };
dsi { compatible = "nvidia,tegra20-dsi"; reg = <0x54300000 0x00040000>; status = "disabled"; }; };
This really isn't anything new but merely brings the Tegra DRM binding in sync with other devices in tegra20.dtsi (disable devices by default, leave out unit addresses for unique nodes). The only actual change is that host1x clients are now children of the host1x node. The children are instantiated by the initial call to of_platform_populate() since the host1x node is also marked compatible with "simple-bus".
An alternative would be to call of_platform_populate() from the host1x driver. This has the advantage that it could integrate better with the host1x bus implementation that Terje is working on, but it also needs additional code to tear down the devices when the host1x driver is unloaded because a module reload would try to create duplicate devices otherwise.
The rgb node is something that I don't quite know how to handle yet. Since it is really part of the display controller and uses its register space, it isn't quite correct to represent it as a separate device. But we will need a separate node to make it available as a connector, which will become more obvious below.
Perhaps the ranges property can also be used to remap the reg properties of the child nodes so that they can be specified as an offset into the host1x aperture instead of an address in the CPU address space. But that's just a minor issue because the OF code should be able to handle it transparently.
Board DTS files could then extend this with board-specific requirements and connectors. The following describes the Medcom Wide:
host1x { carveout = <0x0e000000 0x02000000>; display-controllers = <&dc1>; status = "okay";
dc@54200000 { status = "okay"; };
lvds: rgb { status = "okay"; }; };
connectors { #address-cells = <1>; #size-cells = <0>;
connector@0 { reg = <0>; edid = /incbin/("tegra-medcom.edid"); output = <&lvds>; }; };
Basically this turns on the first display controller and the RGB/LVDS output and hooks up a static EDID block with the LVDS output. There is also a carveout property which might be a better replacement for the "crippled" GART on Tegra20. Alternatively the CMA might work just as well instead.
The Plutux can be described like this:
host1x { carveout = <0x0e000000 0x02000000>; display-controllers = <&dc1 &dc2>; status = "okay";
dc@54200000 { status = "okay"; };
hdmi: hdmi { vdd-supply = <&ldo7_reg>; pll-supply = <&ldo8_reg>; status = "okay";
}; };
connectors { #address-cells = <1>; #size-cells = <0>;
connector@0 { reg = <0>; output = <&hdmi>; ddc = <&ddc>;
hpd-gpio = <&gpio 111 0>; /* PN7 */ }; };
Which is similar to the Medcom Wide, only it uses the HDMI output and hooks it up with one of the Tegra's I2C controllers to do EDID probing via DDC. Additionally it can detect whether an HDMI cable is connected using the GPIO specified by the hpd-gpio property.
Thierry
Hello Thierry,
I still haven't found the time to look at device tree things in detail, but still some comments inline. Overall I think the tree looks ok and is a great thing to get started.
Am Dienstag, den 26.06.2012, 12:55 +0200 schrieb Thierry Reding:
Hi,
while I haven't got much time to work on the actual code right now, I think it might still be useful if we could get the device tree binding to a point where everybody is happy with it. That'll also save me some time once I get to writing the code because I won't have to redo it over again. =)
So here's the current proposal:
host1x { compatible = "nvidia,tegra20-host1x", "simple-bus"; reg = <0x50000000 0x00024000>; interrupts = <0 64 0x04 /* cop syncpt */ 0 65 0x04 /* mpcore syncpt */ 0 66 0x04 /* cop general */ 0 67 0x04>; /* mpcore general */
#address-cells = <1>; #size-cells = <1>; ranges = <0x54000000 0x54000000 0x04000000>; status = "disabled"; gart = <&gart>;
Do we really need this here? IMHO the GART is one kind of a IOMMU managing a part of the bus where the host1x driver is hanging of. So I can't see why we would need a pointer to the gart dev directly. Though I may be off here, as I don't grok everything related to the initiation order yet, so please correct me if I'm wrong.
/* video-encoding/decoding */ mpe { reg = <0x54040000 0x00040000>; interrupts = <0 68 0x04>; status = "disabled"; }; /* video input */ vi { reg = <0x54080000 0x00040000>; interrupts = <0 69 0x04>; status = "disabled"; }; /* EPP */ epp { reg = <0x540c0000 0x00040000>; interrupts = <0 70 0x04>; status = "disabled"; }; /* ISP */ isp { reg = <0x54100000 0x00040000>; interrupts = <0 71 0x04>; status = "disabled"; }; /* 2D engine */ gr2d { reg = <0x54140000 0x00040000>; interrupts = <0 72 0x04>; status = "disabled"; }; /* 3D engine */ gr3d { reg = <0x54180000 0x00040000>; status = "disabled"; }; /* display controllers */ dc1: dc@54200000 { compatible = "nvidia,tegra20-dc"; reg = <0x54200000 0x00040000>; interrupts = <0 73 0x04>; status = "disabled"; }; dc2: dc@54240000 { compatible = "nvidia,tegra20-dc"; reg = <0x54240000 0x00040000>; interrupts = <0 74 0x04>; status = "disabled"; }; /* outputs */ rgb { compatible = "nvidia,tegra20-rgb"; status = "disabled"; }; hdmi { compatible = "nvidia,tegra20-hdmi"; reg = <0x54280000 0x00040000>; interrupts = <0 75 0x04>; status = "disabled"; }; tvo { compatible = "nvidia,tegra20-tvo"; reg = <0x542c0000 0x00040000>; interrupts = <0 76 0x04>; status = "disabled"; }; dsi { compatible = "nvidia,tegra20-dsi"; reg = <0x54300000 0x00040000>; status = "disabled"; };
};
This really isn't anything new but merely brings the Tegra DRM binding in sync with other devices in tegra20.dtsi (disable devices by default, leave out unit addresses for unique nodes). The only actual change is that host1x clients are now children of the host1x node. The children are instantiated by the initial call to of_platform_populate() since the host1x node is also marked compatible with "simple-bus".
We should not rely on OF code doing the instantiation of the children, as expressed by Grant Likely. [1]
An alternative would be to call of_platform_populate() from the host1x driver. This has the advantage that it could integrate better with the host1x bus implementation that Terje is working on, but it also needs additional code to tear down the devices when the host1x driver is unloaded because a module reload would try to create duplicate devices otherwise.
[snip]
[1] http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg28044.html
Thanks, Lucas
On Tue, Jun 26, 2012 at 02:47:56PM +0200, Lucas Stach wrote:
Hello Thierry,
I still haven't found the time to look at device tree things in detail, but still some comments inline. Overall I think the tree looks ok and is a great thing to get started.
Am Dienstag, den 26.06.2012, 12:55 +0200 schrieb Thierry Reding:
Hi,
while I haven't got much time to work on the actual code right now, I think it might still be useful if we could get the device tree binding to a point where everybody is happy with it. That'll also save me some time once I get to writing the code because I won't have to redo it over again. =)
So here's the current proposal:
host1x { compatible = "nvidia,tegra20-host1x", "simple-bus"; reg = <0x50000000 0x00024000>; interrupts = <0 64 0x04 /* cop syncpt */ 0 65 0x04 /* mpcore syncpt */ 0 66 0x04 /* cop general */ 0 67 0x04>; /* mpcore general */
#address-cells = <1>; #size-cells = <1>; ranges = <0x54000000 0x54000000 0x04000000>; status = "disabled"; gart = <&gart>;
Do we really need this here? IMHO the GART is one kind of a IOMMU managing a part of the bus where the host1x driver is hanging of. So I can't see why we would need a pointer to the gart dev directly. Though I may be off here, as I don't grok everything related to the initiation order yet, so please correct me if I'm wrong.
If the information about this can be obtained in another way, then we certainly don't need this here. Hiroshi already pointed out that this can be handled by the IOMMU API now (I don't think it did back when I first wrote the code), so I'm all for stripping unneeded properties from the binding.
One thing we need to keep in mind here is that the DRM driver is not the only one needing access to memory. There may be other users (V4L, ...) that need memory from the same pool. Maybe the memory handling should be done in the host1x driver instead, and the DRM driver could be one of many users obtaining buffers from it?
/* video-encoding/decoding */ mpe { reg = <0x54040000 0x00040000>; interrupts = <0 68 0x04>; status = "disabled"; }; /* video input */ vi { reg = <0x54080000 0x00040000>; interrupts = <0 69 0x04>; status = "disabled"; }; /* EPP */ epp { reg = <0x540c0000 0x00040000>; interrupts = <0 70 0x04>; status = "disabled"; }; /* ISP */ isp { reg = <0x54100000 0x00040000>; interrupts = <0 71 0x04>; status = "disabled"; }; /* 2D engine */ gr2d { reg = <0x54140000 0x00040000>; interrupts = <0 72 0x04>; status = "disabled"; }; /* 3D engine */ gr3d { reg = <0x54180000 0x00040000>; status = "disabled"; }; /* display controllers */ dc1: dc@54200000 { compatible = "nvidia,tegra20-dc"; reg = <0x54200000 0x00040000>; interrupts = <0 73 0x04>; status = "disabled"; }; dc2: dc@54240000 { compatible = "nvidia,tegra20-dc"; reg = <0x54240000 0x00040000>; interrupts = <0 74 0x04>; status = "disabled"; }; /* outputs */ rgb { compatible = "nvidia,tegra20-rgb"; status = "disabled"; }; hdmi { compatible = "nvidia,tegra20-hdmi"; reg = <0x54280000 0x00040000>; interrupts = <0 75 0x04>; status = "disabled"; }; tvo { compatible = "nvidia,tegra20-tvo"; reg = <0x542c0000 0x00040000>; interrupts = <0 76 0x04>; status = "disabled"; }; dsi { compatible = "nvidia,tegra20-dsi"; reg = <0x54300000 0x00040000>; status = "disabled"; };
};
This really isn't anything new but merely brings the Tegra DRM binding in sync with other devices in tegra20.dtsi (disable devices by default, leave out unit addresses for unique nodes). The only actual change is that host1x clients are now children of the host1x node. The children are instantiated by the initial call to of_platform_populate() since the host1x node is also marked compatible with "simple-bus".
We should not rely on OF code doing the instantiation of the children, as expressed by Grant Likely. [1]
At the end of that email, Grant specifically mentions that using the simple-bus is one option to instantiate children. Or to add the bus compatible (nvidia,tegra20-host1x in our case) to the list of busses when calling of_platform_populate().
However I think if we have a custom bus_type implementation for host1x anyway, then it makes sense to instantiate the children from that instead to allow for easier integration.
Thierry
An alternative would be to call of_platform_populate() from the host1x driver. This has the advantage that it could integrate better with the host1x bus implementation that Terje is working on, but it also needs additional code to tear down the devices when the host1x driver is unloaded because a module reload would try to create duplicate devices otherwise.
[snip]
[1] http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg28044.html
Thanks, Lucas
On 26.06.2012 13:55, Thierry Reding wrote:
host1x { compatible = "nvidia,tegra20-host1x", "simple-bus"; reg = <0x50000000 0x00024000>; interrupts = <0 64 0x04 /* cop syncpt */ 0 65 0x04 /* mpcore syncpt */ 0 66 0x04 /* cop general */ 0 67 0x04>; /* mpcore general */
We're only interested in interrupts 65 and 67. The COP interrupts are not routed to CPU. I guess we could just delete those lines here.
#address-cells = <1>; #size-cells = <1>; ranges = <0x54000000 0x54000000 0x04000000>;
I'm a newbie on device trees, so I need to ask. Why isn't host1x register space covered by "ranges" property?
status = "disabled"; gart = <&gart>; /* video-encoding/decoding */ mpe { reg = <0x54040000 0x00040000>; interrupts = <0 68 0x04>; status = "disabled"; };
The client device interrupts are not very interesting, so they could be left out, too. Display controller related are probably an exception to this.
(...)
Otherwise the proposal looked good.
We also assign certain host1x common resources per device by convention, f.ex. sync points, channels etc. We currently encode that information in the device node (3D uses sync point number X, 2D uses numbers Y and Z). The information is not actually describing hardware, as it just describes the convention, so I'm not sure if device tree is the proper place for it.
This really isn't anything new but merely brings the Tegra DRM binding in sync with other devices in tegra20.dtsi (disable devices by default, leave out unit addresses for unique nodes). The only actual change is that host1x clients are now children of the host1x node. The children are instantiated by the initial call to of_platform_populate() since the host1x node is also marked compatible with "simple-bus".
An alternative would be to call of_platform_populate() from the host1x driver. This has the advantage that it could integrate better with the host1x bus implementation that Terje is working on, but it also needs additional code to tear down the devices when the host1x driver is unloaded because a module reload would try to create duplicate devices otherwise.
Yes, we already have a bus_type for nvhost, and we have nvhost_device and nvhost_driver that device from device and device_driver respectively. They all accommodate some host1x client device common behavior and data that we need to store. We use the bus_type also to match each device and driver together, but the matching is version sensitive. For example, Tegra2 3D needs different driver than Tegra3 3D.
The rgb node is something that I don't quite know how to handle yet. Since it is really part of the display controller and uses its register space, it isn't quite correct to represent it as a separate device. But we will need a separate node to make it available as a connector, which will become more obvious below.
I'm not familiar enough with display controller to be able to comment.
Perhaps the ranges property can also be used to remap the reg properties of the child nodes so that they can be specified as an offset into the host1x aperture instead of an address in the CPU address space. But that's just a minor issue because the OF code should be able to handle it transparently.
Either way is fine for me. The full addresses are more familiar to me as we tend to use them internally.
Basically this turns on the first display controller and the RGB/LVDS output and hooks up a static EDID block with the LVDS output. There is also a carveout property which might be a better replacement for the "crippled" GART on Tegra20. Alternatively the CMA might work just as well instead.
We use carveout for Tegra2. Memory management is a big question mark still for tegradrm that I'm trying to find a solution for.
Terje
On Tue, Jun 26, 2012 at 04:01:05PM +0300, Terje Bergström wrote:
On 26.06.2012 13:55, Thierry Reding wrote:
host1x { compatible = "nvidia,tegra20-host1x", "simple-bus"; reg = <0x50000000 0x00024000>; interrupts = <0 64 0x04 /* cop syncpt */ 0 65 0x04 /* mpcore syncpt */ 0 66 0x04 /* cop general */ 0 67 0x04>; /* mpcore general */
We're only interested in interrupts 65 and 67. The COP interrupts are not routed to CPU. I guess we could just delete those lines here.
I remember you mentioned that before. I'll drop them.
#address-cells = <1>; #size-cells = <1>; ranges = <0x54000000 0x54000000 0x04000000>;
I'm a newbie on device trees, so I need to ask. Why isn't host1x register space covered by "ranges" property?
The ranges property defines the memory aperture that is forwarded to the children.
status = "disabled"; gart = <&gart>; /* video-encoding/decoding */ mpe { reg = <0x54040000 0x00040000>; interrupts = <0 68 0x04>; status = "disabled"; };
The client device interrupts are not very interesting, so they could be left out, too. Display controller related are probably an exception to this.
If the interrupts aren't used at all we should drop them.
(...)
Otherwise the proposal looked good.
We also assign certain host1x common resources per device by convention, f.ex. sync points, channels etc. We currently encode that information in the device node (3D uses sync point number X, 2D uses numbers Y and Z). The information is not actually describing hardware, as it just describes the convention, so I'm not sure if device tree is the proper place for it.
Are they configurable? If so I think we should provide for them being specified in the device tree. They are still hardware resources being assigned to devices.
An alternative would be to call of_platform_populate() from the host1x driver. This has the advantage that it could integrate better with the host1x bus implementation that Terje is working on, but it also needs additional code to tear down the devices when the host1x driver is unloaded because a module reload would try to create duplicate devices otherwise.
Yes, we already have a bus_type for nvhost, and we have nvhost_device and nvhost_driver that device from device and device_driver respectively. They all accommodate some host1x client device common behavior and data that we need to store. We use the bus_type also to match each device and driver together, but the matching is version sensitive. For example, Tegra2 3D needs different driver than Tegra3 3D.
We'll have to figure out the best place to put this driver. The driver will need some code to instantiate its children from the DT and fill the nvhost_device structures with the data parsed from it.
BTW, what's the reason for calling it nvhost and not host1x?
Perhaps the ranges property can also be used to remap the reg properties of the child nodes so that they can be specified as an offset into the host1x aperture instead of an address in the CPU address space. But that's just a minor issue because the OF code should be able to handle it transparently.
Either way is fine for me. The full addresses are more familiar to me as we tend to use them internally.
Using the OF mechanism for translating the host1x bus addresses, relative to the host1x base address, to CPU addresses seems "purer", but either way should work fine.
Basically this turns on the first display controller and the RGB/LVDS output and hooks up a static EDID block with the LVDS output. There is also a carveout property which might be a better replacement for the "crippled" GART on Tegra20. Alternatively the CMA might work just as well instead.
We use carveout for Tegra2. Memory management is a big question mark still for tegradrm that I'm trying to find a solution for.
AIUI CMA is one particular implementation of the carveout concept, so I think we should use it, or extend it if it doesn't suit us.
Thierry
On 26.06.2012 16:41, Thierry Reding wrote:
On Tue, Jun 26, 2012 at 04:01:05PM +0300, Terje Bergström wrote:
We also assign certain host1x common resources per device by convention, f.ex. sync points, channels etc. We currently encode that information in the device node (3D uses sync point number X, 2D uses numbers Y and Z). The information is not actually describing hardware, as it just describes the convention, so I'm not sure if device tree is the proper place for it.
Are they configurable? If so I think we should provide for them being specified in the device tree. They are still hardware resources being assigned to devices.
Yes, they're configurable, and there's nothing hardware specific in the assignment of a sync point to a particular use. It's all just a software agreement. That's why I'm a bit hesitant on putting it in device trees, which are supposed to only describe hardware.
Yes, we already have a bus_type for nvhost, and we have nvhost_device and nvhost_driver that device from device and device_driver respectively. They all accommodate some host1x client device common behavior and data that we need to store. We use the bus_type also to match each device and driver together, but the matching is version sensitive. For example, Tegra2 3D needs different driver than Tegra3 3D.
We'll have to figure out the best place to put this driver. The driver will need some code to instantiate its children from the DT and fill the nvhost_device structures with the data parsed from it.
True. We could say that the host1x driver is the "father", and will have to instantiate the nvhost device structs for the children. We just have to ensure the correct ordering at boot-up.
BTW, what's the reason for calling it nvhost and not host1x?
When I started, there was only one driver and one device, and all client devices were just hidden as internal implementation details. Thus the driver wasn't really "host1x" driver. Now we refer to the collection of drivers for host1x and client devices as nvhost.
Either way is fine for me. The full addresses are more familiar to me as we tend to use them internally.
Using the OF mechanism for translating the host1x bus addresses, relative to the host1x base address, to CPU addresses seems "purer", but either way should work fine.
I'll let you decide, as I don't have a strong opinion either way. I guess whatever is the more common way wins.
We use carveout for Tegra2. Memory management is a big question mark still for tegradrm that I'm trying to find a solution for.
AIUI CMA is one particular implementation of the carveout concept, so I think we should use it, or extend it if it doesn't suit us.
Here I'd refer to Hiroshi's message: host1x driver doesn't need to know the details of what memory management we use. We'll just hide that fact behind one of the memory management APIs that nvhost uses.
Terje
On 06/26/2012 08:02 AM, Terje Bergström wrote:
On 26.06.2012 16:41, Thierry Reding wrote:
On Tue, Jun 26, 2012 at 04:01:05PM +0300, Terje Bergström wrote:
We also assign certain host1x common resources per device by convention, f.ex. sync points, channels etc. We currently encode that information in the device node (3D uses sync point number X, 2D uses numbers Y and Z). The information is not actually describing hardware, as it just describes the convention, so I'm not sure if device tree is the proper place for it.
Are they configurable? If so I think we should provide for them being specified in the device tree. They are still hardware resources being assigned to devices.
Yes, they're configurable, and there's nothing hardware specific in the assignment of a sync point to a particular use. It's all just a software agreement. That's why I'm a bit hesitant on putting it in device trees, which are supposed to only describe hardware.
So I think that the DT can describe the existence of sync-points (presumably include a node for the sync-point HW device if it's separate). However, since the usage of each sync-point is entirely arbitrary, that seems like something which should be either assigned dynamically at run-time, or at least managed/assigned in SW at runtime somehow, rather than hard-coded into DT; it's more policy than HW.
Either way is fine for me. The full addresses are more familiar to me as we tend to use them internally.
Using the OF mechanism for translating the host1x bus addresses, relative to the host1x base address, to CPU addresses seems "purer", but either way should work fine.
I'll let you decide, as I don't have a strong opinion either way. I guess whatever is the more common way wins.
I'd certainly prefer all the nodes to use the full/absolute address. That way, the DT will exactly match the addresses in the documentation.
On Tue, Jun 26, 2012 at 11:41:43AM -0600, Stephen Warren wrote:
On 06/26/2012 08:02 AM, Terje Bergström wrote:
On 26.06.2012 16:41, Thierry Reding wrote:
On Tue, Jun 26, 2012 at 04:01:05PM +0300, Terje Bergström wrote:
We also assign certain host1x common resources per device by convention, f.ex. sync points, channels etc. We currently encode that information in the device node (3D uses sync point number X, 2D uses numbers Y and Z). The information is not actually describing hardware, as it just describes the convention, so I'm not sure if device tree is the proper place for it.
Are they configurable? If so I think we should provide for them being specified in the device tree. They are still hardware resources being assigned to devices.
Yes, they're configurable, and there's nothing hardware specific in the assignment of a sync point to a particular use. It's all just a software agreement. That's why I'm a bit hesitant on putting it in device trees, which are supposed to only describe hardware.
So I think that the DT can describe the existence of sync-points (presumably include a node for the sync-point HW device if it's separate). However, since the usage of each sync-point is entirely arbitrary, that seems like something which should be either assigned dynamically at run-time, or at least managed/assigned in SW at runtime somehow, rather than hard-coded into DT; it's more policy than HW.
The sync-points are part of the host1x device as I understand it. If their usage is truly generic, then we can probably ignore them safely. Maybe it'd make sense to carry a property that defines the number of sync points available for the host1x hardware represented by the DT?
Either way is fine for me. The full addresses are more familiar to me as we tend to use them internally.
Using the OF mechanism for translating the host1x bus addresses, relative to the host1x base address, to CPU addresses seems "purer", but either way should work fine.
I'll let you decide, as I don't have a strong opinion either way. I guess whatever is the more common way wins.
I'd certainly prefer all the nodes to use the full/absolute address. That way, the DT will exactly match the addresses in the documentation.
Okay, I'll leave the ranges property as it is now.
Thierry
On 06/26/2012 01:27 PM, Thierry Reding wrote:
On Tue, Jun 26, 2012 at 11:41:43AM -0600, Stephen Warren wrote:
On 06/26/2012 08:02 AM, Terje Bergström wrote:
On 26.06.2012 16:41, Thierry Reding wrote:
On Tue, Jun 26, 2012 at 04:01:05PM +0300, Terje Bergström wrote:
We also assign certain host1x common resources per device by convention, f.ex. sync points, channels etc. We currently encode that information in the device node (3D uses sync point number X, 2D uses numbers Y and Z). The information is not actually describing hardware, as it just describes the convention, so I'm not sure if device tree is the proper place for it.
Are they configurable? If so I think we should provide for them being specified in the device tree. They are still hardware resources being assigned to devices.
Yes, they're configurable, and there's nothing hardware specific in the assignment of a sync point to a particular use. It's all just a software agreement. That's why I'm a bit hesitant on putting it in device trees, which are supposed to only describe hardware.
So I think that the DT can describe the existence of sync-points (presumably include a node for the sync-point HW device if it's separate). However, since the usage of each sync-point is entirely arbitrary, that seems like something which should be either assigned dynamically at run-time, or at least managed/assigned in SW at runtime somehow, rather than hard-coded into DT; it's more policy than HW.
The sync-points are part of the host1x device as I understand it. If their usage is truly generic, then we can probably ignore them safely. Maybe it'd make sense to carry a property that defines the number of sync points available for the host1x hardware represented by the DT?
I would assume this can safely be inferred from the compatible value; nvidia,tegra20-host1x v.s. nvidia,tegra30-host1x, and so there's no need to represent this in DT. I would assume (and it's certainly just an assumption) that there are numerous other small details that are different between the two SoCs, and so the driver will need to have some table mapping from compatible value to various information anyway. Terje, can you confirm/deny this (and hopefully use your knowledge of any future chips to guide the decision without giving anything away)?
On 26.06.2012 22:38, Stephen Warren wrote:
I would assume this can safely be inferred from the compatible value; nvidia,tegra20-host1x v.s. nvidia,tegra30-host1x, and so there's no need to represent this in DT. I would assume (and it's certainly just an assumption) that there are numerous other small details that are different between the two SoCs, and so the driver will need to have some table mapping from compatible value to various information anyway. Terje, can you confirm/deny this (and hopefully use your knowledge of any future chips to guide the decision without giving anything away)?
I'll now discuss what is going in our internal driver, which will be visible in our public git tree at some point.
Regarding just host1x, I have just parameterized per SoC the following properties in the driver: * number of sync points * number of channels * number of wait bases * number of mlocks
In addition to this, per SoC we parameterize the mapping of sync points to channels, the names of sync points (for debugging purposes) and whether sync point is stream managed or client managed.
For host1x and other devices, we have parameters indicating ability to power gate, idle times before clock gating and power gating, power domain ids (also used for resetting), associated clocks and some other stuff. In the driver parameters we have function pointers for context management and clock scaling.
Of these, possibility to power gate, power domain ids, number of resources are hardware properties, and rest are more of less agreements on how software uses the resources. Context management is in the borderline - each SoC has its own sequence to save and restore a context for 3D and MPE.
We will also have different register layouts for host1x in different SoCs and I have recently made nvhost ready for that.
Everything else except the register layout and the context management could be encoded into device trees. The remaining things will ensure that we will need to have some SoC specific code in the driver.
Terje
On 06/26/2012 07:41 AM, Thierry Reding wrote:
On Tue, Jun 26, 2012 at 04:01:05PM +0300, Terje Bergström wrote:
On 26.06.2012 13:55, Thierry Reding wrote:
...
status = "disabled";
gart = <&gart>;
/* video-encoding/decoding */ mpe { reg = <0x54040000 0x00040000>; interrupts = <0 68 0x04>; status = "disabled"; };
The client device interrupts are not very interesting, so they could be left out, too. Display controller related are probably an exception to this.
If the interrupts aren't used at all we should drop them.
I disagree here; "used" is most likely something specific to a particular OS's drivers. The HW always has the interrupts, and hence they should be described in DT.
On Tue, Jun 26, 2012 at 11:43:38AM -0600, Stephen Warren wrote:
On 06/26/2012 07:41 AM, Thierry Reding wrote:
On Tue, Jun 26, 2012 at 04:01:05PM +0300, Terje Bergström wrote:
On 26.06.2012 13:55, Thierry Reding wrote:
...
status = "disabled";
gart = <&gart>;
/* video-encoding/decoding */ mpe { reg = <0x54040000 0x00040000>; interrupts = <0 68 0x04>; status = "disabled"; };
The client device interrupts are not very interesting, so they could be left out, too. Display controller related are probably an exception to this.
If the interrupts aren't used at all we should drop them.
I disagree here; "used" is most likely something specific to a particular OS's drivers. The HW always has the interrupts, and hence they should be described in DT.
Okay, I see. Does the same apply to the COP interrupts of the host1x node in your opinion? I don't know if it makes sense to describe something that's not reachable from the CPU. Yet it is defined in the GIC.
Thierry
On 06/26/2012 01:31 PM, Thierry Reding wrote:
On Tue, Jun 26, 2012 at 11:43:38AM -0600, Stephen Warren wrote:
On 06/26/2012 07:41 AM, Thierry Reding wrote:
On Tue, Jun 26, 2012 at 04:01:05PM +0300, Terje Bergström wrote:
On 26.06.2012 13:55, Thierry Reding wrote:
...
status = "disabled";
gart = <&gart>;
/* video-encoding/decoding */ mpe { reg = <0x54040000 0x00040000>; interrupts = <0 68 0x04>; status = "disabled"; };
The client device interrupts are not very interesting, so they could be left out, too. Display controller related are probably an exception to this.
If the interrupts aren't used at all we should drop them.
I disagree here; "used" is most likely something specific to a particular OS's drivers. The HW always has the interrupts, and hence they should be described in DT.
Okay, I see. Does the same apply to the COP interrupts of the host1x node in your opinion? I don't know if it makes sense to describe something that's not reachable from the CPU. Yet it is defined in the GIC.
This probably applies to the interrupts too. The TRM does indicate that git GIC has 4 interrupt IDs allocated to host1x. I recall Terje saying that two of them weren't usable by the CPU though. Those two points seem inconsistent. Terje, can you please explain further?
On 26.06.2012 22:31, Thierry Reding wrote:
Okay, I see. Does the same apply to the COP interrupts of the host1x node in your opinion? I don't know if it makes sense to describe something that's not reachable from the CPU. Yet it is defined in the GIC.
Hi,
I wasn't sure so I had to check from the hardware people. The COP interrupts are not usable in a real system. They're usable in our internal simulation environment.
The client module interrupts are only for flagging error conditions.
Terje
On 06/26/2012 07:01 AM, Terje Bergström wrote:
On 26.06.2012 13:55, Thierry Reding wrote:
...
An alternative would be to call of_platform_populate() from the host1x
[alternative to making the host1x node contain compatible="simple-bus".]
driver. This has the advantage that it could integrate better with the host1x bus implementation that Terje is working on, but it also needs additional code to tear down the devices when the host1x driver is unloaded because a module reload would try to create duplicate devices otherwise.
Yes, we already have a bus_type for nvhost, and we have nvhost_device and nvhost_driver that device from device and device_driver respectively. They all accommodate some host1x client device common behavior and data that we need to store. We use the bus_type also to match each device and driver together, but the matching is version sensitive. For example, Tegra2 3D needs different driver than Tegra3 3D.
I'd certainly like to see some upstream discussion re: why exactly we have a custom bus type here. What does it do that a regular platform bus doesn't do? Are those features something that would be generally useful to add to the existing platform bus? Can we instead add hooks into platform bus rather than creating a new bus? I recall you saying that the nvhost_bus duplicated a lot of code from the existing platform bus.
On Tue, Jun 26, 2012 at 11:46:42AM -0600, Stephen Warren wrote:
On 06/26/2012 07:01 AM, Terje Bergström wrote:
On 26.06.2012 13:55, Thierry Reding wrote:
...
An alternative would be to call of_platform_populate() from the host1x
[alternative to making the host1x node contain compatible="simple-bus".]
driver. This has the advantage that it could integrate better with the host1x bus implementation that Terje is working on, but it also needs additional code to tear down the devices when the host1x driver is unloaded because a module reload would try to create duplicate devices otherwise.
Yes, we already have a bus_type for nvhost, and we have nvhost_device and nvhost_driver that device from device and device_driver respectively. They all accommodate some host1x client device common behavior and data that we need to store. We use the bus_type also to match each device and driver together, but the matching is version sensitive. For example, Tegra2 3D needs different driver than Tegra3 3D.
I'd certainly like to see some upstream discussion re: why exactly we have a custom bus type here. What does it do that a regular platform bus doesn't do? Are those features something that would be generally useful to add to the existing platform bus? Can we instead add hooks into platform bus rather than creating a new bus? I recall you saying that the nvhost_bus duplicated a lot of code from the existing platform bus.
I don't like the idea of duplicating the code either, but host1x does have some properties that might be convenient to have part of a bus abstraction (sync points and channels). This could all be implemented using a library of functions instead of a bus-type midlayer, though. I think we've also had that discussion before.
Thierry
On 26.06.2012 20:46, Stephen Warren wrote:
I'd certainly like to see some upstream discussion re: why exactly we have a custom bus type here. What does it do that a regular platform bus doesn't do? Are those features something that would be generally useful to add to the existing platform bus? Can we instead add hooks into platform bus rather than creating a new bus? I recall you saying that the nvhost_bus duplicated a lot of code from the existing platform bus.
Yes, the code is largely a copy of platform bus. One big difference is type safety - all the nvhost_bus APIs take in and give out nvhost_device or nvhost_driver pointers. At the registration time I also enforce parent-child relationship to host1x and bus clients.
Another is that I am able to enumerate all host1x related devices. The latter could of course be done via listing the children of host1x.
And third is that I was able to add version field to device to distinguish between Tegra2 and Tegra3 3D (for example) and make the bus use that information for binding driver and device. As the sysfs entry path is determined by the device name, this allows keeping the old sysfs APIs.
As you can see, most of the reasons are more related to convenience than own bus_type being the only way to implement them.
Terje
On Wed, Jun 27, 2012 at 09:13:52AM +0300, Terje Bergström wrote:
On 26.06.2012 20:46, Stephen Warren wrote:
I'd certainly like to see some upstream discussion re: why exactly we have a custom bus type here. What does it do that a regular platform bus doesn't do? Are those features something that would be generally useful to add to the existing platform bus? Can we instead add hooks into platform bus rather than creating a new bus? I recall you saying that the nvhost_bus duplicated a lot of code from the existing platform bus.
Yes, the code is largely a copy of platform bus. One big difference is type safety - all the nvhost_bus APIs take in and give out nvhost_device or nvhost_driver pointers. At the registration time I also enforce parent-child relationship to host1x and bus clients.
I don't know where NVIDIA wants to go with respect to device trees in their downstream kernels, but at least for mainline (with DT-only support on Tegra) the parent-child relationship will be rooted in the DT anyway.
Another is that I am able to enumerate all host1x related devices. The latter could of course be done via listing the children of host1x.
And third is that I was able to add version field to device to distinguish between Tegra2 and Tegra3 3D (for example) and make the bus use that information for binding driver and device. As the sysfs entry path is determined by the device name, this allows keeping the old sysfs APIs.
Again this can easily be represented by the DT compatible property.
Thierry
Hi Thierry,
On Tue, 26 Jun 2012 12:55:13 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
- PGP Signed by an unknown key
Hi,
while I haven't got much time to work on the actual code right now, I think it might still be useful if we could get the device tree binding to a point where everybody is happy with it. That'll also save me some time once I get to writing the code because I won't have to redo it over again. =)
So here's the current proposal:
host1x { compatible = "nvidia,tegra20-host1x", "simple-bus"; reg = <0x50000000 0x00024000>; interrupts = <0 64 0x04 /* cop syncpt */ 0 65 0x04 /* mpcore syncpt */ 0 66 0x04 /* cop general */ 0 67 0x04>; /* mpcore general */
#address-cells = <1>; #size-cells = <1>; ranges = <0x54000000 0x54000000 0x04000000>; status = "disabled"; gart = <&gart>;
...
output and hooks up a static EDID block with the LVDS output. There is also a carveout property which might be a better replacement for the "crippled" GART on Tegra20. Alternatively the CMA might work just as well instead.
The Plutux can be described like this:
host1x { carveout = <0x0e000000 0x02000000>;
As discussed in the following ML thread previously, the necessary info related to the "gart" would be got from the standard IOMMU API(or something above layers, DMABUF or TTM?). So I don't think that we need to refer to "gart" and "carveout" here in the end.
http://lists.linuxfoundation.org/pipermail/iommu/2012-June/004266.html
On Tue, Jun 26, 2012 at 04:02:24PM +0300, Hiroshi Doyu wrote:
Hi Thierry,
On Tue, 26 Jun 2012 12:55:13 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
- PGP Signed by an unknown key
Hi,
while I haven't got much time to work on the actual code right now, I think it might still be useful if we could get the device tree binding to a point where everybody is happy with it. That'll also save me some time once I get to writing the code because I won't have to redo it over again. =)
So here's the current proposal:
host1x { compatible = "nvidia,tegra20-host1x", "simple-bus"; reg = <0x50000000 0x00024000>; interrupts = <0 64 0x04 /* cop syncpt */ 0 65 0x04 /* mpcore syncpt */ 0 66 0x04 /* cop general */ 0 67 0x04>; /* mpcore general */
#address-cells = <1>; #size-cells = <1>; ranges = <0x54000000 0x54000000 0x04000000>; status = "disabled"; gart = <&gart>;
...
output and hooks up a static EDID block with the LVDS output. There is also a carveout property which might be a better replacement for the "crippled" GART on Tegra20. Alternatively the CMA might work just as well instead.
The Plutux can be described like this:
host1x { carveout = <0x0e000000 0x02000000>;
As discussed in the following ML thread previously, the necessary info related to the "gart" would be got from the standard IOMMU API(or something above layers, DMABUF or TTM?). So I don't think that we need to refer to "gart" and "carveout" here in the end.
http://lists.linuxfoundation.org/pipermail/iommu/2012-June/004266.html
Yes, if IOMMU or some layer above can provide the same information, then that is certainly better than explicitly referencing it in the DT.
I'm not sure I understand how information about the carveout would be obtained from the IOMMU API, though.
Thierry
On Tue, 26 Jun 2012 12:55:13 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
Old Signed by an unknown key
Hi,
while I haven't got much time to work on the actual code right now, I think it might still be useful if we could get the device tree binding to a point where everybody is happy with it. That'll also save me some time once I get to writing the code because I won't have to redo it over again. =)
So here's the current proposal:
host1x { compatible = "nvidia,tegra20-host1x", "simple-bus"; reg = <0x50000000 0x00024000>; interrupts = <0 64 0x04 /* cop syncpt */ 0 65 0x04 /* mpcore syncpt */ 0 66 0x04 /* cop general */ 0 67 0x04>; /* mpcore general */
#address-cells = <1>; #size-cells = <1>; ranges = <0x54000000 0x54000000 0x04000000>; status = "disabled"; gart = <&gart>;
...
output and hooks up a static EDID block with the LVDS output. There is also a carveout property which might be a better replacement for the "crippled" GART on Tegra20. Alternatively the CMA might work just as well instead.
The Plutux can be described like this:
host1x { carveout = <0x0e000000 0x02000000>;
As discussed in the following ML thread previously, the necessary info related to the "gart" would be got from the standard IOMMU API(or something above layers, DMABUF or TTM?). So I don't think that we need to refer to "gart" and "carveout" here in the end.
http://lists.linuxfoundation.org/pipermail/iommu/2012-June/004266.html
Yes, if IOMMU or some layer above can provide the same information, then that is certainly better than explicitly referencing it in the DT.
I'm not sure I understand how information about the carveout would be obtained from the IOMMU API, though.
I think that can be similar with current gart implementation. Define carveout as:
carveout { compatible = "nvidia,tegra20-carveout"; size = <0x10000000>; };
Then create a file such like "tegra-carveout.c" to get these definitions and register itself as platform device's iommu instance.
Thierry
- Unknown Key
- 0x7F3EB3A1
On 06/26/2012 07:46 PM, Mark Zhang wrote:
On Tue, 26 Jun 2012 12:55:13 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
...
I'm not sure I understand how information about the carveout would be obtained from the IOMMU API, though.
I think that can be similar with current gart implementation. Define carveout as:
carveout { compatible = "nvidia,tegra20-carveout"; size = <0x10000000>; };
Then create a file such like "tegra-carveout.c" to get these definitions and register itself as platform device's iommu instance.
The carveout isn't a HW object, so it doesn't seem appropriate to define a DT node to represent it.
On 06/26/2012 07:46 PM, Mark Zhang wrote:
On Tue, 26 Jun 2012 12:55:13 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
...
I'm not sure I understand how information about the carveout would be obtained from the IOMMU API, though.
I think that can be similar with current gart implementation. Define carveout as:
carveout { compatible = "nvidia,tegra20-carveout"; size = <0x10000000>; };
Then create a file such like "tegra-carveout.c" to get these definitions and
register itself as platform device's iommu instance.
The carveout isn't a HW object, so it doesn't seem appropriate to define a DT node to represent it. --
Yes. But I think it's better to export the size of carveout as a configurable item. So we need to define this somewhere. How about define carveout as a property of gart?
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/26/2012 08:32 PM, Mark Zhang wrote:
On 06/26/2012 07:46 PM, Mark Zhang wrote:
On Tue, 26 Jun 2012 12:55:13 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
...
I'm not sure I understand how information about the carveout would be obtained from the IOMMU API, though.
I think that can be similar with current gart implementation. Define carveout as:
carveout { compatible = "nvidia,tegra20-carveout"; size = <0x10000000>; };
Then create a file such like "tegra-carveout.c" to get these definitions and
register itself as platform device's iommu instance.
The carveout isn't a HW object, so it doesn't seem appropriate to define a DT node to represent it.
Yes. But I think it's better to export the size of carveout as a configurable item. So we need to define this somewhere. How about define carveout as a property of gart?
There already exists a way of preventing Linux from using certain chunks of memory; the /memreserve/ syntax. From a brief look at the dtc source, it looks like /memreserve/ entries can have labels, which implies that a property in the GART node could refer to the /memreserve/ entry by phandle in order to know what memory regions to use.
On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
On 06/26/2012 08:32 PM, Mark Zhang wrote:
On 06/26/2012 07:46 PM, Mark Zhang wrote:
On Tue, 26 Jun 2012 12:55:13 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
...
I'm not sure I understand how information about the carveout would be obtained from the IOMMU API, though.
I think that can be similar with current gart implementation. Define carveout as:
carveout { compatible = "nvidia,tegra20-carveout"; size = <0x10000000>; };
Then create a file such like "tegra-carveout.c" to get these definitions and
register itself as platform device's iommu instance.
The carveout isn't a HW object, so it doesn't seem appropriate to define a DT node to represent it.
Yes. But I think it's better to export the size of carveout as a configurable item. So we need to define this somewhere. How about define carveout as a property of gart?
There already exists a way of preventing Linux from using certain chunks of memory; the /memreserve/ syntax. From a brief look at the dtc source, it looks like /memreserve/ entries can have labels, which implies that a property in the GART node could refer to the /memreserve/ entry by phandle in order to know what memory regions to use.
Wasn't the whole point of using a carveout supposed to be a replacement for the GART? As such I'd think the carveout should rather be a property of the host1x device.
AIUI what we want to do is have a large contiguous region of memory that a central component (host1x) manages as a pool from which clients (DRM, V4L, ...) can allocate buffers as needed. Since all of this memory will be contiguous anyway there isn't much use for the GART anymore.
But maybe I'm misunderstanding.
Thierry
On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
On 06/26/2012 08:32 PM, Mark Zhang wrote:
On 06/26/2012 07:46 PM, Mark Zhang wrote:
> On Tue, 26 Jun 2012 12:55:13 +0200 Thierry Reding > thierry.reding@avionic-design.de wrote:
...
I'm not sure I understand how information about the carveout would be obtained from the IOMMU API, though.
I think that can be similar with current gart implementation. Define
carveout as:
carveout { compatible = "nvidia,tegra20-carveout"; size = <0x10000000>; };
Then create a file such like "tegra-carveout.c" to get these definitions and
register itself as platform device's iommu instance.
The carveout isn't a HW object, so it doesn't seem appropriate to define a DT node to represent it.
Yes. But I think it's better to export the size of carveout as a configurable item. So we need to define this somewhere. How about define carveout as a
property of gart?
There already exists a way of preventing Linux from using certain chunks of memory; the /memreserve/ syntax. From a brief look at the dtc source, it looks like /memreserve/ entries can have labels, which implies that a property in the GART node could refer to the /memreserve/ entry by phandle in order to know what memory regions to use.
Wasn't the whole point of using a carveout supposed to be a replacement for the GART? As such I'd think the carveout should rather be a property of the host1x device.
AIUI what we want to do is have a large contiguous region of memory that a central component (host1x) manages as a pool from which clients (DRM, V4L, ...) can allocate buffers as needed. Since all of this memory will be contiguous anyway there isn't much use for the GART anymore.
I have the same understanding. We don't need GART anymore if carveout is enabled. I'm thinking that why we need to define a property and reference to global /memreserve/ in GART or HOST1X node? We can just define a label for /memreserve/, so we can distinguish these memory reservations already in codes.
But maybe I'm misunderstanding.
Thierry
- Unknown Key
- 0x7F3EB3A1
Hi all,
I'm not sure what your exact plans are for the direction in which the DRM driver should head, as I'm still a bit out of the loop as many of those matters were only discussed internally at NVIDIA or with some NDA developers. But I'll still try to get into the discussion.
Am Mittwoch, den 27.06.2012, 07:14 +0200 schrieb Thierry Reding:
On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
On 06/26/2012 08:32 PM, Mark Zhang wrote:
On 06/26/2012 07:46 PM, Mark Zhang wrote:
> On Tue, 26 Jun 2012 12:55:13 +0200 > Thierry Reding thierry.reding@avionic-design.de wrote:
...
I'm not sure I understand how information about the carveout would be obtained from the IOMMU API, though.
I think that can be similar with current gart implementation. Define carveout as:
carveout { compatible = "nvidia,tegra20-carveout"; size = <0x10000000>; };
Then create a file such like "tegra-carveout.c" to get these definitions and
register itself as platform device's iommu instance.
The carveout isn't a HW object, so it doesn't seem appropriate to define a DT node to represent it.
Yes. But I think it's better to export the size of carveout as a configurable item. So we need to define this somewhere. How about define carveout as a property of gart?
There already exists a way of preventing Linux from using certain chunks of memory; the /memreserve/ syntax. From a brief look at the dtc source, it looks like /memreserve/ entries can have labels, which implies that a property in the GART node could refer to the /memreserve/ entry by phandle in order to know what memory regions to use.
Wasn't the whole point of using a carveout supposed to be a replacement for the GART? As such I'd think the carveout should rather be a property of the host1x device.
In my understanding carveout is neither a hardware nor software component. It's just a somewhat special pool of memory. As I pointed out in one of the older mails, carveout can not completely replace GART. While normal allocations for graphics use should be done contiguous, GART allows us to link normal scattered sysram buffers into GPU address space, which is a nice thing to have. IMHO if carveout is to be used exclusively by the GPU (i.e. the DRM driver) it should be a property of the host1x device.
AIUI what we want to do is have a large contiguous region of memory that a central component (host1x) manages as a pool from which clients (DRM, V4L, ...) can allocate buffers as needed. Since all of this memory will be contiguous anyway there isn't much use for the GART anymore.
I think this is the wrong way to go. Having a special memory pool managed by some driver adds one more allocator to the kernel, which is clearly not desirable. If we want a special mem region for GPU use, we should not share this memory pool with other components.
But if we want a mem region for contig allocations used by many components, which seems to be consensus here, CMA is the way to go. In this case I think we don't want to bother with the carveout property at all at the DRM driver level. Such a shared mem region managed by CMA should be defined at a higher level of the device tree.
Thanks, Lucas
On Wed, Jun 27, 2012 at 10:13:56AM +0200, Lucas Stach wrote:
Hi all,
I'm not sure what your exact plans are for the direction in which the DRM driver should head, as I'm still a bit out of the loop as many of those matters were only discussed internally at NVIDIA or with some NDA developers. But I'll still try to get into the discussion.
Am Mittwoch, den 27.06.2012, 07:14 +0200 schrieb Thierry Reding:
On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
On 06/26/2012 08:32 PM, Mark Zhang wrote:
On 06/26/2012 07:46 PM, Mark Zhang wrote:
>> On Tue, 26 Jun 2012 12:55:13 +0200 >> Thierry Reding thierry.reding@avionic-design.de wrote:
...
> I'm not sure I understand how information about the carveout would be > obtained from the IOMMU API, though.
I think that can be similar with current gart implementation. Define carveout as:
carveout { compatible = "nvidia,tegra20-carveout"; size = <0x10000000>; };
Then create a file such like "tegra-carveout.c" to get these definitions and
register itself as platform device's iommu instance.
The carveout isn't a HW object, so it doesn't seem appropriate to define a DT node to represent it.
Yes. But I think it's better to export the size of carveout as a configurable item. So we need to define this somewhere. How about define carveout as a property of gart?
There already exists a way of preventing Linux from using certain chunks of memory; the /memreserve/ syntax. From a brief look at the dtc source, it looks like /memreserve/ entries can have labels, which implies that a property in the GART node could refer to the /memreserve/ entry by phandle in order to know what memory regions to use.
Wasn't the whole point of using a carveout supposed to be a replacement for the GART? As such I'd think the carveout should rather be a property of the host1x device.
In my understanding carveout is neither a hardware nor software component. It's just a somewhat special pool of memory. As I pointed out in one of the older mails, carveout can not completely replace GART. While normal allocations for graphics use should be done contiguous, GART allows us to link normal scattered sysram buffers into GPU address space, which is a nice thing to have.
Do you agree that this will likely not be a problem with a more or less stupid framebuffer DRM driver? I recall you mention that in a 3D context where the scattered buffers are for example geometry provided by OpenGL, right?
I think we need to put some thoughts into that once we start to implement more advanced features. At that point we may also want to think about how to integrate that with TTM.
IMHO if carveout is to be used exclusively by the GPU (i.e. the DRM driver) it should be a property of the host1x device.
AIUI what we want to do is have a large contiguous region of memory that a central component (host1x) manages as a pool from which clients (DRM, V4L, ...) can allocate buffers as needed. Since all of this memory will be contiguous anyway there isn't much use for the GART anymore.
I think this is the wrong way to go. Having a special memory pool managed by some driver adds one more allocator to the kernel, which is clearly not desirable. If we want a special mem region for GPU use, we should not share this memory pool with other components.
But if we want a mem region for contig allocations used by many components, which seems to be consensus here, CMA is the way to go. In this case I think we don't want to bother with the carveout property at all at the DRM driver level. Such a shared mem region managed by CMA should be defined at a higher level of the device tree.
Okay, this pretty much matches what we've agreed on in another subthread. CMA looks like the best option for now as it should cover everything we need at present.
Thierry
On Wed, 27 Jun 2012 07:14:18 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
- PGP Signed by an unknown key
On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
On 06/26/2012 08:32 PM, Mark Zhang wrote:
On 06/26/2012 07:46 PM, Mark Zhang wrote:
> On Tue, 26 Jun 2012 12:55:13 +0200 > Thierry Reding thierry.reding@avionic-design.de wrote:
...
I'm not sure I understand how information about the carveout would be obtained from the IOMMU API, though.
I think that can be similar with current gart implementation. Define carveout as:
carveout { compatible = "nvidia,tegra20-carveout"; size = <0x10000000>; };
Then create a file such like "tegra-carveout.c" to get these definitions and
register itself as platform device's iommu instance.
The carveout isn't a HW object, so it doesn't seem appropriate to define a DT node to represent it.
Yes. But I think it's better to export the size of carveout as a configurable item. So we need to define this somewhere. How about define carveout as a property of gart?
There already exists a way of preventing Linux from using certain chunks of memory; the /memreserve/ syntax. From a brief look at the dtc source, it looks like /memreserve/ entries can have labels, which implies that a property in the GART node could refer to the /memreserve/ entry by phandle in order to know what memory regions to use.
Wasn't the whole point of using a carveout supposed to be a replacement for the GART?
Mostly agree. IIUC, we use both carveout/gart allocated buffers in android/tegra2.
As such I'd think the carveout should rather be a property of the host1x device.
Rather than introducing a new property, how about using "coherent_pool=??M" in the kernel command line if necessary? I think that this carveout size depends on the system usage/load.
AIUI what we want to do is have a large contiguous region of memory that a central component (host1x) manages as a pool from which clients (DRM, V4L, ...) can allocate buffers as needed. Since all of this memory will be contiguous anyway there isn't much use for the GART anymore.
Right.
I'd think that the amount of contiguous resion might depend on the system usage/load.
On Wed, Jun 27, 2012 at 03:59:07PM +0300, Hiroshi Doyu wrote:
On Wed, 27 Jun 2012 07:14:18 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
- PGP Signed by an unknown key
On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
On 06/26/2012 08:32 PM, Mark Zhang wrote:
On 06/26/2012 07:46 PM, Mark Zhang wrote:
>> On Tue, 26 Jun 2012 12:55:13 +0200 >> Thierry Reding thierry.reding@avionic-design.de wrote:
...
> I'm not sure I understand how information about the carveout would be > obtained from the IOMMU API, though.
I think that can be similar with current gart implementation. Define carveout as:
carveout { compatible = "nvidia,tegra20-carveout"; size = <0x10000000>; };
Then create a file such like "tegra-carveout.c" to get these definitions and
register itself as platform device's iommu instance.
The carveout isn't a HW object, so it doesn't seem appropriate to define a DT node to represent it.
Yes. But I think it's better to export the size of carveout as a configurable item. So we need to define this somewhere. How about define carveout as a property of gart?
There already exists a way of preventing Linux from using certain chunks of memory; the /memreserve/ syntax. From a brief look at the dtc source, it looks like /memreserve/ entries can have labels, which implies that a property in the GART node could refer to the /memreserve/ entry by phandle in order to know what memory regions to use.
Wasn't the whole point of using a carveout supposed to be a replacement for the GART?
Mostly agree. IIUC, we use both carveout/gart allocated buffers in android/tegra2.
As such I'd think the carveout should rather be a property of the host1x device.
Rather than introducing a new property, how about using "coherent_pool=??M" in the kernel command line if necessary? I think that this carveout size depends on the system usage/load.
I was hoping that we could get away with using the CMA and perhaps initialize it based on device tree content. I agree that the carveout size depends on the use-case, but I still think it makes sense to specify it on a per-board basis.
Thierry
On Wed, 27 Jun 2012 16:08:10 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
- PGP Signed by an unknown key
On Wed, Jun 27, 2012 at 03:59:07PM +0300, Hiroshi Doyu wrote:
On Wed, 27 Jun 2012 07:14:18 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
Old Signed by an unknown key
On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
On 06/26/2012 08:32 PM, Mark Zhang wrote:
On 06/26/2012 07:46 PM, Mark Zhang wrote: >>> On Tue, 26 Jun 2012 12:55:13 +0200 >>> Thierry Reding thierry.reding@avionic-design.de wrote: ... >> I'm not sure I understand how information about the carveout would be >> obtained from the IOMMU API, though. > > I think that can be similar with current gart implementation. Define carveout as: > > carveout { > compatible = "nvidia,tegra20-carveout"; > size = <0x10000000>; > }; > > Then create a file such like "tegra-carveout.c" to get these definitions and register itself as platform device's iommu instance.
The carveout isn't a HW object, so it doesn't seem appropriate to define a DT node to represent it.
Yes. But I think it's better to export the size of carveout as a configurable item. So we need to define this somewhere. How about define carveout as a property of gart?
There already exists a way of preventing Linux from using certain chunks of memory; the /memreserve/ syntax. From a brief look at the dtc source, it looks like /memreserve/ entries can have labels, which implies that a property in the GART node could refer to the /memreserve/ entry by phandle in order to know what memory regions to use.
Wasn't the whole point of using a carveout supposed to be a replacement for the GART?
Mostly agree. IIUC, we use both carveout/gart allocated buffers in android/tegra2.
As such I'd think the carveout should rather be a property of the host1x device.
Rather than introducing a new property, how about using "coherent_pool=??M" in the kernel command line if necessary? I think that this carveout size depends on the system usage/load.
I was hoping that we could get away with using the CMA and perhaps initialize it based on device tree content. I agree that the carveout size depends on the use-case, but I still think it makes sense to specify it on a per-board basis.
DRM driver doesn't know if it uses CMA or not, because DRM only uses DMA API. I think that "coherent_pool" can be used only when the amount of contiguous memory is short in your system. Otherwise even unnecessary.
Could you explain a bit more why you want carveout size on per-board basis?
On Wed, Jun 27, 2012 at 05:29:14PM +0300, Hiroshi Doyu wrote:
On Wed, 27 Jun 2012 16:08:10 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
- PGP Signed by an unknown key
On Wed, Jun 27, 2012 at 03:59:07PM +0300, Hiroshi Doyu wrote:
On Wed, 27 Jun 2012 07:14:18 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
Old Signed by an unknown key
On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
On 06/26/2012 08:32 PM, Mark Zhang wrote:
> On 06/26/2012 07:46 PM, Mark Zhang wrote: >>>> On Tue, 26 Jun 2012 12:55:13 +0200 >>>> Thierry Reding thierry.reding@avionic-design.de wrote: > ... >>> I'm not sure I understand how information about the carveout would be >>> obtained from the IOMMU API, though. >> >> I think that can be similar with current gart implementation. Define carveout as: >> >> carveout { >> compatible = "nvidia,tegra20-carveout"; >> size = <0x10000000>; >> }; >> >> Then create a file such like "tegra-carveout.c" to get these definitions and > register itself as platform device's iommu instance. > > The carveout isn't a HW object, so it doesn't seem appropriate to define a DT > node to represent it.
Yes. But I think it's better to export the size of carveout as a configurable item. So we need to define this somewhere. How about define carveout as a property of gart?
There already exists a way of preventing Linux from using certain chunks of memory; the /memreserve/ syntax. From a brief look at the dtc source, it looks like /memreserve/ entries can have labels, which implies that a property in the GART node could refer to the /memreserve/ entry by phandle in order to know what memory regions to use.
Wasn't the whole point of using a carveout supposed to be a replacement for the GART?
Mostly agree. IIUC, we use both carveout/gart allocated buffers in android/tegra2.
As such I'd think the carveout should rather be a property of the host1x device.
Rather than introducing a new property, how about using "coherent_pool=??M" in the kernel command line if necessary? I think that this carveout size depends on the system usage/load.
I was hoping that we could get away with using the CMA and perhaps initialize it based on device tree content. I agree that the carveout size depends on the use-case, but I still think it makes sense to specify it on a per-board basis.
DRM driver doesn't know if it uses CMA or not, because DRM only uses DMA API.
So how is the DRM supposed to allocate buffers? Does it call the dma_alloc_from_contiguous() function to do that? I can see how it is used by arm_dma_ops but how does it end up in the driver?
I think that "coherent_pool" can be used only when the amount of contiguous memory is short in your system. Otherwise even unnecessary.
Could you explain a bit more why you want carveout size on per-board basis?
In the ideal case I would want to not have a carveout size at all. However there may be situations where you need to make sure some driver can allocate a given amount of memory. Having to specify this using a kernel command-line parameter is cumbersome because it may require changes to the bootloader or whatever. So if you know that a particular board always needs 128 MiB of carveout, then it makes sense to specify it on a per-board basis.
Thierry
Am Mittwoch, den 27.06.2012, 16:44 +0200 schrieb Thierry Reding:
On Wed, Jun 27, 2012 at 05:29:14PM +0300, Hiroshi Doyu wrote:
On Wed, 27 Jun 2012 16:08:10 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
- PGP Signed by an unknown key
On Wed, Jun 27, 2012 at 03:59:07PM +0300, Hiroshi Doyu wrote:
On Wed, 27 Jun 2012 07:14:18 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
Old Signed by an unknown key
On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
On 06/26/2012 08:32 PM, Mark Zhang wrote: >> On 06/26/2012 07:46 PM, Mark Zhang wrote: >>>>> On Tue, 26 Jun 2012 12:55:13 +0200 >>>>> Thierry Reding thierry.reding@avionic-design.de wrote: >> ... >>>> I'm not sure I understand how information about the carveout would be >>>> obtained from the IOMMU API, though. >>> >>> I think that can be similar with current gart implementation. Define carveout as: >>> >>> carveout { >>> compatible = "nvidia,tegra20-carveout"; >>> size = <0x10000000>; >>> }; >>> >>> Then create a file such like "tegra-carveout.c" to get these definitions and >> register itself as platform device's iommu instance. >> >> The carveout isn't a HW object, so it doesn't seem appropriate to define a DT >> node to represent it. > > Yes. But I think it's better to export the size of carveout as a configurable item. > So we need to define this somewhere. How about define carveout as a property of gart?
There already exists a way of preventing Linux from using certain chunks of memory; the /memreserve/ syntax. From a brief look at the dtc source, it looks like /memreserve/ entries can have labels, which implies that a property in the GART node could refer to the /memreserve/ entry by phandle in order to know what memory regions to use.
Wasn't the whole point of using a carveout supposed to be a replacement for the GART?
Mostly agree. IIUC, we use both carveout/gart allocated buffers in android/tegra2.
As such I'd think the carveout should rather be a property of the host1x device.
Rather than introducing a new property, how about using "coherent_pool=??M" in the kernel command line if necessary? I think that this carveout size depends on the system usage/load.
I was hoping that we could get away with using the CMA and perhaps initialize it based on device tree content. I agree that the carveout size depends on the use-case, but I still think it makes sense to specify it on a per-board basis.
DRM driver doesn't know if it uses CMA or not, because DRM only uses DMA API.
So how is the DRM supposed to allocate buffers? Does it call the dma_alloc_from_contiguous() function to do that? I can see how it is used by arm_dma_ops but how does it end up in the driver?
As I said before the DMA API is not a good fit for graphics drivers. Most of the DMA buffers used by graphics cores are long lived and big, so we need a special pool to alloc from to avoid eating all contiguous address space, as DMA API does not provide shrinker callbacks for clients using large amount of memory.
I think that "coherent_pool" can be used only when the amount of contiguous memory is short in your system. Otherwise even unnecessary.
Could you explain a bit more why you want carveout size on per-board basis?
In the ideal case I would want to not have a carveout size at all. However there may be situations where you need to make sure some driver can allocate a given amount of memory. Having to specify this using a kernel command-line parameter is cumbersome because it may require changes to the bootloader or whatever. So if you know that a particular board always needs 128 MiB of carveout, then it makes sense to specify it on a per-board basis.
If we go with CMA, this is a non-issue, as CMA allows to use the contig area for normal allocations and only purges them if it really needs the space for contig allocs.
Thierry
Hi Lucas,
On Wed, 27 Jun 2012 17:59:55 +0200 Lucas Stach dev@lynxeye.de wrote:
Rather than introducing a new property, how about using "coherent_pool=??M" in the kernel command line if necessary? I think that this carveout size depends on the system usage/load.
I was hoping that we could get away with using the CMA and perhaps initialize it based on device tree content. I agree that the carveout size depends on the use-case, but I still think it makes sense to specify it on a per-board basis.
DRM driver doesn't know if it uses CMA or not, because DRM only uses DMA API.
So how is the DRM supposed to allocate buffers? Does it call the dma_alloc_from_contiguous() function to do that? I can see how it is used by arm_dma_ops but how does it end up in the driver?
As I said before the DMA API is not a good fit for graphics drivers. Most of the DMA buffers used by graphics cores are long lived and big, so we need a special pool to alloc from to avoid eating all contiguous address space, as DMA API does not provide shrinker callbacks for clients using large amount of memory.
For contiguious address space shortage issue in DMA API, I think that DMABUF framework can handle?
If so, is there any good example for DMABUF used in DRM?
Am Donnerstag, den 28.06.2012, 09:06 +0300 schrieb Hiroshi Doyu:
Hi Lucas,
On Wed, 27 Jun 2012 17:59:55 +0200 Lucas Stach dev@lynxeye.de wrote:
Rather than introducing a new property, how about using "coherent_pool=??M" in the kernel command line if necessary? I think that this carveout size depends on the system usage/load.
I was hoping that we could get away with using the CMA and perhaps initialize it based on device tree content. I agree that the carveout size depends on the use-case, but I still think it makes sense to specify it on a per-board basis.
DRM driver doesn't know if it uses CMA or not, because DRM only uses DMA API.
So how is the DRM supposed to allocate buffers? Does it call the dma_alloc_from_contiguous() function to do that? I can see how it is used by arm_dma_ops but how does it end up in the driver?
As I said before the DMA API is not a good fit for graphics drivers. Most of the DMA buffers used by graphics cores are long lived and big, so we need a special pool to alloc from to avoid eating all contiguous address space, as DMA API does not provide shrinker callbacks for clients using large amount of memory.
For contiguious address space shortage issue in DMA API, I think that DMABUF framework can handle?
No, DMABUF is only about sharing DMA buffers between different hardware blocks. It has no address space management at all.
All DRM drivers manage their address space on their own, either through GEM or TTM. On the desktop the main pool for contiguous memory is on card VRAM or a area of system memory set aside for use by the DRM driver. Having a carveout area only for DRM use would be the same thing, but makes the split between graphics and system memory a bit unflexible. Currently we have no sane way for integrating DRM memory managers with the normal DMA API. There was some discussion about DMA pools and/or shrinkers for DMA clients, but they have not led to any written code.
DMABUF only makes a limited integration between for example V4L and DRM possible, so they can share buffers. But we still have the situation that DRM devices allocate from their own pool, for explained reasons and not use the standard DMA API.
On Wed, Jun 27, 2012 at 05:59:55PM +0200, Lucas Stach wrote:
Am Mittwoch, den 27.06.2012, 16:44 +0200 schrieb Thierry Reding:
On Wed, Jun 27, 2012 at 05:29:14PM +0300, Hiroshi Doyu wrote:
On Wed, 27 Jun 2012 16:08:10 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
- PGP Signed by an unknown key
On Wed, Jun 27, 2012 at 03:59:07PM +0300, Hiroshi Doyu wrote:
On Wed, 27 Jun 2012 07:14:18 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
> Old Signed by an unknown key
On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote: > On 06/26/2012 08:32 PM, Mark Zhang wrote: > >> On 06/26/2012 07:46 PM, Mark Zhang wrote: > >>>>> On Tue, 26 Jun 2012 12:55:13 +0200 > >>>>> Thierry Reding thierry.reding@avionic-design.de wrote: > >> ... > >>>> I'm not sure I understand how information about the carveout would be > >>>> obtained from the IOMMU API, though. > >>> > >>> I think that can be similar with current gart implementation. Define carveout as: > >>> > >>> carveout { > >>> compatible = "nvidia,tegra20-carveout"; > >>> size = <0x10000000>; > >>> }; > >>> > >>> Then create a file such like "tegra-carveout.c" to get these definitions and > >> register itself as platform device's iommu instance. > >> > >> The carveout isn't a HW object, so it doesn't seem appropriate to define a DT > >> node to represent it. > > > > Yes. But I think it's better to export the size of carveout as a configurable item. > > So we need to define this somewhere. How about define carveout as a property of gart? > > There already exists a way of preventing Linux from using certain chunks > of memory; the /memreserve/ syntax. From a brief look at the dtc source, > it looks like /memreserve/ entries can have labels, which implies that a > property in the GART node could refer to the /memreserve/ entry by > phandle in order to know what memory regions to use.
Wasn't the whole point of using a carveout supposed to be a replacement for the GART?
Mostly agree. IIUC, we use both carveout/gart allocated buffers in android/tegra2.
As such I'd think the carveout should rather be a property of the host1x device.
Rather than introducing a new property, how about using "coherent_pool=??M" in the kernel command line if necessary? I think that this carveout size depends on the system usage/load.
I was hoping that we could get away with using the CMA and perhaps initialize it based on device tree content. I agree that the carveout size depends on the use-case, but I still think it makes sense to specify it on a per-board basis.
DRM driver doesn't know if it uses CMA or not, because DRM only uses DMA API.
So how is the DRM supposed to allocate buffers? Does it call the dma_alloc_from_contiguous() function to do that? I can see how it is used by arm_dma_ops but how does it end up in the driver?
As I said before the DMA API is not a good fit for graphics drivers. Most of the DMA buffers used by graphics cores are long lived and big, so we need a special pool to alloc from to avoid eating all contiguous address space, as DMA API does not provide shrinker callbacks for clients using large amount of memory.
I recall you mentioning TTM as a better alternative several times in the past. How does it fit in with this? Does it have the capability of using a predefined chunk of contiguous memory as a pool to allocate from?
One problem that all of these solutions don't address is that not all devices below host1x are DRM related. At least for the CSI and VI blocks I expect there to be V4L2 drivers eventually, so what we really need is to manage allocations outside of the DRM. host1x is the most logical choice here.
Perhaps we can put host1x code somewhere below drivers/gpu (mm subdirectory?), drivers/memory or perhaps some other or new location that could eventually host similar drivers for other SoCs.
Then again, maybe it'd be easier for now to put everything below the drivers/gpu/drm/tegra directory and cross that bridge when we get to it.
I think that "coherent_pool" can be used only when the amount of contiguous memory is short in your system. Otherwise even unnecessary.
Could you explain a bit more why you want carveout size on per-board basis?
In the ideal case I would want to not have a carveout size at all. However there may be situations where you need to make sure some driver can allocate a given amount of memory. Having to specify this using a kernel command-line parameter is cumbersome because it may require changes to the bootloader or whatever. So if you know that a particular board always needs 128 MiB of carveout, then it makes sense to specify it on a per-board basis.
If we go with CMA, this is a non-issue, as CMA allows to use the contig area for normal allocations and only purges them if it really needs the space for contig allocs.
CMA certainly sounds like the most simple approach. While it may not be suited for 3D graphics or multimedia processing later on, I think we could use it at a starting point to get basic framebuffer and X support up and running. We can always move to something more advanced like TTM later.
Thierry
On 06/28/2012 05:12 AM, Thierry Reding wrote:
On Wed, Jun 27, 2012 at 05:59:55PM +0200, Lucas Stach wrote:
Am Mittwoch, den 27.06.2012, 16:44 +0200 schrieb Thierry Reding:
...
In the ideal case I would want to not have a carveout size at all. However there may be situations where you need to make sure some driver can allocate a given amount of memory. Having to specify this using a kernel command-line parameter is cumbersome because it may require changes to the bootloader or whatever. So if you know that a particular board always needs 128 MiB of carveout, then it makes sense to specify it on a per-board basis.
If we go with CMA, this is a non-issue, as CMA allows to use the contig area for normal allocations and only purges them if it really needs the space for contig allocs.
CMA certainly sounds like the most simple approach. While it may not be suited for 3D graphics or multimedia processing later on, I think we could use it at a starting point to get basic framebuffer and X support up and running. We can always move to something more advanced like TTM later.
I thought the whole purpose of CMA was to act as the infra-structure to provide buffers to 3D, camera, etc. in particular allowing sharing of buffers between them. In other words, isn't CMA the memory manager? If there's some deficiency with CMA for 3D graphics, it seems like that should be raised with those designing CMA. Or, am I way off base with my expectations of CMA?
Am Donnerstag, den 28.06.2012, 10:51 -0600 schrieb Stephen Warren:
On 06/28/2012 05:12 AM, Thierry Reding wrote:
On Wed, Jun 27, 2012 at 05:59:55PM +0200, Lucas Stach wrote:
Am Mittwoch, den 27.06.2012, 16:44 +0200 schrieb Thierry Reding:
...
In the ideal case I would want to not have a carveout size at all. However there may be situations where you need to make sure some driver can allocate a given amount of memory. Having to specify this using a kernel command-line parameter is cumbersome because it may require changes to the bootloader or whatever. So if you know that a particular board always needs 128 MiB of carveout, then it makes sense to specify it on a per-board basis.
If we go with CMA, this is a non-issue, as CMA allows to use the contig area for normal allocations and only purges them if it really needs the space for contig allocs.
CMA certainly sounds like the most simple approach. While it may not be suited for 3D graphics or multimedia processing later on, I think we could use it at a starting point to get basic framebuffer and X support up and running. We can always move to something more advanced like TTM later.
I thought the whole purpose of CMA was to act as the infra-structure to provide buffers to 3D, camera, etc. in particular allowing sharing of buffers between them. In other words, isn't CMA the memory manager? If there's some deficiency with CMA for 3D graphics, it seems like that should be raised with those designing CMA. Or, am I way off base with my expectations of CMA?
CMA is just a way of providing large contiguous address space blocks in a dynamic fashion. The problem CMA solves is: we have a system with relatively low amounts of sysmem (like 512MB), now to ensure we can always get large contiguous buffers for use by GPU or VIDEO blocks, we need to set aside a relatively large contiguous pool (like 128MB). So we are stealing 128MB of memory from the system while we may or may not use it, which is bad. Now CMA allows to say: I may need 128MB of contig space, but the system is free to use it as normal memory as long as I don't really need it. If the space is really needed, CMA purges pages from the area and may even swap them out. So yes CMA is a memory allocator for contig memory.
TTM though solves more advanced matters, like buffer synchronisation between 3D and 2D block of hardware or syncing buffer access between GPU and CPU. One of the most interesting things of TTM is the ability to purge the GPU DMA buffers to scattered sysmem or even swap them out, if they are not currently used by the GPU. It then makes sure to move them in the contig space again when the GPU really needs them and fix up the GPU command stream with the new buffer address.
IMHO the best solution would be to use CMA as a flexible replacement of the static carveout area and put TTM on top of this to solve the needs of graphics drivers. We certainly don't want to reinvent the wheel inside CMA. We have solutions for all those things in the kernel right now, we just have to glue them together in a sane way.
Thanks, Lucas
On 06/28/2012 11:19 AM, Lucas Stach wrote: ...
CMA is just a way of providing large contiguous address space blocks in a dynamic fashion. ...
TTM though solves more advanced matters, like buffer synchronisation between 3D and 2D block of hardware ...
IMHO the best solution would be to use CMA as a flexible replacement of the static carveout area and put TTM on top of this ...
Ah right, thanks for the explanation. That makes sense to me now.
On Thu, Jun 28, 2012 at 11:33:56AM -0600, Stephen Warren wrote:
On 06/28/2012 11:19 AM, Lucas Stach wrote: ...
CMA is just a way of providing large contiguous address space blocks in a dynamic fashion. ...
TTM though solves more advanced matters, like buffer synchronisation between 3D and 2D block of hardware ...
IMHO the best solution would be to use CMA as a flexible replacement of the static carveout area and put TTM on top of this ...
Ah right, thanks for the explanation. That makes sense to me now.
Okay. I think that resolves the last open issue. I'll try to get some more work done on the DT and corresponding code soonish.
For those who don't know yet I've requested the creation of a project on freedesktop.org for Tegra graphics drivers[0]. I plan to have the DRM code hosted there once the project has been approved. Furthermore if we ever get to write a corresponding X driver it can be hosted there as well. We should also use the wiki for coordination once things get started.
Thierry
Am Donnerstag, den 28.06.2012, 10:51 -0600 schrieb Stephen Warren:
On 06/28/2012 05:12 AM, Thierry Reding wrote:
On Wed, Jun 27, 2012 at 05:59:55PM +0200, Lucas Stach wrote:
Am Mittwoch, den 27.06.2012, 16:44 +0200 schrieb Thierry Reding:
...
In the ideal case I would want to not have a carveout size at all. However there may be situations where you need to make sure some driver can allocate a given amount of memory. Having to specify this using a kernel command-line parameter is cumbersome because it may require changes to the bootloader or whatever. So if you know that a particular board always needs 128 MiB of carveout, then it makes sense to specify it on a per-board basis.
If we go with CMA, this is a non-issue, as CMA allows to use the contig area for normal allocations and only purges them if it really needs the space for contig allocs.
CMA certainly sounds like the most simple approach. While it may not be suited for 3D graphics or multimedia processing later on, I think we could use it at a starting point to get basic framebuffer and X support up and running. We can always move to something more advanced like TTM later.
I thought the whole purpose of CMA was to act as the infra-structure to provide buffers to 3D, camera, etc. in particular allowing sharing of buffers between them. In other words, isn't CMA the memory manager? If there's some deficiency with CMA for 3D graphics, it seems like that should be raised with those designing CMA. Or, am I way off base with my expectations of CMA?
CMA is just a way of providing large contiguous address space blocks in a dynamic fashion. The problem CMA solves is: we have a system with relatively low amounts of sysmem (like 512MB), now to ensure we can always get large contiguous buffers for use by GPU or VIDEO blocks, we need to set aside a relatively large contiguous pool (like 128MB). So we are stealing 128MB of memory from the system while we may or may not use it, which is bad. Now CMA allows to say: I may need 128MB of contig space, but the system is free to use it as normal memory as long as I don't really need it. If the space is really needed, CMA purges pages from the area and may even swap them out. So yes CMA is a memory allocator for contig memory.
TTM though solves more advanced matters, like buffer synchronisation between 3D and 2D block of hardware or syncing buffer access between GPU and CPU. One of the most interesting things of TTM is the ability to purge the GPU DMA buffers to scattered sysmem or even swap them out, if they are not currently used by the GPU. It then makes sure to move them in the contig space again when the GPU really needs them and fix up the GPU command stream with the new buffer address.
IMHO the best solution would be to use CMA as a flexible replacement of the static carveout area and put TTM on top of this to solve the needs of graphics drivers. We certainly don't want to reinvent the wheel inside CMA. We have solutions for all those things in the kernel right now, we just have to glue them together in a sane way.
That is a great explanation. So could you explain what's the relation between IOMMU api and TTM(or GEM)? Terje said DMABUF api sits on top of IOMMU api. So for normal device drivers(such as drm), can forget iommu apis, just use dmabuf api is OK. If so, I wanna know does TTM/GEM and IOMMU are related? Or TTM/GEM uses dmabuf apis which calls iommu api to do memory allocation/mapping?
Thanks, Lucas
-- To unsubscribe from this list: send the line "unsubscribe linux-tegra" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am Donnerstag, den 28.06.2012, 10:51 -0600 schrieb Stephen Warren:
On 06/28/2012 05:12 AM, Thierry Reding wrote:
On Wed, Jun 27, 2012 at 05:59:55PM +0200, Lucas Stach wrote:
Am Mittwoch, den 27.06.2012, 16:44 +0200 schrieb Thierry Reding:
...
In the ideal case I would want to not have a carveout size at all. However there may be situations where you need to make sure some driver can allocate a given amount of memory. Having to specify this using a kernel command-line parameter is cumbersome because it may require changes to the bootloader or whatever. So if you know that a particular board always needs 128 MiB of carveout, then it makes sense to specify it on a per-board basis.
If we go with CMA, this is a non-issue, as CMA allows to use the contig area for normal allocations and only purges them if it really needs the space for contig allocs.
CMA certainly sounds like the most simple approach. While it may not be suited for 3D graphics or multimedia processing later on, I think we could use it at a starting point to get basic framebuffer and X support up and running. We can always move to something more advanced like TTM later.
I thought the whole purpose of CMA was to act as the infra-structure to provide buffers to 3D, camera, etc. in particular allowing sharing of buffers between them. In other words, isn't CMA the memory
manager?
If there's some deficiency with CMA for 3D graphics, it seems like that should be raised with those designing CMA. Or, am I way off base with my expectations of CMA?
CMA is just a way of providing large contiguous address space blocks in a dynamic fashion. The problem CMA solves is: we have a system with relatively low amounts of sysmem (like 512MB), now to ensure we can always get large contiguous buffers for use by GPU or VIDEO blocks, we need to set aside a relatively large contiguous pool (like 128MB). So we are stealing 128MB of memory from the system while we may or may not
use it, which is bad.
Now CMA allows to say: I may need 128MB of contig space, but the system is free to use it as normal memory as long as I don't really need it. If the space is really needed, CMA purges pages from the area and may even swap them out. So yes CMA is a memory allocator for contig
memory.
TTM though solves more advanced matters, like buffer synchronisation between 3D and 2D block of hardware or syncing buffer access between GPU
and CPU.
One of the most interesting things of TTM is the ability to purge the GPU DMA buffers to scattered sysmem or even swap them out, if they are not currently used by the GPU. It then makes sure to move them in the contig space again when the GPU really needs them and fix up the GPU command stream with the new buffer address.
IMHO the best solution would be to use CMA as a flexible replacement of the static carveout area and put TTM on top of this to solve the needs of graphics drivers. We certainly don't want to reinvent the wheel inside CMA. We have solutions for all those things in the kernel right now, we just have to glue them together in a sane way.
That is a great explanation. So could you explain what's the relation between IOMMU api and TTM(or GEM)? Terje said DMABUF api sits on top of IOMMU api. So for normal device drivers(such as drm), can forget iommu apis, just use dmabuf api is OK. If so, I wanna know does TTM/GEM and IOMMU are related? Or TTM/GEM uses dmabuf apis which calls iommu api to do memory allocation/mapping?
Sorry for my stupid question. DMA mapping api sits on top of IOMMU api, not DMABUF. DMABUF is used to share buffers. So please ignore what I said.
Thanks, Lucas
-- To unsubscribe from this list: send the line "unsubscribe linux-tegra" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 28.06.2012 20:19, Lucas Stach wrote:
TTM though solves more advanced matters, like buffer synchronisation between 3D and 2D block of hardware or syncing buffer access between GPU and CPU. One of the most interesting things of TTM is the ability to purge the GPU DMA buffers to scattered sysmem or even swap them out, if they are not currently used by the GPU. It then makes sure to move them in the contig space again when the GPU really needs them and fix up the GPU command stream with the new buffer address.
We preferably should choose dma_buf as a common interface towards buffers. That way whatever we choose as the memory manager, all dma_buf aware drivers will be able to use buffers allocated by other drivers.
We probably need to accommodate multiple memory managers to take care of legacy and new drivers. If V4L2 and DRM projects all move to dma_buf, we have the possibility to do zero-copy video without forcing everybody to use the same memory manager.
As I understand, TTM is good for platforms where we have a separate frame buffer memory, as is the case with most of the graphics cards. In Tegra, graphics and CPU occupy the same memory, so I'm not sure if we require the level of functionality that TTM provides. I guess the level of functionality and the complexity that it brings is one reason why TTM hasn't really caught on in the ARM world.
The synchronization primitives attached to TTM are slightly confusing. At the bottom level, it's operations which need to be synchronized between each other. That's the API level that we should to export from kernel to user space. It's then up to libdrm level (or whatever is doing the rendering in user space) to decide which operations it wants to have completed before a buffer can be reused/read/passed on to the next stage.
Anyway, if we hide the memory manager behind dma_buf, we're free to muck around with multiple of them and see what works best.
Terje
On Fri, Jun 29, 2012 at 04:20:31PM +0300, Terje Bergström wrote:
On 28.06.2012 20:19, Lucas Stach wrote:
TTM though solves more advanced matters, like buffer synchronisation between 3D and 2D block of hardware or syncing buffer access between GPU and CPU. One of the most interesting things of TTM is the ability to purge the GPU DMA buffers to scattered sysmem or even swap them out, if they are not currently used by the GPU. It then makes sure to move them in the contig space again when the GPU really needs them and fix up the GPU command stream with the new buffer address.
We preferably should choose dma_buf as a common interface towards buffers. That way whatever we choose as the memory manager, all dma_buf aware drivers will be able to use buffers allocated by other drivers.
We probably need to accommodate multiple memory managers to take care of legacy and new drivers. If V4L2 and DRM projects all move to dma_buf, we have the possibility to do zero-copy video without forcing everybody to use the same memory manager.
I agree. Supporting DMA BUF also doesn't seem very difficult.
As I understand, TTM is good for platforms where we have a separate frame buffer memory, as is the case with most of the graphics cards. In Tegra, graphics and CPU occupy the same memory, so I'm not sure if we require the level of functionality that TTM provides. I guess the level of functionality and the complexity that it brings is one reason why TTM hasn't really caught on in the ARM world.
The synchronization primitives attached to TTM are slightly confusing. At the bottom level, it's operations which need to be synchronized between each other. That's the API level that we should to export from kernel to user space. It's then up to libdrm level (or whatever is doing the rendering in user space) to decide which operations it wants to have completed before a buffer can be reused/read/passed on to the next stage.
Anyway, if we hide the memory manager behind dma_buf, we're free to muck around with multiple of them and see what works best.
Exactly. Other subthreads echo this as well. Using CMA seems the easiest and most flexible for now but still covers everything we need. If it turns out that it isn't suited for more advanced stuff once we start supporting 3D then we can still opt for something like TTM.
Thierry
Am Samstag, den 30.06.2012, 20:01 +0200 schrieb Thierry Reding:
On Fri, Jun 29, 2012 at 04:20:31PM +0300, Terje Bergström wrote:
On 28.06.2012 20:19, Lucas Stach wrote:
TTM though solves more advanced matters, like buffer synchronisation between 3D and 2D block of hardware or syncing buffer access between GPU and CPU. One of the most interesting things of TTM is the ability to purge the GPU DMA buffers to scattered sysmem or even swap them out, if they are not currently used by the GPU. It then makes sure to move them in the contig space again when the GPU really needs them and fix up the GPU command stream with the new buffer address.
We preferably should choose dma_buf as a common interface towards buffers. That way whatever we choose as the memory manager, all dma_buf aware drivers will be able to use buffers allocated by other drivers.
We probably need to accommodate multiple memory managers to take care of legacy and new drivers. If V4L2 and DRM projects all move to dma_buf, we have the possibility to do zero-copy video without forcing everybody to use the same memory manager.
I agree. Supporting DMA BUF also doesn't seem very difficult.
As I understand, TTM is good for platforms where we have a separate frame buffer memory, as is the case with most of the graphics cards. In Tegra, graphics and CPU occupy the same memory, so I'm not sure if we require the level of functionality that TTM provides. I guess the level of functionality and the complexity that it brings is one reason why TTM hasn't really caught on in the ARM world.
The synchronization primitives attached to TTM are slightly confusing. At the bottom level, it's operations which need to be synchronized between each other. That's the API level that we should to export from kernel to user space. It's then up to libdrm level (or whatever is doing the rendering in user space) to decide which operations it wants to have completed before a buffer can be reused/read/passed on to the next stage.
Anyway, if we hide the memory manager behind dma_buf, we're free to muck around with multiple of them and see what works best.
Exactly. Other subthreads echo this as well. Using CMA seems the easiest and most flexible for now but still covers everything we need. If it turns out that it isn't suited for more advanced stuff once we start supporting 3D then we can still opt for something like TTM.
As working code is the primary goal, I would say go for it. I still think TTM is the way to go, even for simple things like a DRM framebuffer driver, but as CMA and TTM won't collide in their goals it should be easy to put TTM in there after we have something going with CMA.
Lucas
Am Freitag, den 29.06.2012, 16:20 +0300 schrieb Terje Bergström:
On 28.06.2012 20:19, Lucas Stach wrote:
TTM though solves more advanced matters, like buffer synchronisation between 3D and 2D block of hardware or syncing buffer access between GPU and CPU. One of the most interesting things of TTM is the ability to purge the GPU DMA buffers to scattered sysmem or even swap them out, if they are not currently used by the GPU. It then makes sure to move them in the contig space again when the GPU really needs them and fix up the GPU command stream with the new buffer address.
We preferably should choose dma_buf as a common interface towards buffers. That way whatever we choose as the memory manager, all dma_buf aware drivers will be able to use buffers allocated by other drivers.
We probably need to accommodate multiple memory managers to take care of legacy and new drivers. If V4L2 and DRM projects all move to dma_buf, we have the possibility to do zero-copy video without forcing everybody to use the same memory manager.
As I understand, TTM is good for platforms where we have a separate frame buffer memory, as is the case with most of the graphics cards. In Tegra, graphics and CPU occupy the same memory, so I'm not sure if we require the level of functionality that TTM provides. I guess the level of functionality and the complexity that it brings is one reason why TTM hasn't really caught on in the ARM world.
I understand that TTM looks like a big complex beast at first sight, but trying to understand how it works avoids reinventing the wheel over and over again. We still have to solve problems like cache invalidation, synchronization and swap-out of dma buffers, which is a lot easier if we go with a common framework.
The synchronization primitives attached to TTM are slightly confusing. At the bottom level, it's operations which need to be synchronized between each other. That's the API level that we should to export from kernel to user space. It's then up to libdrm level (or whatever is doing the rendering in user space) to decide which operations it wants to have completed before a buffer can be reused/read/passed on to the next stage.
That's exactly the level we are providing to userspace from other drivers using TTM like radeon or nouveau.
Anyway, if we hide the memory manager behind dma_buf, we're free to muck around with multiple of them and see what works best.
dma_buf at the current level is only a way to share buffers and does not provide enough information about the buffer to be useful as an abstraction level on top of multiple memory managers. But I agree that we should try to get dma_buf integration right from the start, as the zero-copy share a very useful thing to have.
Lucas
Hi Thierry,
Am Donnerstag, den 28.06.2012, 13:12 +0200 schrieb Thierry Reding:
On Wed, Jun 27, 2012 at 05:59:55PM +0200, Lucas Stach wrote:
Am Mittwoch, den 27.06.2012, 16:44 +0200 schrieb Thierry Reding:
On Wed, Jun 27, 2012 at 05:29:14PM +0300, Hiroshi Doyu wrote:
On Wed, 27 Jun 2012 16:08:10 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
- PGP Signed by an unknown key
On Wed, Jun 27, 2012 at 03:59:07PM +0300, Hiroshi Doyu wrote:
On Wed, 27 Jun 2012 07:14:18 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
> > Old Signed by an unknown key > > On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote: > > On 06/26/2012 08:32 PM, Mark Zhang wrote: > > >> On 06/26/2012 07:46 PM, Mark Zhang wrote: > > >>>>> On Tue, 26 Jun 2012 12:55:13 +0200 > > >>>>> Thierry Reding thierry.reding@avionic-design.de wrote: > > >> ... > > >>>> I'm not sure I understand how information about the carveout would be > > >>>> obtained from the IOMMU API, though. > > >>> > > >>> I think that can be similar with current gart implementation. Define carveout as: > > >>> > > >>> carveout { > > >>> compatible = "nvidia,tegra20-carveout"; > > >>> size = <0x10000000>; > > >>> }; > > >>> > > >>> Then create a file such like "tegra-carveout.c" to get these definitions and > > >> register itself as platform device's iommu instance. > > >> > > >> The carveout isn't a HW object, so it doesn't seem appropriate to define a DT > > >> node to represent it. > > > > > > Yes. But I think it's better to export the size of carveout as a configurable item. > > > So we need to define this somewhere. How about define carveout as a property of gart? > > > > There already exists a way of preventing Linux from using certain chunks > > of memory; the /memreserve/ syntax. From a brief look at the dtc source, > > it looks like /memreserve/ entries can have labels, which implies that a > > property in the GART node could refer to the /memreserve/ entry by > > phandle in order to know what memory regions to use. > > Wasn't the whole point of using a carveout supposed to be a replacement > for the GART?
Mostly agree. IIUC, we use both carveout/gart allocated buffers in android/tegra2.
>As such I'd think the carveout should rather be a property > of the host1x device.
Rather than introducing a new property, how about using "coherent_pool=??M" in the kernel command line if necessary? I think that this carveout size depends on the system usage/load.
I was hoping that we could get away with using the CMA and perhaps initialize it based on device tree content. I agree that the carveout size depends on the use-case, but I still think it makes sense to specify it on a per-board basis.
DRM driver doesn't know if it uses CMA or not, because DRM only uses DMA API.
So how is the DRM supposed to allocate buffers? Does it call the dma_alloc_from_contiguous() function to do that? I can see how it is used by arm_dma_ops but how does it end up in the driver?
As I said before the DMA API is not a good fit for graphics drivers. Most of the DMA buffers used by graphics cores are long lived and big, so we need a special pool to alloc from to avoid eating all contiguous address space, as DMA API does not provide shrinker callbacks for clients using large amount of memory.
I recall you mentioning TTM as a better alternative several times in the past. How does it fit in with this? Does it have the capability of using a predefined chunk of contiguous memory as a pool to allocate from?
One problem that all of these solutions don't address is that not all devices below host1x are DRM related. At least for the CSI and VI blocks I expect there to be V4L2 drivers eventually, so what we really need is to manage allocations outside of the DRM. host1x is the most logical choice here.
I think you are right here. We might want to move all those buffer/memory management in the host1x code and provide contig memory to the host1x clients from there.
TTM has the ability to manage a chunk of memory for contig allocations. Also I think TTM does not depend too heavily on DRM, so we may even be able to use TTM as the general allocator for host1x clients, including VI and others. The more advanced stuff in TTM like swapping and moving buffers might be a bit of overkill for simple stuff like V4L, where you basically just want something like: "give me a contig buffer and pin it in address space so it won't ever move", but it should do no harm.
Perhaps we can put host1x code somewhere below drivers/gpu (mm subdirectory?), drivers/memory or perhaps some other or new location that could eventually host similar drivers for other SoCs.
Then again, maybe it'd be easier for now to put everything below the drivers/gpu/drm/tegra directory and cross that bridge when we get to it.
I think that "coherent_pool" can be used only when the amount of contiguous memory is short in your system. Otherwise even unnecessary.
Could you explain a bit more why you want carveout size on per-board basis?
In the ideal case I would want to not have a carveout size at all. However there may be situations where you need to make sure some driver can allocate a given amount of memory. Having to specify this using a kernel command-line parameter is cumbersome because it may require changes to the bootloader or whatever. So if you know that a particular board always needs 128 MiB of carveout, then it makes sense to specify it on a per-board basis.
If we go with CMA, this is a non-issue, as CMA allows to use the contig area for normal allocations and only purges them if it really needs the space for contig allocs.
CMA certainly sounds like the most simple approach. While it may not be suited for 3D graphics or multimedia processing later on, I think we could use it at a starting point to get basic framebuffer and X support up and running. We can always move to something more advanced like TTM later.
Thierry
On Wed, 27 Jun 2012 16:44:14 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
I think that "coherent_pool" can be used only when the amount of contiguous memory is short in your system. Otherwise even unnecessary.
Could you explain a bit more why you want carveout size on per-board basis?
In the ideal case I would want to not have a carveout size at all. However there may be situations where you need to make sure some driver can allocate a given amount of memory. Having to specify this using a kernel command-line parameter is cumbersome because it may require changes to the bootloader or whatever. So if you know that a particular board always needs 128 MiB of carveout, then it makes sense to specify it on a per-board basis.
Hm...I could understand somewhat;) but DT can also specify "bootargs" in dts file, which can support per-board-wide spec too, like the above sum of carveout needed from all drivers. I just want to avoid introducing a new parameter additionaly if we can make use of the existing mechanism.
On 06/28/2012 12:18 AM, Hiroshi Doyu wrote:
On Wed, 27 Jun 2012 16:44:14 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
I think that "coherent_pool" can be used only when the amount of contiguous memory is short in your system. Otherwise even unnecessary.
Could you explain a bit more why you want carveout size on per-board basis?
In the ideal case I would want to not have a carveout size at all. However there may be situations where you need to make sure some driver can allocate a given amount of memory. Having to specify this using a kernel command-line parameter is cumbersome because it may require changes to the bootloader or whatever. So if you know that a particular board always needs 128 MiB of carveout, then it makes sense to specify it on a per-board basis.
Hm...I could understand somewhat;) but DT can also specify "bootargs" in dts file, which can support per-board-wide spec too, like the above sum of carveout needed from all drivers. I just want to avoid introducing a new parameter additionaly if we can make use of the existing mechanism.
The bootargs in the DT file is usually provided by (over-written by) the bootloader. If we start requiring lots of random kernel command-line arguments, that makes it more effort for the user of the bootloader (e.g. distribution bootloader scripts, etc.) to create the kernel command-line. I'd prefer to avoid that as much as possible. That said, using a standardized command-line option that is (or will be) used by all (ARM?) SoCs for the same purpose is reasonable, because there's commonality there.
On Thu, Jun 28, 2012 at 10:46:58AM -0600, Stephen Warren wrote:
On 06/28/2012 12:18 AM, Hiroshi Doyu wrote:
On Wed, 27 Jun 2012 16:44:14 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
I think that "coherent_pool" can be used only when the amount of contiguous memory is short in your system. Otherwise even unnecessary.
Could you explain a bit more why you want carveout size on per-board basis?
In the ideal case I would want to not have a carveout size at all. However there may be situations where you need to make sure some driver can allocate a given amount of memory. Having to specify this using a kernel command-line parameter is cumbersome because it may require changes to the bootloader or whatever. So if you know that a particular board always needs 128 MiB of carveout, then it makes sense to specify it on a per-board basis.
Hm...I could understand somewhat;) but DT can also specify "bootargs" in dts file, which can support per-board-wide spec too, like the above sum of carveout needed from all drivers. I just want to avoid introducing a new parameter additionaly if we can make use of the existing mechanism.
The bootargs in the DT file is usually provided by (over-written by) the bootloader. If we start requiring lots of random kernel command-line arguments, that makes it more effort for the user of the bootloader (e.g. distribution bootloader scripts, etc.) to create the kernel command-line. I'd prefer to avoid that as much as possible. That said, using a standardized command-line option that is (or will be) used by all (ARM?) SoCs for the same purpose is reasonable, because there's commonality there.
I propose we stick with the default CMA settings for now and see where we end up. No need to jump the gun.
Thierry
On 06/27/2012 08:29 AM, Hiroshi Doyu wrote:
Could you explain a bit more why you want carveout size on per-board basis?
Different boards have different amounts of memory, and are sometimes targeted at different use-cases (e.g. server with simple display buffer, vs. consumer-oriented device intended to play games with OpenGL allocating lots of textures).
On Wed, 27 Jun 2012 19:56:35 +0200 Stephen Warren swarren@wwwdotorg.org wrote:
On 06/27/2012 08:29 AM, Hiroshi Doyu wrote:
Could you explain a bit more why you want carveout size on per-board basis?
Different boards have different amounts of memory, and are sometimes targeted at different use-cases (e.g. server with simple display buffer, vs. consumer-oriented device intended to play games with OpenGL allocating lots of textures).
May I ask a bit? If the above requirement has been satisfied by one of the kernel parameter in a commandline, wouldn't it be enough since DT can also specify "bootargs"? Or using bootargs in dts isn't so encouraged?
On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
On 06/26/2012 08:32 PM, Mark Zhang wrote:
On 06/26/2012 07:46 PM, Mark Zhang wrote:
On Tue, 26 Jun 2012 12:55:13 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
...
I'm not sure I understand how information about the carveout would be obtained from the IOMMU API, though.
I think that can be similar with current gart implementation. Define carveout as:
carveout { compatible = "nvidia,tegra20-carveout"; size = <0x10000000>; };
Then create a file such like "tegra-carveout.c" to get these definitions and
register itself as platform device's iommu instance.
The carveout isn't a HW object, so it doesn't seem appropriate to define a DT node to represent it.
Yes. But I think it's better to export the size of carveout as a configurable item. So we need to define this somewhere. How about define carveout as a property of gart?
There already exists a way of preventing Linux from using certain chunks of memory; the /memreserve/ syntax. From a brief look at the dtc source, it looks like /memreserve/ entries can have labels, which implies that a property in the GART node could refer to the /memreserve/ entry by phandle in order to know what memory regions to use.
That doesn't work, unfortunately. The /memreserve/ label isn't even stored in the DTB. Even DTC throws an error when you try to reference the /memreserve/ by label.
Thierry
On Wed, 27 Jun 2012 04:48:18 +0200 Stephen Warren swarren@nvidia.com wrote:
On 06/26/2012 08:32 PM, Mark Zhang wrote:
On 06/26/2012 07:46 PM, Mark Zhang wrote:
On Tue, 26 Jun 2012 12:55:13 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
...
I'm not sure I understand how information about the carveout would be obtained from the IOMMU API, though.
I think that can be similar with current gart implementation. Define carveout as:
carveout { compatible = "nvidia,tegra20-carveout"; size = <0x10000000>; };
Then create a file such like "tegra-carveout.c" to get these definitions and
register itself as platform device's iommu instance.
The carveout isn't a HW object, so it doesn't seem appropriate to define a DT node to represent it.
Yes. But I think it's better to export the size of carveout as a configurable item. So we need to define this somewhere. How about define carveout as a property of gart?
There already exists a way of preventing Linux from using certain chunks of memory; the /memreserve/ syntax. From a brief look at the dtc source, it looks like /memreserve/ entries can have labels, which implies that a property in the GART node could refer to the /memreserve/ entry by phandle in order to know what memory regions to use.
I think that we don't need the starting address for carveout but we need its size. carveout memory is just anonymous physically continguous buffer.
On Wed, 27 Jun 2012 04:32:07 +0200 Mark Zhang markz@nvidia.com wrote:
On 06/26/2012 07:46 PM, Mark Zhang wrote:
On Tue, 26 Jun 2012 12:55:13 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
...
I'm not sure I understand how information about the carveout would be obtained from the IOMMU API, though.
I think that can be similar with current gart implementation. Define carveout as:
carveout { compatible = "nvidia,tegra20-carveout"; size = <0x10000000>; };
Then create a file such like "tegra-carveout.c" to get these definitions and
register itself as platform device's iommu instance.
The carveout isn't a HW object, so it doesn't seem appropriate to define a DT node to represent it. --
Yes. But I think it's better to export the size of carveout as a configurable item. So we need to define this somewhere. How about define carveout as a property of gart?
I agree that the carveout size should be configurable. But it may not be related to gart.
On Tue, 26 Jun 2012 16:00:33 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
- PGP Signed by an unknown key
On Tue, Jun 26, 2012 at 04:02:24PM +0300, Hiroshi Doyu wrote:
Hi Thierry,
On Tue, 26 Jun 2012 12:55:13 +0200 Thierry Reding thierry.reding@avionic-design.de wrote:
Old Signed by an unknown key
Hi,
while I haven't got much time to work on the actual code right now, I think it might still be useful if we could get the device tree binding to a point where everybody is happy with it. That'll also save me some time once I get to writing the code because I won't have to redo it over again. =)
So here's the current proposal:
host1x { compatible = "nvidia,tegra20-host1x", "simple-bus"; reg = <0x50000000 0x00024000>; interrupts = <0 64 0x04 /* cop syncpt */ 0 65 0x04 /* mpcore syncpt */ 0 66 0x04 /* cop general */ 0 67 0x04>; /* mpcore general */
#address-cells = <1>; #size-cells = <1>; ranges = <0x54000000 0x54000000 0x04000000>; status = "disabled"; gart = <&gart>;
...
output and hooks up a static EDID block with the LVDS output. There is also a carveout property which might be a better replacement for the "crippled" GART on Tegra20. Alternatively the CMA might work just as well instead.
The Plutux can be described like this:
host1x { carveout = <0x0e000000 0x02000000>;
As discussed in the following ML thread previously, the necessary info related to the "gart" would be got from the standard IOMMU API(or something above layers, DMABUF or TTM?). So I don't think that we need to refer to "gart" and "carveout" here in the end.
http://lists.linuxfoundation.org/pipermail/iommu/2012-June/004266.html
Yes, if IOMMU or some layer above can provide the same information, then that is certainly better than explicitly referencing it in the DT.
I'm not sure I understand how information about the carveout would be obtained from the IOMMU API, though.
I think that there are 2 cases:
(1) discontiguous memory with IOMMU (2) contiguous memory without IOMMU(called "carveout" in general?)
For (1), the necessary info should be passed via IOMMU API as mentioned in the previous reply, like DT->{GART,SMMU}->IOMMU->DRM.
[PATCH 0/5] IOMMU: Make IOMMU-API ready for GART-like hardware https://lkml.org/lkml/2012/1/19/170
For (2), although memory is mostly anonymous one, we may need to know how much to allocate, where we only need "size". This size is not from h/w feature, but it depends on the system load/usage. So I think that this size can be passed from kernel command line? For exmaple, we can specify how much contiguous memory is necessary with putting "coherent_pool=??M" in the kernel command line as below:
coherent_pool=nn[KMG] [ARM,KNL] Sets the size of memory pool for coherent, atomic dma allocations.
On 06/27/2012 06:44 AM, Hiroshi Doyu wrote: ...
I think that there are 2 cases:
(1) discontiguous memory with IOMMU (2) contiguous memory without IOMMU(called "carveout" in general?)
...
For (2), although memory is mostly anonymous one, we may need to know how much to allocate, where we only need "size". This size is not from h/w feature, but it depends on the system load/usage. So I think that this size can be passed from kernel command line? For exmaple, we can specify how much contiguous memory is necessary with putting "coherent_pool=??M" in the kernel command line as below:
coherent_pool=nn[KMG] [ARM,KNL] Sets the size of memory pool for coherent, atomic dma allocations.
I guess if that's the standard way of initializing CMA, then that's fine.
It'd be nice if there was a way to specify that from the DT too; that way the user/distro/bootloader constructing the kernel command-line wouldn't have to remember to add "random" (potentially Tegra-/board-specific) extra arguments onto the command-line; the Tegra command-line in the upstream kernel is quite clean right now, especially compare to the enormous number of options we require downstream:-(
On Wed, 27 Jun 2012 20:02:46 +0200 Stephen Warren swarren@wwwdotorg.org wrote:
On 06/27/2012 06:44 AM, Hiroshi Doyu wrote: ...
I think that there are 2 cases:
(1) discontiguous memory with IOMMU (2) contiguous memory without IOMMU(called "carveout" in general?)
...
For (2), although memory is mostly anonymous one, we may need to know how much to allocate, where we only need "size". This size is not from h/w feature, but it depends on the system load/usage. So I think that this size can be passed from kernel command line? For exmaple, we can specify how much contiguous memory is necessary with putting "coherent_pool=??M" in the kernel command line as below:
coherent_pool=nn[KMG] [ARM,KNL] Sets the size of memory pool for coherent, atomic dma allocations.
I guess if that's the standard way of initializing CMA, then that's fine.
It'd be nice if there was a way to specify that from the DT too; that way the user/distro/bootloader constructing the kernel command-line wouldn't have to remember to add "random" (potentially Tegra-/board-specific) extra arguments onto the command-line; the Tegra command-line in the upstream kernel is quite clean right now, especially compare to the enormous number of options we require downstream:-(
DT can specify the kernel command-line as "bootargs" in a dts file, which can be modified after extracted on memory by bootloader.
On Wed, Jun 27, 2012 at 12:02:46PM -0600, Stephen Warren wrote:
On 06/27/2012 06:44 AM, Hiroshi Doyu wrote: ...
I think that there are 2 cases:
(1) discontiguous memory with IOMMU (2) contiguous memory without IOMMU(called "carveout" in general?)
...
For (2), although memory is mostly anonymous one, we may need to know how much to allocate, where we only need "size". This size is not from h/w feature, but it depends on the system load/usage. So I think that this size can be passed from kernel command line? For exmaple, we can specify how much contiguous memory is necessary with putting "coherent_pool=??M" in the kernel command line as below:
coherent_pool=nn[KMG] [ARM,KNL] Sets the size of memory pool for coherent, atomic dma allocations.
I guess if that's the standard way of initializing CMA, then that's fine.
It'd be nice if there was a way to specify that from the DT too; that way the user/distro/bootloader constructing the kernel command-line wouldn't have to remember to add "random" (potentially Tegra-/board-specific) extra arguments onto the command-line; the Tegra command-line in the upstream kernel is quite clean right now, especially compare to the enormous number of options we require downstream:-(
Looking at Documentation/kernel-parameters.txt it seems the canonical way to initialize CMA is using the "cma" kernel command-line parameter. For device tree we could extend the "chosen" node to include something like "carveout = <0x04000000>;". Or "contiguous-memory" or whatever.
Thierry
On 06/26/2012 04:55 AM, Thierry Reding wrote:
Hi,
while I haven't got much time to work on the actual code right now, I think it might still be useful if we could get the device tree binding to a point where everybody is happy with it. That'll also save me some time once I get to writing the code because I won't have to redo it over again. =)
So here's the current proposal:
host1x { compatible = "nvidia,tegra20-host1x", "simple-bus"; reg = <0x50000000 0x00024000>; interrupts = <0 64 0x04 /* cop syncpt */ 0 65 0x04 /* mpcore syncpt */ 0 66 0x04 /* cop general */ 0 67 0x04>; /* mpcore general */
#address-cells = <1>; #size-cells = <1>; ranges = <0x54000000 0x54000000 0x04000000>; status = "disabled";
The idea behind status="disabled" is that some HW only makes sense to use on particular boards. This concept really only applies to HW modules that drive external interfaces on the SoC, which in turn the board can choose whether to connect anything to (or whether to even connect to any external pins using the pinmux, or not).
As such, I don't think it makes sense to set status="disabled" on host1x, nor many other internal-only engines such as say mpe, epp, i2sp, gr2d, gr3d, dc1, dc2.
However it does make sense for the output resources rgb, hdmi, tvo, dsi.
/* outputs */ rgb { compatible = "nvidia,tegra20-rgb"; status = "disabled"; };
...
The rgb node is something that I don't quite know how to handle yet. Since it is really part of the display controller and uses its register space, it isn't quite correct to represent it as a separate device. But we will need a separate node to make it available as a connector, which will become more obvious below.
Are you referring to the DC_COM_PIN_OUTPUT* registers? Sorry, I'm not at all familiar with our display HW yet.
Some possible solutions spring to mind:
a) The output nodes don't have to be direct children of host1x. Instead, each DC could have an rgb child node that represents its own individual output capability.
b) If the RGB-related registers in DC are completely independent of any other DC registers and grouped together well enough, we can just carve a chunk out of the DC register space and give that to the RGB node instead:
i.e. not:
dc1: dc@54200000 { compatible = "nvidia,tegra20-dc"; reg = <0x54200000 0x00040000>; interrupts = <0 73 0x04>; status = "disabled"; };
but something more like (the completely made up example):
dc1: dc@54200000 { compatible = "nvidia,tegra20-dc"; reg = <0x54200000 0x00020000 0x54203000 0x10000>; interrupts = <0 73 0x04>; status = "disabled"; };
rgb { compatible = "nvidia,tegra20-rgb"; reg = <0x54220000 0x00010000>; status = "disabled"; };
c) The rgb node could simply reference the dc nodes using a phandle, and call into the dc driver to obtain RGB configuration services from it:
rgb { compatible = "nvidia,tegra20-rgb"; status = "disabled"; nvidia,dcs = <&dc1 &dc2>; };
By the way, if the RGB registers are in the DC, aren't there two separate RGB outputs. Certainly the TRM implies that both DCs can be driving LCDs, by reducing the width of the LCD signals that each DC uses (lower bits-per-pixel, or perhaps DDR encoding on the data lines).
Board DTS files could then extend this with board-specific requirements and connectors. The following describes the Medcom Wide:
connectors { #address-cells = <1>; #size-cells = <0>;
};
The connector seems to be a property of the individual output resources. I'd expect to see the connector configuration be a child of the outputs that a particular board had enabled; something more like:
host1x { rgb { status = "okay";
connector@0 { nvidia,edid = /incbin/("tegra-medcom.edid"); }; }; hdmi { status = "okay";
connector@0 { nvidia,ddc-i2c-bus = <&tegra_i2c1>; }; }; };
Perhaps even completely omit the connector node, and put the properties directly within the rgb/hdmi node itself. After all the HDMI output really is the connector as far as Tegra goes.
On Tue, Jun 26, 2012 at 12:10:42PM -0600, Stephen Warren wrote:
On 06/26/2012 04:55 AM, Thierry Reding wrote:
Hi,
while I haven't got much time to work on the actual code right now, I think it might still be useful if we could get the device tree binding to a point where everybody is happy with it. That'll also save me some time once I get to writing the code because I won't have to redo it over again. =)
So here's the current proposal:
host1x { compatible = "nvidia,tegra20-host1x", "simple-bus"; reg = <0x50000000 0x00024000>; interrupts = <0 64 0x04 /* cop syncpt */ 0 65 0x04 /* mpcore syncpt */ 0 66 0x04 /* cop general */ 0 67 0x04>; /* mpcore general */
#address-cells = <1>; #size-cells = <1>; ranges = <0x54000000 0x54000000 0x04000000>; status = "disabled";
The idea behind status="disabled" is that some HW only makes sense to use on particular boards. This concept really only applies to HW modules that drive external interfaces on the SoC, which in turn the board can choose whether to connect anything to (or whether to even connect to any external pins using the pinmux, or not).
As such, I don't think it makes sense to set status="disabled" on host1x, nor many other internal-only engines such as say mpe, epp, i2sp, gr2d, gr3d, dc1, dc2.
What about power management and resource usage? If a board for instance doesn't need gr3d at all it could just leave it at status = "disabled" to not have the corresponding driver loaded and not waste the power and resources.
However it does make sense for the output resources rgb, hdmi, tvo, dsi.
/* outputs */ rgb { compatible = "nvidia,tegra20-rgb"; status = "disabled"; };
...
The rgb node is something that I don't quite know how to handle yet. Since it is really part of the display controller and uses its register space, it isn't quite correct to represent it as a separate device. But we will need a separate node to make it available as a connector, which will become more obvious below.
Are you referring to the DC_COM_PIN_OUTPUT* registers? Sorry, I'm not at all familiar with our display HW yet.
Some possible solutions spring to mind:
a) The output nodes don't have to be direct children of host1x. Instead, each DC could have an rgb child node that represents its own individual output capability.
Yes, that idea had sprung to my mind as well. I rather like it, too.
b) If the RGB-related registers in DC are completely independent of any other DC registers and grouped together well enough, we can just carve a chunk out of the DC register space and give that to the RGB node instead:
i.e. not:
dc1: dc@54200000 { compatible = "nvidia,tegra20-dc"; reg = <0x54200000 0x00040000>; interrupts = <0 73 0x04>; status = "disabled"; };
but something more like (the completely made up example):
dc1: dc@54200000 { compatible = "nvidia,tegra20-dc"; reg = <0x54200000 0x00020000 0x54203000 0x10000>; interrupts = <0 73 0x04>; status = "disabled"; }; rgb { compatible = "nvidia,tegra20-rgb"; reg = <0x54220000 0x00010000>; status = "disabled"; };
c) The rgb node could simply reference the dc nodes using a phandle, and call into the dc driver to obtain RGB configuration services from it:
rgb { compatible = "nvidia,tegra20-rgb"; status = "disabled"; nvidia,dcs = <&dc1 &dc2>; };
By the way, if the RGB registers are in the DC, aren't there two separate RGB outputs. Certainly the TRM implies that both DCs can be driving LCDs, by reducing the width of the LCD signals that each DC uses (lower bits-per-pixel, or perhaps DDR encoding on the data lines).
Yes, there are two RGB outputs. Using alternative a) above should be able to represent this quite well.
Board DTS files could then extend this with board-specific requirements and connectors. The following describes the Medcom Wide:
connectors { #address-cells = <1>; #size-cells = <0>;
};
The connector seems to be a property of the individual output resources. I'd expect to see the connector configuration be a child of the outputs that a particular board had enabled; something more like:
host1x { rgb { status = "okay"; connector@0 { nvidia,edid = /incbin/("tegra-medcom.edid"); }; }; hdmi { status = "okay"; connector@0 { nvidia,ddc-i2c-bus = <&tegra_i2c1>; }; }; };
Perhaps even completely omit the connector node, and put the properties directly within the rgb/hdmi node itself. After all the HDMI output really is the connector as far as Tegra goes.
Heh. I seem to remember you objecting to this in a previous series[0] which is actually the reason that I moved them to the top-level in the first place. =)
Thierry
On 06/26/2012 01:51 PM, Thierry Reding wrote:
On Tue, Jun 26, 2012 at 12:10:42PM -0600, Stephen Warren wrote:
On 06/26/2012 04:55 AM, Thierry Reding wrote:
Hi,
while I haven't got much time to work on the actual code right now, I think it might still be useful if we could get the device tree binding to a point where everybody is happy with it. That'll also save me some time once I get to writing the code because I won't have to redo it over again. =)
So here's the current proposal:
host1x { compatible = "nvidia,tegra20-host1x", "simple-bus"; reg = <0x50000000 0x00024000>; interrupts = <0 64 0x04 /* cop syncpt */ 0 65 0x04 /* mpcore syncpt */ 0 66 0x04 /* cop general */ 0 67 0x04>; /* mpcore general */
#address-cells = <1>; #size-cells = <1>;
ranges = <0x54000000 0x54000000 0x04000000>;
status = "disabled";
The idea behind status="disabled" is that some HW only makes sense to use on particular boards. This concept really only applies to HW modules that drive external interfaces on the SoC, which in turn the board can choose whether to connect anything to (or whether to even connect to any external pins using the pinmux, or not).
As such, I don't think it makes sense to set status="disabled" on host1x, nor many other internal-only engines such as say mpe, epp, i2sp, gr2d, gr3d, dc1, dc2.
What about power management and resource usage? If a board for instance doesn't need gr3d at all it could just leave it at status = "disabled" to not have the corresponding driver loaded and not waste the power and resources.
The driver should be turning off all the clocks and power if the devices aren't actually used (or not turning it on in the first place). I guess there will be some slight overhead for the device/driver instantiation.
However, in all likelihood, all/most boards will enable this feature once it's in place. For very resource-constrained scenarios without display, presumably one would be building a custom kernel without the display driver enabled, so the DT content for display wouldn't ever come into play.
Board DTS files could then extend this with board-specific requirements and connectors. The following describes the Medcom Wide:
connectors { #address-cells = <1>; #size-cells = <0>;
};
The connector seems to be a property of the individual output resources. I'd expect to see the connector configuration be a child of the outputs that a particular board had enabled; something more like:
host1x { rgb { status = "okay";
connector@0 { nvidia,edid = /incbin/("tegra-medcom.edid"); }; }; hdmi { status = "okay";
connector@0 { nvidia,ddc-i2c-bus = <&tegra_i2c1>; }; }; };
Perhaps even completely omit the connector node, and put the properties directly within the rgb/hdmi node itself. After all the HDMI output really is the connector as far as Tegra goes.
Heh. I seem to remember you objecting to this in a previous series[0] which is actually the reason that I moved them to the top-level in the first place. =)
Thierry
I don't think I was objecting to exactly what I wrote above; in that email, there were already separate connector nodes, but they were placed directly inside the host1x node (at that time called graphics), so still rather disconnected from the HW they represent.
The argument for sharing e.g. an HDMI port between both video and audio still exists though. That said, I think now I'd still be inclined to put all the connector information for video into the hdmi/rgb/tvo nodes. If DT does grow to represent the user-connectors at the top-level perhaps in conjunction with drivers/extcon, perhaps the hdmi node can point at the extcon node with a phandle, or vice-versa, to set up the link between the components of the user-visible port.
Still, the decision here is possibly a little arbitrary; many schemes would work. I think at this point I don't care /too/ strongly about which is used, so the separate-connectors-at-top-level concept in your email is probably OK. I wonder if the hdmi node doesn't need a phandle pointing at the connector node though, so they can both "find" each-other?
On Tue, Jun 26, 2012 at 04:48:14PM -0600, Stephen Warren wrote:
On 06/26/2012 01:51 PM, Thierry Reding wrote:
On Tue, Jun 26, 2012 at 12:10:42PM -0600, Stephen Warren wrote:
On 06/26/2012 04:55 AM, Thierry Reding wrote:
Hi,
while I haven't got much time to work on the actual code right now, I think it might still be useful if we could get the device tree binding to a point where everybody is happy with it. That'll also save me some time once I get to writing the code because I won't have to redo it over again. =)
So here's the current proposal:
host1x { compatible = "nvidia,tegra20-host1x", "simple-bus"; reg = <0x50000000 0x00024000>; interrupts = <0 64 0x04 /* cop syncpt */ 0 65 0x04 /* mpcore syncpt */ 0 66 0x04 /* cop general */ 0 67 0x04>; /* mpcore general */
#address-cells = <1>; #size-cells = <1>;
ranges = <0x54000000 0x54000000 0x04000000>;
status = "disabled";
The idea behind status="disabled" is that some HW only makes sense to use on particular boards. This concept really only applies to HW modules that drive external interfaces on the SoC, which in turn the board can choose whether to connect anything to (or whether to even connect to any external pins using the pinmux, or not).
As such, I don't think it makes sense to set status="disabled" on host1x, nor many other internal-only engines such as say mpe, epp, i2sp, gr2d, gr3d, dc1, dc2.
What about power management and resource usage? If a board for instance doesn't need gr3d at all it could just leave it at status = "disabled" to not have the corresponding driver loaded and not waste the power and resources.
The driver should be turning off all the clocks and power if the devices aren't actually used (or not turning it on in the first place). I guess there will be some slight overhead for the device/driver instantiation.
I think it would be great if we provided true runtime power-management for the host1x children at some point. Then this should all work pretty much automatically.
However, in all likelihood, all/most boards will enable this feature once it's in place. For very resource-constrained scenarios without display, presumably one would be building a custom kernel without the display driver enabled, so the DT content for display wouldn't ever come into play.
Resource-constrained systems could always choose to override the default and set the corresponding nodes to status = "disabled". Anyway, I agree that this functionality will likely be used by most boards so I'll enable everything except the outputs by default.
Board DTS files could then extend this with board-specific requirements and connectors. The following describes the Medcom Wide:
connectors { #address-cells = <1>; #size-cells = <0>;
};
The connector seems to be a property of the individual output resources. I'd expect to see the connector configuration be a child of the outputs that a particular board had enabled; something more like:
host1x { rgb { status = "okay";
connector@0 { nvidia,edid = /incbin/("tegra-medcom.edid"); }; }; hdmi { status = "okay";
connector@0 { nvidia,ddc-i2c-bus = <&tegra_i2c1>; }; }; };
Perhaps even completely omit the connector node, and put the properties directly within the rgb/hdmi node itself. After all the HDMI output really is the connector as far as Tegra goes.
Heh. I seem to remember you objecting to this in a previous series[0] which is actually the reason that I moved them to the top-level in the first place. =)
Thierry
I don't think I was objecting to exactly what I wrote above; in that email, there were already separate connector nodes, but they were placed directly inside the host1x node (at that time called graphics), so still rather disconnected from the HW they represent.
The argument for sharing e.g. an HDMI port between both video and audio still exists though. That said, I think now I'd still be inclined to put all the connector information for video into the hdmi/rgb/tvo nodes. If DT does grow to represent the user-connectors at the top-level perhaps in conjunction with drivers/extcon, perhaps the hdmi node can point at the extcon node with a phandle, or vice-versa, to set up the link between the components of the user-visible port.
Still, the decision here is possibly a little arbitrary; many schemes would work. I think at this point I don't care /too/ strongly about which is used, so the separate-connectors-at-top-level concept in your email is probably OK. I wonder if the hdmi node doesn't need a phandle pointing at the connector node though, so they can both "find" each-other?
I actually like what you proposed above a lot, so if you don't mind either way I'll go with that proposal. Keeping the connector nodes as children of the outputs has the advantage of being able to reference them if we need it at some point. But it is also redundant in that a single output doesn't usually (never?) driver more than one connector.
The same issue will have to be addressed for the CSI and VI nodes, but as I currently use neither of those I don't feel qualified to propose a binding for them. Also for the VI part we're completely missing documentation. Maybe somebody could push this to be released as well?
If I understand correctly, most of the host1x children can also be chained in a processing pipeline to do postprocessing an video input for example. I suppose that's generic and doesn't need to be represented in DT either, right?
Thierry
On 06/26/2012 11:07 PM, Thierry Reding wrote:
On Tue, Jun 26, 2012 at 04:48:14PM -0600, Stephen Warren wrote:
...
I actually like what you proposed above a lot, so if you don't mind either way I'll go with that proposal. Keeping the connector nodes as children of the outputs has the advantage of being able to reference them if we need it at some point. But it is also redundant in that a single output doesn't usually (never?) driver more than one connector.
Yes, I believe that each output is 1:1 with (the video portion of) a connector. The display controllers obviously aren't 1:1.
The same issue will have to be addressed for the CSI and VI nodes, but as I currently use neither of those I don't feel qualified to propose a binding for them. Also for the VI part we're completely missing documentation. Maybe somebody could push this to be released as well?
I did file a bug noting the request for VI documentation. At this point in time, it's too early to say what, if anything, will come of that.
If I understand correctly, most of the host1x children can also be chained in a processing pipeline to do postprocessing an video input for example. I suppose that's generic and doesn't need to be represented in DT either, right?
Yes, I believe that's something internal to the driver.
On Wed, Jun 27, 2012 at 11:49:43AM -0600, Stephen Warren wrote:
On 06/26/2012 11:07 PM, Thierry Reding wrote:
On Tue, Jun 26, 2012 at 04:48:14PM -0600, Stephen Warren wrote:
...
I actually like what you proposed above a lot, so if you don't mind either way I'll go with that proposal. Keeping the connector nodes as children of the outputs has the advantage of being able to reference them if we need it at some point. But it is also redundant in that a single output doesn't usually (never?) driver more than one connector.
Yes, I believe that each output is 1:1 with (the video portion of) a connector. The display controllers obviously aren't 1:1.
Yes, the display controllers are only 1:1 for the RGB outputs. I'll merge your proposed changes and see if I can come up with some code to parse it.
The same issue will have to be addressed for the CSI and VI nodes, but as I currently use neither of those I don't feel qualified to propose a binding for them. Also for the VI part we're completely missing documentation. Maybe somebody could push this to be released as well?
I did file a bug noting the request for VI documentation. At this point in time, it's too early to say what, if anything, will come of that.
I think we have some raw register documentation for VI but it's next to impossible to really understand the hardware block just by looking at the registers.
If I understand correctly, most of the host1x children can also be chained in a processing pipeline to do postprocessing an video input for example. I suppose that's generic and doesn't need to be represented in DT either, right?
Yes, I believe that's something internal to the driver.
Okay. So I think apart from the carveout topic in the other subthread I think we've settled on solutions for the remaining points. I'll try to find some time to work all the changes into a new binding proposal and get working on the code.
Thierry
Here's a new proposal that should address all issues collected during the discussion of the previous one:
From tegra20.dtsi:
host1x { compatible = "nvidia,tegra20-host1x", "simple-bus"; reg = <0x50000000 0x00024000>; interrupts = <0 65 0x04 /* mpcore syncpt */ 0 67 0x04>; /* mpcore general */
#address-cells = <1>; #size-cells = <1>;
ranges = <0x54000000 0x54000000 0x04000000>;
/* video-encoding/decoding */ mpe { reg = <0x54040000 0x00040000>; interrupts = <0 68 0x04>; };
/* video input */ vi { reg = <0x54080000 0x00040000>; interrupts = <0 69 0x04>; };
/* EPP */ epp { reg = <0x540c0000 0x00040000>; interrupts = <0 70 0x04>; };
/* ISP */ isp { reg = <0x54100000 0x00040000>; interrupts = <0 71 0x04>; };
/* 2D engine */ gr2d { reg = <0x54140000 0x00040000>; interrupts = <0 72 0x04>; };
/* 3D engine */ gr3d { reg = <0x54180000 0x00040000>; };
/* display controllers */ dc@54200000 { compatible = "nvidia,tegra20-dc"; reg = <0x54200000 0x00040000>; interrupts = <0 73 0x04>;
rgb { status = "disabled"; }; };
dc@54240000 { compatible = "nvidia,tegra20-dc"; reg = <0x54240000 0x00040000>; interrupts = <0 74 0x04>;
rgb { status = "disabled"; }; };
/* outputs */ hdmi { compatible = "nvidia,tegra20-hdmi"; reg = <0x54280000 0x00040000>; interrupts = <0 75 0x04>; status = "disabled"; };
tvo { compatible = "nvidia,tegra20-tvo"; reg = <0x542c0000 0x00040000>; interrupts = <0 76 0x04>; status = "disabled"; };
dsi { compatible = "nvidia,tegra20-dsi"; reg = <0x54300000 0x00040000>; status = "disabled"; }; };
From tegra20-medcom.dts:
host1x { dc@54200000 { rgb { nvidia,edid = /incbin/("tegra20-medcom.edid"); status = "okay"; }; }; };
From tegra20-plutux.dts:
host1x { hdmi { vdd-supply = <&ldo7_reg>; pll-supply = <&ldo8_reg>;
nvidia,hpd-gpio = <&gpio 111 0>; /* PN7 */ nvidia,ddc-i2c-bus = <&ddc>;
status = "okay"; }; };
One problem I've come across when trying to get some rudimentary code working with this is that there's no longer a device which the DRM driver can bind to, because the top-level device (host1x) now has a separate driver.
I've briefly discussed this with Dave Airlie on IRC and there are basically two solutions:
a) Use a fake platform device, instantiated from the device tree or at module initialization time, so that the drm_platform_init() function can be used.
b) Implement a custom drm_bus and reimplement most of the functionality provided by drm_platform_init().
Alternative a) is what Sascha has been doing for i.MX and is the most straightforward. One variant of this would be to add another virtual device node below host1x that groups gr2d, gr3d, both dcs and the three output nodes. The down-side of this being that it doesn't describe the hardware of course, and therefore really shouldn't go into the device tree. Instantiating at module initalization time has the disadvantage that the parent-child relationship is lost.
Using b) isn't optimal either because DRM requires at least a struct device which can be associated with the drm_device. So the question is which struct device that should be?
Thierry
From: linux-tegra-owner@vger.kernel.org [mailto:linux-tegra- owner@vger.kernel.org] On Behalf Of Thierry Reding Sent: Thursday, July 05, 2012 8:15 PM To: linux-tegra@vger.kernel.org Cc: devicetree-discuss@lists.ozlabs.org; dri-devel@lists.freedesktop.org Subject: Re: Tegra DRM device tree bindings
- PGP Signed by an unknown key
Here's a new proposal that should address all issues collected during the discussion of the previous one:
From tegra20.dtsi:
host1x { compatible = "nvidia,tegra20-host1x", "simple-bus"; reg = <0x50000000 0x00024000>; interrupts = <0 65 0x04 /* mpcore syncpt */ 0 67 0x04>; /* mpcore general */
#address-cells = <1>; #size-cells = <1>; ranges = <0x54000000 0x54000000 0x04000000>; /* video-encoding/decoding */ mpe { reg = <0x54040000 0x00040000>; interrupts = <0 68 0x04>; }; /* video input */ vi { reg = <0x54080000 0x00040000>; interrupts = <0 69 0x04>; }; /* EPP */ epp { reg = <0x540c0000 0x00040000>; interrupts = <0 70 0x04>; }; /* ISP */ isp { reg = <0x54100000 0x00040000>; interrupts = <0 71 0x04>; }; /* 2D engine */ gr2d { reg = <0x54140000 0x00040000>; interrupts = <0 72 0x04>; }; /* 3D engine */ gr3d { reg = <0x54180000 0x00040000>; }; /* display controllers */ dc@54200000 { compatible = "nvidia,tegra20-dc"; reg = <0x54200000 0x00040000>; interrupts = <0 73 0x04>; rgb { status = "disabled"; }; }; dc@54240000 { compatible = "nvidia,tegra20-dc"; reg = <0x54240000 0x00040000>; interrupts = <0 74 0x04>; rgb { status = "disabled"; }; }; /* outputs */ hdmi { compatible = "nvidia,tegra20-hdmi"; reg = <0x54280000 0x00040000>; interrupts = <0 75 0x04>; status = "disabled"; }; tvo { compatible = "nvidia,tegra20-tvo"; reg = <0x542c0000 0x00040000>; interrupts = <0 76 0x04>; status = "disabled"; }; dsi { compatible = "nvidia,tegra20-dsi"; reg = <0x54300000 0x00040000>; status = "disabled"; };
};
From tegra20-medcom.dts:
host1x { dc@54200000 { rgb { nvidia,edid = /incbin/("tegra20-medcom.edid"); status = "okay"; }; }; };
From tegra20-plutux.dts:
host1x { hdmi { vdd-supply = <&ldo7_reg>; pll-supply = <&ldo8_reg>;
nvidia,hpd-gpio = <&gpio 111 0>; /* PN7 */ nvidia,ddc-i2c-bus = <&ddc>; status = "okay"; };
};
One problem I've come across when trying to get some rudimentary code working with this is that there's no longer a device which the DRM driver can bind to, because the top-level device (host1x) now has a separate driver.
I've briefly discussed this with Dave Airlie on IRC and there are basically two solutions:
a) Use a fake platform device, instantiated from the device tree or at module initialization time, so that the drm_platform_init() function can be used.
If we change the dts to this version, that means we should register all drivers(dc/rgb/dsi...) during host1x driver initialization. In that case, I think we can register drm driver as well. It's not a good idea to add drm into dts cause just as you said, it's not a real device at all.
b) Implement a custom drm_bus and reimplement most of the functionality provided by drm_platform_init().
Alternative a) is what Sascha has been doing for i.MX and is the most straightforward. One variant of this would be to add another virtual device node below host1x that groups gr2d, gr3d, both dcs and the three output nodes. The down-side of this being that it doesn't describe the hardware of course, and therefore really shouldn't go into the device tree. Instantiating at module initalization time has the disadvantage that the parent-child relationship is lost.
Using b) isn't optimal either because DRM requires at least a struct device which can be associated with the drm_device. So the question is which struct device that should be?
Thierry
- Unknown Key
- 0x7F3EB3A1
On 07/05/2012 06:15 AM, Thierry Reding wrote:
Here's a new proposal that should address all issues collected during the discussion of the previous one:
From tegra20.dtsi:
...
At a quick glance, that all seems pretty reasonable now.
One problem I've come across when trying to get some rudimentary code working with this is that there's no longer a device which the DRM driver can bind to, because the top-level device (host1x) now has a separate driver.
Can't you just have the host1x driver trigger the instantiation of the DRM driver? In other words, the host1x node is both the plain host1x driver and the DRM driver. So, after initializing anything for host1x itself, just end host1x's probe() with something a call to drm_platform_init(), passing in host1x's own pdev.
After all, it seems like the whole point of representing host1x in the DT is to expose the graphics HW, so you'd need the DRM device in that case.
On Fri, Jul 06, 2012 at 01:59:13PM -0600, Stephen Warren wrote:
On 07/05/2012 06:15 AM, Thierry Reding wrote:
Here's a new proposal that should address all issues collected during the discussion of the previous one:
From tegra20.dtsi:
...
At a quick glance, that all seems pretty reasonable now.
One problem I've come across when trying to get some rudimentary code working with this is that there's no longer a device which the DRM driver can bind to, because the top-level device (host1x) now has a separate driver.
Can't you just have the host1x driver trigger the instantiation of the DRM driver? In other words, the host1x node is both the plain host1x driver and the DRM driver. So, after initializing anything for host1x itself, just end host1x's probe() with something a call to drm_platform_init(), passing in host1x's own pdev.
After all, it seems like the whole point of representing host1x in the DT is to expose the graphics HW, so you'd need the DRM device in that case.
The reason that I've been hesitating is that host1x isn't related only to graphics HW but is also required to make f.e. video input work. So I can imagine a setup where for instance you need only video input and send captured data via the network. If host1x registered the DRM driver it would imply that setups that don't require any output would still have the whole DRM initialized.
I did post a proposal to solve this problem, which doesn't seem Tegra specific, to dri-devel last week. But if nobody else thinks it's a problem to use host1x then I'll just use that as the easy way out. If this happens to cause problems at some point in the future we can always address it then. If host1x is tightly coupled to DRM then I think we can also keep the driver with the rest of the DRM code.
Thierry
dri-devel@lists.freedesktop.org