Adding dri-devel and a few others because an i915 patch contributed to the regression.
On Mon, Jul 02, 2012 at 03:32:15PM +0100, Mel Gorman wrote:
On Mon, Jul 02, 2012 at 02:32:26AM -0400, Christoph Hellwig wrote:
It increases the CPU overhead (dirty_inode can be called up to 4 times per write(2) call, IIRC), so with limited numbers of threads/limited CPU power it will result in lower performance. Where you have lots of CPU power, there will be little difference in performance...
When I checked it it could only be called twice, and we'd already optimize away the second call. I'd defintively like to track down where the performance changes happend, at least to a major version but even better to a -rc or git commit.
By all means feel free to run the test yourself and run the bisection :)
It's rare but on this occasion the test machine is idle so I started an automated git bisection. As you know the milage with an automated bisect varies so it may or may not find the right commit. Test machine is sandy so http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-metadata-... is the report of interest. The script is doing a full search between v3.3 and v3.4 for a point where average files/sec for fsmark-single drops below 25000. I did not limit the search to fs/xfs on the off-chance that it is an apparently unrelated patch that caused the problem.
It was obvious very quickly that there were two distinct regression so I ran two bisections. One led to a XFS and the other led to an i915 patch that enables RC6 to reduce power usage.
[c999a223: xfs: introduce an allocation workqueue] [aa464191: drm/i915: enable plain RC6 on Sandy Bridge by default]
gdm was running on the machine so i915 would have been in use. In case it is of interest this is the log of the bisection. Lines beginning with # are notes I made and all other lines are from the bisection script. The second-last column is the files/sec recorded by fsmark.
# MARK v3.3..v3.4 Search for BAD files/sec -lt 28000 # BAD 16536 # GOOD 34757 Mon Jul 2 15:46:13 IST 2012 sandy xfsbisect 141124c02059eee9dbc5c86ea797b1ca888e77f7 37454 good Mon Jul 2 15:56:06 IST 2012 sandy xfsbisect 55a320308902f7a0746569ee57eeb3f254e6ed16 25192 bad Mon Jul 2 16:08:34 IST 2012 sandy xfsbisect 281b05392fc2cb26209b4d85abaf4889ab1991f3 38807 good Mon Jul 2 16:18:02 IST 2012 sandy xfsbisect a8364d5555b2030d093cde0f07951628e55454e1 37553 good Mon Jul 2 16:27:22 IST 2012 sandy xfsbisect d2a2fc18d98d8ee2dec1542efc7f47beec256144 36676 good Mon Jul 2 16:36:48 IST 2012 sandy xfsbisect 2e7580b0e75d771d93e24e681031a165b1d31071 37756 good Mon Jul 2 16:46:36 IST 2012 sandy xfsbisect 532bfc851a7475fb6a36c1e953aa395798a7cca7 25416 bad Mon Jul 2 16:56:10 IST 2012 sandy xfsbisect 0c9aac08261512d70d7d4817bd222abca8b6bdd6 38486 good Mon Jul 2 17:05:40 IST 2012 sandy xfsbisect 0fc9d1040313047edf6a39fd4d7c7defdca97c62 37970 good Mon Jul 2 17:16:01 IST 2012 sandy xfsbisect 5a5881cdeec2c019b5c9a307800218ee029f7f61 24493 bad Mon Jul 2 17:21:15 IST 2012 sandy xfsbisect f616137519feb17b849894fcbe634a021d3fa7db 24405 bad Mon Jul 2 17:26:16 IST 2012 sandy xfsbisect 5575acc7807595687288b3bbac15103f2a5462e1 37336 good Mon Jul 2 17:31:25 IST 2012 sandy xfsbisect c999a223c2f0d31c64ef7379814cea1378b2b800 24552 bad Mon Jul 2 17:36:34 IST 2012 sandy xfsbisect 1a1d772433d42aaff7315b3468fef5951604f5c6 36872 good # c999a223c2f0d31c64ef7379814cea1378b2b800 is the first bad commit # [c999a223: xfs: introduce an allocation workqueue] # # MARK c999a223c2f0d31c64ef7379814cea1378b2b800..v3.4 Search for BAD files/sec -lt 20000 # BAD 16536 # GOOD 24552 Mon Jul 2 17:48:39 IST 2012 sandy xfsbisect b2094ef840697bc8ca5d17a83b7e30fad5f1e9fa 37435 good Mon Jul 2 17:58:12 IST 2012 sandy xfsbisect d2a2fc18d98d8ee2dec1542efc7f47beec256144 38303 good Mon Jul 2 18:08:18 IST 2012 sandy xfsbisect 5d32c88f0b94061b3af2e3ade92422407282eb12 16718 bad Mon Jul 2 18:18:02 IST 2012 sandy xfsbisect 2f7fa1be66dce77608330c5eb918d6360b5525f2 24964 good Mon Jul 2 18:24:14 IST 2012 sandy xfsbisect 923f79743c76583ed4684e2c80c8da51a7268af3 24963 good Mon Jul 2 18:33:49 IST 2012 sandy xfsbisect b61c37f57988567c84359645f8202a7c84bc798a 24824 good Mon Jul 2 18:40:20 IST 2012 sandy xfsbisect 20a2a811602b16c42ce88bada3d52712cdfb988b 17155 bad Mon Jul 2 18:50:12 IST 2012 sandy xfsbisect 78fb72f7936c01d5b426c03a691eca082b03f2b9 38494 good Mon Jul 2 19:00:24 IST 2012 sandy xfsbisect e1a7eb08ee097e97e928062a242b0de5b2599a11 25033 good Mon Jul 2 19:10:24 IST 2012 sandy xfsbisect 97effadb65ed08809e1720c8d3ee80b73a93665c 16520 bad Mon Jul 2 19:16:16 IST 2012 sandy xfsbisect 25e341cfc33d94435472983825163e97fe370a6c 16748 bad Mon Jul 2 19:21:52 IST 2012 sandy xfsbisect 7dd4906586274f3945f2aeaaa5a33b451c3b4bba 24957 good Mon Jul 2 19:27:35 IST 2012 sandy xfsbisect aa46419186992e6b8b8010319f0ca7f40a0d13f5 17088 bad Mon Jul 2 19:32:54 IST 2012 sandy xfsbisect 83b7f9ac9126f0532ca34c14e4f0582c565c6b0d 25667 good # aa46419186992e6b8b8010319f0ca7f40a0d13f5 is the first bad commit # [aa464191: drm/i915: enable plain RC6 on Sandy Bridge by default]
I tested plain reverts of the patches individually and together and got the following results
FS-Mark Single Threaded 3.4.0 3.4.0 3.4.0 3.4.0-vanilla revert-aa464191 revert-c999a223 revert-both Files/s min 14176.40 ( 0.00%) 17830.60 (25.78%) 24186.70 (70.61%) 25108.00 (77.11%) Files/s mean 16783.35 ( 0.00%) 25029.69 (49.13%) 37513.72 (123.52%) 38169.97 (127.43%) Files/s stddev 1007.26 ( 0.00%) 2644.87 (162.58%) 5344.99 (430.65%) 5599.65 (455.93%) Files/s max 18475.40 ( 0.00%) 27966.10 (51.37%) 45564.60 (146.62%) 47918.10 (159.36%) Overhead min 593978.00 ( 0.00%) 386173.00 (34.99%) 253812.00 (57.27%) 247396.00 (58.35%) Overhead mean 637782.80 ( 0.00%) 429229.33 (32.70%) 322868.20 (49.38%) 287141.73 (54.98%) Overhead stddev 72440.72 ( 0.00%) 100056.96 (-38.12%) 175001.08 (-141.58%) 102018.14 (-40.83%) Overhead max 855637.00 ( 0.00%) 753541.00 (11.93%) 880531.00 (-2.91%) 637932.00 (25.44%) MMTests Statistics: duration Sys Time Running Test (seconds) 44.06 32.25 24.19 23.99 User+Sys Time Running Test (seconds) 50.19 36.35 27.24 26.7 Total Elapsed Time (seconds) 59.21 44.76 34.95 34.14
Individually reverting either patch makes a difference to both files/sec and overhead. Reverting both is not as dramatic as reverting each individual patch would indicate but it's still a major improvement.