CCCP Project Forums

Please login or register.

Login with username, password and session length
Advanced search  
Pages: [1] 2 3 ... 16

Author Topic: my vsfilter mod  (Read 79866 times)

x_xy_y2

  • CCCP Lieutenant
  • **
  • Posts: 21
my vsfilter mod
« on: September 15, 2011, 03:27:59 PM »

We have implemented a scale funtion that allows subtitles being rendered at arbitrary resolutions. The build can be found here xy-VSFilter_Scale_Test_Build_1.7z.
--------------------------------
Downloads Page: Downloads (Older versions can be found here too.)
ReleaseNotes: ReleaseNotes
--------------------------------
Test version for the next stable: 3.0.0.144
Update:
1. A faster floating-point \blur. Using floating-point intermedia for \blur and since floating-point allows us to do SSE optimization which is hard in the old fixed-point version, this floating-point version is faster than the old fixed-point one.
2. Some bugfixes. details

Latest stable version:3.0.0.65
1. Some bugfixes. details

--------------------------------
Hi,everyone.
I've been working to improve vsfilter's performance since last year. The background is quite similar to gommorah's (http://www.cccp-project.net/forums/index.php?topic=5776.0), but I had no plan to share my mod to anyone else at the beginning (so that I didn't have to worry about issues such as compatibility or rare bugs). Recently, the 10-bit playback issue brought me to this forum. And unexpectedly, I found that gommorah was making a vsfilter mod too here. It's great to see that someone else fousing on the same thing as me. Here's my mod, based on vsfilter 2.39. Hope it helps.

binary http://code.google.com/p/xy-vsfilter/downloads/detail?name=xy_vsfilter.7z&can=2&q=
source http://code.google.com/p/xy-vsfilter/downloads/detail?name=xy_vsfilter_source_20110916.7z&can=2&q=

What I've done in this mod:
1.Disable prebuffering. It's to difficult to test.
2.Improve performance.
    1).Alpha blending on dirty area only. Instead of using a rectangle to represent dirty area on a subpic as Vsfilter, I using a list of rectangles. Hence in cases like two subtittle lines one on the top and one on the buttom, alpha blending of the subpic and video frame can be done on a relatively small area.
    2).No full subpic rgb to yuv convertion. Subpics are drawed in yuv/rgb directly, unlike vsfilter which always draw subpics using rgb and do yuv convertion afterwards (if necessary), so the overhead of rgb to yuv convertion is avoided.
    3).Cache of intermedia data. Crucial to speed up rendering of animated effect. But the downside is memory load increased.
    4).More efficient blur code(stealed from libass).
On my system (win7+i5-2410+2G ram), simple test shows that it outperforms both threaded-vsfilter and mpc-hc built-in sub renderer.
3.Speed up when loading big ass file (with tens of thousands of line).
4.BT709 support.
--------------------------------
« Last Edit: September 11, 2012, 08:54:54 PM by x_xy_y2 »
Logged

cyberbeing

  • DirectShow Mage
  • *****
  • Posts: 340
Re: my vsfilter mod
« Reply #1 on: September 15, 2011, 04:28:14 PM »

A quick test on an old AMD X2 looks positive, so nice work x_xy_y2.

There is a rather significant bug in that build though. Anywhere that the \c tag is used in a script to specify a color, the element turns black instead. Colors specified in styles are fine though.

Edit: Fixed. See Below.
« Last Edit: September 16, 2011, 06:35:00 AM by cyberbeing »
Logged

gommorah

  • DirectShow Mage
  • *****
  • Posts: 137
Re: my vsfilter mod
« Reply #2 on: September 15, 2011, 06:34:15 PM »

Looks interesting, I'm glad someone else is working on VSFilter as well :). I haven't had a chance to look at the source (and probably won't for a while), but if there's anything nice in there, I may try to integrate it into threaded-vsfilter. If you keep working on it, keep us updated, especially with the source.
Logged

cyberbeing

  • DirectShow Mage
  • *****
  • Posts: 340
Re: my vsfilter mod
« Reply #3 on: September 15, 2011, 09:15:48 PM »

I haven't had a chance to look at the source (and probably won't for a while), but if there's anything nice in there, I may try to integrate it into threaded-vsfilter.
He makes use of the Boost library and his builds work on WinXP, so those are two interesting things. ;)

It appears that just like other VSFilter builds, enabling SSE2 and PGO results a measurable speed-up as well: xy-vsfilter_sse2_pgo_20110916 (test-build)
x_xy_y2, I hope you have luck fixing the \c color tag turning things black, since that breaks a large number of scripts in circulation.
Logged

x_xy_y2

  • CCCP Lieutenant
  • **
  • Posts: 21
Re: my vsfilter mod
« Reply #4 on: September 15, 2011, 09:46:12 PM »

I haven't had a chance to look at the source (and probably won't for a while), but if there's anything nice in there, I may try to integrate it into threaded-vsfilter.
He makes use of the Boost library and his builds work on WinXP, so those are two interesting things. ;)

It appears that just like other VSFilter builds, enabling SSE2 and PGO results a measurable speed-up as well: xy-vsfilter_sse2_pgo_20110916 (test-build)
x_xy_y2, I hope you have luck fixing the \c color tag turning things black, since that breaks a large number of scripts in circulation.
I believe replacing src\subtitles\RTS.cpp with attachment below can fix the bug. But I have neither time nor environment necessary for a rebuild right now.
PS: Your post remind me that I forgot to pack Boost library with the source. I use boost::flyweights in several places.
Logged

cyberbeing

  • DirectShow Mage
  • *****
  • Posts: 340
Re: my vsfilter mod
« Reply #5 on: September 15, 2011, 10:21:59 PM »

That fixed it. :))

Here is a build containing your RTS.cpp fix with \c tag working:
xy-vsfilter_sse2_openMP_20110916 (fixed) OLD

No PGO this time, and I won't have time to make another PGO build for a few days, since I'm going out of town. That build is SSE2 only, but OpenMP still disabled (since I forgot which things had it enabled...). Did you have OpenMP enabled for any particular reason in your project files? I needed to disable OpenMP to build with PGO, so hopefully there is no problem.

Edit: Replaced link with a build without OpenMP disabled.

Edit2: Grab recent builds from http://code.google.com/p/xy-vsfilter/downloads/list
« Last Edit: September 22, 2011, 09:33:01 PM by cyberbeing »
Logged

x_xy_y2

  • CCCP Lieutenant
  • **
  • Posts: 21
Re: my vsfilter mod
« Reply #6 on: September 15, 2011, 10:43:18 PM »

That fixed it. :))

Here is a build containing your RTS.cpp fix with \c tag working:
xy-vsfilter_sse2_20110916 (fixed)

No PGO this time, and I won't have time to make another PGO build for a few days, since I'm going out of town. That build is SSE2 only, but OpenMP still disabled (since I forgot which things had it enabled...). Did you have OpenMP enabled for any particular reason in your project files? I needed to disable OpenMP to build with PGO, so hopefully there is no problem.

I use openmp only in the box blur code. It's ok to turn off openmp. In fact I've tried add openmp support in other places (Rasterizer::Draw), the result is it do increase cpu usage but total performance seems falling back. So I don't believe the openmp option will significantly affect the performance.
Logged

gommorah

  • DirectShow Mage
  • *****
  • Posts: 137
Re: my vsfilter mod
« Reply #7 on: September 16, 2011, 01:38:03 PM »

Curiosity got the better of me so instead of doing school work like I should be doing, I ended up looking through some of your code. I was wondering if you saw any measurable memory savings using the flyweight pattern?
Logged

x_xy_y2

  • CCCP Lieutenant
  • **
  • Posts: 21
Re: my vsfilter mod
« Reply #8 on: September 16, 2011, 03:22:45 PM »

Curiosity got the better of me so instead of doing school work like I should be doing, I ended up looking through some of your code. I was wondering if you saw any measurable memory savings using the flyweight pattern?

No, I didn't see any measurable memory savings after converting some objects, CString STSStyle CMyFont etc. into flyweight pattern. Given the heavy memory load caused by the other parts, cache or prebuffered subpics, those objects have little influence on the total memory usage. So I never expect obvious reduction of memory consumption via flyweight pattern. In fact, I use it mainly concerning about speed. It provides:
1 A simple way to cache objects that are frequently used and inefficient to construct, e.g. the gaussian table I used and CMyfont object whose constructer looks super inefficient, preventing them from repeatly constructing.
2 Extremely fast equality comparison between two flyweight object, with which I hope comparison of some complex object, e.g. STSStyle, can  be speeded up.
« Last Edit: September 16, 2011, 03:34:32 PM by x_xy_y2 »
Logged

gommorah

  • DirectShow Mage
  • *****
  • Posts: 137
Re: my vsfilter mod
« Reply #9 on: September 16, 2011, 06:04:37 PM »

No, I didn't see any measurable memory savings after converting some objects, CString STSStyle CMyFont etc. into flyweight pattern. Given the heavy memory load caused by the other parts, cache or prebuffered subpics, those objects have little influence on the total memory usage. So I never expect obvious reduction of memory consumption via flyweight pattern. In fact, I use it mainly concerning about speed. It provides:
1 A simple way to cache objects that are frequently used and inefficient to construct, e.g. the gaussian table I used and CMyfont object whose constructer looks super inefficient, preventing them from repeatly constructing.
2 Extremely fast equality comparison between two flyweight object, with which I hope comparison of some complex object, e.g. STSStyle, can  be speeded up.
Ah ok, seems reasonable.

It's a good thing you came along when you did xy. I'd been trying to figure out a way to reduce the penalty of CreateWidenedRegion calls for a long time now (since Ekyu brought up Samurai Girls), but for some reason, it never occurred to me that it would be possible to cache the results. Consequently, I spent all today implementing a similar caching system to the one you had, but restricted to the outlines since the conditions for being able to use the cached results are less strict than with caching the overlay buffer itself (at least, that's my gut feeling). At the very least, threaded-vsfilter no longer loses out to your build in Samurai Girls :P.
Logged

x_xy_y2

  • CCCP Lieutenant
  • **
  • Posts: 21
Re: my vsfilter mod
« Reply #10 on: September 16, 2011, 09:35:43 PM »

No, I didn't see any measurable memory savings after converting some objects, CString STSStyle CMyFont etc. into flyweight pattern. Given the heavy memory load caused by the other parts, cache or prebuffered subpics, those objects have little influence on the total memory usage. So I never expect obvious reduction of memory consumption via flyweight pattern. In fact, I use it mainly concerning about speed. It provides:
1 A simple way to cache objects that are frequently used and inefficient to construct, e.g. the gaussian table I used and CMyfont object whose constructer looks super inefficient, preventing them from repeatly constructing.
2 Extremely fast equality comparison between two flyweight object, with which I hope comparison of some complex object, e.g. STSStyle, can  be speeded up.
Ah ok, seems reasonable.

It's a good thing you came along when you did xy. I'd been trying to figure out a way to reduce the penalty of CreateWidenedRegion calls for a long time now (since Ekyu brought up Samurai Girls), but for some reason, it never occurred to me that it would be possible to cache the results. Consequently, I spent all today implementing a similar caching system to the one you had, but restricted to the outlines since the conditions for being able to use the cached results are less strict than with caching the overlay buffer itself (at least, that's my gut feeling). At the very least, threaded-vsfilter no longer loses out to your build in Samurai Girls :P.

Looks like I must go steal some code to win this competition  :].
For the cache thing, I've been thinking of implementing a LRU like cache mechanism, get memory usage under control, and with the help of boost::multi_index, maybe it won't be a difficult job.
Logged

gommorah

  • DirectShow Mage
  • *****
  • Posts: 137
Re: my vsfilter mod
« Reply #11 on: September 17, 2011, 06:51:52 AM »

Yea, the fact that the cache is global makes me a little a nervous. I was originally thinking it could be implemented at the CLine level, but it seems CLines get recreated pretty frequently. I'll need to actually watch the memory usage as I watch episodes of anime and stuff. One of the benefits of caching just the outline rather than the overlay is that eventually, the cache will start behaving like a flyweights factory since subtitles usually have most of their text styled uniformly. But, I bet there's some edge case out there that will make the memory usage explode. The LRU might not be a bad idea, but that will probably be harder for me to implement since I use a hashmap, and I need to deal with the new positioning bugs I seem to have added with the caching :(.

Also, there's no need for us to compete. If I could contact you via IM and get a feel for the kind of developer you are, I wouldn't mind having you work on threaded-vsfilter as well, assuming you're interested. I don't think it makes makes much sense to have competing performance oriented VSFilter forks, and from looking through your code, you don't particularly strike me as being a bad programmer. I guess the only thing is that I don't support Windows XP (people need to move on...), which if that's something you care about, then working on threaded-vsfilter might not be something you're interested in. I'd definitely welcome another active developer since school will be taking priority over this for the forseeable future.

Edit: Nevermind about the outline cache behaving like a flyweights factory. It seems that CLine generates a single large CWord that contains the entire text of one contiguous line (no line break). This means in the general case, the cache is unlikely to be used very often, but in those edge cases where it is used, the speed up is significant. Seems like the an LRU scheme will be necessary after all to keep the memory in check.
« Last Edit: September 17, 2011, 08:46:38 AM by gommorah »
Logged

x_xy_y2

  • CCCP Lieutenant
  • **
  • Posts: 21
Re: my vsfilter mod
« Reply #12 on: September 19, 2011, 04:34:31 AM »

Yea, the fact that the cache is global makes me a little a nervous. I was originally thinking it could be implemented at the CLine level, but it seems CLines get recreated pretty frequently. I'll need to actually watch the memory usage as I watch episodes of anime and stuff. One of the benefits of caching just the outline rather than the overlay is that eventually, the cache will start behaving like a flyweights factory since subtitles usually have most of their text styled uniformly. But, I bet there's some edge case out there that will make the memory usage explode. The LRU might not be a bad idea, but that will probably be harder for me to implement since I use a hashmap, and I need to deal with the new positioning bugs I seem to have added with the caching :(.

Also, there's no need for us to compete. If I could contact you via IM and get a feel for the kind of developer you are, I wouldn't mind having you work on threaded-vsfilter as well, assuming you're interested. I don't think it makes makes much sense to have competing performance oriented VSFilter forks, and from looking through your code, you don't particularly strike me as being a bad programmer. I guess the only thing is that I don't support Windows XP (people need to move on...), which if that's something you care about, then working on threaded-vsfilter might not be something you're interested in. I'd definitely welcome another active developer since school will be taking priority over this for the forseeable future.

Edit: Nevermind about the outline cache behaving like a flyweights factory. It seems that CLine generates a single large CWord that contains the entire text of one contiguous line (no line break). This means in the general case, the cache is unlikely to be used very often, but in those edge cases where it is used, the speed up is significant. Seems like the an LRU scheme will be necessary after all to keep the memory in check.

1. IM: msn:yuzhuohuang@hotmail.com. We all use qq, one im tool, in our country, and I never use other im tools before. Moreover, I am non-native English speaker and particularly slow (the words particularly slow here do mean particularly slow) when writing.(Hope I did not make too many mistakes yet in my posts here.)
2. I don't really care about xp support, although better have than none. (What I care most is my own usage! :evil: ) It would be great if I can offer any help on your project. But I don't have much free time and rather than stick to vsfilter and making performance patches for it, I prefer switching to libass, the framework of which is much better designed and much more efficient. I've been thinking of making a libass-filter for days but not yet start doing anything on it with the fear that there won't be enough time for me to finish it.
Logged

gommorah

  • DirectShow Mage
  • *****
  • Posts: 137
Re: my vsfilter mod
« Reply #13 on: September 19, 2011, 06:02:21 AM »

Ha, I forgot Tencent and Baidu basically own everything Internet related in China. Your fluency is very good for a non-native English speaker.

I'd rather see you use your limited free time to investigate writing a DirectShow wrapper for libass as well. It's not like there's any rush to get a DShow libass filter out any time soon; we've been stuck with VSFilter for this long, what's another few months or more.
Logged

taulin

  • CCCP Sr. Lieutenant
  • ***
  • Posts: 30
Re: my vsfilter mod
« Reply #14 on: September 23, 2011, 02:17:46 PM »

small issue
http://i.imgur.com/TaTjJ.jpg

the line as it appears in the .ass file is:
Dialogue: 0,0:16:04.74,0:16:08.58,Default,,0000,0000,0000,,U–Um, that's what we thought,

it doesnt seem to occur all the time, ive seen it happen a couple times already but i am not sure what the trigger is

it also doesnt seem to handle negative or out of frame positions for a \move very well/at all
Logged
Pages: [1] 2 3 ... 16
 

Page created in 0.137 seconds with 21 queries.