tag:blogger.com,1999:blog-7903896489979726802018-12-05T06:06:58.105-05:00The Code DepositIt seemed like a good idea at the timeOneEightHundredhttp://www.blogger.com/profile/15917532861521845279noreply@blogger.comBlogger11125tag:blogger.com,1999:blog-790389648997972680.post-39164195407784716102018-03-30T04:26:00.002-04:002018-03-30T04:26:32.683-04:00BC7 endpoint search using endpoint extrapolation<a href="https://github.com/elasota/cvtt">Convection Texture Tools</a> is now roughly equal quality-wise with NVTT at compressing BC7 textures despite being about 140 times faster, making it one of the fastest and highest-quality BC7 compressors.<br /><br />How this was accomplished turned out to be simpler than expected. Recall that Squish became the gold standard of S3TC compressors by implementing a "cluster fit" algorithm that <a href="http://sjbrown.co.uk/2006/01/19/dxt-compression-techniques/">ordered all of the input colors on a line and tried every possible grouping of them to least-squares fit them</a>.<br /><br />Unfortunately, using this technique isn't practical in BC7 because the number of orderings has rather extreme scaling characteristics. While 2-bit indices have a few hundred possible orderings, 4-bit indices have millions, most BC7 mode indices are 3 bits, and some have 4.<br /><br />With that option gone, most BC7 compressors until now have tried to solve endpoints using various types of endpoint perturbation, which tends to require a lot of iterations.<br /><br />Convection just uses 2 rounds of K-means clustering and a much simpler technique based on a guess about why Squish's cluster fit algorithm is actually useful: It can create endpoint mappings that don't use some of the terminal ends of the endpoint line, causing the endpoint to be extrapolated out, possibly to a point that loses less accuracy to quantization.<br /><br />Convection just tries cutting off 1 index at each end, then 1 index at both ends. That turned out to be enough to place it near the top of the quality benchmarks.<br /><br />Now I just need to add color weighting and alpha weighting and it'll be time to move on to other formats.OneEightHundredhttp://www.blogger.com/profile/15917532861521845279noreply@blogger.com1tag:blogger.com,1999:blog-790389648997972680.post-74479194604843572152011-12-02T12:22:00.000-05:002011-12-07T18:39:10.216-05:00Spherical harmonics self-shadowing<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-scRIghLBJN4/TtkMif23TKI/AAAAAAAAAEE/EG6jOa4CgD8/s1600/sh-selfshadow.jpg"><img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 200px; height: 112px;" src="http://4.bp.blogspot.com/-scRIghLBJN4/TtkMif23TKI/AAAAAAAAAEE/EG6jOa4CgD8/s200/sh-selfshadow.jpg" alt="" id="BLOGGER_PHOTO_ID_5681586191711292578" border="0" /></a><a href="http://www.valvesoftware.com/publications/2007/SIGGRAPH2007_EfficientSelfShadowedRadiosityNormalMapping.pdf">Valve's self-shadowing radiosity normal maps</a> concept can be used with spherical harmonics in approximately the same way: Integrate a sphere based on how much light will affect a sample if incoming from numerous sample direction, accounting for collision with other samples due to elevation.<br /><br />You can store this as three DXT1 textures, though you can improve quality by packing channels with similar spatial coherence. Coefficients 0, 2, and 6 in particular tend to pack well, since they're all dominated primarily by directions aimed perpendicular to the texture.<br /><br />I use the following packing:<br />Texture 1: Coefs 0, 2, 6<br />Texture 2: Coefs 1, 4, 5<br />Texture 3: Coefs 3, 7, 8<br /><br />You can <a href="http://codedeposit.blogspot.com/2010/01/spherical-harmonics-spoilers.html">reference an early post on this blog</a> for code on how to rotate a SH vector by a matrix, in turn allowing you to get it into texture space. Once you've done that, simply multiply each SH coefficient from the self-shadowing map by the SH coefficients created from your light source (also covered on the previous post) and add together.OneEightHundredhttp://www.blogger.com/profile/15917532861521845279noreply@blogger.com0tag:blogger.com,1999:blog-790389648997972680.post-81971830771147662902011-10-18T23:55:00.000-04:002011-10-19T00:37:29.437-04:00Introducing RDXHas it really been a year since the last update?<br /><br />Well, things have been chugging along with less discovery and more actual work. However, development on TDP is largely on hold due to the likely impending release of the Doom 3 source code, which has numerous architectural improvements like rigid-body physics and much better customization of entity networking.<br /><br /><br />In the meantime, however, a component of TDP has been spun off into its own project: The RDX extension language. Initially planned as a resource manager, it has evolved into a full-fledged programmability API. The main goal was to have a runtime with very straightforward integration, to the point that you can easily use it for managing your C++ resources, but also to be much higher performance than dynamically-typed interpreted languages, especially when dealing with complex data types such as float vectors.<br /><br />Features are still being implemented, but the compiler seems to be stable and load-time conversion to native x86 code is functional. Expect a real release in a month or two.<br /><br /><a href="http://code.google.com/p/rdx-extension-language/">The project now has a home on Google Code</a>.OneEightHundredhttp://www.blogger.com/profile/15917532861521845279noreply@blogger.com0tag:blogger.com,1999:blog-790389648997972680.post-32661998485585436312010-10-07T20:36:00.001-04:002010-10-11T02:21:28.508-04:00YCoCg DXT5 - Stripped down and simplifiedYou'll recall some improvements I proposed to the YCoCg DXT5 algorithm a while back.<br /><br />There's another realization of it I made recently: As a YUV-style color space, the Co and Cg channels are constrained to a range that's directly proportional to the Y channel. The addition of the scalar blue channel was mainly introduced to deal with resolution issues that caused banding artifacts on colored objects changing value, but the entire issue there can be sidestepped by simply using the Y channel as a multiplier for the Co and Cg channels, causing them to only respect tone and saturation while the Y channel becomes fully responsible for intensity.<br /><br />This is <span style="font-weight:bold;">not</span> a quality improvement, in fact it nearly doubles PSNR in testing. However, it does result in considerable simplification of the algorithm, both on the encode and decode sides, and the perceptual loss compared to the old algorithm is very minimal.<br /><br />This also simplifies the algorithm considerably:<br /><br /><pre><br />int iY = px[0] + 2*px[1] + px[2]; // 0..1020<br />int iCo, iCg;<br /><br />if (iY == 0)<br />{<br /> iCo = 0;<br /> iCg = 0;<br />}<br />else<br />{<br /> iCo = (px[0] + px[1]) * 255 / iY;<br /> iCg = (px[1] * 2) * 255 / iY;<br />}<br /><br />px[0] = (unsigned char)iCo;<br />px[1] = (unsigned char)iCg;<br />px[2] = 0;<br />px[3] = (unsigned char)((iY + 2) / 4);<br /></pre><br /><br />... And to decode:<br /><br /><pre><br />float3 DecodeYCoCgRel(float4 inColor)<br />{<br /> return (float3(4.0, 0.0, -4.0) * inColor.r<br /> + float3(-2.0, 2.0, -2.0) * inColor.g<br /> + float3(0.0, 0.0, 4.0)) * inColor.a;<br />}<br /></pre><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_p7tnlbl0cTs/TK5sFtPUqfI/AAAAAAAAADg/qhnCNDrhESo/s1600/coloration.jpg"><img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 320px; height: 179px;" src="http://3.bp.blogspot.com/_p7tnlbl0cTs/TK5sFtPUqfI/AAAAAAAAADg/qhnCNDrhESo/s320/coloration.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5525472638129187314" /></a><br /><br />While this does the job with much less perceptual loss than DXT1, and eliminates banding artifacts almost entirely, it is not quite as precise as the old algorithm, so using that is recommended if you need the quality.OneEightHundredhttp://www.blogger.com/profile/15917532861521845279noreply@blogger.com0tag:blogger.com,1999:blog-790389648997972680.post-31205732181244817682010-06-04T19:23:00.000-04:002010-06-04T19:32:13.374-04:00... and they're still compressableAs a corollary to the last entry, an orthogonal tangent basis is commonly compressed by storing the normal and one of the texture axis vectors, along with a "handedness" multiplier which is either -1 or 1. The second texture axis is regenerated by taking the cross product of the normal and the stored axis, and multiplying it by the handedness.<br /><br />The method I proposed was faulted for breaking this scheme, but there's no break at all. Since the two texture axes are on the triangle plane, and the normal is perpendicular, you can use the same compression scheme by simply storing the two texture axis vectors, and regenerating the normal by taking the cross product of them, multiplying it by a handedness multiplier, and normalizing it.<br /><br />This does not address mirroring concerns if you use my "snap-to-normal" recommendation, though you could detect those cases in a vertex shader by using special handedness values.OneEightHundredhttp://www.blogger.com/profile/15917532861521845279noreply@blogger.com0tag:blogger.com,1999:blog-790389648997972680.post-77508396851064024992010-04-22T19:49:00.000-04:002012-01-08T00:23:05.193-05:00Tangent-space basis vectors: Don't assume your texture projection is orthogonalHow do you generate the tangent vectors, which represent which way the texture axes on a textured triangle, are facing?<br /><br />Hitting up Google tends to produce articles like <a href="http://www.terathon.com/code/tangent.html">this one</a>, or maybe even that exact one. I've seen others linked too, the basic formulae tend to be the same. Have you looked at what you're pasting into your code though? Have you noticed that you're using the T coordinates to calculate the S vector, and vice versa? Well, you can look at the underlying math, and you'll find that it's because that's what happens when you assume the normal, S vector, and T vectors form an orthonormal matrix and attempt to invert it, in a sense you're not really using the S and T vectors but rather vectors perpendicular to them.<br /><br />But that's fine, right? I mean, this is an orthogonal matrix, and they are perpendicular to each other, right? Well, does your texture project on to the triangle with the texture axes at right angles to each other, like a grid?<br /><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_p7tnlbl0cTs/S9Dheqg82bI/AAAAAAAAAB4/R2a1-akHNas/s1600/tgen-coords.png"><img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 150px;" src="http://1.bp.blogspot.com/_p7tnlbl0cTs/S9Dheqg82bI/AAAAAAAAAB4/R2a1-akHNas/s200/tgen-coords.png" alt="" id="BLOGGER_PHOTO_ID_5463114264925231538" border="0" /></a>... Not always? Well, you might have a problem then!<br /><br />So, what's the real answer?<br /><br />Well, what do we know? First, translating the vertex positions will not affect the axial directions. Second, scrolling the texture will not affect the axial directions.<br /><br />So, for triangle (A,B,C), with coordinates (x,y,z,t), we can create a new triangle (LA,LB,LC) and the directions will be the same:<br /><br /><div style="text-align: left;"><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_p7tnlbl0cTs/S9EBF8Z9rCI/AAAAAAAAACA/zt7MAmJM9Lk/s1600/tgen-formula1.png"><img style="margin: 0pt 10px 10px 0pt; cursor: pointer; width: 88px; height: 60px;" src="http://4.bp.blogspot.com/_p7tnlbl0cTs/S9EBF8Z9rCI/AAAAAAAAACA/zt7MAmJM9Lk/s200/tgen-formula1.png" alt="" id="BLOGGER_PHOTO_ID_5463149024603122722" border="0" /></a></div>We also know that both axis directions are on the same plane as the points, so to resolve that, we can to convert this into a local coordinate system and force one axis to zero.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_p7tnlbl0cTs/TAHaQ8Z3bnI/AAAAAAAAADA/LEntYl_Bk9c/s1600/tgen-formula2.png"><img style="cursor: pointer; width: 170px; height: 67px;" src="http://2.bp.blogspot.com/_p7tnlbl0cTs/TAHaQ8Z3bnI/AAAAAAAAADA/LEntYl_Bk9c/s400/tgen-formula2.png" alt="" id="BLOGGER_PHOTO_ID_5476898606486613618" border="0" /></a><br /><br />Now we need triangle (Origin, PLB, PLC) in this local coordinate space. We know PLB[y] is zero since LB was used as the X axis.<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_p7tnlbl0cTs/TAHcPPe_B2I/AAAAAAAAADQ/CQhHj3zQaRI/s1600/tgen-formula3.png"><img style="margin: 0pt 10px 10px 0pt; cursor: pointer; width: 294px; height: 115px;" src="http://3.bp.blogspot.com/_p7tnlbl0cTs/TAHcPPe_B2I/AAAAAAAAADQ/CQhHj3zQaRI/s400/tgen-formula3.png" alt="" id="BLOGGER_PHOTO_ID_5476900776271873890" border="0" /></a><br /><br />Now we can solve this. Remember that PLB[y] is zero, so...<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_p7tnlbl0cTs/S9EHHOg_G_I/AAAAAAAAACw/TyvxD3sS5j0/s1600/tgen-formula4.png"><img style="cursor: pointer; width: 400px; height: 335px;" src="http://3.bp.blogspot.com/_p7tnlbl0cTs/S9EHHOg_G_I/AAAAAAAAACw/TyvxD3sS5j0/s400/tgen-formula4.png" alt="" id="BLOGGER_PHOTO_ID_5463155643714051058" border="0" /></a><br /><br />Do this for both axes and you have your correct texture axis vectors, regardless of the texture projection. You can then multiply the results by your tangent-space normalmap, normalize the result, and have a proper world-space surface normal.<br /><br />As always, the source code spoilers:<br /><br /><pre>terVec3 lb = ti->points[1] - ti->points[0];<br />terVec3 lc = ti->points[2] - ti->points[0];<br />terVec2 lbt = ti->texCoords[1] - ti->texCoords[0];<br />terVec2 lct = ti->texCoords[2] - ti->texCoords[0];<br /><br />// Generate local space for the triangle plane<br />terVec3 localX = lb.Normalize2();<br />terVec3 localZ = lb.Cross(lc).Normalize2();<br />terVec3 localY = localX.Cross(localZ).Normalize2();<br /><br />// Determine X/Y vectors in local space<br />float plbx = lb.DotProduct(localX);<br />terVec2 plc = terVec2(lc.DotProduct(localX), lc.DotProduct(localY));<br /><br />terVec2 tsvS, tsvT;<br /><br />tsvS[0] = lbt[0] / plbx;<br />tsvS[1] = (lct[0] - tsvS[0]*plc[0]) / plc[1];<br />tsvT[0] = lbt[1] / plbx;<br />tsvT[1] = (lct[1] - tsvT[0]*plc[0]) / plc[1];<br /><br />ti->svec = (localX*tsvS[0] + localY*tsvS[1]).Normalize2();<br />ti->tvec = (localX*tsvT[0] + localY*tsvT[1]).Normalize2();<br /></pre><br /><br />There's an additional special case to be aware of: Mirroring.<br /><br />Mirroring across an edge can cause wild changes in a vector's direction, possibly even degenerating it. There isn't a clear-cut solution to these, but you can work around the problem by snapping the vector to the normal, effectively cancelling it out on the mirroring edge.<br /><br />Personally, I check the angle between the two vectors, and if they're more than 90 degrees apart, I cancel them, otherwise I merge them.OneEightHundredhttp://www.blogger.com/profile/15917532861521845279noreply@blogger.com0tag:blogger.com,1999:blog-790389648997972680.post-81204348582299069842010-02-11T11:02:00.000-05:002010-02-11T20:03:14.764-05:00Volumetric fog spoilersOkay, so you want to make volumetric fog. Volumetric fog has descended from its days largely as a gimmick to being situationally useful, and there are still some difficulties: It's really difficult to model changes in the light inside the fog. There are techniques you can use for volumetric shadows within the fog, like rendering the depths of the front and back sides of non-overlapping volumes into a pair of accumulation textures, and using the difference between the two to determine the amount of distance penetrated.<br /><br />Let's focus on a simpler implementation though: Planar, infinite, and with a linear transitional region. A transitional region is nice because it means the fog appears to gradually taper off instead of being conspicuously contained entirely below a flat plane.<br /><br />In practice, there is one primary factor that needs to be determined: The amount of fog penetrated by the line from the viewpoint to the surface. In determining that, the transitional layer and the surface layer actually need to be calculated separately:<br /><br /><span style="font-size:180%;">Transition layer</span><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_p7tnlbl0cTs/S3ShYV8eH1I/AAAAAAAAABY/2Vp0JOH1AOg/s1600-h/fog-intermediate.png"><img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 200px; height: 161px;" src="http://2.bp.blogspot.com/_p7tnlbl0cTs/S3ShYV8eH1I/AAAAAAAAABY/2Vp0JOH1AOg/s200/fog-intermediate.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5437148089722740562" /></a>For the transition layer, what you want to do is multiply the distance traveled through the transition layer by the average density of the fog. Fortunately, due to some quirks of the math involved, there's a very easy way to get this: The midpoint of the entry and exit points of the transitional region will be located at a point where the fog density is equal to the average density passed through. The entry and exit points can be done by taking the viewpoint and target distances and clamping them to the entry and exit planes.<br /><br /><span style="font-size:180%;">Full-density layer</span><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_p7tnlbl0cTs/S3Sj7oQeEmI/AAAAAAAAABg/WAMfyBjgv94/s1600-h/fog-exterior.png"><img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 200px; height: 129px;" src="http://3.bp.blogspot.com/_p7tnlbl0cTs/S3Sj7oQeEmI/AAAAAAAAABg/WAMfyBjgv94/s200/fog-exterior.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5437150894957138530" /></a><br />The full-density layer is a bit more complex, since it behaves differently whether the camera is inside or outside of the fog. For a camera inside the fog, the fogged portion is represented by the distance from the camera to the fog plane. For a camera outside of the fog, the fogged portion is represented by the distance from the object to the fog plane. If you want to do it in one pass, both of these modes can be represented by dividing one linearly interpolated value by the linearly interpolated distance of the camera-to-point distance relative to the fog plane.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_p7tnlbl0cTs/S3SkEkY6z9I/AAAAAAAAABo/usNXP0qyczc/s1600-h/fog-interior.png"><img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 200px; height: 161px;" src="http://3.bp.blogspot.com/_p7tnlbl0cTs/S3SkEkY6z9I/AAAAAAAAABo/usNXP0qyczc/s200/fog-interior.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5437151048537657298" /></a><br />Since the camera being inside or outside the fog is completely determinable in advance, you can easily make permutations based on it and skip a branch in the shader. With a deferred renderer, you can use depth information and the fog plane to determine all of the distances. With a forward renderer, most of the distance factors interpolate linearly, allowing you to do some clamps and divides entirely in the shader.<br /><br />Regardless of which you use, once you have the complete distance traveled, the most physically accurate determination of the amount still visible as:<br /><br /><span style="font-size:150%;">min(1, e</span><span style="vertical-align:top;font-size:75%;">-(length(cameraToVert) * coverage * density)</span><span style="font-size:150%;">)</span><br /><br />You don't have to use e as the base though: Using 2 is a bit faster, and you can rescale the density coefficient to achieve any behavior you could have attained with using e.<br /><br /><br />As usual, the shader code spoilers:<br /><br /><pre><code><br />// EncodeFog : Encodes a 4-component vector containing fraction components used<br />// to calculate fog factor<br />float4 VEncodeFog(float3 cameraPos, float3 vertPos, float4 fogPlane, float fogTransitionDepth)<br />{<br /> float cameraDist, pointDist;<br /><br /> cameraDist = dot(cameraPos, fogPlane.xyz);<br /> pointDist = dot(vertPos, fogPlane.xyz);<br /><br /> return float4(cameraDist, fogPlane.w, fogPlane.w - fogTransitionDepth, pointDist);<br />}<br /><br />// PDecodeFog : Returns the fraction of the original scene to display given<br />// an encoded fog fraction and the camera-to-vertex vector<br />// rcpFogTransitionDepth = 1/fogTransitionDepth<br />float PDecodeFog(float4 fogFactors, float3 cameraToVert, float fogDensityScalar, float rcpFogTransitionDepth)<br />{<br /> // x = cameraDist, y = shallowFogPlaneDist, z = deepFogPlaneDist (< shallow), w = pointDist<br /> float3 diffs = fogFactors.wzz - fogFactors.xxw;<br /><br /> float cameraToPointDist = diffs.x;<br /> float cameraToFogDist = diffs.y;<br /> float nPointToFogDist = diffs.z;<br /><br /> float rAbsCameraToPointDist = 1.0 / abs(cameraToPointDist);<br /><br /> // Calculate the average density of the transition zone fog<br /> // Since density is linear, this will be the same as the density at the midpoint of the ray,<br /> // clamped to the boundaries of the transition zone<br /> float clampedCameraTransitionPoint = max(fogFactors.z, min(fogFactors.y, fogFactors.x));<br /> float clampedPointTransitionPoint = max(fogFactors.z, min(fogFactors.y, fogFactors.w));<br /> float transitionPointAverage = (clampedPointTransitionPoint + clampedCameraTransitionPoint) * 0.5;<br /><br /> float transitionAverageDensity = (fogFactors.y - transitionPointAverage) * rcpFogTransitionDepth;<br /><br /> // Determine a coverage factor based on the density and the fraction of the ray that passed through the transition zone<br /> float transitionCoverage = transitionAverageDensity *<br /> abs(clampedCameraTransitionPoint - clampedPointTransitionPoint) * rAbsCameraToPointDist;<br /><br /> // Calculate coverage for the full-density portion of the volume as the fraction of the ray intersecting<br /> // the bottom part of the transition zone<br /># ifdef CAMERA_IN_FOG<br /> float fullCoverage = cameraToFogDist * rAbsCameraToPointDist;<br /> if(nPointToFogDist >= 0.0)<br /> fullCoverage = 1.0;<br /># else<br /> float fullCoverage = max(0.0, nPointToFogDist * rAbsCameraToPointDist);<br /># endif<br /><br /> float totalCoverage = fullCoverage + transitionCoverage;<br /><br /> // Use inverse exponential scaling with distance<br /> // fogDensityScalar is pre-negated<br /> return min(1.0, exp2(length(cameraToVert) * totalCoverage * fogDensityScalar));<br />}<br /><br /></code></pre>OneEightHundredhttp://www.blogger.com/profile/15917532861521845279noreply@blogger.com0tag:blogger.com,1999:blog-790389648997972680.post-5353207415566225782010-02-10T23:10:00.000-05:002010-02-11T18:29:26.963-05:00Self-shadowing textures using radial horizon maps<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_p7tnlbl0cTs/S3OTp7HbqbI/AAAAAAAAABQ/d_NxZCCK8f8/s1600-h/fourier_horizon_mapping.jpg"><img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 200px; height: 134px;" src="http://4.bp.blogspot.com/_p7tnlbl0cTs/S3OTp7HbqbI/AAAAAAAAABQ/d_NxZCCK8f8/s200/fourier_horizon_mapping.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5436851523619105202" /></a><br />I don't think I need to extoll the virtues of self-shadowing textures, so instead, I'll simply present a method for doing them:<br /><br />Any given point on a heightmap can represent all of the shadowing from the surrounding heightmap as a waveform h = f(t), where t is a radial angle across the base plane, and h is the sine of the angle from the base plane to the highest-angled heightmap sample in that direction. h, in other words, is the horizon level for a light coming from that direction.<br /><br />Anyway, the method for precomputing this should be pretty obvious: Fire traces from each sample and finding the shallowest angle that will clear all heightmap samples. Once you have it, it's simply a matter of encoding it, which can be easily done using Fourier series. Each Fourier band requires 2 coefficients, except the first which requires one since the sine band is zero. I use 5 coefficients (stored as an RGBA8 and A8), but 3 works acceptably. More coefficients requires more storage, but produces sharper shadows.<br /><br />Fourier series are really straightforward, but here are the usual Cartesian coordinate spoilers:<br /><br />Assuming x = sin(t) and y = cos(t)<br /><br />[sin(0)] = 0<br />[cos(0)] = 1<br />[sin(n)] = x<br />[cos(n)] = y<br />[sin(2n)] = 2*x*y<br />[cos(2n)] = y*y - x*x<br /><br />The constant band squared integrates to 1 over a circle, but all other bands integrate to 0.5. That means you need to double whatever you get from it to reproduce the original waveform.<br /><br />There's one other problem, which is that the horizon will move rapidly at shallow angles, right where precision breaks down. You need more precision at low values, and my recommended way of doing that is storing the sign-preserving square root of the original value, and multiplying the stored value by the absolute value of itself in the shader.<br /><br />For added effect, rather than simply doing a comparison of the light angle to the horizon level, you can take the difference of the two, multiply it by some constant, and saturate. This will produce a softer shadow edge.<br /><br />Shader code to decode this, taking a texture coordinate and a tangent-space lighting direction to the light souce:<br /><br /><pre><code>float SelfShadowing(float3 tspaceLightDir, float2 tc)<br />{<br /> float3 nLightDir = normalize(tspaceLightDir);<br /> float4 lqCoefs = tex2D(pHorizonLQMap, tc) * 4.0 - 2.0; // Premultiply by 2<br /> float cCoef = tex2D(pHorizonCCUMap, tc).x * 2.0 - 1.0;<br /><br /> lqCoefs *= abs(lqCoefs); // Undo dynamic range compression<br /> cCoef *= abs(cCoef);<br /><br /> float2 tspaceRadial = normalize(float3(tspaceLightDir.xy, 0.0)).xy;<br /><br /> float3 qmultipliers = tspaceRadial.xyx * tspaceRadial.yyx * float3(2.0, 1.0, -1.0);<br /><br /> float horizonLevel = cCoef + dot(lqCoefs.rg, tspaceRadial)<br /> + dot(lqCoefs.baa, qmultipliers);<br /><br /> return saturate(0.5 + 20.0*(tspaceLightDir.z - horizonLevel));<br />}</code></pre><br /><br />An additional optimization would be to exploit the fact that the range of the coefficients other than the constant band is not -1..1, but rather, -2/pi..2/pi. Expanding this to -1..1 gives you a good deal of additional precision.OneEightHundredhttp://www.blogger.com/profile/15917532861521845279noreply@blogger.com0tag:blogger.com,1999:blog-790389648997972680.post-48854038251897007782010-01-18T06:06:00.000-05:002011-12-02T12:22:32.323-05:00Spherical harmonics spoilersSpherical harmonics seems to have some impenetrable level of difficulty, especially among the indie scene which has little to go off of other than a few presentations and whitepapers, some of which even contain incorrect information (i.e. one of the formulas in the Sony paper on the topic is incorrect), and most of which are still using ZYZ rotations because it's so hard to find how to do a matrix rotation.<br /><br />Hao Chen and Xinguo Liu did a presentation at SIGGRAPH '08 and the <a href="http://developer.amd.com/documentation/presentations/legacy/S2008-Chen-Lighting_and_Material_of_Halo3.pdf">slides from it</a> contain a good deal of useful stuff, nevermind one of the ONLY easy-to-find rotate-by-matrix functions. It also treats the Z axis a bit awkwardly, so I patched the rotation code up a bit, and a pre-integrated cosine convolution filter so you can easily get SH coefs for directional light.<br /><br />There was also gratuitous use of sqrt(3) multipliers, which can be completely eliminated by simply premultiplying or predividing coef #6 by it, which incidentally causes all of the constants and multipliers to resolve to rational numbers.<br /><br />As always, you can include multiple lights by simply adding the SH coefs for them together. If you want specular, you can approximate a directional light by using the linear component to determine the direction, and constant component to determine the color. You can do this per-channel, or use the average values to determine the direction and do it once.<br /><br />Here are the spoilers:<br /><br /><pre><code>#define SH_AMBIENT_FACTOR (0.25f)<br />#define SH_LINEAR_FACTOR (0.5f)<br />#define SH_QUADRATIC_FACTOR (0.3125f)<br /><br />void LambertDiffuseToSHCoefs(const terVec3 &dir, float out[9])<br />{<br /> // Constant<br /> out[0] = 1.0f * SH_AMBIENT_FACTOR;<br /><br /> // Linear<br /> out[1] = dir[1] * SH_LINEAR_FACTOR;<br /> out[2] = dir[2] * SH_LINEAR_FACTOR;<br /> out[3] = dir[0] * SH_LINEAR_FACTOR;<br /><br /> // Quadratics<br /> out[4] = ( dir[0]*dir[1] ) * 3.0f*SH_QUADRATIC_FACTOR;<br /> out[5] = ( dir[1]*dir[2] ) * 3.0f*SH_QUADRATIC_FACTOR;<br /> out[6] = ( 1.5f*( dir[2]*dir[2] ) - 0.5f ) * SH_QUADRATIC_FACTOR;<br /> out[7] = ( dir[0]*dir[2] ) * 3.0f*SH_QUADRATIC_FACTOR;<br /> out[8] = 0.5f*( dir[0]*dir[0] - dir[1]*dir[1] ) * 3.0f*SH_QUADRATIC_FACTOR;<br />}<br /><br /><br />void RotateCoefsByMatrix(float outCoefs[9], const float pIn[9], const terMat3x3 &rMat)<br />{<br /> // DC<br /> outCoefs[0] = pIn[0];<br /><br /> // Linear<br /> outCoefs[1] = rMat[1][0]*pIn[3] + rMat[1][1]*pIn[1] + rMat[1][2]*pIn[2];<br /> outCoefs[2] = rMat[2][0]*pIn[3] + rMat[2][1]*pIn[1] + rMat[2][2]*pIn[2];<br /> outCoefs[3] = rMat[0][0]*pIn[3] + rMat[0][1]*pIn[1] + rMat[0][2]*pIn[2];<br /><br /> // Quadratics<br /> outCoefs[4] = (<br /> ( rMat[0][0]*rMat[1][1] + rMat[0][1]*rMat[1][0] ) * ( pIn[4] )<br /> + ( rMat[0][1]*rMat[1][2] + rMat[0][2]*rMat[1][1] ) * ( pIn[5] )<br /> + ( rMat[0][2]*rMat[1][0] + rMat[0][0]*rMat[1][2] ) * ( pIn[7] )<br /> + ( rMat[0][0]*rMat[1][0] ) * ( pIn[8] )<br /> + ( rMat[0][1]*rMat[1][1] ) * ( -pIn[8] )<br /> + ( rMat[0][2]*rMat[1][2] ) * ( 3.0f*pIn[6] )<br /> );<br /><br /> outCoefs[5] = (<br /> ( rMat[1][0]*rMat[2][1] + rMat[1][1]*rMat[2][0] ) * ( pIn[4] )<br /> + ( rMat[1][1]*rMat[2][2] + rMat[1][2]*rMat[2][1] ) * ( pIn[5] )<br /> + ( rMat[1][2]*rMat[2][0] + rMat[1][0]*rMat[2][2] ) * ( pIn[7] )<br /> + ( rMat[1][0]*rMat[2][0] ) * ( pIn[8] )<br /> + ( rMat[1][1]*rMat[2][1] ) * ( -pIn[8] )<br /> + ( rMat[1][2]*rMat[2][2] ) * ( 3.0f*pIn[6] )<br /> );<br /><br /> outCoefs[6] = (<br /> ( rMat[2][1]*rMat[2][0] ) * ( pIn[4] )<br /> + ( rMat[2][2]*rMat[2][1] ) * ( pIn[5] )<br /> + ( rMat[2][0]*rMat[2][2] ) * ( pIn[7] )<br /> + 0.5f*( rMat[2][0]*rMat[2][0] ) * ( pIn[8])<br /> + 0.5f*( rMat[2][1]*rMat[2][1] ) * ( -pIn[8])<br /> + 1.5f*( rMat[2][2]*rMat[2][2] ) * ( pIn[6] )<br /> - 0.5f * ( pIn[6] )<br /> );<br /><br /> outCoefs[7] = (<br /> ( rMat[0][0]*rMat[2][1] + rMat[0][1]*rMat[2][0] ) * ( pIn[4] )<br /> + ( rMat[0][1]*rMat[2][2] + rMat[0][2]*rMat[2][1] ) * ( pIn[5] )<br /> + ( rMat[0][2]*rMat[2][0] + rMat[0][0]*rMat[2][2] ) * ( pIn[7] )<br /> + ( rMat[0][0]*rMat[2][0] ) * ( pIn[8] )<br /> + ( rMat[0][1]*rMat[2][1] ) * ( -pIn[8] )<br /> + ( rMat[0][2]*rMat[2][2] ) * ( 3.0f*pIn[6] )<br /> );<br /><br /> outCoefs[8] = (<br /> ( rMat[0][1]*rMat[0][0] - rMat[1][1]*rMat[1][0] ) * ( pIn[4] )<br /> + ( rMat[0][2]*rMat[0][1] - rMat[1][2]*rMat[1][1] ) * ( pIn[5] )<br /> + ( rMat[0][0]*rMat[0][2] - rMat[1][0]*rMat[1][2] ) * ( pIn[7] )<br /> +0.5f*( rMat[0][0]*rMat[0][0] - rMat[1][0]*rMat[1][0] ) * ( pIn[8] )<br /> +0.5f*( rMat[0][1]*rMat[0][1] - rMat[1][1]*rMat[1][1] ) * ( -pIn[8] )<br /> +0.5f*( rMat[0][2]*rMat[0][2] - rMat[1][2]*rMat[1][2] ) * ( 3.0f*pIn[6] )<br /> );<br />}<br /></code></pre><br /><br />... and to sample it in the shader ...<br /><br /><pre><code><br />float3 SampleSHQuadratic(float3 dir, float3 shVector[9])<br />{<br /> float3 ds1 = dir.xyz*dir.xyz;<br /> float3 ds2 = dir*dir.yzx; // xy, zy, xz<br /><br /> float3 v = shVector[0];<br /><br /> v += dir.y * shVector[1];<br /> v += dir.z * shVector[2];<br /> v += dir.x * shVector[3];<br /><br /> v += ds2.x * shVector[4];<br /> v += ds2.y * shVector[5];<br /> v += (ds1.z * 1.5 - 0.5) * shVector[6];<br /> v += ds2.z * shVector[7];<br /> v += (ds1.x - ds1.y) * 0.5 * shVector[8];<br /><br /> return v;<br />}<br /></code></pre><br /><br />For Monte Carlo integration, take sampling points, feed direction "dir" to the following function to get multipliers for each coefficient, then multiply by the intensity in that direction. Divide the total by the number of sampling points:<br /><br /><pre><code><br />void SHForDirection(const terVec3 &dir, float out[9])<br />{<br /> // Constant<br /> out[0] = 1.0f;<br /><br /> // Linear<br /> out[1] = dir[1] * 3.0f;<br /> out[2] = dir[2] * 3.0f;<br /> out[3] = dir[0] * 3.0f;<br /><br /> // Quadratics<br /> out[4] = ( dir[0]*dir[1] ) * 15.0f;<br /> out[5] = ( dir[1]*dir[2] ) * 15.0f;<br /> out[6] = ( 1.5f*( dir[2]*dir[2] ) - 0.5f ) * 5.0f;<br /> out[7] = ( dir[0]*dir[2] ) * 15.0f;<br /> out[8] = 0.5f*( dir[0]*dir[0] - dir[1]*dir[1] ) * 15.0f;<br />}<br /></code></pre><br /><br />... and finally, for a uniformly-distributed random point on a sphere ...<br /><br /><pre><code><br />terVec3 RandomDirection(int (*randomFunc)(), int randMax)<br />{<br /> float u = (((float)randomFunc()) / (float)(randMax - 1))*2.0f - 1.0f;<br /> float n = sqrtf(1.0f - u*u);<br /><br /> float theta = 2.0f * M_PI * (((float)randomFunc()) / (float)(randMax));<br /><br /> return terVec3(n * cos(theta), n * sin(theta), u);<br />}</code></pre>OneEightHundredhttp://www.blogger.com/profile/15917532861521845279noreply@blogger.com2tag:blogger.com,1999:blog-790389648997972680.post-2090233152359063032010-01-18T05:21:00.000-05:002010-10-11T00:32:07.568-04:002.0 gamma textures and full-range scalars in YCoCg DXT5A few years back there was a <a href="http://developer.nvidia.com/object/real-time-ycocg-dxt-compression.html">publication on real-time YCoCg DXT5 texture compression</a>. There are two improvements on the technique I feel I should present:<br /><br />There's a pretty clear problem right off the bat: It's not particularly friendly to linear textures. If you simply attempt to convert sRGB values into linear space and store the result in YCoCg, you will experience severe banding owing largely to the loss of precision at lower values. Gamma space provides a lot of precision at lower intensity values where the human visual system is more sensitive.<br /><br />sRGB texture modes exist as a method to cheaply convert from gamma space to linear, and are pretty fast since GPUs can just use a look-up table to get the linear values, but YCoCg can't be treated as an sRGB texture and doing sRGB decodes in the shader is fairly slow since it involves a divide, power raise, and conditional.<br /><br />This can be resolved first by simply converting from a 2.2-ish sRGB gamma ramp to a 2.0 gamma ramp, which preserves most of the original gamut: 255 input values map to 240 output values, low intensity values maintain most of their precision, and they can be linearized by simply squaring the result in the shader.<br /><br /><br />Another concern, which isn't really one if you're aiming for speed and doing things real-time, but is if you're considering using such a technique for offline processing, is the limited scale factor. DXT5 provides enough resolution for 32 possible scale factor values, so there isn't any reason to limit it to 1, 2, or 4 if you don't have to. Using the full range gives you more color resolution to work with.<br /><br /><br />Here's some sample code:<br /><br /><br /><pre><code>unsigned char Linearize(unsigned char inByte)<br />{<br /> float srgbVal = ((float)inByte) / 255.0f;<br /> float linearVal;<br /><br /> if(srgbVal < 0.04045)<br /> linearVal = srgbVal / 12.92f;<br /> else<br /> linearVal = pow( (srgbVal + 0.055f) / 1.055f, 2.4f);<br /><br /> return (unsigned char)(floor(sqrt(linearVal)* 255.0 + 0.5));<br />}<br /><br />void ConvertBlockToYCoCg(const unsigned char inPixels[16*3], unsigned char outPixels[16*4])<br />{<br /> unsigned char linearizedPixels[16*3]; // Convert to linear values<br /><br /> for(int i=0;i<16*3;i++)<br /> linearizedPixels[i] = Linearize(inPixels[i]);<br /><br /> // Calculate Co and Cg extents<br /> int extents = 0;<br /> int n = 0;<br /> int iY, iCo, iCg;<br /> int blockCo[16];<br /> int blockCg[16];<br /> const unsigned char *px = linearizedPixels;<br /> for(int i=0;i<16;i++)<br /> {<br /> iCo = (px[0]<<1) - (px[2]<<1);<br /> iCg = (px[1]<<1) - px[0] - px[2];<br /> if(-iCo > extents) extents = -iCo;<br /> if( iCo > extents) extents = iCo;<br /> if(-iCg > extents) extents = -iCg;<br /> if( iCg > extents) extents = iCg;<br /><br /> blockCo[n] = iCo;<br /> blockCg[n++] = iCg;<br /><br /> px += 3;<br /> }<br /><br /> // Co = -510..510<br /> // Cg = -510..510<br /> float scaleFactor = 1.0f;<br /> if(extents > 127)<br /> scaleFactor = (float)extents * 4.0f / 510.0f;<br /><br /> // Convert to quantized scalefactor<br /> unsigned char scaleFactorQuantized = (unsigned char)(ceil((scaleFactor - 1.0f) * 31.0f / 3.0f));<br /><br /> // Unquantize<br /> scaleFactor = 1.0f + (float)(scaleFactorQuantized / 31.0f) * 3.0f;<br /><br /> unsigned char bVal = (unsigned char)((scaleFactorQuantized << 3) | (scaleFactorQuantized >> 2));<br /><br /> unsigned char *outPx = outPixels;<br /><br /> n = 0;<br /> px = linearizedPixels;<br /> for(i=0;i<16;i++)<br /> {<br /> // Calculate components<br /> iY = ( px[0] + (px[1]<<1) + px[2] + 2 ) / 4;<br /> iCo = ((blockCo[n] / scaleFactor) + 128);<br /> iCg = ((blockCg[n] / scaleFactor) + 128);<br /><br /> if(iCo < 0) iCo = 0; else if(iCo > 255) iCo = 255;<br /> if(iCg < 0) iCg = 0; else if(iCg > 255) iCg = 255;<br /> if(iY < 0) iY = 0; else if(iY > 255) iY = 255;<br /><br /> px += 3;<br /><br /> outPx[0] = (unsigned char)iCo;<br /> outPx[1] = (unsigned char)iCg;<br /> outPx[2] = bVal;<br /> outPx[3] = (unsigned char)iY;<br /><br /> outPx += 4;<br /> }<br />}<br /><br /><br /></code></pre><br /><br />.... And to decode it in the shader ...<br /><br /><code><pre><br /><br />float3 DecodeYCoCg(float4 inColor)<br />{<br /> float3 base = inColor.arg + float3(0, -0.5, -0.5);<br /> float scale = (inColor.b*0.75 + 0.25);<br /> float4 multipliers = float4(1.0, 0.0, scale, -scale);<br /> float3 result;<br /><br /> result.r = dot(base, multipliers.xzw);<br /> result.g = dot(base, multipliers.xyz);<br /> result.b = dot(base, multipliers.xww);<br /><br /> // Convert from 2.0 gamma to linear<br /> return result*result;<br />}<br /><br /></code></pre>OneEightHundredhttp://www.blogger.com/profile/15917532861521845279noreply@blogger.com0tag:blogger.com,1999:blog-790389648997972680.post-65807875051086462312009-06-07T21:46:00.000-04:002009-06-18T00:06:13.866-04:00Designing TDP's material systemTDP is very much a work in progress, but several months were spent on the new renderer before it was even capable of displaying anything. TDP's asset manager was a large part of this, but a larger part was the material system. A lot of work went into it to create a powerful material system that would minimize work in the future by making material definitions "smarter." It would also create a highly scalable and easy-to-refactor system capable of hitting on all cylinders.<br /><br />A "hard" material system, which is what I was trying to avoid, works something like this: A material is either left to defaults, in which case the engine uses a hard-coded mechanism to determine the best way to render it, or it's defined as a "custom" material. Custom materials generally need to be described for every non-default material. These used to require things very close to the metal, now they're often just done by declaring values to throw at hardware shaders, which is a bit higher-level but still pretty close to the metal.<br /><br />There are some flaws with this: It forces artists to deal with rendering concepts that are normally programmer territory, and the only way to deal with some shader customization or other being used repeatedly is either hard-code a new "default" into the engine, or copy/paste a lot of material specifications.<br /><br />TDP's material system was designed with a few design goals, largely to address these issues:<br /><ul><li>Artists should only have to describe the type of surface (or layer) and the assets needed for it, not how to render it.</li><li>Material customizations should be reusable.</li><li>Rendering paths should be data-driven, not hard-coded.</li><li>Asset importation should be automatic, not require an extra step.</li></ul><br />The third point was the most difficult one, but all contributed to the creation of a script-based "mediator" which I called "profiles."<br /><br /><span style="font-weight: bold;font-size:130%;" >Using profiles to describe rendering paths</span><br /><br />Radically-different rendering paths became less of an issue as programmable shaders became standard fare, but it's still necessary: Some features (i.e. texture access in vertex shaders) have major performance differences between hardware, some are not available at all, some (i.e. blends into sRGB framebuffers) behave differently, etc.<br /><br />Profiles needed to be "smart" and consistently formulate the best solution. At the same time, script VMs tend to be slow and executing them every frame is a great way to kill performance. My solution was to use Lua scripts to create rendering definitions, but also to use a caching scheme to avoid needing to re-execute them unless the rendering environment had changed.<br /><br />Profiles are consequently aware of two types of parameters: Static and dynamic. Static parameters include things that will never change over a material's lifetime: What textures it refers to, graphics settings, etc. Dynamic parameters include things that might change over its lifetime: How many lights are affecting it, whether it's being viewed from inside a fog volume, etc.<br /><br />I call each rendering path generated by an execution of the profile script a "run." Caching runs was a design challenge of its own. They needed to be cacheable using only relevant information, but they also needed to be deterministic:<br /><ul><li>If the static parameters are the same for two materials, they should be able to use the same runs.</li><li>Runs should be grouped by their static parameters, but only relevant ones. This means the same set of static parameters should always be relevant for every run on a material.</li></ul>These created a simple limitation to make sure the static parameter reference list was always deterministic: A static parameter could never be checked by a profile script after a dynamic one. Of course, static parameters could depend on each other, doing so would drive the permutation count down while still keeping them the same across all runs.<br /><br />The dynamic parameter checks were then used as a lookup tree: Each combination of static parameters had one run tree associated with it. Rendering attempts on a material would traverse the tree, and if they hit an empty node, the profile script would be executed to create a new run. During that execution, every DynamicParameter function call would attempt to descend the tree, creating a new node if the result hadn't been seen yet, or going down an existing one if it did. The nodes contained what parameter was checked, and branched based on the results.<br /><br />The product of a run, to the engine, was a set of techniques containing several draw layers for a surface for the various reasons that surface could be drawn, as well as the capabilities and limitations of it. For example, lightmapped surfaces in TDP have a "lightmapBase" technique which is drawn once for every lightmapped surface. Runs can specify a "maxLights" value which determines how many lights can be merged into that first pass, and if there are too many, they can be rendered using the "pointLight" technique instead and blended over it.<br /><br />The techniques contain several layers, each layer corresponding to one pass, and each pass containing a reference to a shader, as well as declarations (which are determined out by the run) to decide which permutation of that shader to use. Of course, permutations can vary by asset availability, which means profiles will choose shader permutations that assign appropriate default values and avoid unnecessary computation if some component (i.e. a glow overlay) isn't available.<br /><br /><span style="font-weight: bold;font-size:130%;" >Using templates to describe multiple materials</span><br /><br />The highest-level material definitions in TDP are templates. Instead of materials needing to be explicitly defined each time, TDP looks for a material definition for a material name, and if that fails, it looks for a "default.mpt" file in every directory in the tree, up to the root directory. Templates are also Lua scripts, and they're aware of the material name. This lets them do all kinds of things with it: They can attempt to derive information from the material name itself, and they can form a list of other assets to try loading, like a normal map.<br /><br />This makes things pretty easy for material creation: You don't need to create a material, import a normal map, and assign that normalmap to the material inside an asset browser. Just drop an image with the material name and "_nm" attached to it in the same directory, and it will be automatically detected, imported, and associated with the material. TDP's terrain uses this quite liberally: To the renderer, terrain is nothing more than a vertex-lit model, but the material system associates numerous other assets with everything in the "terrain" directory like the detailmaps and lightmaps and renders it like actual terrain.<br /><br />Importation and processing jobs are another thing recycled heavily using templates. Most textures are converted to YCoCg DXT5 textures and color-corrected into linear space, which is simply specified as a processing job in one template and then repeated for every material that template applies to.OneEightHundredhttp://www.blogger.com/profile/15917532861521845279noreply@blogger.com0