rksnyp - Tumblr blog

rksnyp · 6 years ago

Text

Random Sampling

One of the things that had been sitting in the back of my mind, and yet I never quite delved into is how do you really sample things (numbers, or whatever) according to some specified distribution. What's the technique/algorithm for doing this kind of things. The C++ standard library has many callable classes like std::normal_distribution, std::uniform_distribution, etc. whereas JS only provides that uniformly distributed Math.random() in its standard library. This post is just a rumination of that.

So, without further ado, it turns out, you can use uniformly distributed random numbers to produce random numbers characterized by some other distribution of your choice. This is kind of cool, but I think most people who have been programming for a while has done this, knowingly or unknowingly.

If I want a function to output 'A', 'B', or 'C' with probability 10/100, 50/100, 40/100, how would I do it? I would chunk 100 into 3 intervals - [0 .. 10), [10 .. 60), and [60 .. 100). I will then generate a random number from a uniform distribution and see which interval it falls into. Given a random_number in range [0 .. 100), this function can be written as

def get_abc(random_number): if random_number < 10: return 'a' if random_number < 60: return 'b' if random_number < 100: return 'c' assert False

Now, I deliberately didn't write the probabilities with reduced numerators and denominators, but I could, and then instead of chunking 100 into 3 pieces, I would be chunking 1.0. We wanted to output some values with a particular distribution, which is not uniform (A, B, and C have different probabilities), and what we have just done is create the corresponding cumulative distribution function. (aka CDF) And after that, it's easy. We assume that the given random_number (which is from an uniform distrubution) is actually the output of the CDF, and we find which particular value x produced random_number (i.e. for which x, CDF(x) = random_number).

From a mathematical point of view, the CDF is always a monotonically increasing function if you assume no outcome has 0 probability. Even if some output has 0 probability, the only thing you need to is not put any if statement for that output.

Anyway, by virtue of being monotonically increasing, the CDF is invertible. Why is that important? We have just seen an example of a discrete probability distribution, with very few outcomes the random variable can take (X = A, B or C). For a continuous probability distribution, this method won't work. For those, like the power distribution, you have to work out the closed form of the inverse function by hand, and then use that directly instead of this interval by interval check. Well, if your calculus is a bit rusty, and you need to implement one, the formulas are already listed in a lot of places, I'm sure. This method is called the inversion method.

Even then, some distributions simply don't have a closed form inverse, like the normal distribution, and there are alternative approaches .

And of course, if everything fails, you can always turn to good old rejection sampling. I will now leave the reader to explore these at their own leisure. x)

#math #programming

0 notes

rksnyp · 7 years ago

Text

Pixel Centers

Few articles explain an odd thing related to Direct3D 9 pixel and texel alignment. The best one I found is by Aras P.

Although this 'problem', as he points out, was fixed long ago starting with Direct3D 10, and never shows up in OpenGL, this serves as a good place to learn about pixel coordinates.

Graphics APIs expose almost all coordinates are floats, not integers. Although the screen, and therefore the viewport is ultimately going to be shown as a grid of pixels, you can treat the area as being continuous anyway, and therefore assign a position to each of the pixel centers in continuous screen (or viewport, whatever) coordinates.

Texel centers

A not fully accurate, but intuitive way to think of the 'center' of a texel is the position inside it, which on sampling, even with bilinear filtering, will return the color of that very texel, with no mixing of colors (or whatever value we're storing) with surrounding texels, i.e. interpolant equals 0 along both x and y directions.

If you have a texture of size (w, h) where by size I mean number of texels along x and y directions, each "texel center" is situated at displacement of 1/w and 1/h from the previous texel along the x and y directions.

But, it's all dependent on where the top-left texel's center is located at. And, intuitively enough, it's located at (1/w * 0.5, 1/h * 0.5). So the center for the (i, j)-th texel is located, wrt normalized texture coordinates is at

(i + 0.5, j + 0.5) * (1/w, 1/h)

Pixel coordinates

If you render a full-screen quad, the screen-space continuous coordinate (0, 0) maps to the NDC coordinate (-1, -1), be it D3D or OpenGL.

Let's ignore the y axis. You obtain the texture coordinate at this NDC coordinate by a usual scale-shift, i.e. mapping the range [-1, 1] to [0, 1].

Note, in d3d, that scale accounts for inverting the y coordinate to top-down increasing, but yeah, doesn't really matter for this discussion.

Anyway, so you have gotten (0.0, 0.0) as the texture-coordinate to sample at for the pixel (0, 0) (in discrete screen space coordinates). But that is not going to be the position of the center of texel (0, 0). You have to add an offset of (0.5, 0.5) in your pixel shader during sampling.

Why it's a non-issue in in OpenGL and D3D10+

In D3D10, the position of the center of the (0, 0) texel is the same as in D3D9, (0.5, 0.5). The fix is done in the what pixel the NDC space top-left coordinate of (-1, 1) maps to. The NDC to viewport (and then screen) transform will map such that (-1, 1) NDC point corresponds to the screen space continuous coordinate of (0.5, 0.5). So, SV_POSITION.xy (or gl_Fragcoord), assuming MSAA is off, will be (0.5, 0.5) at top-left corner (or bottom-left in OpenGL). So the texture coordinate can be obtained by just scaling down -

(0.5, 0.5) / (w, h) = (1/w * 0.5, 1/h * 0.5)

So you don't need to add that 0.5 offset to your texture coordinates, just use gl_FragCoord or SV_POSITION directly.

gl_FragCoord.xy / vec2(w, h)

Fix in d3d9 vertex shader

Although not super relevant to most, it's interesting to see how the problem can be solved in one go right in the vertex shader (as the article I linked to shows). Here, I will just write in plain english what it is doing.

The pixel shader fix of adding 0.5 offset to SV_POSITION works, but only during certain usages like UI rendering, full-screen passes, etc. as Aras P. points out.

All we need is to forcefully shift the mapping from NDC to screen-space (viewport, more accurately) such that the pesky (-1, 1) NDC coordinate maps to viewport-space (0.5, 0.5) and we then we can simply say that top-left pixel center (which is where a fragment goes into if it passes the depth/stencil/etc. test) are at (0.5, 0.5) instead of viewport space (0.0, 0.0).

And that can be done in the vertex shader, and essentially make the pixel shader's SV_POSITION (or gl_FragCoord) behave like opengl or d3d10 pixel shaders.

If you have a clip space position clip_pos.xyzw, you know that clip_pos.xy / clip_pos.w will give the NDC xy coordinates. Add an offset vector there, call it o

(clip_pos.xy + o.xy) / clip_pos.w

This will be equal to (-1 + 0.5 / w, -1 + 0.5 / h) when clip_pos.xy / clip_pos.w is (-1, -1). Solve the equation for o, and you get o = clip_pos.w * (1/w, 1/h).

#programming #d3d #opengl

0 notes

rksnyp · 7 years ago

Text

I've been tinkering with an SSAO implementation https://en.wikipedia.org/wiki/Screen_space_ambient_occlusion

This effect, to me, seems like the posterchild of real-time graphics in the late 2000s. Partly because it's a post-processing effect, and the implementation is rather simple. (Post-processing addons like ENB and Reshade come bundled with SSAO). It has also spawned quite a few other techniques like 'HBAO' and 'SSDO'.

Ambient occlusion is an approximation of global illumination, and SSAO is an approximation of even that. It's kind of like when a pencil artist shades the edges and creases of objects in a drawing. These areas have less chance of light reaching them, so SSAO simply samples multiple surrounding points, and sees if the depth (view space z, not the non-linear NDC space z) is less than the fragment's depth itself (but not too far either, so some thresholding is needed), therefore deciding that the fragment in question is lying inside a fold or crease like region. That's my tl;dr of the technique, and without going into deep details (quite a few good examples are available on the web), the number of direction samples is usually not enough for real-time rates and even with a variation technique - there are at least two I've seen (reflect off a random plane vs create a random tangent-basis) - the effect of aliasing is visible, so for 'proper SSAO', you gotta blur the resulting occlusion factor texture.

In fact, there was a silly bug in my blur shader, which really amounted to not blurring at all. This creates an interesting newspaper-print like look. And if you don't vary your direction-sampling kernel at all, you get another look which kind of resembles water paint.

#cg #programming

0 notes

rksnyp · 7 years ago

Text

A small note on the 'optimized' Gaussian blur implementations that come up in a few places on doing a Google search. Since I couldn't find a proper derivation of the maths behind the method, simple it may be, here's the general way to do it.

Basically, instead of sampling some N (N should be odd) texels you sample N/2 + 1 positions in between adjacent texels, relying on the linear sampler to give you linearly interpolated values. So suppose that you have texel0 and texel1, you would originally sample both. But if you sample in between by an offset t, you get a value

Sample0 = texel0 * (1 - t) + texel1 * t

On the right hand side you can equate with the original weights in the Gaussian kernel matrix, call them G0 and G1.

NewWeight * Sample0 = G0 * texel0 + G1 * texel1

You will get 2 equations from here by matching coefficients of texel0 and texel1.

And then solve for NewWeight and the offset t. Calculate these weights and offsets for each pair. For a kernel of size 5 these would be (texel0, texel1), texel2 alone, (texel3, texel4) and use them in the shader.

#shader

0 notes

rksnyp · 7 years ago

Text

Recently thought of adding #include <...> support to my shader system. This has been a chore for a while. Initially I thought implementing it would not be too time-consuming. Just match the pattern with a regex and paste in the included files. Huh. But not really. Gotta track include cycles. Gotta avoid parsing commented out sections. All in all a full fledged tokenizer is required for this.

No thanks. So I was already aware of shaderc and you can actually use to just preprocess glsl files instead of full compilation. At first glance through examples I thought it only compiles the latest >= 4.5 GLSL files in Vulkan mode only, but there is support for OpenGL too. Went with that. Does add a bit of stuff to dependencies, but it's not that much of an issue.

#programming

0 notes

rksnyp · 7 years ago

Text

C++’s std::variant

In my "experiments framework", I have implemented quite a few 'tidbits' that make coding with C++ somewhat more pleasant or at the very least, easier. (Don't get me wrong, I love C++ as a language, but it can get rather convoluted at times). So I thought I would write about one of these facilities here.

C++17's standard library has std::variant, which is a tagged union. But of course, the standard variant is very bare-bones, and meant to be extended from what I can feel. Suppose that I have a variant of 3 possible shapes that looks something like

using Object = std::variant<Sphere, Cube, Mesh>;

when I actually want to inspect an object of this type (i.e. Object), and switch-case on it, I have to remember the index of the type. So Sphere has index 0, Cube has index 1, and Mesh has index 2. This is rather messy if you are working with many variants and have to keep these indices in your brain at all times. If I later inserted another type into the variant, and not at the last, for whatever reason, I would have to change all my switch-cases. So, the obvious thing I could do is to implement, or rather extend, the std::variant that also keeps a mapping (compile-time, of course) from the parameter types to the index. I call this, for lack of a better name, VariantTable

Now I can write a switch case, or just extract the index directly as follows

using Object = VariantTable<Sphere, Cube, Mesh> Object v = ...; // Fill with some value switch (v.index()) { case MyVariant::index<Sphere>: // Handle the case when it's containing a Sphere break; case MyVariant::index<Cube>: // Handle the case when it's containing a Cube break; ... // and so on }

--

Doing things like these requires a bit of template programming, and I've realized that the best 'data-structure' to use in fairly heavy template programming is the linked-list, or rather the 'cons cells' based linked lists from Lisp. C++'s variadic templates are good for short self-contained bits of implementation, but I think once you just start using cons based lists, they are much more tractable mentally (especially if you are familiar with some functional programming idioms). Unlike variadic parameter packs, which you must construct immediately and pass as template parameters, cons-based lists can be "stored" and then operated on. For example, using MyList = unpack_t<Constantbuffer, Storagebuffer, Counterbuffer>. This will simply create a list of of 3 cons cells.

I do prefer to just sometimes 'unpack' all given variadic template parameters into a cons list, and operate on them.

--

Of course, there are more expressive ways to switch on a variant, like a match function that takes a list of lambdas, and resembles some other languages like Haskell/OCaml, but I found the debugger to be finnicky when jumping inside these lambdas. I think the switch-case with automatic index map is a good balance.

--

Implementation - This uses an AssocArray type

// Subclass of std::variant (or ::variant) which provides an assoc array from type to index, so you don't have // to remember it yourself when doing a switch-case. So use this one instead of original variant. template <typename... Types> struct VariantTable : public ::variant<Types...> { template <size_t n, typename... T> struct Rec; template <size_t n, typename T> struct Rec<n, T> { using assoc_array = make_assoc_array_t<T, ConsUint<n>>; }; template <size_t n, typename T, typename... Rest> struct Rec<n, T, Rest...> { using assoc_array = assoc_array_set_t<typename Rec<n + 1, Rest...>::assoc_array, T, ConsUint<n>>; }; using assoc_array = typename Rec<0, Types...>::assoc_array; // Static variable that contains the index of T in the variant template <typename T> static constexpr size_t index = (size_t)assoc_array_get_t<assoc_array, T>::value; // Return reference to inner value template <typename T> T &as() { return get_value<T>(*this); } template <typename T> const T &as() const { return get_value<T>(*this); } // Still have to define copy, move, ctor and assign just for the conversion from one of the internal types // during assignment. Here goes. using Base = ::variant<Types...>; VariantTable() : Base() {} VariantTable(const VariantTable &) = default; VariantTable(VariantTable &&) = default; template <typename T> VariantTable(T &&t) : Base(std::forward<T>(t)) {} template <typename T> VariantTable(const T &t) : Base(t) {} template <typename T> VariantTable &operator=(T &&t) { Base::operator=(std::forward<T>(t)); return *this; } template <typename T> VariantTable &operator=(const T &t) { Base::operator=(t); return *this; } };

#programming #c++

0 notes

rksnyp · 7 years ago

Text

Timing OpenGL commands

In all(?) GL implementations, the GL commands are executed by the driver asynchronously, i.e. every function call pertaining to rendering (like glDraw*) or copying data, especially asynchronous transfers (via PBOs), areexecuted at a later time than when the app calls the function. For certainoperations, GL takes care of the dependencies of operation, and for certain operations, the app needs to explicitly synchronize operations to ensure datais available to the consumer side (GPU -> CPU or CPU -> GPU).

So to time the actual completion of some GL command, for example a glDraw, the only way to ensure it is either call glFinish (which flushes all pending commands and also waits until they are complete), or use timer query objects. Query objects in GL extract some particular information from the GPU. Without going into too much detail, here’s how “scoped” queries are used.

glBeginQuery(GL_TIME_ELAPSED, query_object) // Issue commands here like glUseProgram(...); // Set shader program glBufferSubData(...); // Source uniforms glDrawArrays(GL_TRIANGLES, 0, 6); glEndQuery(GL_TIME_ELAPSED);

Since query objects get ‘filled’ with the info asynchronously, we need to synchronize and make sure the info is actually available before we call glGetQueryObject . So after the glEndQuery, we associate a 'fence’ with the query which will be ready when the query (and therefore the commands that it is scoped around) is finished executing on the GPU side.

... glEndQuery(GL_TIME_ELAPSED); fence = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);

And then we can wait for the fence to be ready, with glClientWaitSync.

glClientWaitSync(fence, 0, 0); // Last 0 means return if result isn't ready, don't wait

But if we do this every frame, we would lose any performance benefits of the gpu rendering frames asynchronously. Given the commands issued in a particular frame by the CPU, the GPU might actually be executing the commands of the frame prior to it (or even 2 frames older). In fact, triple buffering is the norm. CPU processing and issuing commands for one frame, driver processing the previous frame and GPU executing the frame before that.

All this means, to be on the safe side, we should create 3 instances for the timer (and the fence to wait on). So like a ring-buffer, for example, the cpu can issue the timer query and create a fence at slot i, and wait on the fence (and get the query result) at slot (i - 2) (modulo 3).

So you can create multiple timer query objects for timing multiple (non-overlapping) GL command sequences.

Of course, there’s a some gotchas here and there and it’s important to wrap these functions in a nice interface. The current system I have implemented is rather simple and simply keeps the 10 max or min durations (or so, can be configured) per timer. This is rather useful.

#programming #opengl

1 note · View note

rksnyp · 7 years ago

Text

Live editing shaders

Since there are quite a few online shader playgrounds, I thought I would try implementing one myself, but in Linux. The idea is pretty simple, I’ve created a FileMonitor that one registers listeners with. The FileMonitor object will poll changes to files if you call poll_changes() on it. Underneath, I use Linux’s inotify set of functions. The thing works pretty nicely, I’d say - https://i.imgur.com/J9kfy2L.gifv

#programming

0 notes

rksnyp · 8 years ago

Text

Small note. Now I use this expression in Python to create an 'empty' list of some N Nones

l = [None] * N

Then stuff works as expected. Now if you instead want to initialize to the same valued object the following however doesn't work as intended.

l = [A()] * N

This will put N references to one object of type A in the list. So if A were a dict, you would be updating the same dict from different elements of the list. sigh. One way to get the desired behaviour is to use

l = [A() for i in range(N)]

That's it really.

#python

1 note · View note

rksnyp · 8 years ago

Text

Barycentric coordinates

This is an incomplete dump of all that I've come to know about Barycentric coordinates, looking at it from a few point of views. Mostly a mental dump, not really well written, and contains no diagrams. There are some fine articles on it which I have read, and I will refer to them at the end.

The ‘twice_signed_area’ and 'is_counter_clockwise’ primitives

First of all, I will talk mostly about 2D real-vector space. One ubiquitous operation we perform on vectors is the cross product. Let's have 3 points A, B, and C. We can write a function that checks if this sequence of points lies counter-clockwise on the plane -

def cross(A, B, C): return (B.x - A.x) * (C.y - A.y) - (B.y - A.y) * (C.x - A.x) def is_ccw(A, B, C): return cross(B - A, C - A) > 0.0

This is the same as saying, that if A and B describe an infinite line, is C lying to the left of this line or not. So note that given any 3 points, A, B, C, we can pass the three points to this function in any one of 3 orders A, B, C, or B, C, A, or C, A, B.

Turns out given three points A, B, C, the absolute value of the cross product as computed above is equal to twice the area of the triangle ABC. The cross product is also alternatively defined as

def cross(A, B, C): AB = B - A AC = C - A return length(AB) * length(AC) * sin(Theta) where Theta is the angle between the the vector AB and AC.

This definition makes it clearer to see that the length of the cross product is indeed twice the area of the triangle ABC.

The sign is positive if the triangle is wound counterclockwise, zero if it's a line and negative otherwise. Since we are in a 2D vector space, the vector represented by the cross product is normal to the plane the points are lying in, and in our case the z-axis.

Barycentric coordinates are defined directly in terms of cross product, and in particular, the previous definition is used. We will see that the previous expression yields the same value as the |A||B|sin(theta) definition.

Signed distance of a point to a line

One way of describing a line is to have a point P and a unit-length vector v, and together they define the line L(t) = P + t * v. This is an ‘explicit parameterisation’, the parameter being t.

But there's yet another way, a so-called implicit one We can have a point P and a unit-length normal vector n. The line they describe has the point P in it and has n as its normal. Obviously we can have 2 normal vectors describing the same line. Using the normal n we can come up with an equation for the line.

What is the distance of any given point X to this line? It's the dot product of X - P with n. So we can write d = dot(n, X - P). Points on the line have d = 0. That is the equation for line is L(X) ≡ dot(n, X) - dot(n, P) = 0. Since a line divides the 2D space into 2 so called half-planes, d is positive when the point X lies in the half-plane toward which the normal points at, and negative when it's in the opposite half-plane.

Barycentric coordinates just some 2D affine space

Given a triangle ABC, choose the origin of the "barycentric space" as A. We can choose any of the tree vertices, but let's go with A. We have two vectors B-A and C-A. Barycentric coordinates are just a 2D vector space where these two vectors form the basis. So, let's write the barycentric basis to standard basis transform as a matrix. The x-axis becomes the vector B-A, the y-axis becomes the vector C-A and the origin shiftes to A.

So the transformation matrix from standard basis to barycentric coordinates is the inverse of this matrix. Which turns out to be

+-- --+ | C_y - A_y A_x - C_x A_y * C_x - A_x * C_y | 1/cross(A, B, C) * | | | A_y - B_y B_x - A_x A_x * B_y - A_y * B_x | | | | 0 0 1 | +-- --+

So, given a point in 2D space, we can find its barycentric coordinates by multiplying this matrix. We add an implicit 1 as the 'third' component of this point. For vectors, this third component is 0. At this point you pretty much know what barycentric coordinates are, but I will look at it from another point of view just for fun.

Signed Height

Ignore that division by the cross product for now. It's a division by the signed area of the triangle of course. Also, assume that ABC is wound counter clockwise, so that area is positive. Let's call the matrix M and the result of multiplying M to P be equal to [u, v] - where u and v are the Barycentric coordinates of the point X not yet divided by the area.

Let's see what the top row of the result is. If we are given a point P = [X, Y, 1] in the standard basis, the u component of its barycentric coordinate representation is

u = X * (C_y - A_y) + Y * (A_x - C_x) + (A_y * C_x - A_x * C_y)

Let's call the vector n = [C_y - A_y, A_x - C_x], the 'normal', although it is not of unit-length in general. This n is just the vector C - A rotated by -90 degree about z axis - remember that a vector [x, y] rotated by -90 degree about the z axis is [y, -x]. This vector points to the right of the vector AC.

u = dot(n, P) - d where d = (A_y * C_x - A_x * C_y) = length(AC) * dot(normalize(n), P) - d (Since n is just the vector AC rotated by -90 degrees)

Not showing it here, but d = dot(n, A) = length(AC) * dot(normalize(n), A). What we have here is therefore,

= length(AC) * dot(normalize(n), P - A)

This expression denotes two things. It's the signed distance of P from the line AC, scaled by the length of AC. That's the first thing. It's positive if it's to the right, 0 if it's on the line, and negative otherwise. But length(AC) is equal to the length of the base of the triangle PCA, and that dot product is the height of this triangle. We know that area of triangle = 1/2 * base * height. So what we have calculated is twice the area of the triangle PCA, and of course, signed. So u is nothing but the signed area of the triangle PCA.

Similarly, v would be the signed area of the triangle PAB. Ultimately, these two areas are divided by cross(A, B, C) which is the area of triangle ABC (times 2) and we get the Barycentric coordinates of the given point P as [area(PCA) / area(ABC), area(PAB) / area(ABC)]. If P is inside or the triangle, both the components must be between 0 and 1. The reason is that for the fraction area(PCA) / area(ABC), the numerator and denominator are length(CA) * signed_distance(P, CA) and length(CA) * signed_distance(B, CA). So the base lengths cancel out and we take the ratio of the singed height of P and B from the base.

#math

0 notes

rksnyp · 8 years ago

Text

`pmr` namespace in C++

The pmr namespace in the C++ standard library provides an interface called memory_resource. It's a little weirdly named perhaps, it could just be called a memory_allocator, but the Allocator concept is already being used in the standard library for quite a while, so I think that's why the committee chose to name it that in order to avoid confusion.

Anyway, so the memory_resource is a pretty simple abstract class that is meant to be used as the interface to memory allocators. Classes that derive from it are required to override 3 virtual functions do_allocate, do_deallocate, and do_is_equal. The third one is simply for checking if 2 allocators are the same and should usually be implemented as return this == &other;. The do_deallocate receives a pointer with a size and alignment. Your allocator can do something with it, like free the memory, or maybe do nothing depending on what type of allocator you have.

The do_allocate functions takes a size and an alignment and should allocate memory and return a void * pointer. This is the main difference between memory_resource based allocators, and the Allocator concept. The stdlib containers use allocators that should satisfy the Allocator concept. And this concept is rather awkward.

A memory allocator should just allocate memory. It doesn't really need to know the type of the object or objects that will reside in there. Just the size it needs to allocate and the alignment. But an Allocator, during creation, requires a type T specified for it and gets the size and alignment information from sizeof(T) and alignof(T). It doesn't need T for anything else. Also, when you create a vector, you instantiate the vector template with an element type and an allocator type. So you cannot copy-assign a vector that is created with a different allocator to another vector since the allocator types are different despite these two vectors having same element type.

This issue is what pmr's memory_resource interface and the "polymorphic allocator" model seeks to solve. The stdlib made a mistake I think in designing the allocator interface. We shouldn't even need to have an allocator as a template parameter at all. So in order to circumvent this, the pmr namespace also provides a class called polymorphic_allocator, which is an Allocator, but takes a memory_resource during construction. So your vector templates are instantiated for the same polymorphic_allocator but the allocator themselves use the particular memory_resource they are holding to allocate and deallocate memory.

As an example, I adapted the TempAllocator from the foundation library to be compatible with the pmr::memory_resource and here's how it looks - https://bitbucket.org/snippets/snyp/KAKeXG/pmr-based-tempallocator

#c++

0 notes

rksnyp · 8 years ago

Text

A discussion has been recently made popular by the Go community, and in particular by Rob Pike's talk 'Concurrency is not Parallelism'. I have, since listening to the talk, tried to make the distinction in my mind clear, but it has almost always been too hand-wavy. So today, I was reading a book, and I have found a nice definition of concurrency that makes the distinction really clear. A software system is concurrent when it consists of more than one stream of operations that are active and can make progress at one time. Parallelism is said to be achieved when multiple such streams are executing simultaneously. And thus, concurrency 'enables' parallelism this way.

#note

0 notes

rksnyp · 8 years ago

Text

Stack Frames

I was studying a coroutine library, which made me inspect the SysV calling convention. So, without much explanation whatsoever, I will dump this ascii drawing of how the frame of a function looks like.

Assuming you have pushed 'previous' rbp, the n-th memory argument eightbyte is at location [8*n + rbp+16].

The end of argument memory argument area, including the return address, is aligned to 16 byte boundary (unless __m256 is passed, then aligned to 32 bytes)

Locations [rsp .. rbp) holds locals. margs hold arguments passed by memory as opposed to registers, chunked into 8 bytes according to rules in the SysV specification. Ret is the return address. It's always passed via memory. loc1 is just a local variable for example. RED is a 128 byte region and called the 'red zone'. It's a region that the function can use for data that doesn't have to persist across function calls.

+-----+ -------------+ |margN| | +~+~+~+ <--- rbp + 16 + 8*N | | | | +-----+ | |marg0| | Caller's frame +-----+ <--- rbp + 16 | | Ret | | +-----+ <--- rbp + 8 (aligned to 16 bytes) -----+ <---- rsp on entry. | rbp'| | +-----+ <--- rbp | |loc1 | | +-----+ | | | | +~+~+~+ | Callee's frame | | | +-----+ <--- rsp | | | | | RED | | | | | +-----+ <--- rsp - 128 -------------+

Notes -

That push rbp, then mov rbp, rsp, sequence is called the 'prologue'. It's what enables a debugger to step through parent function frames.

If you do not modify rsp on entry i.e push rbp and/or expand the frame by decrementing rsp, then rsp would point to the return address, as you can see. So no need to put an 'epilogue' like

mov rsp, rbp

pop rbp

before the ret instruction.

A leaf function can use the red zone as its entire frame (if size is enough).

#amd64 #assembly

0 notes

rksnyp · 8 years ago

Text

I believe writing a note is almost always better than not writing it. I am forgetful in general, and with a lot of information it can be difficult to keep everything in head at all times. Hence, even if there's a mental voice telling me "Oh you don't need to write a note for something as simple/small as that!", I try my hardest to justify writing it. So yeah, writing a note, sometimes in whatever cryptic personal way, on however small a concept, is a good thing. Notes don't have to be understandable to everyone who reads it, just understandable to the author suffices.

#personal

0 notes

rksnyp · 8 years ago

Text

Not much to write. Thought I would share a script I wrote to do really simple algebraic expression expansion in the command line. I really make a lot of mistakes when expanding formulas involving lots of multiplications. Well, the script is really meant to be a poor man's formula expander supporting only multiplication, additions and subtractions. http://paste.debian.net/925282/

I am slowly writing a new one with more operators and a few more useful but simple features like variables that can store expressions, and evaluating numbers in the terms, so there's that. An example, and the one which made me realize that I could perhaps write an automatic evaluator for this formula, which is a part of the expression for the determinant of a 3 by 3 matrix:

> (m00 - lam) * ((m11 - lam) * (m22 - lam) - m21 * m22) - m01 * (m10 * (m22 - lam) - m20 * m12) + m02 * (m10 * m21 - m20 * (m11 - lam)) + 'm00'm11'm22 - 'm00'm11'lam - 'm00'lam'm22 + 'm00('lam**2) - 'm00'm21'm22 - 'lam'm11'm22 + ('lam**2)'m11 + ('lam**2)'m22 - ('lam**3) + 'lam'm21'm22 - 'm22'm10'm01 + 'lam'm10'm01 + 'm20'm12'm01 + 'm10'm21'm02 - 'm11'm20'm02 + 'lam'm20'm02

Well, a simpler example would be

> (a - b) * (d - c) * (x + z) + 'a'd'x + 'a'd'z - 'a'c'x - 'a'c'z - 'b'd'x - 'b'd'z + 'b'c'x + 'b'c'z

--

I have been listening to this channel a lot, and I love this album they've posted https://www.youtube.com/watch?v=Gsi2b6HJUd4.

#programming

0 notes

rksnyp · 8 years ago

Text

I was thinking about writing a CSON parser for easily configuring my demos as I delve deeper in programming graphics. But I found Niklas Frykholm's fine library

https://github.com/niklasfrykholm/nflibs

and I'm using just that. Something to learn from this design is to consider making data structures easily relocatable in memory. For the most part this means using a contiguous buffer for the whole data structure and using offsets instead of pointers so that internal references don't need to be patched up.

0 notes

rksnyp · 9 years ago

Text

Change of basis is an operation that comes up in graphics so very often. Without going into too much details about vector spaces I will write about this operation as a note.

A basis can also be called a 'frame'. Same thing, just sounds better to me in certain contexts. A matrix encodes a linear transformation. Multiplying a vector by the matrix produces the resulting vector. A vector, by itself, is just a tuple of scalars. To 'know' what point or direction the vector is representing we need to keep in mind an associated frame. For now, I will only talk about linear transformations that map from a three-dimensional vector space to that same three-dimensional vector space. (These linear transforms are also called 'operators'. Don't know why)

The frame with respect to which we should interpret the resulting vector after a matrix multiplication should be specified. Here's what a matrix looks like for a linear transformation T, that transforms vectors in frame (b1, b2, b3) where each bi is a basis vector, to a vector in frame (c1, c2, c3), each ci being a basis vector. Of course, each basis vector us to keep in the back of our mind what frame we are supposed to intepret it with respect to.

The column vector i of the matrix is the vector T(bi) interpreted in basis (c1, c2, c3). That is, the column i contains the coordinates of T(bi) with respect to basis (c1, c2, c3)

T(b1) T(b2) T(b3) +-- --+ c1 | | | | c2 | | | | c3 | | +-- --+

The basis (b1, b2, b3) is called the input basis of the matrix, and (c1, c2, c3) the output basis. After multiplying and getting the result, we must interpret the result with respect to the frame (c1, c2, c3). Two specific cases of this kind of matrix are when

The input and output bases are the same.

I.e the matrix looks like,

T(b1) T(b2) T(b3) +-- --+ b1 | | | | b2 | | | | b3 | | +-- --+

The resulting vector is to be interpreted with respect to the same basis you interpreted the input vector.

The transformation T is the identity transform.

And this is the so-called change of basis matrix. It represents the identity linear transformation. But it's not going to be the identity matrix in general, since the output basis is not, usually, same as the input basis (Or why would we call it change of basis matrix, right?).

b1 b2 b3 +-- --+ c1 | | | | c2 | | | | c3 | | +-- --+

Suppose that we have begun with the basis (b1, b2, b3), each vector bi interpreted with respect to the standard basis.

We apply a transformation T to each basis vector to obtain another basis (T(b1), T(b2), T(b3)). (Keep in mind we are interpreting all these vectors in the standard basis. When no frame is mentioned, it is assumed you are to interpret vectors with respect to the standard basis). So the matrix of this transformation with input basis and output basis both (b1, b2, b3) is, again,

T(b1) T(b2) T(b3) +-- --+ b1 | | | | b2 | | | | b3 | | +-- --+

The inverse of this matrix would be interpreted like this

b1 b2 b3 +-- --+ T(b1) | | | | T(b2) | | | | T(b3) | | +-- --+

This is just the change of basis matrix from basis (b1, b2, b3) to basis (T(b1), T(b2), T(b3)).

Therefore, the result vector of multiplying a change of basis matrix like that to a vector v, interpreted wrt the input basis (b1, b2, b3), can be seen in two ways.

The result vector is the inverse of T applied to v, interpreted wrt the same frame (i.e the input frame/basis).

The result vector is how v looks like wrt the transformed frame (T(b1), T(b2), T(b3)).

Yep. That's it really. (b1, b2, b3) is often the standard frame itself. So to see how a vector v looks like in some other frame, we can first construct the matrix that transforms the standard frame to this other frame. Then invert it to get the change of basis matrix from the standard fromae to the other frame. If the transformation is orthogonal, i.e the columns of the matrix, which form the basis vectors of the 'other' basis, are mutually perpendicular to each other and of magnitude 1, we can simply transpose the transformation matrix to get the change of basis matrix.

#linear algebra

0 notes