#that their brains are all somehow capable of computer level complex calculations
Explore tagged Tumblr posts
Text
~IT DOESN’T HAVE TO BE PERFECT~
Of course you’re going to give up if your brain insists on turning every small task into a huge simultaneous equation
To be as clichéd as possible; Just fucking do the thing
#I seem to have this revelation every other week but it never sticks#why assume that everyone else has everything perfectly mapped out and calculated to the nth degree?#that their brains are all somehow capable of computer level complex calculations#Occam’s razor would say that’s a stupid assumption#why does my brain insist on having everything interlinked and predicted and planned for?#it’s too much to hold in one brain#Sometimes we just have to focus on the key points and let the rest fall into place.#I do my best work when I’ve given up or I don’t care or there’s impossibly little time before the deadline#shoutout to the time I started an essay worth 30% of the mark 6 hours before the deadline thought it was shit#was genuinely scared to look at the results because it was so bad#and then got 98%#got so mad at the time but it was a good lesson that sometimes less is more
3 notes
·
View notes
Photo
I like the ambiguity of "knowing" it's "you". the way I'm interpreting it is their mind somehow senses that this encounter is one with the same ~entity~ as previous times? like if someone doesn't know you they can still realise the lizard they saw the other day had the same identity™ or sense of ego as that of the scooter they're seeing right now. so you're a cryptid that can present as and be whatever they like, except everyone has a special Recognising You sense, that doesn't necessarily tell them where you are if they're not looking at/hearing you, or your name(s), or things you've done that they haven't witnessed.
ok now here's my plan: I shapeshift into dust and spread myself across the entire atmosphere, right? then locally I can be whatever I want wherever I want, and no one can know since they also get a false-positive feeling that I'm there all the time. or maybe they can recognise the person in front of them as Also-Me even when the atmosphere is me too? in which case my plan is to be more indirect; maybe if I became a human they'd know it was me, but if I create a puppet-human to interact through,,, there has to be a point at which I'm far enough removed that they don't know about it.
assuming there's no limit to the number of sensory inputs I can receive simultaneously (i.e. if I have a million eyes I can choose to be able to see out of all of them), the main obstacle is my mental capacity to interpret these inputs and act on them - the way I understand shapeshifting, my thoughts are produced by some metaphysical/spiritual "mind", disconnected from any specific brain (otherwise once I turn into dust I'm literally dead lol. shapeshifting in general would probably be very painful), with the same capabilities as those my usual brain would otherwise have. the solution: become a computer! specifically, have some parts of me be analogue (or preferably electrical, if possible) machines running complex calculations so that I don't need to hold them in my proverbial head; I'd keep these machines far away from possible human tampering, of course. ideally, I'd work to make them sophisticated enough to plan subtle physical manipulations on the molecular, perhaps even chemical, level. I'd probably like to be able to make some crude predictions about macro events like politics, economics, climate etc. maybe have an intuitive pattern-recognition module as well, if I can manage it. best-case scenario is I host a self-improving AGI.
now I can do way more complex things than humans can comprehend. this includes me, as my actual consciousness is still pretty limited, so the plan would have to include a way for me to get summaries of the information my system gets and monitor its actions - otherwise I might become just as dependent on the whims of the singularity I created as the rest of the world.
if I somehow do all of that, I've basically cracked the code - I can interact with people through any number of totems and physical manipulations, without any of them being recognised as me. I can make it so no one knows I exist, or so everyone reveres me as a goddess, or just live my life normally except looking however I want and ensuring things like world peace and equality and stuff. y'all are welcome.
WHAT IS THE FUCKING POINT
785K notes
·
View notes
Text
Q learning flappy bird game demystified (part 1)
Let's place our first babysteps into reinforcement learning. I think, this one will be the very first part of a total of three parts series. In this series of posts, I would like to cover Q learning in reinforcement learning, starting from its pure implementation, through neural network based modification and finally adding computer vision facilities to our game playing system.
What we are going to do:
We will be using a popular game "flappy bird" for the demonstration purpose. In this article, we will discuss the essential information needed to code pure Q learning algorithm, which requires connection wires to access game data. We will create an independent version (computer vision based) of Q learnng in the later sections.
Prerequisit: We will do all our experiments in a java based language library, 'Processing'. You need thorough knowledge on this and basic knowledge on java.
Reinforcement learning:
Here's the scenario: One or more agents are wandering around in an environment. The agent can be anything, which could perform an action chosen from a set of actions (say A) based on its current state. The states are made available by the environment from a set of possible states (say S) based on the recent action taken by the agent. The agent will also recieve reinforcements as rewards or punishments either from the environment or from within. bla.. bla.. simply, you are an agent in the environment planet earth. You know that, time goes on in forward direction, and you wanna do things in this small lifetime. You are busy taking actions at each moment, and at the very next moment you will be at a new state, because of the action you took. At each state you will be rewarded or punished by either your serroundings or by your thoughts itself. That's all reinforcement learning. The basic concept is this much simple. To develop a model and train it for our purpose, we should create an environment that suits our needs. As a teacher, we need to worry only about the reward or punishment that should be provided to the agent to make it perfect for our requirements.
Let's hack:
How we are going to represent this with the help of mathematics? Even though you skipped everything given above, the upcoming sections will help you to breakdown the algorithm if you have a premitive brain like mine. Hold on..
First question is, what's the algorithm or what are the steps to be performed to get reinforcement learning (RL) running? Here it is:
for each moment of time in agent's lifetime do these steps given below
perform the right action, which is suitable for current state.
learn from reward/punishment that, the last action performed at the last situation was right/wrong.
RL algorithm is simple, right? We could add any logical sub steps to our algorithm as long as the high level steps stays the same. We will be using Q learning to create substeps in the above algorithm. We will examine each of the two steps under seperate sub-headings. Each part will have three sub-parts, thought process : how we approach the problem , data structures and solution : conclusion of everything.
Step 1 - choose the right action:
thought process:
how do we select an action from a set of actions? hmm.. We know that, in any selection process, if we could somehow assign scores to each candidate, we've got a justifying method for selection.
we also know that, the action selection is dependent on current state, so that, for each state we will have same set of actions with different score values.
now that we have got scores related to actions. We could select an action in different ways, depending on the problem we're trying to solve. Like, calculating probability distribution over actions { (score of ith action)/(score of all actions) }, complex probability functions like this , or simply choosing the action with highest value :) .
data structures:
if we keep these scores in memory, we could reuse them (it will be used in step 2). It will be cool if we could access them as multi dimensional arrays, like QScore[states][actions].
solution:
We need a states x actions matrix (states or actions can already be multidimensional) to keep our scores as memory, let's call it QMatrix.
For a specific situation, consider the raw of actions. Perform the action which have the highest score (we chose the simplest action selection function).
Step 2 - learn from reward/punishment:
thought process:
We need an equation to update our score based on our achievements for better results in the future. This equation does that updation. Under next sub-heading we will arrive at this equation through very basic thought process.
data structures:
same as above.
Deriving update equation for Q learning:
We would like to teach our agent through reinforcements to tackle different situations in the environment. Because of that, we want those reinforcements affect the new score calculations. The agent's entire action is purely based on those scores. Let's say that Qnew is our new score. Then we could say that, Qnew = reinforcement. Okey, that will work because, the agent looks for the maximum score value among available actions, since we will be giving a positive number as the reward and a big negative number as punishment, agent will choose the action with positive value which already succeded.
This doesn't stop here. We want our agent to be future concious. It must take current actions with future advandages in mind. In the previous case, the agent is blind. It only acts based on last experiance only, and will only look for local advandages. As an example, consider our agent as an ant. It should consider exploring the world for better food resources than being localized to get limited bad food. So, what should be our modification? we will look at one step further, to see advandages, if we took an action. Mathematically, we wanted to add the maximum score available in the next state if the agent decides to take an action. ie, Qnew = reinforcement + Max. future score available in the new state. We want to be more flexible on our agent. Depending on the problem we're dealing with, we want to place a control over this foreseeing capacity of agents. Its not a good approach to give same capabilities to every creature in the world we're creating. so we will introduce a discount paramater to control future foreseeing capability of that agent. then, Qnew = reinforcement + (discount) * (Max. future score available in the new state). Value of discount varies between 0 1.
Wow.. our equation is getting better and better. Now what? Yes.. we want our agent be less error prone. There may be situations of uncertainities including, agent's perception errors (like misinterpretations of input senses of the agents), special cases in the environments (since the agent is in learning state, it should be able to differentiate special cases) etc. So what we need our agent to do is, keep old information that learned so far with the new ones. We could modify our old equation as, Qnew = [Qold] + [reinforcement + (discount) * (Max. future score available in the new state)].
As we did in the case of future vision, we need to introduce another parameter control over the agent, to limit its learning capabilities. more specifically, how fast it should adapt the new changes in environment. That new parameter is called learning rate. And thus we could modify the equation as: Qnew = {[1 - learning rate]*[Qold]} + {[learning rate]*[reinforcement + (discount) * (Max. future score available in the new state)]}.
We're good. Whoo.. its over. For more information, you could hang in here.
example problem - a self playing flappy bird game
Since we've explained almost everything, I think I should provide only specifics to break this problem down. Here's an example I've did in processing:
youtube
find the processing sketch in github: source code
States: This is the trickiest part. This is not my idea. I've got this from the internet. For a flappy bird, the only requirement is to be in the air and jump over the pipes. Whatever the pipe's height, the bird should be worrying about its alignment with the pipe's tip. We could represent it with two variables, width (current horizontal distance from the pipe's tip) and height (current vertical distance from the pipe's tip). Thus we could say that, whatever may be the pipe's height, each state is a relative position from the immediate pipe's tip. Thus, the states (relative positions) near to the pipes will help the bird to jump over the tip; whereas, far away states could help the bird to stay alive and keep an average height, always.
Actions: jump or no jump.
Reinforcements:
for each action, chech whether the bird is alive, if so, reward positive value, a large negative value, otherwise; this is to teach the death traps to the bird.
if the bird crosses the pipe, reward it with relatively big positive value, this is to teach the bird, how to score.
give a small negative reward for each jump as energy wastage, and small positive value for free fall (no jump) as energy gains. This is to encourage the bird to jump only if it is necessory.
future parts
In the future parts we will look at
a way to replace the QMatrix with feed forward neural networks (part 2) [currently working on it].
a way to introduce computer vision to play by looking, using convolutional neural networks (part 3).
0 notes