Autonomous Car Parking using ML-Agents

Published in

XRPractices

6 min readAug 21, 2020

Using Unity’s ML-Agents and simulation environment to train AI for autonomous car parking

Gone are the days where we used to see self driving cars or autonomous cars only in science fiction. With leading car manufacturers of the world are investing big time in self driving cars, it is going to be a reality that the self driving cars are going to be part of our daily traffic within next decade.

Self driving cars have a wide domain of problems to be solved such as Lane detection, Lane following, Signal detection, obstacle detection and many others. In this article we are going to solve one of the problem — Autonomous Parking using Unity’s simulation environment and ML-agents package.

If you are new to ML-agents and wondering how to setup, please read my earlier article on training a 6 axis robot arm.

How to train your Robot Arm?

Training a 6 axis robot arm using Unity ML-Agents

medium.com

Why Simulation?

We have seen autonomous vehicles doing trials on the road. But thats the last phase. Training an AI for autonomous vehicles goes roughly in 3 major phases

Simulation / Supervised Learning — Using Computer or Lab
Controlled real world environment learning — Pre-built within campus
Real world learning — Hit the road

What we are going to see in this article is the first phase, i-e Simulation learning where the AI will be trained by physics simulation in a computer model. In this phase, everything exists as data and software. The trained AI model will only produce signals like how much throttle, brake and steering is needed as output. Nothing is linked to any Hardware of physical car. It is in the second phase where the output from AI will be passed to hardware.

Simulation Setup

This simulation has 2 key parts, One is the “Car” which we are going to refer as agent going forward and the other is the environment against which we are training the car. In this project, the car element is created as a prefab and attached with a Rigitbody physics element. The car that we are simulating is set with the following values

A total body mass of about 1500kg

A Car controller with Front wheel drive
Brake on all 4 wheels
Motor Torque of 400
Brake Torque of 200 and
Max. steering angle of 30 degrees

Ray cast sensors in all four directions of the car. The ray cast sensors in real-world would be the LIDAR like vision sensors.

Raycast sensors in all sides of the car to detect obstacles

Ray cast sensors in action during simulation

Reinforcement Learning

Unity’s ML-agents use Reinforcement learning (Markov Decision Process) for training the agents. So we need a mechanism to let the agent know when it has properly occupied a parking slot. So the parking lots are setup with a trigger collider.

Green colliders in the parking lot to detect and reward correct parking

The agent will receive a penalty if it runs into obstacles.

Funny AI

During the development of this project, I had setup a trigger collider at the entry of each parking lot to incentivise the agent on successful entry into the lot. But the clever AI started collect all the entry incentives by linearly covering them instead of entering into the lot.

Clever AI collecting parking entry incentives

Agent Overrides

Here are the key method overrides implemented for the Agent. At the beginning of each training episode, reset the parking lot occupancy to a different pattern and place the agent in a random location in the parking lot.

public override void OnEpisodeBegin()
{
    _simulationManager.ResetSimulation();
    _simulationManager.InitializeSimulation();
    _nearestLot = null;
}

There are 3 signals received from the AI, first one is the amount of steering. second is the amount of throttle and third is the amount of brake applied for each simulation step.

public override void OnActionReceived(float[] vectorAction)
{
    _lastActions = vectorAction;
    _controller.CurrentSteeringAngle = vectorAction[0];
    _controller.CurrentAcceleration = vectorAction[1];
    _controller.CurrentBrakeTorque = vectorAction[2];
}

The state of the simulation is collected in CollectObservations, The velocity alignment with the nearest parking lot is calculated and given as a reward. It should be noted that the ray cast sensor values will get automatically added into the observation at each step.

public override void CollectObservations(VectorSensor sensor)
{
    if (_lastActions != null && _simulationManager.InitComplete)
    {
        if(_nearestLot == null)
            _nearestLot = _simulationManager.GetRandomEmptyParkingSlot();
        Vector3 dirToTarget = (_nearestLot.transform.position - transform.position).normalized;
        sensor.AddObservation(transform.position.normalized);
        sensor.AddObservation(
            this.transform.InverseTransformPoint(_nearestLot.transform.position));
        sensor.AddObservation(
            this.transform.InverseTransformVector(_rigitBody.velocity.normalized));
        sensor.AddObservation(
            this.transform.InverseTransformDirection(dirToTarget));
        sensor.AddObservation(transform.forward);
        sensor.AddObservation(transform.right);
        // sensor.AddObservation(StepCount / MaxStep);
        float velocityAlignment = Vector3.Dot(dirToTarget, _rigitBody.velocity);
        AddReward(0.001f * velocityAlignment);
    }
    else
    {
        sensor.AddObservation(new float[18]);
    }
}

Apart from the above, there is an accident detection routine which will give a penalty if the car runs into any barrier, tree or another car. The penalty value chosen should be bigger enough such that the agent doesn’t repeat it too often and small enough such that the agent is allowed to explore. Larger penalty value doesn’t allow the agent to learn.

private void OnCollisionEnter(Collision other)
{
    if (other.gameObject.CompareTag("barrier") || other.gameObject.CompareTag("car") ||
        other.gameObject.CompareTag("tree"))
    {
        AddReward(-0.01f);
        EndEpisode();
    }
}

When the agent successfully enters a parking lot, this trigger collider is called and a Jackpot reward is given to the agent. A bonus value is calculated based on how aligned the agent is to the parking lot. A minimum bonus if the car is parked facing the wall, maximum bonus if it is parked facing the road.

private void OnTriggerEnter(Collider other)
{
    if (other.CompareTag("agent"))
    {
        if (fullEndCollider.bounds.Intersects(other.bounds))
        {
            if (!IsOccupied)
            {
                float bonusfactor = 0.2f;
                float alignment = Vector3.Dot(gameObject.transform.right,
                    other.gameObject.transform.up);
                if (alignment > 0)
                    bonusfactor = 0.8f;
                float bonus = bonusfactor * Mathf.Abs(alignment);
                other.gameObject.transform.parent.GetComponent<AutoParkAgent>().JackpotReward(bonus);
            }
        }
    }
}

The Training config yml is setup such that we have 4 hidden layers with 512 nodes per layer. And we run a training session of 10 million cycles.

behaviors:
    default:
        trainer_type: ppo
        hyperparameters:
            batch_size: 512 
            buffer_size: 5120
            learning_rate_schedule: linear
            learning_rate: 3.0e-4
        network_settings:
            hidden_units: 512
            normalize: false
            num_layers: 4
            vis_encode_type: simple
            memory:
                memory_size: 512
                sequence_length: 512
        max_steps: 10.0e5
        time_horizon: 64    
        summary_freq: 10000
        reward_signals:
            extrinsic:
                strength: 1.0
                gamma: 0.99
    Autopark:
        trainer_type: ppo
        hyperparameters:
            batch_size: 512
            buffer_size: 5120
        network_settings:
            hidden_units: 512
            num_layers: 4
        max_steps: 10.0e6
        time_horizon: 128

During the training it is found that the agent doesn’t learn properly even after 5 million cycles if we always place the agent at the entry of the parking lot. So randomly placing the agent near to an empty parking lot increased its chances of collecting a reward and there by increased the variance in learning. The simulation manager is altered to do this randomisation 50% of the time. The rest of the time the agent is placed in the parking entry.

public void PositionAtSafePlace(GameObject nearestLotGameObject)
{
   float[] ang = new float[] {-90f, 90f, 180f, -180f,0f};
   
   if (agent != null)
   {
      agent.GetComponent<Rigidbody>().velocity = Vector3.zero;
         agent.GetComponent<Rigidbody>().angularVelocity = Vector3.zero;
         agent.GetComponent<CarController>().CurrentSteeringAngle = 0f;
         agent.GetComponent<CarController>().CurrentAcceleration = 0f;
         agent.GetComponent<CarController>().CurrentBrakeTorque = 0f;
         Vector3 newPosition = nearestLotGameObject.transform.position +
                               nearestLotGameObject.transform.right * Random.Range(-3f, -7f) +
                               nearestLotGameObject.transform.forward * Random.Range(-1f, 1f);
         agent.transform.position = newPosition;
         agent.transform.Rotate(agent.transform.up,ang[Random.Range(0,4)]);
   }
}

The tensor graph after 5 million cycles looked like this

Not only the agent has increased its reward, it also optimised the number of steps to reach the optimal solution (right side graph). On an average it took 40 signals (like accelerate, steer, brake) to take it from the entrance to the empty parking lot.

As a token of appreciation for reading this article till here, Please find the entire source code of this project in https://github.com/xrpractice/AutonomousParkingMLUnity. We have used ML-Agents frameworks version 1.0.3 for this article. You may need to import this package using Package Manager.