Skip to content

Following to control camera movement #32

Open
Ahelsamahy wants to merge 278 commits intomasterfrom
CameraMovement_AM
Open

Following to control camera movement #32
Ahelsamahy wants to merge 278 commits intomasterfrom
CameraMovement_AM

Conversation

@Ahelsamahy
Copy link
Collaborator

No description provided.

OrestisTS and others added 30 commits February 22, 2024 16:08
We moved pilot to core 1, and moved following to cores 2,3 for faster processing and consistend load
The linear function to control steering and throttle was changed
it has a avg speed of 90ms per image. More details can be found with OT through the .txt explanation of steps I sent to him
as the working directory now is changed where app.py is running (it is running in py env) so the directory where ultralytics dependencies are installed, is different
The docker image has a base version of 3.6 but there is python environment with python 3.8 installed in it and this is where FOL is running with YOLOv8. by default, the py env is activated.
As the old function could be only called on the button call and couldn't be exported. I had to wrap it in a function I can export, but the function was doing more than one process (changing the text on button and sending POST with the command) so I split it into class.
…or the button

There was implementation done for hiding the button on immediate break (part of the smooth breaking RFC16) and now it needs another way to access it through the class
In hiding and showing the button while the user is interacting with the active area
It will read the current state of button before it starts drawing the triangles where the OP fingers are placed
…ualMode

RFC0017 switching between FOL and manual mode
Now the user can enter numbers for the scale and offset of the robot directly from the mobile controller page
These functions are used when the user presses the triangles to hide the input fields, and also when the user is not interacting with the screen, making the input fields appear again
The input fields appear when the user is not pressing triangles
They disappear when the user presses triangles
The JS side will now check if the data entered in the input boxes of the html page, are valid or not.
If they are valid, the confirm button is enabled
The html page will now have two input boxes for the speed and steering offset of the robot. It also checks for the validity of the data entered, and also a confirm button to send data to the backend
The text now says the accepted values of the text box. Real numbers between 1 and 10
Now the data that is entered in the input boxes can be sent to the python backend to be applied. Empty text boxes are ignored
Now when the user is not pressing the triangles and at the first load of the html, the JS side will receive the already in use scale and steering values from the robot to show to the text boxes
Added post get functions to communicate with teleop, to receive the scale and steering currently in use
Removed it from inside the redraw() function, because it would trigger too often, leading the user to not be able to change the values
the entry point for the service won't need to run inside a python env anymore. I had to add the BSP directly in the docker image as balena devices don't come with it installed by default
To make it run with the same HZ that fol service is running with. The idea is to send and receive commands from TEL as fast as yolo model is working
I did retrain the model again from epoch 400 to 485. I changed some in the parameters and had to retrain it on these epochs again. The model has been tested on 5Earl and working fine.

I think it is a matter of trying to solve the cameras problem now
@Ahelsamahy Ahelsamahy changed the title Camera movement am Following to control camera movement Jul 10, 2024
@Ahelsamahy
Copy link
Collaborator Author

The new model performs quite will and also faster than the older model. It didn't have false negatives as the normal yolov8n model. I will test it on 6Frank soon. In the meantime I will add extra utils functions, like get the model's speed and tweak some numbers in the camera movement speed

the speed is calculated based on each frame. There was an older implementation of storing the images in a deque then calculate based on the accumulated amount, but I don't want to store data that isn't important
that would limit how fast inference is working
because it was trying to access the id of a detected person, and the tracker takes a while till they assign ID to the tracked person.
as it was going in a loop, the numbers kept dividing themselves and going only for smaller values. I had to make them static numbers
…n active

I noticed that the camera stream just goes black, after I have sent lots of pan commands. Maybe the camera has a limit of the amount of commands it can receive in a specific amount of time? No clue.

Will try to implement a thread lock and see how it will behave
@Ahelsamahy
Copy link
Collaborator Author

BoT-SORT tracker takes a while to assign an ID to the new tracked human. It was mentioned here ultralytics/ultralytics#10314 (comment)

as i did tweak them at the beginning to try make it up for the slow frame speed of the model. it is faster now, so i will experiment with the default parameters
There is no need to calculate the steering and pan twice when the calibration flag is true
Added comments and some inner thoughts for the code flow.

I'm thinking if i should go back to the old implementation of calibrating the vehicle while the camera is following the user, or should i stick with the calibration flag current implementation.

The calibration flag won't work if the user keeps taking small turns around the robot. the camera will see the bars eventually.
as it had the same if condition as in calc_cam_pan
To make sure all the code has proper error handling
I noticed it wouldn't move at all after I implemented the safety feature. the user might be close to the robot and just taking a turn around them, it would be wise to make the camera follow them, without the robot moving.

This will need extra logic implementation in the future
@Ahelsamahy
Copy link
Collaborator Author

Test (2024-07-15)

  • It can send calibration values (camera pan and steering) when person is close
  • It takes harsh turns when doing calibration.
  • Maybe it should send the go to preset from TEL when it detects the value to switch the state of FOL.
  • if the camera's azimuth is high from the x_center and the user is close. When the user gets far, it should send higher steering value to make the user in the middle again.

TEL wouldn't move the camera to preset 1 even though I would see in the logs that FOL sent it. I made sure that TEL would send the value from its side when there is a change in FOL state
this one is trained with augmentation for 485 epochs then with much harsher ones (angel 100 degrees) for another 60 epochs.

I did perform less than the normal yolov8 and yolov10 models, but this is because the validation is done with respect to other detection classes (the other 79 neglected classes) which effects the metric

I will try to improve the model by fine tunning it for the hyperparameters.
@Ahelsamahy
Copy link
Collaborator Author

Ahelsamahy commented Jul 18, 2024

As the goal for the project changed to be "Follow for 5 min in the park" or "Follow black pants on white background (vice-versa)" this doesn't need to necessarily have the camera movement. I did a test today and the model was working fast, but not as good as the original yolov8n model.

Here are the test result


# My striped-down model
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 313/313 [00:27<00:00, 11.20it/s]
                   all       5000      36335    0.00887    0.00588    0.00696    0.00426
                person       2693      10777      0.709      0.471      0.557      0.341
Speed: 0.2ms preprocess, 1.5ms inference, 0.0ms loss, 0.8ms postprocess per image
Saving runs/detect/val3/predictions.json...

loading annotations into memory...
Done (t=0.31s)
creating index...
index created!
Loading and preparing results...
DONE (t=2.22s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=33.89s).
Accumulating evaluation results...
DONE (t=4.85s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.004
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.007
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.005
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.002
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.005
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.007
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.002
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.005
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.007
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.004
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.008
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.010
Results saved to runs/detect/val3
Results for model loaded from ./training_runs/yolov8_run_240717_T_0933/weights/last.pt:
mAP50-95: 0.004259269502738342
mAP50: 0.006957906997456405
mAP75: 0.00438899912665478
mAPs by category: [    0.34074           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0
           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0
           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0           0
           0           0]

# YoloV8n

WARNING ⚠️ updating to 'imgsz=640'. 'train' and 'val' imgsz must be an integer, while 'predict' and 'export' imgsz may be a [h, w] list or an integer, i.e. 'yolo export imgsz=640,480' or 'yolo export imgsz=640'
Ultralytics YOLOv8.2.54  Python-3.10.12 torch-2.3.1+cu121 CUDA:0 (NVIDIA GeForce RTX 3090, 24260MiB)
YOLOv8n summary (fused): 168 layers, 3151904 parameters, 0 gradients, 8.7 GFLOPs
val: Scanning /mnt/sdb1/users-data/ahmedmahfouz/following/Inf-FOL/datasets/coco/labels/val2017.cache... 4952 images, 48 backgrounds, 0 corrupt: 100%|██████████| 5000/5000 [00:00<?, ?it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 313/313 [00:53<00:00,  5.84it/s]
                   all       5000      36335      0.632      0.475      0.521      0.371
                person       2693      10777      0.753      0.673      0.745      0.514
Speed: 0.2ms preprocess, 0.9ms inference, 0.0ms loss, 0.7ms postprocess per image
Saving runs/detect/val4/predictions.json...

loading annotations into memory...
Done (t=0.33s)
creating index...
index created!
Loading and preparing results...
DONE (t=4.71s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=84.95s).
Accumulating evaluation results...
DONE (t=19.77s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.374
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.526
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.405
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.188
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.410
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.535
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.320
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.533
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.589
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.369
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.654
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.768
Results saved to runs/detect/val4
Results for model loaded from yolov8n.pt:
mAP50-95: 0.3709853852916213
mAP50: 0.5209078049243206
mAP75: 0.40311046304685466
mAPs by category: [    0.51427     0.26433     0.36388     0.41327     0.65279     0.62003     0.64572     0.29313     0.21026     0.21124     0.60849     0.63023     0.44099     0.19329     0.27781      0.6516     0.59118     0.52444     0.45966     0.48716     0.62998      0.6893     0.65905     0.68315     0.10033     0.35931
    0.084863      0.2684     0.34218      0.5841       0.188     0.26651     0.32832     0.37976      0.2157     0.30202     0.45131     0.30924     0.39737     0.29755     0.26955     0.35063     0.26368     0.10589    0.098828     0.39031     0.23341     0.15666     0.34894     0.28057     0.20968     0.18901
     0.36425      0.5018     0.40772     0.29238     0.25713     0.43462     0.22562     0.42644     0.29289     0.64187     0.55252     0.57991      0.5266     0.16011     0.48176     0.27999      0.5139     0.34916     0.31403     0.33706     0.51057    0.096766     0.45648     0.32102     0.27798     0.42001
   0.0037661     0.16513]

# YoloV10n

Ultralytics YOLOv8.2.54  Python-3.10.12 torch-2.3.1+cu121 CUDA:0 (NVIDIA GeForce RTX 3090, 24260MiB)
YOLOv10n summary (fused): 285 layers, 2762608 parameters, 0 gradients, 8.6 GFLOPs
val: Scanning /mnt/sdb1/users-data/ahmedmahfouz/following/Inf-FOL/datasets/coco/labels/val2017.cache... 4952 images, 48 backgrounds, 0 corrupt: 100%|██████████| 5000/5000 [00:00<?, ?it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 313/313 [00:51<00:00,  6.03it/s]
                   all       5000      36335      0.644      0.488      0.534      0.383
                person       2693      10777      0.766      0.655      0.744      0.519

Speed: 0.2ms preprocess, 1.4ms inference, 0.0ms loss, 0.1ms postprocess per image
Saving runs/detect/val5/predictions.json...

loading annotations into memory...
Done (t=0.30s)
creating index...
index created!
Loading and preparing results...
DONE (t=4.07s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=77.10s).
Accumulating evaluation results...
DONE (t=15.71s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.385
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.538
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.417
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.190
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.423
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.546
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.323
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.539
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.603
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.379
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.659
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.779
Results saved to runs/detect/val5
Results for model loaded from yolov10n.pt:
mAP50-95: 0.38328865898217773
mAP50: 0.5339398378888086
mAP75: 0.4160252082713042
mAPs by category: [    0.51903     0.27192     0.37043     0.43695     0.68256      0.6436     0.67419     0.32235     0.22041     0.22019       0.647     0.59678     0.48338      0.2146     0.28719     0.69614     0.62605     0.57865     0.49503     0.50215     0.64005     0.72744       0.682     0.69003     0.10932     0.38091
     0.09737     0.27565     0.35804     0.60554     0.19323     0.28987     0.33906     0.41467     0.26256     0.30475     0.48174     0.31947     0.40777     0.30268     0.27433     0.36249     0.27121     0.11984       0.107     0.38433     0.22398     0.15883     0.36528     0.28305     0.21529     0.17351
     0.34031       0.514     0.41937     0.34501     0.25927       0.454     0.23237     0.45435     0.29694     0.66075     0.57145     0.58748     0.52857     0.19731     0.48848     0.28449     0.46431     0.36347     0.23365     0.34155     0.56282    0.099099       0.451     0.33139     0.26728     0.45347
   0.0010717        0.15]

in a nutshell

1. Custom-Trained Model

  • Precision: 0.709
  • Recall: 0.471
  • mAP50: 0.557
  • mAP50-95: 0.341

2. YOLOv8 Model

  • Precision: 0.753
  • Recall: 0.673
  • mAP50: 0.745
  • mAP50-95: 0.514

3. YOLOv10 Model

  • Precision: 0.766
  • Recall: 0.655
  • mAP50: 0.744
  • mAP50-95: 0.519

Comparative Analysis:
Precision: YOLOv10 has the highest precision for detecting humans, followed closely by YOLOv8. Your custom model has notably lower precision.
Recall: YOLOv8 excels in recall, suggesting it is better at identifying relevant instances across the dataset for human detection.
mAP50 (Average Precision at 50% IOU): Both YOLOv10 and YOLOv8 perform similarly and significantly better than your custom model. This indicates a strong ability to detect humans when the criterion is less strict (50% intersection over union).
mAP50-95 (Average Precision across IOUs from 50% to 95%): Again, YOLOv8 leads, showing it consistently identifies humans accurately across various levels of strictness in overlap, followed very closely by YOLOv10.

I made a detailed question on UltraLytics repo to find more about the default parameters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants