-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathnotes.txt
More file actions
2153 lines (1343 loc) · 95.1 KB
/
notes.txt
File metadata and controls
2153 lines (1343 loc) · 95.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
--------------
🎥 Types of Streaming
1️⃣ On-Demand Streaming (VOD – Video on Demand)
-> Content is pre-recorded.
-> Stored on servers and delivered in chunks (HLS, DASH).
-> User can pause, rewind, skip ahead.
👉 Examples: YouTube videos, Netflix, Disney+, Spotify songs.
2️⃣ Live Streaming
-> Content is captured and transmitted in real time.
-> Viewers watch as it happens, with small delay (latency).
-> Uses RTMP → HLS/DASH/WebRTC to reach viewers.
👉 Examples: Twitch, YouTube Live, football match broadcast, live webinars.
3️⃣ Audio Streaming
-> Focused only on audio data (music, podcasts, radio).
-> Works similarly to video streaming but lighter.
👉 Examples: Spotify, Apple Music, online radio.
4️⃣ Game/Screen Streaming
-> Captures your screen/gameplay and streams it live to others.
-> Requires low latency so viewers don’t see big delays.
👉 Examples: Twitch gaming streams, Discord screen share, Google Stadia (cloud gaming).
5️⃣ Real-Time Communication (RTC)
-> Interactive streaming where people send/receive audio + video both ways.
-> Requires ultra-low latency (<1s).
-> Usually built on WebRTC.
👉 Examples: Zoom, Google Meet, video calls, online classes.
6️⃣ Progressive Download (not true streaming, but often confused)
-> Video/audio file is downloaded from start → end.
-> Can start playing before fully downloaded.
-> No adaptive bitrate, less efficient than HLS/DASH.
👉 Example: Watching an MP4 directly from a web link.
--------------
-------------
🎬 Full Flow of VOD System
1️⃣ Upload Stage (Content Creation)
User/Streamer uploads a video file (.mp4, .mov, etc.) to your system.
This is just the raw file — not yet optimized for streaming.
👉 Example: You upload mygame.mp4.
2️⃣ Processing Stage (Transcoding & Packaging)
FFmpeg (or another encoder) takes the raw file and:
1) Transcodes it into standard codecs:
-> Video → H.264 (for wide compatibility)
-> Audio → AAC
2) Creates multiple bitrates/resolutions:
-> 1080p (high quality)
-> 720p (medium)
-> 480p (low, for slow internet)
3) splits video into chunks (small .ts files, e.g., 4–10 seconds each).
4) Generates playlist(s) (.m3u8 for HLS or .mpd for DASH).
👉 Output looks like this:
master.m3u8 ← points to all qualities
1080p.m3u8 ← lists 1080p chunks
720p.m3u8
480p.m3u8
segment0.ts
segment1.ts
segment2.ts
...
3️⃣ Storage & Distribution
- All these files (chunks + playlists) are saved on:
-> Your server, or
-> A CDN (Content Delivery Network) like AWS CloudFront, Cloudflare, Akamai.
- Since it’s just HTTP files, it can be delivered at scale easily.
4️⃣ Playback (Viewer Side)
- The user opens your app/website.
- The video player (like hls.js in a React/Next.js frontend) loads the master.m3u8.
- The player:
-> Reads which qualities (1080p, 720p, etc.) are available.
-> Picks one based on internet speed.
-> Starts downloading small .ts chunks in order.
-> If bandwidth drops, it auto-switches to lower quality (adaptive bitrate).
👉 To the viewer, it looks like a smooth YouTube-style experience.
5️⃣ User Experience
-> Can pause, rewind, seek (because the whole file exists).
-> Smooth playback because only small chunks are streamed at a time.
-> Adaptive streaming ensures minimal buffering.
✅ Summary Flow (Step by Step)
[Uploader]
↓ (raw video)
[Server: FFmpeg]
- Transcode → H.264 + AAC
- Multi-quality → 1080p/720p/480p
- Chunking → segment0.ts, segment1.ts…
- Playlist → .m3u8 (HLS)
↓
[Storage/CDN]
- Stores chunks + playlists
↓
[Viewer Player (hls.js)]
- Loads master.m3u8
- Fetches .ts chunks over HTTP
- Plays video, adapts quality
👉 In one sentence:
VOD = Upload video → Server processes into streaming format → Store as chunks + playlist → Player downloads & plays smoothly.
-------------
-----------
Video Formats
🎬 What is an MP4 file?
MP4 = “MPEG-4 Part 14”
It’s a container format → like a box that holds different types of media.
Inside an MP4, you can have:
-> Video (usually H.264, H.265, VP9…)
-> Audio (usually AAC, MP3…)
-> Subtitles (captions)
-> Metadata (title, duration, thumbnail, etc.)
👉 Think of MP4 as a folder in one file that packs video + audio + extras together neatly.
🎞 What is a MOV file?
-> MOV = Apple QuickTime Movie format.
-> Also a container format, just like MP4.
-> Developed by Apple → works best in Mac/iOS/QuickTime ecosystem.
-> Stores the same kinds of things: video, audio, subtitles, metadata.
🔑 Difference Between MP4 and MOV
Feature MP4 MOV
Origin Standard (MPEG group) Apple (QuickTime)
Compatibility Works everywhere (web, PC, mobile, TV) Best on Apple devices/software
File size Usually smaller (more compressed) Larger (less compressed, higher quality)
Usage Streaming, sharing, web videos Professional video editing (Final Cut, iMovie)
👉 Example:
If you upload to YouTube/Netflix → use MP4 (universal).
If you’re editing in Final Cut Pro on Mac → MOV is common.
When FFmpeg finishes encoding the raw video + audio into compressed streams (like H.264 video + AAC audio), those streams still need a “box” to live in. That box is called a container format.
A container is like a folder/box that holds:
- the video stream,
- the audio stream,
- subtitles (optional),
- metadata (title, duration, codec info, etc.).
Common Containers:
- .mp4 → Most popular, widely supported (H.264/H.265 + AAC audio).
- .mkv → More flexible, can hold almost any codec (used for movies, supports multiple audio/subtitle tracks).
- .ts (MPEG Transport Stream) → Often used in broadcasting, streaming (e.g., live TV, HLS).
👉 Example:
If you have H.264 video + AAC audio, you could wrap them into:
- movie.mp4 (standard for web/mobile)
- movie.mkv (for flexibility, e.g., multiple subtitles/audio)
- movie.ts (for live streaming chunks).
So the line you asked about means: ➡️ After FFmpeg encodes, it wraps the streams inside one of these containers to produce the final playable file.
1. MP4 & MOV
- These are file-based containers.
- They hold the entire compressed video + audio from start to end in one file.
- Example: movie.mp4 → contains the full 2-hour movie.
2. .TS (MPEG Transport Stream)
- Designed for broadcasting and streaming.
- Instead of one big file, it can be broken into small chunks (e.g., 2–10 seconds each).
- That’s why .ts is often used in HLS (HTTP Live Streaming) → video is split into many .ts parts, and a playlist file (.m3u8) tells the player which chunk to play next.
👉 But .ts can also store a whole video in one file (like .mp4), it’s just more common to see it split into chunks for streaming.
each .ts (Transport Stream) file can contain:
- 🎥 Video (usually H.264, H.265, etc.)
- 🔊 Audio (AAC, MP3, etc.)
- 📝 Subtitles/Closed Captions (if included)
- ℹ️ Metadata (timestamps, program info, error correction, etc.)
But here’s the key point: 👉 Each .ts chunk is self-contained. That means even if you take one .ts file (say, a 5-second chunk), it has everything needed (video + audio + metadata) to play that 5 seconds properly.
That’s why streaming works:
- Your video player downloads .ts chunks one by one.
- Each chunk has its own mini package of audio + video + metadata.
- The player stitches them together smoothly using the playlist (.m3u8).
So, every .ts chunk is like a little standalone media container for a short duration.
So:
- MP4 / MOV = whole video in one piece (good for download/playback).
- TS = usually smaller pieces (good for live/streaming), but can also be one big file.
-----------
-----------
Bitrate And Bandwidth
🎚 What is Bitrate?
-> Bitrate = the amount of data per second needed to play a video or audio.
-> Measured in kbps (kilobits per second) or Mbps (megabits per second).
👉 Example: A video encoded at 5 Mbps means every second of video requires 5 megabits of data.
Why it matters:
Higher bitrate → better quality (more detail, less compression).
Lower bitrate → smaller size, loads faster, but may look blurry or pixelated.
📡 What is Bandwidth?
-> Bandwidth = the maximum amount of data your internet connection can transfer per second.
-> Also measured in Mbps.
-> Think of it like the width of a water pipe:
- Bigger pipe (more bandwidth) → more water (data) flows at once.
- Smaller pipe → less water flows, might choke if too much is sent.
🔄 How Bitrate Consumes Bandwidth
To watch a video, your internet must handle at least the bitrate of the stream.
👉 Example:
- If a video is encoded at 5 Mbps and your internet bandwidth is 10 Mbps, playback is smooth.
- If your internet bandwidth is only 2 Mbps, you’ll see buffering (can’t keep up).
So:
- Bitrate is demand (how much data the video needs).
- Bandwidth is supply (how much your network can deliver).
⚡ Real-world Example (YouTube/Netflix)
- 1080p HD → ~5 Mbps bitrate
- 4K Ultra HD → ~15–25 Mbps bitrate
- Audio only → ~128 kbps bitrate
- If your Wi-Fi bandwidth is 20 Mbps: You can easily watch 4K video.
- If your Wi-Fi bandwidth is 3 Mbps: You’ll have to drop to 480p or 360p to avoid buffering.
✅ In short:
Bitrate = video’s hunger for data per second.
Bandwidth = how much food (data) your internet pipe can deliver per second.
The video will only play smoothly if bandwidth ≥ bitrate.
-----------
-----------
📦 What is Object Storage?
-> Object storage is a way of storing data (like videos, images, documents) in the cloud as objects, not as files in folders or rows in a database.
Each object has:
- Data (the actual video, image, etc.)
- Metadata (info about the file — size, type, created date, etc.)
- Unique ID (key) to find it
👉 Example: Instead of saying “video.mp4 is in folder X”, you say “give me object with ID 12345”.
🔹 How It Works
- You upload your file → object storage saves it as an object.
- You don’t care where it’s physically stored — the cloud provider handles it.
- You access it via a URL or API.
✅ Why It’s Useful
- Scalable: Can store billions of objects (good for YouTube-like apps).
- Cheap: You only pay for the space used.
- Durable: Data is stored in multiple locations, so it rarely gets lost.
- Easy access: Files can be fetched over HTTP.
🌍 Examples of Object Storage
- Amazon S3 (Simple Storage Service)
- Google Cloud Storage
- Azure Blob Storage
- MinIO (open-source, S3-compatible)
🆚 File System vs Database vs Object Storage
- File system → good for local small-scale use (folders, files).
- Database → good for structured data (users, transactions).
- Object storage → good for unstructured, huge data (videos, images, backups).
🎯 In Your VOD System
- When users upload videos → store the original file in object storage (like S3).
- After FFmpeg processes them into chunks + playlists → also save them in object storage.
- Then connect a CDN (CloudFront, Cloudflare) to deliver them worldwide.
👉 In one line:
Object storage = cloud “bucket” where you drop your files (as objects) and fetch them later via URL.
-----------
-----------
🎥 Video Codecs
-> A codec = COmpressor + DECompressor → it compresses video so it’s smaller to store/send, then decompresses it for playback.
🎬 Why do we use Codecs?
- Raw video and audio are huge. A single minute of uncompressed 1080p video can be several GBs.
👉 That’s impossible to store, upload, or stream efficiently.
- So, codecs compress the data while keeping quality good enough.
In your system (VOD or Live):
- Streamer (or uploader): The video is encoded (compressed) before sending — OBS, WebRTC, or phone camera usually does this.
- Server (processing with FFmpeg): If needed, re-encode into different bitrates/qualities (1080p, 720p, 480p).
- Viewer (player/browser): The codec decompresses the stream so it can play smoothly.
👉 So codecs are used at all three stages: encode → transmit/store → decode.
🎥 Where Video is Converted into H.264 (or other codecs)
1️⃣ At the Source (Client Side)
- Most of the time, video is already compressed before leaving your device.
- Your webcam, screen capture, or OBS software will encode the raw video into H.264 (video) + AAC (audio) before sending.
- Reason: Raw video is HUGE (gigabytes per minute). You can’t send that over the internet.
👉 Example:
- OBS encodes your gameplay in H.264 + AAC and then pushes it to the server using RTMP.
- WebRTC in the browser uses built-in codecs (VP8/VP9 or H.264) before sending.
2️⃣ On the Server (Transcoding Stage)
Even if the client already sends H.264, the server often re-encodes:
- To generate multiple resolutions/bitrates (1080p, 720p, 480p).
- To ensure compatibility across devices (phones, TVs, browsers).
- To apply compression settings (reduce bandwidth).
👉 This step is usually handled by FFmpeg or a media server (e.g., Wowza, Ant Media, Janus, LiveKit).
But in case of VOD:
Client Side:
- When you upload to YouTube, your browser/app just sends the video file as it is over HTTP/HTTPS to YouTube’s servers.
- There is no special compression done before upload (other than whatever codec/format your file already has — e.g., your screen recorder may have already encoded it as MP4/H.264). So if your video file is 1 GB on disk, that 1 GB is uploaded.
Once the file is uploaded, YouTube’s backend takes over:
- Decode → Read your uploaded file (whatever codec/container it uses: MP4, MOV, etc.).
- Re-encode (transcode) into standard formats:
- Video → H.264/VP9/AV1 (multiple qualities: 1080p, 720p, 480p, etc.)
- Audio → AAC/Opus
- Package into streaming formats (HLS/DASH).
- Store the processed chunks/playlists on their servers/CDNs.
- Make it available to viewers with adaptive bitrate streaming.
Different Codecs (video compression standard):
1. H.264 (AVC – Advanced Video Coding)
- The most common video codec today.
- Used by YouTube, Netflix, Zoom, and almost every device.
- Balance: Good quality + reasonable file size.
- Supported everywhere (browsers, phones, TVs).
👉 Think of it as the default video format for the internet.
2. H.265 (HEVC – High Efficiency Video Coding)
- Successor to H.264.
- Same quality at ~50% smaller size.
- Great for 4K/8K videos (saves bandwidth).
- But: More CPU/GPU power needed and not all devices support it (licensing issues).
👉 Used in 4K streaming, Blu-ray, and some Apple devices.
3. VP9
- Made by Google as an open-source alternative to H.265.
- Also gives smaller file sizes than H.264.
- Widely used in YouTube 4K/HD streaming.
- Supported in Chrome, Firefox, Android, but not all hardware.
👉 Google’s “free” competitor to H.265.
4. AV1
- AV1 (newest, by Alliance for Open Media: Google, Netflix, Amazon, Microsoft, etc.) → free and even more efficient than H.265.
- Downside: needs more CPU/GPU power to encode.
🎵 Audio Codec
- Audio codecs compress sound like video codecs compress video.
4. AAC (Advanced Audio Coding)
- Standard audio codec for streaming.
- Successor to MP3 (better quality at same bitrate).
- Used in YouTube, Spotify, iTunes, and all video streaming.
- Works well with H.264/H.265/VP9.
👉 Think of it as the MP3 of video platforms.
✅ Simple Analogy
- H.264 / H.265 / VP9 = different ways of packing your video clothes into a suitcase → smaller suitcase = cheaper to ship.
- AAC = the way you pack your audio clothes into a smaller bag.
Together, they make videos streamable without eating too much bandwidth.
⚡ Example:
- A 10-minute uncompressed 1080p video = hundreds of GBs ❌
- Same video with H.264 + AAC = maybe 100 MB ✅
-----------
-----------
Raw Video and Audio
🎞 What are Raw Video Frames?
- A video is just a sequence of images shown very fast (like 30 or 60 per second).
- Each of those images is called a frame.
- A raw video frame = the full uncompressed image data for that moment in time.
👉 Example:
- A single 1920×1080 (1080p) raw frame ≈ 6 MB (RGB).
- 30 frames per second → 180 MB per second of raw video 😲
- That’s why raw video is huge and needs compression (H.264, VP9, etc.).
🎵 What is Raw Audio?
- Audio is a continuous waveform of sound.
- A raw audio sample = the uncompressed digital values representing the waveform (often called PCM – Pulse Code Modulation).
👉 Example:
- CD-quality raw audio (44.1 kHz, 16-bit, stereo) = 1.4 Mbps.
- Compressed as AAC/MP3 → ~128 kbps (much smaller).
🔹 Why Do We Decode to Raw?
- Because codecs (H.264, AAC, etc.) are like secret codes (compressed formats).
- If you want to edit, resize, re-encode, or transcode the video/audio → you must first decode it into its raw form (frames + samples).
- Once raw, FFmpeg can:
- Apply filters (resize, crop, add watermark).
- Change codecs (e.g., VP9 → H.264).
- Create different bitrates/resolutions.
👉 Analogy:
- A compressed video (H.264) is like a zipped file.
- To change or repack it, you must unzip (decode) to the original contents → make edits → then zip (encode) again.
-----------
-----------
🎬 FFmpeg Decoding
🎬 Why Do We Need to Decode in FFmpeg?
1️⃣ Different Input Formats
- Users upload all kinds of files: MP4, MOV, MKV, AVI…
- Inside, these can have different codecs: H.264, VP9, HEVC, ProRes, etc.
- To standardize them, FFmpeg must understand (decode) whatever codec is used → into raw frames/audio.
👉 Example: Your phone records in HEVC, another camera records in ProRes → FFmpeg can’t mix/convert them unless it first decodes them to raw.
2️⃣ Apply Processing or Filters
- You can’t resize, crop, watermark, or adjust brightness on compressed video directly.
- Those operations need raw frames (like editing photos).
- So FFmpeg decodes to raw → applies changes → then re-encodes.
- 👉 Example: If you want 1080p and 720p versions: Decode original → resize raw frames → re-encode at new resolutions.
3️⃣ Change Codec or Bitrate (Transcoding)
- If the input codec ≠ output codec, you must decode first.
- Example:
- Input: VP9
- Output: H.264 (for better compatibility)
- Step: Decode VP9 → raw → encode H.264
👉 Same goes for changing bitrate (say 8 Mbps → 2 Mbps).
4️⃣ Guarantee Compatibility
- Even if you upload H.264, it might have weird settings (wrong profile, unusual GOP size, bad keyframes).
- Platforms (YouTube, Netflix, etc.) re-decode and re-encode everything to enforce their standard pipeline.
- 👉 That’s why YouTube always says “Processing…” after upload.
✅ In Short
We need to decode because:
- Uploaded videos come in many codecs and formats.
- To edit, filter, or resize, we must work with raw frames.
- To transcode into new codecs/bitrates, we need to uncompress first.
- To ensure consistency and compatibility across all devices.
💡 Analogy:
- A compressed video is like a zipped folder.
- If you want to reorganize files inside, you must unzip (decode) → make changes → zip (encode) again
🔹 Uploaded video ≠ raw
- When you upload a video (MP4, MOV, MKV, etc.), it is already compressed by some codec (H.264, H.265, VP9, etc.).
- That’s why the file size is manageable (e.g., 500 MB instead of 50 GB).
- Your camera, screen recorder, or editing software already encoded it before saving.
🔹 What “Decode the input” really means
- The uploaded video is in a compressed format (H.264, AAC, etc.).
- FFmpeg must decode (unpack) those compressed streams into raw frames + raw audio samples inside memory.
- Once raw, FFmpeg can process, filter, resize, or re-encode.
👉 The uploaded file itself is not uncompressed, but FFmpeg temporarily uncompresses it in memory to work on it.
🔹 Why not just copy without decoding?
Sometimes we can!
- If the uploaded video is already in H.264 + AAC (the exact format we want), FFmpeg can skip decoding and just re-package (called remuxing).
- Example: ffmpeg -i input.mp4 -c copy output.mkv
- Here FFmpeg doesn’t decode/encode, it just puts the compressed streams into a new container.
- But in most streaming platforms (like YouTube):
- They re-encode everything to guarantee compatibility, multiple resolutions, and consistent settings.
- So decoding → raw → re-encode is necessary.
✅ In Short:
- Uploaded video = already compressed (H.264, VP9, etc.).
- FFmpeg decode step = temporarily uncompresses it into raw frames/audio in memory.
- Then re-encodes into the platform’s preferred codec/resolutions.
🎬 Example: User Uploads an MP4 Video
📤 Step 1: Upload
- User uploads video.mp4 to your server.
- Inside that MP4:
- Video track → H.264 (compressed frames)
- Audio track → AAC (compressed audio samples)
- The file is already compressed (not raw).
⚙️ Step 2: FFmpeg Reads & Decodes
- FFmpeg opens the MP4 file and decodes:
- H.264 video → expanded into raw video frames (bitmaps of each frame).
- AAC audio → expanded into raw PCM samples (waveform data).
👉 Example command: ffmpeg -i video.mp4 output.raw
- This would literally dump raw frames/audio (huge files). Normally, you don’t save raw — you just hold it in memory.
- Raw video format inside FFmpeg → often YUV420p (pixel data).
- Raw audio format inside FFmpeg → often PCM (Pulse Code Modulation) samples.
🔄 Step 3: Processing (Optional)
While FFmpeg has the raw frames in memory, it can:
- Resize (1080p → 720p, 480p).
- Apply filters (watermark, brightness, crop).
- Change frame rate (60 fps → 30 fps).
👉 Example: ffmpeg -i video.mp4 -vf scale=1280:720 output.mp4
- Decodes → resizes raw frames → re-encodes.
📦 Step 4: Re-Encode
- After processing, FFmpeg re-encodes the raw frames/audio into your target format.
- For streaming systems:
- Video → H.264 (universal)
- Audio → AAC (universal)
👉 Example:
ffmpeg -i video.mp4 \
-c:v libx264 -b:v 3000k -c:a aac -b:a 128k output.m3u8
- This decodes input → re-encodes into H.264/AAC → packages as HLS.
✅ Final Output
Now you have streaming-friendly files: Video chunks (segment0.ts, segment1.ts …) and Playlist (.m3u8) All encoded in H.264 (video) + AAC (audio).
-----------
-----------
what is meant by 720p, 1080p, ...
When you hear 720p, 1080p, 4K, etc., it’s talking about video resolution — basically how many pixels (tiny dots) make up the picture.
720p (HD):
- The “720” means 720 pixels tall (height).
- Standard width is 1280 pixels.
- So resolution = 1280 × 720 pixels.
1080p (Full HD):
- Height = 1080 pixels.
- Width = 1920 pixels.
- So resolution = 1920 × 1080 pixels.
👉 Bigger numbers = more pixels = sharper video (if your screen and internet can handle it).
The “p” in 720p or 1080p stands for progressive scan.
- In progressive scan, every frame of the video is drawn line by line from top to bottom in one go.
- This is different from “i” (interlaced scan), like 1080i, where the picture is drawn in two passes: first the odd lines, then the even lines.
So:
- 720p → video has 720 horizontal lines, shown progressively (all at once per frame).
- 1080p → video has 1080 horizontal lines, also shown progressively.
👉 The “p” basically means the image looks smoother and sharper, especially for fast motion, because the whole frame updates at once.
When we say 720 lines or 1080 lines, we are talking about the number of horizontal rows of pixels that make up the picture.
Think of your screen like graph paper:
- Each little square is a pixel.
- The rows across the screen (from left to right) are the lines.
So:
- 720p means the image has 720 rows of pixels stacked from top to bottom.
- 1080p means it has 1080 rows of pixels.
Example:
- 720p usually = 1280 × 720 pixels (1280 columns × 720 rows).
- 1080p = 1920 × 1080 pixels.
👉 More lines (rows) = more detail, because the screen has more pixels to show the picture.
-----------
-----------
🎬 FFmpeg Processing
Processing step (happens on raw data, because compressed data is hard to modify directly):
- Resize video (e.g., 1080p → 720p).
- Change frame rate (e.g., 60fps → 30fps).
- Apply filters (watermark, brightness, crop, etc.).
- Adjust audio (volume, remove noise, sync).
When we talk about resizing (scaling) a video, you can actually go both ways:
🔽 Downscaling (common case):
- Convert higher resolution → lower resolution.
- Example: 4K (3840×2160) → 1080p (1920×1080).
- This saves bandwidth and storage. Most streaming services do this to provide multiple qualities (adaptive streaming).
🔼 Upscaling (possible, but limited):
- Convert lower resolution → higher resolution.
- Example: 720p (1280×720) → 1080p (1920×1080).
- But here, you don’t magically get extra detail — FFmpeg just stretches and interpolates pixels, so it looks bigger but not sharper. Some advanced AI upscalers (like Topaz Video Enhance AI) try to add detail, but normal FFmpeg upscaling can look blurry.
📌 Where resolution changes
- Changing resolution (480p → 720p → 1080p) = resizing filter.
- This belongs to the processing stage (because it’s a transformation of raw frames).
- But it’s always paired with re-encoding, because raw frames are too huge to store/stream.
So the flow is:
- Decode → raw frames.
- Processing (resize) → raw frames resized to target resolution.
- Encoding → compress those resized frames into H.264/VP9/etc.
✅ That’s why we usually say “resolution conversion happens during encoding,”
because in practice, FFmpeg does resize + encode in one command.
-----------
-----------
YUV420p and PCM
🎨 YUV420p (raw video format)
- A way to represent video frames.
- Splits image into:
- Y = brightness (luma)
- U & V = color (chroma)
- 420" means: color info is stored at 1/4 resolution of brightness → saves space without hurting quality much.
- This is how raw video frames are usually handled inside video tools.
👉 Think of it as: a raw picture of every frame, stored efficiently.
🎵 PCM (Pulse Code Modulation, raw audio format)
- A way to represent sound as numbers.
- Records the amplitude of sound waves many times per second (samples).
- Example: CD audio = 44,100 samples per second.
- It’s uncompressed → very big, but simple and accurate.
👉 Think of it as: a raw recording of the sound wave.
-----------
-----------
🎬 FFmpeg Encoding
When FFmpeg does the encoding step, it usually converts the raw video/audio into widely supported codecs so that almost any device/browser can play them.
Common Codecs After Encoding:
🎥 Video:
- H.264 (AVC) → most common, supported everywhere (phones, browsers, TVs).
- H.265 (HEVC) → newer, smaller file size, but licensing + less browser support.
- VP9 / AV1 → open-source alternatives (Google’s VP9, newer AV1).
🔊 Audio:
- AAC → most common audio codec for MP4/HLS/DASH.
- MP3 (older, less efficient).
- Opus (great quality, used in WebRTC, some streaming).
👉 So, if you upload an .mp4 file to YouTube:
1) YouTube decodes it to raw frames & samples.
2) Then re-encodes it into multiple versions like:
- 1080p H.264 + AAC
- 720p H.264 + AAC
- 480p H.264 + AAC
- … maybe also VP9/AV1 for better compression.
3) Packs them into containers (.mp4 for VOD, .ts for HLS chunks).
That way, your video is playable across all browsers, TVs, and phones.
⚡ So to answer you simply:
The encoding step usually outputs H.264 video + AAC audio, because that pair is universal and safe.
🎬 Step 1: Upload
You upload video.mp4 (say, H.264 codec, AAC audio, 1080p resolution).
The server receives the file and stores it temporarily.
🎬 Step 2: FFmpeg starts processing
FFmpeg pipeline will look like this:
(a) Decode
- FFmpeg reads video.mp4.
- It decodes the compressed H.264 video into raw frames (YUV420p format usually).
- Audio is decoded into raw PCM samples.
👉 Now we have the video as raw images + raw audio samples.
(b) Processing (resizing into 4 versions)
FFmpeg duplicates the raw video stream into 4 branches:
- 1080p branch → keep same resolution.
- 720p branch → scale down raw frames from 1920×1080 → 1280×720.
- 480p branch → scale down to 854×480.
- 360p branch → scale down to 640×360.
⚡ This happens via FFmpeg’s scale filter (e.g., -vf scale=1280:720).
(c) Encoding
Each branch is encoded (compressed) again into a codec.
- Video → H.264 (common for playback).
- Audio → AAC (common for playback).
Bitrate is set for each version (e.g., 1080p = 5 Mbps, 720p = 3 Mbps, 480p = 1.5 Mbps, 360p = 800 kbps).
So now you have 4 compressed streams:
- 1080p_h264_aac.mp4
- 720p_h264_aac.mp4
- 480p_h264_aac.mp4
- 360p_h264_aac.mp4
(d) Packaging
Now, depending on what you want:
1) If VOD (progressive download): Just keep the 4 MP4 files.
2) If adaptive streaming (HLS/DASH):
- Split each MP4 into chunks (e.g., .ts files of 6s each).
- Generate a playlist (.m3u8 for HLS or .mpd for DASH) that tells the player which resolution versions are available.
- Player can switch between resolutions dynamically.
Example:
- 1080p_000.ts, 1080p_001.ts, …
- 720p_000.ts, 720p_001.ts, …
- master.m3u8 (index file that lists all 4 versions).
📌 Final Flow (stepwise)
- Upload: video.mp4 (1080p).
- Decode: compressed H.264 → raw frames (YUV) + raw PCM audio.
- Process: apply filters → resize into 1080p, 720p, 480p, 360p.
- Encode: compress each with H.264 + AAC, assign different bitrates.
- Package: save as MP4 (for direct VOD) OR chunk + playlist (for HLS/DASH).
1) FFmpeg decodes your uploaded MP4 into raw frames (images in YUV420p) and raw audio (PCM).
- These raw frames are not written to disk (too huge!).
- They stay in RAM (memory).
2) Filters (processing stage) run in memory.
- FFmpeg takes each decoded frame in RAM.
- Then it applies the scale filter multiple times:
- One copy stays 1080p.
- Another copy gets resized to 720p.
- Another to 480p.
- Another to 360p.
So at this moment, 4 different versions of the same raw frame exist in memory.
3) Encoders then compress each branch (H.264 + AAC) immediately while still in RAM.
- Encoded chunks or MP4 files are written out to disk/network.
🔄 How it actually works:
1) FFmpeg reads a small chunk of the video file.
2) It decodes that part → raw video frame + raw audio samples.
3) That raw frame goes through the filters (resizing → 1080p, 720p, 480p, 360p).
4) Each resized copy exists only briefly in memory.
5) Each copy is immediately sent to the encoder, compressed, and written to output (disk or network).
6) Once a frame is finished, FFmpeg discards it from RAM and moves to the next frame.
-----------
-----------
🎥 What is Progressive Download and how it works?
- Imagine you upload one MP4 file (say a 720p video).
- When a user clicks play, the video starts downloading from the server just like any other file (like downloading a PDF).
- BUT — modern video players (like HTML5 <video> tag) don’t wait for the whole file to finish downloading.
- Instead:
- As soon as the first part of the file arrives, playback begins.
- While the user watches, the rest of the file keeps downloading in the background.
- This is why it’s called progressive → because playback progresses as download progresses.
⚡ Key Characteristics:
✅ Simple: Only one MP4 per resolution.
✅ Works on most browsers without special setup.
❌ Not adaptive: If a user’s internet slows down, the video may buffer (pause).
❌ Higher bandwidth waste: If a user only watches the first 2 minutes, the player might still download the whole file.
🔄 Comparison with Adaptive Streaming (HLS/DASH):
- In adaptive streaming, the video is cut into small chunks (e.g., 6s .ts files).
- The player chooses which resolution chunk to download based on your internet speed (720p if fast, 360p if slow).
- If network conditions change mid-playback, the player can switch resolution seamlessly.
Example:
- Progressive: One full MP4 file → plays start to finish.
- Adaptive: Many small chunks → player selects best chunk quality dynamically.
🧩 How Video is Divided into Chunks (for adaptive streaming)
Let’s say you have a 1-minute video:
- If chunk size = 10 seconds → the encoder splits it into 6 chunks.
- Each chunk is a self-contained mini video segment (with its own audio+video+metadata).
- Plus, a playlist file (.m3u8 for HLS) is created which lists the order of chunks.
- The player just reads this playlist and fetches chunks one by one.
👉 So, in short:
- Progressive download = one big MP4, downloaded + played progressively.
- Adaptive streaming = many small chunks, playlist tells which chunk to play, and resolution can change mid-stream.
🔹 How Progressive Download Works