Recently I learned how to stream a Youtube video’s audio in MP3 format using nodejs. I wanted to publish my findings and show off how to accomplish this.
First, let’s build this the non streaming way, and then I will show you how to upgrade to a streaming method. You must install the powerful ffmpeg via your package management system of choice in order to play along.
I’ll be using nodejs, to do this, because it offers us a very easy way to stream bits around. Despite the recent controversy, I’m still quite partial to coffee-script due to it’s terse syntax and clean output, so I’m going to use that for most of the new code. Let’s also use the wonderful expressjs framework to get us started really quickly. This will set up our project skeleton.
You can now convert the skeleton into coffee-script if you like, but I leave this as an exercise to the reader. For the sake of time I simply spawn another shell and do the following in it:
1 2 3 4 5 6
Then in yet another shell we can:
And we’re rocking the “hello world!” page.
Now, we have to add another dependancy to the project. It turns out that getting a good, uncompressed FLV that ffmpeg can digest into something we can use is a giant pain. There’s cookies that need setting, HTML that needs parsing, and language detection to accomplish. Luckily for us, misery loves company and the hard work of turning a normal youtube video url ( http://www.youtube.com/watch?v=:youtube_video_id ) into a ffmpeg edible FLV can be done by youtube-dl. Requires python, but otherwise dependancy free. This means that we can focus on building something quickly. Assuming you’ve got a sane python accessible on your system, install it to the root of the project like so:
Let’s use it from the command line and see what how it works. I’ve plucked a random short video clip that’s holiday themed for the demonstration. Remember, we just want the audio, and we want it in MP3.
1 2 3 4 5 6 7 8 9
Playing the resulting mayCvk2P4f0.mp3 results in Arnald’s voice demanding us to release the cookie. Excellent! Let’s make an API for this!
The only variable in all of this is the alphanumeric video id at end of the youtube video URL. I’m going to call this the :youtube_video_id from now on. Since we’ll be using a child process to launch youtube-dl, and we’re going to need to write the resulant file back to the browser, we’ll need the native nodejs modules ‘child_process’ and ‘fs’. Knowing this, let’s add the following code to routes/coffee/index.coffee
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Let’s add this to our routes in app.js with this line:
Now restart our development server and go to http://localhost:3000/youtube_mp3/mayCvk2P4f0 to see the following in the log. It will probably differ slightly, and as you can see, the internet connection I’m on is terribly slow. If you have something in your browser capable of playing mp3’s, you should hear audio.
1 2 3 4 5 6 7 8 9 10 11 12
So, what just happened? We take the id of a video, and feed it to youtube-dl, which finds us an appropriate FLV file. This is then fed into ffmpeg, which outputs this to a static mp3 file, which we then open and read. We should probably upgrade this to cleanly delete the mp3 when it’s done, but it will be better to do this as streaming instead, then not only will we have no intermediary file to worry about, it will be a better user experiance.
To see what I mean about a better user experiance, instead of using a short video, let’s try a long one. I’m going to demonstrate this with a 8:44 long video of New Years 2011 in Times Square, . Go to http://localhost:3000/youtube_mp3/GKpRXswgDwU. If you have a fast internet connection, this video may download fully and play before your browser times out, but I doubt you’ll be impressed with the performance of our first attempted solution. I won’t paste the log here, but you’ll see the video is ~133mb! Way too much data to download all at once and expect reasonable performance. Now, youtube-dl has some options about setting the max quality and since it defaults to the highest availible we could probably do well to turn that down. In fact, if you do end up using this code in production, I would suggest you do that anyway if only to lower the bandwidth bill.
In order to make this work with larger videos, we can’t just download the whole video up front. We need to be able to simultaneously stream the video from youtube into ffmpeg, and the output of ffmpeg, directly into the response.
Let’s step away from the code and get something working in the shell first. To
solve the first problem of streaming the FLV into ffmpeg, we can get ffmpeg to
take input from stdin, and use a unix pipe to stream in the data outputed from
GET’ing the URL of the FLV. We’re going to need to get the URL that’s used
internally by youtube-dl. Luck us, there’s a combination of options that
outputs this for us in
./youtube-dl --help. Here’s the incantation to get
the FLV from a youtube video URL without downloading any video:
Now with the URL, we can write the new streaming code by first retriveing that URL by using a youtube-dl child process. Then we can GET the FLV, as we recieve the file from youtube, we can pipe the data directly into a ffmpeg subprocess. With ffmpeg set to output the data on stdout, we can pipe this data directly to the response.
The two subprocess streaming to response setup can be done using pure nodejs,
but why do that when we have the excellent
request library at our disposal. As you
will see, this library makes the code a lot shorter and easier to read. Add
this as a dependancy to the project in the package.json file, and then
install it.Once that’s done, let’s replace the whole file of
routes/coffee/index.coffee with this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
There are a couple of bits of “magic” that need to be covered in the above code. Digging around in the source of youtube-dl, you’ll see the request to get the FLV has the header ‘Youtubedl-no-compression’ set to ‘True’. I mimic this behavior in order to ensure ffmpeg get’s uncompressed FLV data from youtube, as ffmpeg does not support compressed FLVs. A possible upgrade later is to dechipher the FLV compression and decompress this on the fly to ffmpeg, so we can use less bandwidth downloading the video. The other bit of magic are the command line args we’re passing to ffmpeg. They are a combination of the same args used by youtube-dl to produce an mp3, as well as telling ffmpeg to take data in from stdin (the ‘-i’, and ‘pipe:0’) and output to stdout (the ‘–’) We’re also using LAME to re-encode the audio to mp3, you may need to adjust this to an encoder availible on your system, or install LAME if it isn’t installed.
A few other comments, I dropped the dependency on ‘fs’ as we aren’t reading a file anymore, and I use ‘child_process.exec’ for getting the URL, as the output is short and we fully depend on the completion of this process before moving on. I use ‘child_process.spawn’ for creating the ffmpeg child process because it allows access to the stdin and stdout streams. There’s also lots of things that can go wrong that I’m not checking for, but this is a proof of concept code sample anyway, so use at your own risk.
After restarting your development server, go to http://localhost:3000/youtube_mp3/GKpRXswgDwU. What once either timed out or sat for a long time no longer does! Provided everything went OK, we get the expected behavior of streaming the audio through the whole process.
A note about scaling: this setup would be expensive to scale due to the re- encoding which is CPU bound, and with the code above you’re limited to a single machine. To scale this further, I would useZeroMQ. On each request for a youtube video to be encoded as audio, use a ZMQ_PUSH socket to give a FLV URL worker a :youtube_video_id to lookup to a FLV URL. This worker could then ZMQ_PUSH to another set of workers that download the FLV and re-encode the audio to mp3, while publishing with ZMQ_PUB a response_id and a chunk of data to write to it. This setup will scale horizontally very well, additional horsepower can be added very easily to the re-encoding layer. Alternate scaling plans could be to use Ha-Proxy to round-robin the full HTTP requests to different machines running the code here, but this would need to be separate from the rest of the web application’s code.
I hope that the exercise was as educational for you as it was for me. Some other things to think about, how do we enchance this to accept “Range” requests so we can use the streamed audio with jPlayer? We’ll need to have better error handling to use this in production, where are some good places to add error handling and what’s worthwhile to validate? Is there anyway we can decompress the FLVs on the fly and then stream this output to ffmpeg, to save bandwidth? What’s a good caching strategy for looking up the URLs of the FLVs from youtube? Is it worthwhile to move the URL retrieval process into JS so we can do this in nodejs?