https://s3-us-west-2.amazonaws.com/secure.notion-static.com/9f5b8539-09b2-4ed8-8b0f-412164f59e1e/Untitled_presentation_(2).png

Hey everyone, after being selected for GSoC 2020, there's a great summer full of open source development. This is the second weekly check-in where I've shared my experience of getting involved in development.

Hello everyone, I'm back with a new weekly check-in to share my experience with the ongoing Community Bonding Period period during Google Summer of Code 2020. As you may know, I'm working with Kiwix on improving the new python scrapers to make the best content available on the internet easily accessible by archiving it. So let's get started.

What I'm up to this week ?

In this week, I iterated a bit upon the video module in zimscraperlib and basically, it should be ready for making it to master. We did a lot of improvements on it and made it more general and easier to use. Also, I had a discussion on what's going on with the Project Gutenberg scraper and devised our plans to move forward with it. I also tried to improve the language info module in zimscraperlib to make the language filtering better in scrapers like ted2zim.

The number of FFmpeg options and the number of languages in the world are quite comparable.

Talking of the concrete work done, I have already did pull requests for the improvement in language detection and the video module in zimscraperlib and am waiting for it to get reviewed. Meanwhile, I've also worked on the ted and youtube scrapers locally to make them support the newer changes and have a script to create multiple zim files.

What's ahead now?

After all the changes that we did recently make it to master, we're planning for a new release of zimscraperlib and the scrapers using it. We're also planning to test the new TED scraper that I rewrote in the zimfarm so that we can have some good experience with the real-world scenario. Now, talking about the long term, we now plan to be working on improving the Project Gutenberg scraper.

Challenges I faced

This week was a great learning experience and I came to know about basically two things - 1. FFmpeg has a lot of options and 2. FFmpeg is not an easy beast to please (these were the exact same words that my mentor said ).

I came to know that the best way to deal with a wrapper for FFmpeg in python would be to keep it simple and actually let the user play with it.

Verdict

So, overall this week is quite exciting and made me learn a lot of new things. I'm very thankful to Kiwix to have provided me with this responsibility and my mentors who have always shown me the right approach to take. GSoC is an exciting journey and has already impacted me in multiple awesome ways. I'm super excited to experience what's coming