Fansubbing has, since I stopped actively subbing, heavily migrated from an FTP-focused workflow to a Git-based one. I have long harbored suspicions that this was a bad idea on net, but having a recent opportunity to watch a group sub a show convinced me that a longer explanation of my position might be useful.

History

First, I have to explain the historical workflow. Everyone in a group has access to an FTP server, where the workraw[1] and premux[2] are both uploaded to a folder for the episode as they are encoded. The translator will generally type out the initial script into an Etherpad clone[3], along with all the signs and timestamps. The typesetter(s) will complete the signs in the pad, marking them off as they go, and uploading the file containing the signs to the FTP when completed with a filename like showname_06[ts-ryan].ass and all the fonts they used. The timer will take the dialog from the pad and time it in Aegisub, splitting where appropriate, and upload a showname_06[time].ass file along with their chosen dialog fonts, and usually copy over the OP and ED files from previous episodes as part of this. Editing can happen either in the pad (my preference) or after timing, in which case there’d also be a showname_06[edit].ass file dumped in the folder. Translation checking was also handled both ways, when applicable. Finally, the QC will merge the typesetting and finalized dialog and upload a showname_06[merged].ass, and they or others will upload showname_06[qc].ass and potentially successive versions with changes in response to the QC pass[4][5]. Finally, someone took the final version, muxed everything together in mkvtoolnix, made a torrent, and threw it up on Nyaa. Show released!

Changes

Since then, there have been two major changes to the workflow.

The first major change is migrating to Git for a lot of the process. This was, as best I can tell, done in response to the host of different versioned files that would be dumped on the FTP. Some people felt that version control was a better solution, and Git was chosen for that purpose as it was by this point effectively the standard in software engineering.

The second is automating the actual release process much more heavily, in an effort to stamp out issues like forgetting to mux fonts, incorrect naming, and merging the wrong scripts. This workflow change seems conceptually positive to me; even if I have gripes with some of the specific tools used for it[6], we were previously at a suboptimal level of release automation. I want to be clear that this is theoretically separate from the migration to Git, and the benefits of automated releases can be accrued independently. The existing tools tie the two together, but I consider them overcomplicated and don’t believe the marriage is intrinsic. There’s nothing stopping people from using a standardized naming scheme on FTP and a more straightforward tool to simplify the merge, muxing, and release processes.[7]

My core contention is that because the same person made the general tooling improvements and some of the tooling that enabled a Git workflow, people mentally link them and adopt the less useful portion (Git) unnecessarily. I also claim that the existing tools are overly generic and difficult for most people to use.

Strengths

I’ll start with what Git’s theoretical strengths: it offers a single latest version of any file going into a release, with a clear history of changes to it, and enables concurrent editing to a file. If you’re using Github, the pull request process also offers a way to review QC changes, creating a middle ground between the QC purely taking notes and the QC applying changes directly. It also creates nice web-based diffs automatically, which makes it generally easier to review others’ work.

Unfortunately, I’m not convinced these are particularly great benefits. Having a single latest version of a file is useful, but wasn’t a big issue in practice on the FTP, even if showname_ep06[qc][final][v3][no but for real this time].ass looks goofy, and could have been dealt with by standardizing naming (enforced by a release bot). Similarly, the concurrent editing is nice in theory but in practice does not come up very often if you’re separating dialog and TS.

I’m also not sold that the version history matters much at all for subbing. I’ve almost never seen it referenced due to the fundamentally short-term nature of these releases. In the scenario you’re looking to change something for the Blu-rays, it’s usually not long after the project and the staff can just search in the channel to read the discussion, if one ensued. You do ensure that the version history is never accidentally overwritten, which is much easier to do on accident with an FTP, but if it doesn’t matter much either way I don’t consider the occasional mistakes to be of much consequence.

Finally, the pull request process seems mildly beneficial but again not all that great. Most of the changes aren’t actually discussed in context on the site, and instead are discussed in the staff channel in Discord, and even a competent QC will probably leave some changes to the original typesetter, so you’ll end up with the good ‘ol process of pinging the typesetter to push changes anyway. This, and similar workflows borne of the diffs being intrinsic to git, is probably the strongest argument for Git, though I personally know a lot of people that don’t like it and don’t take advantage of this at all. It seems great for some groups, particularly ones composed heavily of programmers comfortable with Github, but probably not great for the median project.

Problems

Now that I’ve talked about the benefits and why I consider them minimal, what about the downsides?

The biggest one is that Git is really difficult to use. If you think otherwise, you probably don’t understand it very well. Software engineers are overrepresented in fansubbing, so basic familiarity is not uncommon for many of the staff, but even then in practice every group has someone whose part-time job is unfucking the inevitable messes Git produces. And for people who don’t work in software, which includes many of the translators[8], Git means punching in arcane commands or dealing with an extremely complicated GUI, and occasionally getting into a bad state and begging a team member to help[9].

It also necessitates further mandatory tooling setup to contribute to a project. In practice, everyone still uses an FTP for the premuxes[10], so getting Git installed locally along with a GUI, making a Github account, and getting this all running is an extra step for every non-programmer to muddle through. You’re also putting your questionably legal fan translations on Github, which has so far been fine but I personally have reservations about. Also unlike when typesetting gained increased tooling requirements, these offer no improvement (hopefully) to the final output. It’s theoretical minor efficiency, and minor consistency, improvements, in exchange for a more complicated toolchain pushed onto the entire group. The tradeoff is quite different.

It also makes it surprisingly easy to leak personal information if a group wants to make a repo public. There’s an extra manual step of checking to see if anyone used identifying information in their commits, and rewriting the history if so.

Finally, as mentioned earlier, Git adoption seems to have been tied originally to SubKt[11], and more recently also muxtools. Both of these are monsters, and are so generic that they force every group to have someone comfortable in Kotlin or Python, respectively, to set up and maintain them for every project. In practice this means that similar to Git, every project ends up needing someone whose part-time job is tech support. These tools are difficult to set up and CLI-only, so good luck to anyone who has to interact with them that isn’t a programmer!

Looking forward

I think the situation has gotten sufficiently bad that groups have actually ended up in a worse place than the historical FTP-based workflow. I view the current situation as programmers responding to their dislike of manual processes by pushing incredibly complex tools on non-technical users for very marginal benefits (and very likely negative time saved).

What would a better world look like? A more opinionated tool, with YAML or some equivalent for configuration, running as a bot on a server with an FTP and enforcing a standardized naming scheme, and with less magical mega-merging and more checks to make sure everything necessary is present as part of a release, alongside the more rote aspects like muxing and uploading. It should opt for manual processes over automated ones for anything too complicated, and try hard to be straightforward for a nontechnical user. This means it has to be a single binary on Windows/MacOS, packaged in an installer, and have a GUI for editing the configuration and generating a release. No more “just learn python lol”. There’s a world where we have tooling that’s strictly better than the old workflow, but we aren’t there today.


Thanks to Petzku for reading a draft of this post.

  1. 1: Initial encode designed to be run off quickly for the subbers in the earlier parts of the process 

  2. 2: Proper encode, ready to be muxed into the finished product 

  3. 3: Etherpad is a minimal collaborative online editor that highlights changes. publishwith.me was historically used for this purpose 

  4. 4: Some groups had the QC apply changes directly, and others didn’t allow it at all. You’d end up with greater variations on the file name depending on how many people had to touch it to make the changes suggested by QC. 

  5. 5: For the purposes of this piece, I’m counting stuff like running ASSWipe as part of the merge/QC process. 

  6. 6: To be addressed later 

  7. 7: This actually existed in the form or Servrhe, which was a Commie-specific bot for automating status updates on the site and releases. However, it was never actually made suitable for public consumption, so adoption outside of Commie was nonexistent 

  8. 8: Which I will note is also the hardest role to fill, by a significant margin 

  9. 9: This is not some hypothetical issue; I don’t sub anything and I’ve been in multiple project channels where this happened, to the obvious frustration of the poor TL 

  10. 10: Github’s LFS is an obviously poor option for pirated content 

  11. 11: Made by the wonderful Myaamori, whose choices I am implicitly complaining about in this article but who I cannot stress enough is absolutely the best