In the last few weeks I have been working on a database of Delta Green scenarios. You can find it here: https://www.dg-scenario-database.com/
I already posted this on Discord and Reddit, but I totally forgot to put something about it on this blog.I started building the database because for my current campaign I wanted to use mostly shotgun scenarios with a common theme. But there are a lot of shotgun scenarios, over 400 by now. So how do you find scenarios that share elements, without having to read all of them? It would be cool if each scenario had a few tags by which we could find the ones we are most interested in. And maybe there is a way to assign the tags automatically.
I scraped the shotgun scenario texts from the Fairfield Project² Wiki and started experimenting with automatic tag extraction. For this I applied Named-entity recognition (NER), in the hope that named entities in the text correspond to important elements of the scenario. I used the spaCy library to do the NER.
This approach worked okay-ish. There was a lot of noise in the results. I had to filter out certain NER labels that were not useful at all, like quantities, cardinal numbers, etc. And the remaining entities had to be cleaned up as well. E.g., almost every scenario contained the "Delta Green" entity, which makes it a pretty useless tag. Other entities had to be normalized, because they referred to the same thing but were written differently, like "MJ-12" and "Majestic-12".
I built a small web UI to make the manual cleaning of the tags easier, at which point I decided I could turn this into an actual website that could also be used by other people. And thus, the DG Scenario Database was born.
I added functionality for user registration and login, so that registered users can add new scenarios, add or remove tags, and even vote for their favorite scenarios.
There are still a few issues that have to be addressed:
- The database contains all official and all shotgun scenarios, but there are still a lot of other fan-made scenarios that should be added.
- Some tags might be wrong, or at least not useful. The NER might have identified entities that are technically in the scenario text, but are not really central. Ideally, all tags correspond to elements that are actually important to a scenario.
- The NER might have failed to identify an entity, which means that a corresponding tag is missing from the scenario.
- The database needs more tags that correspond to concepts or types of scenarios. E.g., I already added a "disease" tag for scenarios that deal with outbreaks of infectious diseases. We need more tags of this kind.
The database has been running for a few weeks now. After my Reddit post there was an influx of traffic, but not a lot of people have registered and contributed to the tags or voted. I hope that this changes over time and that this database can grow and become even more useful. A big thanks to all those that have already contributed!
This an incredible and super useful piece of work! Seriously awesome! Did you know in advance about all the DG scenarios available on Google docs or did the app scrape the whole web? I posted a note with a link to your database on both Yog-Sothoth.com and BRP Central. I will also sign up and help with adding a few scenario tags!
ReplyDeleteThe Google docs scenarios are mostly submissions to contests that were run on the Night at the Opera Discord server, that's where I have them from. Thanks for linking the database elsewhere and for any contributed tags, it's appreciated!
Delete