Potential for WikiData in creating and consuming content? 🖨

WikiData is the massively powerful database behind Wikipedia.

It contains objects, identified with a Q: Q12345; it also contains properties, identified with a P: P1234.

In creating content, it can help to easily insert stable references to crucial aspects of methodologies. For example, if I use a z-test to analyse my data, I could reference Q897827 in my methods section.

After being created this way, consumption could be facilitated by providing translations. For example, for Q897827 the english version is z-test, the German version Gauß test, and the French version Test Z. This might help improve accessibility.

Other options include providing identifiers for modules posted by people. For example, a hypothesis module could directly be identified using Q41719 in the metadata.

I would love to discuss this further and see what value using WikiData can add to content. How to implement the production of content can be discussed too.

In order to continue work on this, I submitted an application to Wikidata Con in October 2019. More details contained below (hidden for brevity)

Proposal

Participant Background

  • What Wikidata or Wikibase related skills, experience or knowledge could you share with the other participants during the conference? Some programming, systems thinking, inclusivity,
  • In what ways are you involved in Wikidata and/or Wikibase? Reuse of WikiData to provide richer and easier ways to produce and consume written content.
  • Are you part of an organization or a project that is already contributing to or analyzing Wikidata’s data, or could youconsider doing it in the near future? Not yet, but I can imagine a feedback for users to easily provide translations of Objects when Wikidata objects become more commonly used to translate texts (which provides better Object translations than google translate for technical terms I find into more niche languages)
  • Are you reusing Wikidata’s data? Yes, by providing a “insert Wikidata” tool while writing (scholarly) content.
  • Are you part of an organization that is already using their own Wikibase instance or could consider doing it in the near future? Not yet, but would be interested in exploring this option.
  • What other related projects are you involved in that make use of Wikidata/Wikibase? None (getting started)
  • How do you see yourself using Wikidata in 10 years? I envision Wikidata as the main source of providing Semantic and easily text-minable content. It will be part of everyday use in producing content across the web, inserting references and relations between those references. As a consumer of information, filtering based on Wikidata properties and objects across a prespecified set of pages will be accessible to anyone in an intuitive way, in order to deal with the excess of information being produced.
  • What languages do you speak? At what level/how do you use them? Dutch (Mother Tongue), Limburgish (Mother tongue), English (bilingual), German (advanced)
  • Would you like to volunteer during the event? If yes, for which role would you be ready to commit to? Yes, as a room angel, Accessibility unicorn 🦄

Program submissions #1

  • Do you want to submit sessions to the program? No

Event Attendance Information

  • Days you plan to attend Friday, October 25, Saturday, October 26
  • Have you attended previous international Wikimedia events? No
  • Dietary preference Vegan
  • Your cellphone number +491735738788
  • Emergency contact -

Visa

  • Do you need a visa to travel to Berlin? No

Scholarship

  • Do you want to apply to a scholarship? No
  • If you don’t get a scholarship from the WikidataCon organization, do you have other organizations you can ask for funding? If yes, which ones? no but live in Berlin so none needed :slight_smile:
  • After attending to the WikidataCon, how do you plan to share the knowledge and experience you gained within your organization or local community? I document everything back to the Liberate Science community on the newsletter and will provide key points in the ongoing discussion on how to implement WD in content production (https://talk.libscie.org)

Update 2019-06-12: Proposal for WikiDataCon rejected

Each module that is communicated can contain a separate type of information. Octopus uses a set of eight predefined categories to make it easier to understand what steps are being taken:

Which visually results in something like this

Screenshot%20from%202019-05-15%2009-21-01
Image from Alexandra Freeman

I am personally no fan of setting up predefined categories because it restricts how modules may be used and bounds them to the conceptual framework of the designers (e.g., how do humanities fit in here?). I would like the content producers to be able to choose themselves.

This is where WikiData might be valuable as a query service for content producers. Each of the types outlined above can be mapped onto an object in that database. Moreover, if modules provide a valuable way to organize some other form of information production (e.g., for musical producers), this dynamism needs to be available.

A foreseeable (?) issue is that content producers are faced with multiple choices to add a WikiData object. This might make it more difficult in harmonizing the consumption, because exact matches might overlook similar modules. For example it is easy to image users choosing various of these when creating an interpretation module
Screenshot%20from%202019-05-15%2009-31-19

Differing semantics may also have an effect: do you use hypothesis or prediction? But then again, maybe there is a substantial difference. These are issues on the consumption side and the production side.

Anyway, long enough ramble for now.

I see your point and partially agree. For example, a category “theory” is missing, in which contributors can present and discuss theories without needing to turn them into hypotheses or predictions or whatsoever.

On the other hand, I do think that it could really help to have such categories in ordering modules and making sense of them. If for example I would like to know what datasets exist that address a particular hypothesis (or prediction, I think prediction is more inclusive?), it would be very helpful if I can select modules that contain data. And in placing a hypothesis, it would be helpful to be able to go through other hypothesis modules to check if it doesn’t already exist. If such categories do not exist, the search costs will greatly increase.

I’m not entirely sure how WikiData could replace this, because if a Q is used to identify modules than we still need to agree on what Q categories to use, which would still require coming up with categories.

So to reduce search costs I think categories are very helpful, but such categories need to be comprehensive/all encompassing. So couldn’t we come up with a more comprehensive set of categories that would fit all disciplines, and where relevant come up with subcategories? Linking those to a Q is of course still possible. Furthermore, such categories can be fluid anyway, such that there is always an “other” category that contributers can use and via which they can propose new categories.

1 Like

Thanks for the reply :slight_smile:

I agree that having some form of sense-making across modules is worthwhile, this is exactly why I think WikiData is the place to go. Us building a taxonomy will homogenize the inputs again, simply because we’re fallible and non-encompassing in our efforts based on our context.

This infrastructure of public project management is designed within a scholarly research context, but need not be limited to it. I want to be able to use it to manage my music projects as well.

WikiData is a well maintained database with many entries. The Qs are defined not by us, but by the WikiData community (which will make it important to contribute back to that community as well to have this project succeed). It is more likely that this will be encompassing than any effort we even start.

As a sidenote: When I was on the PsyArxiv steering committee, we discussed about setting up a taxonomy for preprints in psychology (fields, subfields, etc). This was not an easy task, especially given that we’d need to agree on both the categories and subcategories.

I think it best to use an emergent structure using WikiData Qs, instead of imposing a structure by selecting a few Qs (or making up categories ourselves). We can always provide personalized suggestions to users based on the modules posted by the peers in their network (which will allow crossing the issue of scientists v musicians as well).

1 Like

Even if the backend uses a free structure (good idea, on balance, for my money), the actual structure will be codified by intermediate tools which allow novice users to use the system (‘power of defaults’).

This is obviously okay, but it also means that, insofar as we’re in charge of developing those initial tools, we’re setting that structure tacitly. So it’s worth thinking about what that structure will be, even if we’re recommending rather than imposing it.

1 Like

Hi, big fan of Wikidata here <3

You probably know about Scholia, I think it’s far from ideal but it’s interesting to look at. They work on data of articles uploaded into Wikidata (even have a tool to automate migration from arxiv to wd through QuickStatements), and then show “aspects” of publications, authors, topics. They even have one for “uses” (basically when the article Q in wikidata has the property “uses”, displaying info on a method, e.g.).

I understand the tension between proposing structures and allowing flexibility (e.g. my thesis in social studies of tech would hardly fit in the graph @chjh shared). What I like about WD is precisely that multiple perspectives are allowed and as long as there’s a reason and a reference, it’s ok.

What I was wondering, as what is usually not very flexible in WD is the list of available properties… How were you thinking the statements? Something like “my work/paper/project/whatever(Q) --> uses (property) --> z-test (Q) / qualifier: methods (Q)”? It would be something like linking keywords to WD items, but qualifing them to specify aspects (which could be the ones proposed by Octopus).

Just thinking aloud, sorry for all the blah blah

Thanks for the “blah blah” :blush:

I also checked out Scholia a few times, and it’s interesting. Also interesting how WikiCite is sort of overtaking the largest part of WikiData :man_shrugging:

For modules themselves, I was thinking to only include properties and no relational statements (at first). If I look at the scope of these, I also think many of them don’t really apply or clarify the situation? Open to suggestions though!

For text within modules, citing Wikidata objects could include relational statements as well (e.g., z-test (Q) is a test of § depression (Q); I don’t know whether these actually exist). This would require a good specification to manage these relational triplets (or use one that exists) and an easy to use relational citation manager. :floppy_disk: This could then be part of a WYSIWYG editor?

Yup, WikiCite is really big, weird enough :stuck_out_tongue:

I think I got a bit confused re: goals - Linking modules to wikidata would make them more easy to find and link with other data for sure, but in which context and what for?

Yup, agreed with the good specification. Shape expressions are now working in WD, could be super useful for modeling and validation.

Also, I saw you applied to WikiDataCon… Wikimania 2019 is next week in Sweden. There are quite a few Wikidata sessions this year, haven’t checked yet.

We’re now starting integration in Hypergraph of this. See also

We integrated this list into Hypergraph :slight_smile:

This is just a basic list to get started, but we’d be happy to add anything people need directly, so feel free to make a Pull Request with any WikiData identifiers you might need in your daily work.