Property talk:P274

From Wikidata
Jump to navigation Jump to search

Documentation

chemical formula
description of chemical compound giving element symbols and counts
Descriptiondescription of chemical compound based on element symbols
Representschemical formula (Q83147)
Data typeString
Domainchemical coumpounds (note: this should be moved to the property statements)
Allowed values([αβγδφωλμπ]-)?([([]*[A-Z☐][ub]?[a-z]?[₁₂₃₄₅₆₇₈₉₀.]*(\)?[¹²³⁴⁵⁶⁷⁸⁹⁰]*[⁺⁻]?)?[\])|,₁₂₃₄₅₆₇₈₉₀]*(·\(?[-0-9.]*n?\)?)?)+
Examplewater (Q283) → H₂O
carbon dioxide (Q1997) → CO₂
ethylene (Q151313) → C₂H₄
Tracking: sameno label (Q28046688)
Tracking: differencesno label (Q20636209)
Tracking: usageCategory:Pages using Wikidata property P274 (Q20636211)
Tracking: local yes, WD noCategory:Chemical formula not in Wikidata, but available on Wikipedia (Q20636201)
See alsogeneral formula (P1673)
Lists
Proposal discussion[not applicable Proposal discussion]
Current uses
Total1,425,095
Main statement1,425,091>99.9% of uses
Qualifier3<0.1% of uses
Reference1<0.1% of uses
Search for values
[create Create a translatable help page (preferably in English) for this property to be included here]
Single value: this property generally contains a single value. (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P274#Single value, SPARQL
Format “([αβγδφωλμπ]-)?([([]*[A-Z☐][ub]?[a-z]?[₁₂₃₄₅₆₇₈₉₀.ₓ]*(\)?[¹²³⁴⁵⁶⁷⁸⁹⁰]*[⁺⁻]?)?[\])|,₁₂₃₄₅₆₇₈₉₀ₓ]*(·\(?[-0-9.]*n?\)?)?)+: value must be formatted using this pattern (PCRE syntax). (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P274#Format, SPARQL
Conflicts with “general formula (P1673): this property must not be used with the listed properties and values. (Help)
List of violations of this constraint: Database reports/Constraint violations/P274#Conflicts with P1673, hourly updated report, search, SPARQL
Allowed entity types are Wikibase item (Q29934200): the property may only be used on a certain entity type (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P274#Entity types
Scope is as main value (Q54828448): the property must be used by specified way only (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P274#Scope, SPARQL
This property is being used by:

Please notify projects that use this property before big changes (renaming, deletion, merge with another property, etc.)

Note:

Subscript and superscript

[edit]

Shouldn´t that be written like C<sub>2</sub>H<sub>6</sub>O? --Goldzahn (talk) 06:26, 13 March 2013 (UTC)[reply]

Good question but I think it's the same at the end: a template can put all numbers in the right format for display. I prefer to avoid mixing data and display features in wikidata DB. Snipre (talk) 14:34, 13 March 2013 (UTC)[reply]
The subscripts is a notation, not a display feature. /Esquilo (talk) 16:00, 15 March 2013 (UTC)[reply]
Ok, but it's a notation for html display, and depending on the use this notation will be useless. Again wikidata provides raw data and data users will do what they want with that according to their programming language or display features. Snipre (talk) 16:46, 15 March 2013 (UTC)[reply]
I think it would be very difficult to write a template that will output C2O42− for Oxalate. I suggest to allow <sub></sub> and <sup></sup> in any string. HenkvD (talk) 12:59, 16 March 2013 (UTC)[reply]
But in wikidata syntax subscript is not defined by html format. So the html format as to be avoided because it is meanless in wikidata. I agree about the convention but for format reasons there is no way to describe subscript in an unique format. So each time you will try to use wikidata data you have to first convert subscript information. So at the end it is simplier to write without subscript format in my opinion. Snipre (talk) 11:16, 17 April 2013 (UTC)[reply]
It does seem like a string is a too simple data format for something as complex as a chemical formula. Something like MathML would be needed that allows complex formulas to be displayed correctly. --Tobias1984 (talk) 11:34, 17 April 2013 (UTC)[reply]
I tried H<sub>2</sub>, H&#8322; in the demo system but {{#property|p274}} results in H<sub>2</sub>, H&#8322; instead of H2, H₂. So that does not work either HenkvD (talk) 08:08, 21 April 2013 (UTC)[reply]
User:Pyfisch‎ uses Unicode (Q3513021) --Chris.urs-o (talk) 08:17, 27 May 2013 (UTC)[reply]

Different way

[edit]

Wouldn't it be better to construct the chemical formula from the elements. So H2SO4 would be "2 x H + 2 x S + 2 x O". That way queries for substances containing H could be constructed, instead of queries for "H" in the chemical formula also returning "He". --Tobias1984 (talk) 08:26, 17 April 2013 (UTC)[reply]

We are thinking about adding new properties form atom definition which will be used to calculate the molecular mass but as the molecular formula is a well known identifier for molecules it is better for query reason to have it already defined and to avoid a decomposition of the query. Snipre (talk) 11:11, 17 April 2013 (UTC)[reply]
WolframAlpha does it pretty neat. (e.g. https://www.wolframalpha.com/input/?i=glucose) when you press on the formula it resolves it into number of atoms and mass fraction. I think this is the level of database intelligence we should strive for. --Tobias1984 (talk) 11:38, 17 April 2013 (UTC)[reply]

Unique value

[edit]

These property doesn't have unique values. There are minerals with the same chemical formula. A chemical formula is a summary, and it is only a repeating unit for networks and chains. --Chris.urs-o (talk) 20:11, 25 May 2013 (UTC)[reply]

  • I add this constraint here for experiment: {{Constraint:Unique value}}

I have a plan to remove it tomorrow. — Ivan A. Krestinin (talk) 22:05, 15 January 2014 (UTC)[reply]

format constraint

[edit]

i propose to use a format constraint with the pattern (([A-Z][ub]?[a-z]?(<sub>[0-9]+</sub>)?(<sup>[0-9]+[+-]?</sup>)?)+|([A-Z][ub]?[a-z]?[₀₁₂₃₄₅₆₇₈₉]*([⁰¹²³⁴⁵⁶⁷⁸⁹]+[⁺⁻]?)?)+). these values wouldn't match it:

is there a need to expand the pattern? --Akkakk 23:26, 5 June 2013 (UTC)[reply]

stripped the html-variant and changed it to "([([]?[A-Z][ub]?[a-z]?[₀₁₂₃₄₅₆₇₈₉]*([⁰¹²³⁴⁵⁶⁷⁸⁹]+[⁺⁻]?)?[])|,₀₁₂₃₄₅₆₇₈₉]*(·[0-9]+)?)+". can you give an example for ☐? is it instead of an element symbol? the following wouldn't match. --Akkakk 13:13, 6 June 2013 (UTC)[reply]
  • Q104692 (Ca2(Mg,Fe)5[OH|Si4O11]2)
  • Q355615 (CrO<sub>4</sub><sup>2-</sup>, Cr<sub>2</sub>O<sub>7</sub><sup>2-</sup>)
  • Q411314 (C<sub>3</sub>H<sub>3</sub>N<sub>3</sub>O<sub>3</sub>)
  • Q411876 (P<sub>2</sub>O<sub>3</sub>, P<sub>4</sub>O<sub>6</sub>)
  • Q422642 (H<sub>2</sub>CrO<sub>4</sub>, H<sub>2</sub>Cr<sub>2</sub>O<sub>7</sub>)
As I understand <sup> and <sub> are invalid on WikiData, right?
As I understand, sometimes a site on an unit cell of a crystal of a mineral group is empty, and the charge is compensated on another site. --Chris.urs-o (talk) 18:29, 6 June 2013 (UTC)[reply]
i don't know any rule that prohibits <sup> and <sub>, but we should use one form and i think unicode is better. added the box as alternative for chemical element. --Akkakk 00:06, 7 June 2013 (UTC)[reply]
Is vacancy symbol ☐ an official symbol for chemical formula ? Because when I look at the wikipedia article~s this is no symbol like this. Snipre (talk) 00:31, 7 June 2013 (UTC)[reply]
I don't know about IUPAC nomenclature of inorganic chemistry (Red Book), but scientific literature, rruff.info/ima/ and mindat.org uses it throughout. If I remember it right, mindat.org used '{}' for vacancy a while ago.
The chemical formula of minerals comes from rruff.info or mineralienatlas.de (a form of secondary literature). You can have a look at the end of the page of tremolite (mineralienatlas.de). --Chris.urs-o (talk) 02:02, 7 June 2013 (UTC)[reply]
I think greek letters should be allowed too
Belite: α-Ca2SiO4, β-Ca2SiO4, γ-Ca2SiO4; α-, β-, γ-, δ- cycloheptasulfur; φ-, ω-, λ-, μ-, π-sulfur. --Chris.urs-o (talk) 07:19, 8 June 2013 (UTC)[reply]
Could we dump unique value? Some sources use a diferent format, rruff.info and mineralienatlas.de use different values. Mineralienatlas.de is more correct, but rruff.info is more up to date. --Chris.urs-o (talk) 04:22, 7 June 2013 (UTC)[reply]

changed the pattern, assuming the number after the "·" is optional. --Akkakk 23:55, 9 June 2013 (UTC)[reply]

We have ·xH₂O and ·nH₂O, as well. --Chris.urs-o (talk) 15:12, 20 June 2013 (UTC)[reply]
added [nx]? --Akkakk 13:41, 21 June 2013 (UTC)[reply]
allowed ⁻ without number. --Akkakk 16:21, 21 June 2013 (UTC)[reply]
Thx. It's tempting to use only '·nH2O', but I'm not so bold. --Chris.urs-o (talk) 18:17, 21 June 2013 (UTC)[reply]
Note: autunite (Q407345): Ca(UO₂)₂(PO₄)₂·(10-12)H₂O, thomsonite-Sr (Q655464): NaSr₂Al₅Si₅O₂₀·(6-7)H₂O, parsonsite (Q1067103): Pb₂(UO₂)(PO₄)₂·(0-2)H₂O Regards --Chris.urs-o (talk) 05:03, 22 June 2013 (UTC)[reply]
added --Akkakk 10:23, 22 June 2013 (UTC)[reply]
I changed my mind, deleted [x]?, standard is [n]? now --Chris.urs-o (talk) 07:42, 26 June 2013 (UTC)[reply]
then the [] aren't needed ;) --Akkakk 11:18, 26 June 2013 (UTC)[reply]
changed pattern to match (KAl₃[(OH)₆(SO₄)₂]). should (Cu₄(AsO₄)₂(OH)₂·2.5H₂O) be valid? --Akkakk 10:16, 7 July 2013 (UTC)[reply]
Yup, it should, you see it on newer formulas. --Chris.urs-o (talk) 13:26, 7 July 2013 (UTC)[reply]

Hello, bot stops the report updating die to error in pattern. Regexp parser says "range out of order in character class". — Ivan A. Krestinin (talk) 15:14, 11 August 2013 (UTC)[reply]

Sorry, I'll ask Akkakk to fix it. --Chris.urs-o (talk) 02:32, 12 August 2013 (UTC)[reply]
reverted to last version by me and added . to match (Cu₄(AsO₄)₂(OH)₂·2.5H₂O) --Akkakk 10:57, 14 August 2013 (UTC)[reply]

Errors

[edit]

Hi, I compare chemical formula (P274) + PubChem CID (P662) with PubChem database and generate disagree list: User:Ivan A. Krestinin/Chemical compounds. It will be great if somebody helps with error fixing. — Ivan A. Krestinin (talk) 22:05, 31 January 2014 (UTC)[reply]

Q419714: '(CH₃COO)₂Cd' differs from new value 'C₄H₁₀CdO₆'.
  • Cadmium acetate: anhydrite CAS 543-90-8; dihydrate CAS 5743-04-4
Q1014242: '(NH₄)₃[AlF₆]' differs from new value 'AlF₆H₁₂N₃'.
These might be not full errors, but two versions of the same substance, take care. Three chemical formulas of the same substance might be acceptable, don't overwrite it. The qualifiers/references are important, though. Simple empirical formulas aren't so good in organic chemistry. The PubChem ID might be wrong. The CAS hyperlink isn't working. --Chris.urs-o (talk) 04:39, 2 February 2014 (UTC)[reply]
This is autogenerated list, false positives are present. The list was generated to check data consistence and for manual error fixing, not for auto replacing something. '(NH₄)₃[AlF₆]' was bug in checker, fixed. CAS links was fixed. Wrong PubChem ID are needed to be fixed too. — Ivan A. Krestinin (talk) 09:06, 3 February 2014 (UTC)[reply]

Multiple variants of chemical formulas in organic chemistry

[edit]

Should we add chemical formulas of organic compounds as (as example butyl propionate):

  1. molecular formula (summary), e.g. C₇H₁₄O₂
  2. condensed / semi-structural formula, e.g. CH₃CH₂COO(CH₂)₃CH₃
  3. both

See: Condensed formulas in organic chemistry implying molecular geometry and structural formulas. The variant 1. does not convey much information but could be useful for searching. Should not we have qualifiers to show which variant of the chemical formula is used? --Pabouk (talk) 23:55, 12 February 2014 (UTC)[reply]

I used 2 variant when it is possible and 1 in other cases. 3 variant is redundant. If some application needs summary, it can calculate it automatically. — Ivan A. Krestinin (talk) 03:48, 13 February 2014 (UTC)[reply]
I would vote for option #1. --Leyo 23:00, 11 April 2015 (UTC)[reply]

Checking for 0 or 1

[edit]

A 0 or a 1 can never be alone in sub- or superscript (see example fix in Wikipedia). Is there a way to add such errors to Wikidata:Database reports/Constraint violations/P274#Format? --Leyo 23:13, 11 April 2015 (UTC)[reply]

mhchem

[edit]

There is a new tag on wikipedia the <ce> tag which renders the following input <ce>H2O</ce> as unsing the syntax defined by the mhchem package. Please join the discussion https://phabricator.wikimedia.org/T126862 to decide if that should become a new datatype. --Physikerwelt (talk) 17:47, 15 February 2016 (UTC)[reply]


Format Constraint

[edit]

Is there way to make the regexp constraint optional? I can't add the formula "(C12H20O29S6)n" to dextran sulfate (Q50350128)

@Gstupp: maybe it would be better to use general formula (P1673)? dextran sulfate (Q50350128) is not a molecule where the chemical formula is precisely known, but a polymer which in fact is a mixture of macromolecules. Wostr (talk) 00:08, 6 March 2018 (UTC)[reply]

REGEX has 'ballot box' character?

[edit]

At the mpoment, the REGEX string has character 'BALLOT BOX' (U+2610). (See "([α-γδφλμπω]-)?([([]*[A-Z☐][ub]?[a-z]?[₁₂₃₄₅₆₇₈₉₀]*(\)?[¹²³⁴⁵⁶⁷⁸⁹⁰]*[⁺⁻]?)?[])|,₁₂₃₄₅₆₇₈₉₀]*(·\(?[-0-9.]*n?\)?)?)+", right after the element's first, required, uppercase A-Z character. Is that OK? -DePiep (talk) 17:54, 14 October 2019 (UTC)[reply]

@DePiep: see anthophyllite (Q413322) (w:Anthophyllite) for an example.--GZWDer (talk) 16:34, 19 October 2019 (UTC)[reply]

Citation required

[edit]

@Wostr: complicated chemical formulas get redefined/revised, at least in mineralogy. Some additions are very questionable. Regards --Chris.urs-o (talk) 01:49, 24 December 2021 (UTC)[reply]