I am new to chemistry and I find it fascinating. I am trying to learn about chemical reactions and I was wondering if there was an easy way to quickly tell if any combination of chemical substances would produce a reaction and what product(s) if any might be formed.
For example, if I pick any two random substances $\ce{A}$ and $\ce{B}$, can I determine if a reaction will occur and predict the products?
$$\ce{A + B -> \ ?}$$
More specifically, let's say I just learned that chlorine bleach (sodium hypochlorite) can be made by a reaction of sodium hydroxide and chlorine with sodium chloride as a byproduct:
$$\ce{NaOH(aq) + Cl2(g) -> NaOCl(aq) + NaCl(aq)}$$
Is there a way that I could have predicted this reaction (and any other) before I learned about it? I do not want to memorize the outcome of every combination o that I can answer questions about chemical reactions. I am hoping there is a short list of simple rules that govern all chemical reactions that I can commit to memory and then apply to any combination of substances. I might also like to be able to develop a simple computer program built around an algorithm for these reactivity rules that can sample databases of known substances and predict new reactions.
In theory, yes!
Every substance has characteristic reactivity behavior. Likewise pairs and sets of substances have characteristic behavior. For example, the following combinations of substances only have one likely outcome each:
$$ \ce{HCl + NaOH -> NaCl + H2O} \\[2ex] \ce{CH3CH2CH2OH->[$1.$ (COCl)2, (CH3)2SO][$2.$ Et3N] CH3CH2CHO} $$
However, it is not a problem suited to brute force or exhaustive approaches
There are millions or perhaps billions of known or possible substances. Let's take the lower estimate of 1 million substances. There are $999\,999\,000\,000$ possible pairwise combinations. Any brute force method (in other words a database that has an answer for all possible combinations) would be large and potentially resource prohibitive. Likewise you would not want to memorize the nearly 1 trillion combinations.
If more substances are given, the combination space gets bigger. In the second example reaction above, there are four substances combined: $\ce{CH3CH2CH2OH}$, $\ce{(COCl)2}$, $\ce{(CH3)2SO}$, and $\ce{Et3N}$. Pulling four substances at random from the substance space generates a reaction space on the order of $1\times 10^{24}$ possible combinations. And that does not factor in order of addition. In the second reaction above, there is an implied order of addition:
- $\ce{CH3CH2CH2OH}$
- $\ce{(COCl)2}$, $\ce{(CH3)2SO}$
- $\ce{Et3N}$
However, there are $4!=24$ different orders of addition for four substances, some of which might not generate the same result. Our reaction space is up to $24\times 10^{24}$, a bewildering number of combinations. And this space does not include other variables, like time, temperature, irradiation, agitation, concentration, pressure, control of environment, etc. If each reaction in the space could somehow be stored for as little as 100 kB of memory, then the whole space of combinations up to 4 substances would require $2.4 \times 10^{27}$ bytes of data, or $2.4\times 10^7$ ZB (zettabytes) or $2.4\times 10^4$ trillion terabytes. The total digital data generated by the human species was estimated recently (Nov. 2015) to be 4.4 ZB. We need $5.5\times 10^5$ times more data in the world to hold such a database. And that does not even count the program written to search it or the humans needed to populate it, the bandwidth required to access it, or the time investment of any of these steps.
In practice, it can be manageable!
Even though the reaction space is bewilderingly huge, chemistry is an orderly predictable business. Folks in the natural product total synthesis world do not resort to random combinations and alchemical mumbo jumbo. They can predict with some certainty what type of reactions do what to which substances and then act on that prediction.
When we learn chemistry, we are taught to recognize if a molecule belongs to a certain class with characteristic behavior. In the first example above, we can identify $\ce{HCl}$ as an acid and $\ce{NaOH}$ as a base, and then predict an outcome that is common to all acid-base reactions. In the second example above, we are taught to recognize $\ce{CH3CH2CH2OH}$ as a primary alcohol and the reagents given as an oxidant. The outcome is an aldehyde.
These examples are simple ones in which the molecules easily fit into one class predominantly. More complex molecules may belong to many categories. Organic chemistry calls these categories “Functional Groups”. The ability to predict synthetic outcomes then begins and ends with identifying functional groups within a compound's structure. For example, even though the following compound has a more complex structure, it contains a primary alcohol, which will be oxidized to an aldehyde using the same reagents presented above. We can also be reasonably confident that no unpleasant side reactions will occur.
If the reagents in the previous reaction had been $\ce{LiAlH4}$ followed by $\ce{H3O+}$, then more than one outcome is possible since more than one functional group in the starting compound will react. Controlling the reaction to give one of the possible outcomes is possible, but requires further careful thought.
There are rules, but they are not few in number. There are too many classes of compounds to list here. Likewise even one class, like primary alcohols (an hydroxyl group at the end of a hydrocarbon chain) has too many characteristic reactions to list here. If there are 30 classes of compounds (an underestimate) and 30 types of reactions (an underestimate), then there are 900 reaction types (an underestimate). The number of viable reaction types is more manageable than the total reaction space, but would still be difficult to commit to memory quickly. And new reaction types are being discovered all the time.
Folks who learn how to analyze combinations of compounds spend years taking courses and reading books and research articles to accumulate the knowledge and wisdom necessary. It can be done. Computer programs can be (and have been) designed to do the same analysis, but they were designed by people who learned all of the characteristic combinations. There is no shortcut.