Objective
Create a function to extract all individual words from a given camel-cased string.
Difficulty
Description
Consider the string 'helloWorld'
. As we know, it is in camel case.
Camel case is a casing convention of text whereby given words are joined next to each other, without any space in between, with the first letter of every word uppercased, except for the first one.
So if we have the two words hello
and world
, then representing these together in camel case would give us the word helloWorld
.
HTMLElement
obtained from the words HTML
and Element
is an example of using Pascal case, not camel case. Hence, we don't consider such words here that begin with an uppercase character.In this exercise, you have to create a function getWords()
that takes in a camel-cased string and returns back an array containing all the individual words in the string.
If the given string is empty (''
), you MUST return back an empty array.
Note that you must not worry about the grammar of the given string, i.e. whether or not it's a valid camel-cased string. You should assume that it's a well-formed camel-cased string.
For example, 'to There'
is not a well-formed camel-cased string, but thanks to our assumption, we can be rest assured that we won't be dealing with such strings in our function.
In addition to this, you should also assume that there are no digits in the given string. This assumption largely simplifies the underlying logic.
Shown below are a couple of examples of the usage of the function:
getWords('helloWorld')
['hello', 'World']
getWords('innerHTML')
['inner', 'HTML']
getWords('lastHTMLElement')
['last', 'HTML', 'Element']
getWords('insertAdjacentElement')
['insert', 'Adjacent', 'Element']
getWords('insertAdjacentHTML')
['insert', 'Adjacent', 'HTML']
getWords('')
[]
Note that there is are two special cases to deal with as shown above:
- If a sequence of characters is in upper case, then that constitutes as a single word. For instance,
innerHTML
is comprised of the two wordsinner
andHTML
, where the latter is a sequence of uppercase words and hence, constitutes a single word. - There is an exception to this rule and that's when a sequence of uppercase characters is followed by a lowercase character. In this case, the sequence, excluding the last character, constitutes a single word. For instance,
'lastHTMLElement'
is comprised of the words'last'
,'HTML'
, and'Element'
. Here the sequenceHTMLE
is followed byl
and hence it constitutes the word'HTML'
.
Once again, note that in none of the strings passed to getWords()
above do we have the first letter in uppercase — the first character is always in lowercase, as per the camel case convention.
New file
Inside the directory you created for this course on JavaScript, create a new folder called Exercise-17-Camel-Words and put the .html solution files for this exercise within it.
Solution
So where should we start from?
Well, first let's set up the basic wireframe of the function and then reason about its implementation.
This is done below:
function getWords(str) {
}
Actually, we could define more here, before we start writing the real code. That is, if str
is ''
, we could immediately return []
:
function getWords(str) {
if (str === '') {
return [];
}
}
So far, so good.
Let's now get to real deal — thinking about how to extract all of the words from str
. Well, it's really not that difficult to accomplish.
Here's what we could do:
- Iterate over each character of
str
, while keeping an index-tracking variable at hand — let's call itindex
, initially set to0
— to help us slice a given substring if we are sure that it represents a single word. - In the iteration, we check whether the current character is uppercase.
- If it is one, this means that the preceding substring is a word.
- This preceding substring starts at index
index
(inclusive) and ends at the index of the current character (exclusive). - To extract this word, we use the string
slice()
method. - In the end, we call
slice()
once more to slice the string fromindex
upto its very end.
And this is just the algorithm that we need.
Let's take the help of an example to understand this better.
Suppose that str
is 'firstChild'
. Moreover, index = 0
.
Iteration begins from index 0
in 'firstChild'
.
f
is not uppercase, i
is not uppercase, r
is again not uppercase, s
is not uppercase, and t
is not uppercase as well. However, C
is an uppercase character and so the preceding substring is a word.
str.slice(index, 5)
, where 5
is the index of the current character C
, helps us extract this word. This gives 'first'
.
The variable index
is updated to 5
, which is precisely where the next word should begin.
Moving on, h
is lowercase, i
is lowercase, so is l
, and so is d
. Iteration over str
completes, however we haven't yet extracted the second word, i.e Child
. To do this, str.slice(index)
is called after the iteration ends, which extracts a word from index index
upto the very end of str
. This gives 'Child'
.
Simple?
In the code below, we implement this algorithm in getWords()
, adding each extracted word onto an array words
which ought to be returned in the end by the function:
function getWords(str) {
if (str === '') {
return [];
}
var words = [];
var index = 0;
for (var i = 0, len = str.length; i < len; i++) {
if ('A' <= str[i] && str[i] <= 'Z') {
words.push(str.slice(index, i));
index = i;
}
}
words.push(str.slice(index));
return words;
}
Now the question is: can we be sure that this code works correctly for every input as shown in the exercise's description above?
Well, the best way is to just go and check it:
getWords('helloWorld')
['hello', 'World']
getWords('innerHTML')
['inner', 'H', 'T', 'M', 'L']
Oops! The second call here, i.e. getWords('innerHTML')
, produces the wrong result. It considers HTML
to be four words, whereas it's just one single word.
Clearly, something has to be changed/added to our getWords()
function.
But what?
Well, let's walk through the algorithm with str
set to 'innerHTML'
. This will help us see exactly how our current approach leads to the wrong output for 'innerHTML'
and then think about how to solve it.
Suppose that str
is 'innerHTML'
, and that index = 0
.
Iteration begins from index 0
in 'innerHTML'
.
i
is not uppercase, n
is not uppercase, n
is again not uppercase, e
is not uppercase, and r
is not uppercase. However, H
is an uppercase character and so the preceding substring is a word.
str.slice(index, 5)
is invoked, giving us the first word 'inner'
. Thereby, index
is updated to 5
.
Moving on, T
is also an uppercase character and so the preceding substring is again a word. str.slice(index, 6)
is invoked (where index = 5
), giving us the second word 'H'
. index
is updated to 6
.
Going furhter, M
is again an uppercase character and so the preceding substring is again a word. str.slice(index, 7)
is invoked (where index = 6
), giving us the third word 'T'
. index
is updated to 7
.
Finally, L
is yet another uppercase character, and so str.slice(index, 8)
is invoked (where index = 7
), giving us the fourth word 'M'
. index
is updated to 8
.
Iteration ends, and so the last word extraction is made via str.slice(index)
. This gives us 'L'
.
The problem is pretty apparent. Whenever we encounter an uppercase character, we just perform the preceding word's extraction without even checking if the uppercase character is preceded by another uppercase character.
The question remains: how to solve this?
Well, a pretty straightforward way is to check the preceding character each time an uppercase character is encountered.
- If the preceding character is NOT uppercase, i.e. it's lowercase, this means that it's part of that preceding word, and so we should extract that word.
- Otherwise, we shouldn't do anything.
Simple?
Adding this notion into our current approach, we get the following code:
function getWords(str) {
if (str === '') {
return [];
}
var words = [];
var index = 0;
for (var i = 0, len = str.length; i < len; i++) {
if ('A' <= str[i] && str[i] <= 'Z'
&& 'a' <= str[i - 1] && str[i - 1] <= 'z') {
words.push(str.slice(index, i));
index = i;
}
}
words.push(str.slice(index));
return words;
}
To our surprise, even this code misses to address an edge case.
Let's see if you could figure out that edge case. Hint: it's already there in the console snippet in the exercise's description.
The problem with the approach above is that it only checks the preceding character when an uppercase character is encountered, not the next character.
Consider the third statement below:
getWords('helloWorld')
['hello', 'World']
getWords('innerHTML')
['inner', 'HTML']
getWords('lastHTMLElement')
['last', 'HTMLElement']
As stated in the description of this exercise, if a sequence of uppercase characters is followed by a lowercase character, then the sequence, excluding the last character, constitutes a word.
Hence, the sequence of uppercase characters HTMLE
in 'lastHTMLElement'
constitutes the word 'HTML'
, not 'HTMLE
', as it's followed by l
.
So how to address this case?
Well, we just need to add another check to our existing if
conditional. With this check added, the if
statement would be read as follows:
The text in bold here represents the new condition while the parameters represent a group expression.
Converting this idea into the glyphs of code, we get the following:
function getWords(str) {
if (str === '') {
return [];
}
var words = [];
var index = 0;
for (var i = 0, len = str.length; i < len; i++) {
if ('A' <= str[i] && str[i] <= 'Z'
&& ('a' <= str[i - 1] && str[i - 1] <= 'z'
|| 'a' <= str[i + 1] && str[i + 1] <= 'z'))
{
words.push(str.slice(index, i));
index = i;
}
}
words.push(str.slice(index));
return words;
}
{
) of the if
statement above on a new line.And now, the function getWords()
works just as desired, without leaving off any edge cases.
Below we try it out on a couple of strings:
getWords('helloWorld')
['hello', 'World']
getWords('innerHTML')
['inner', 'HTML']
getWords('lastHTMLElement')
['last', 'HTML', 'Element']
getWords('insertAdjacentElement')
['insert', 'Adjacent', 'Element']
getWords('insertAdjacentHTML')
['insert', 'Adjacent', 'HTML']
getWords('')
[]
Superb!