Hacker News new | past | comments | ask | show | jobs | submit | more seanharr11's comments login

I just took the index of the data point (i.e. 1, 250) as the y-axis with the intention of "Stretching out" the data set along the y-axis. Otherwise the data would otherwise be illegibly compressed on a 1-D number line.

In retrospect, I could have better represented the data with 2 overlying histograms, but this (somewhat) captures the intent of showing that "more expensive houses tend to have more than 2 bathrooms".


I agree with you that 2 histograms would be better. Check out the grouped barchart on this link

http://nvd3.org/examples/multiBar.html


I am a programmer trying to learn math, so are my intended audience members.

That said, I should include facts related to convergence, and maybe even speed compared to SGD.

As to the reciprocal -> inverse generalization, do you have any resources you could point we towards to better understand this?

Additionally, a concrete answer to "Why would following the tangent repeatedly be a good idea?" has been hard to come by for me. I am able to visualize this, but if you have resources that explain this well please share.


In general, it’s not a good idea. And in general, Newton’s method won’t converge.

Newton’s method boils down to replacing your function by a first-order approximation. For a differentiable function, in a small neighbourhood(!), that’s a good approximation (by definition), though, and the zero of the model function will be very close to the zero of the original function (if it lies in that neighbourhood).

PS: i did not expect the poster and author to be the same person, otherwise I would’ve phrased my criticism differently. A SHOW HN would have helped.

PPS: basically the whole reciprocal/inverse confusion only arises because you start the multidimensional case from your iteration formula. If you back to its derivation, and start again from there, you can avoid that.


> In general, it’s not a good idea. And in general, Newton’s method won’t converge.

Right, but this blog post isn't about the general case of using Newton's method to find roots, it's about using Newton's method for solving logistic regression for which it is perfectly suited, though there are better methods as well, of course.


Newton's method with a line search is the go-to algorithm for convex optimisation if the dimension of the problem is not too large.


Thank you. I err'ed on the side of caution to (try) to reduce information overload, but I also didn't want to assume too much prior knowledge. Delicate balance...

Maybe a linked post that dives deeper into the 5 steps of Newton's Method? Would that be more approachable?


Yeah, that would work. Honestly, when I studied logistic regression, we used gradient descent, so seeing it with Newton's method was the draw for me. I can go find the resources myself of course, but since I was reading the article I thought it would be convenient there :)


I wrote this post (first) on Gradient Descent that will be helpful if you liked the OP.

http://thelaziestprogrammer.com/sharrington/math-of-machine-...

This looks at the algorithm the same way that the OP does (mathematically, visually, programmatically), only it applies it to a simpler Linear Regression model.


Interesting. That's also been my experience so it's good to hear confirmation. It seems the be consistent with the posts I've read on how "rank" is calculated.

I wonder what the penalty is for "re-posting", especially because timing seems to be key. You post at 8:30AM and get very few looks from your target audience, and then post again at 10AM and coincidentally reach a large amount of your target audience.

I wonder where the line is drawn between "spam" and "persistence"...


For those of you experience a github outage (like myself), you can read about the software package here: http://thelaziestprogrammer.com/sharrington/databases/migrat...


Down in Boston, MA


Harold Martin held without bail (high risk of flight) accused of theft of 20 years worth of government (NSA) tools/data, Trump stating he will not concede the election, tens of millions of IoT devices used in DDOS attack, Assange (wikileaks originator) cut off from internet, DNC hacked and exposed.

A conspiracy theorists dream.


1. We (an NFL team) had an Oracle 9i database (which Oracle pulled support for in 2010) storing NFL player's Scouting information for > 20 years.

2. We needed to make the switch to a modern solution because as of June 2016, we had been running w/o support for 6 years (I got here in 2016), and the RDBMS (9i) was quite slow and not compatible with modern drivers and applications.

3. Our biggest technical hurdles were unique to the tool that we used for migration (etlalchemy, described in (4) below). i.) Finding the correct cx_Oracle (Python) driver to communicate with the correct Oracle Instant Client version, and then finding the exact version of Python libraries which supported that driver (b/c Oracle pulled support for 9i so long ago, finding the old instantclient binaries was nasty work). ii.) Handling FK constraint violations when importing data into PostgreSQL. Because you can't turn constraint checks like you can on MySQL (SET FOREIGN_KEY_CHECKS=0;), data that violates constraints must be resolved before the migration. As a solution, I added support to etlalchemy (the next bullet below) to "dump bad rows" to STDOUT, and skip the foreign key that is violated during the "constraint migration" step.

4. Regarding tools, we used https://github.com/seanharr11/etlalchemy to facilitate the migration. I created this open source tool 2 years ago, published it to github, and used it to carry out 99% of this project. The only thing that had to be migrated manually were the Oracle functions (most of which we tossed out as they were used for ColdFusion/Report Generation). The most powerful feature of this tool is its ability to migrate schema (including column types) between 2 different RDBMS's. It also loads data quite fast leveraging PostgreSQL's 'COPY FROM' bulk import, and MySQL's 'LOAD DATA INFILE'. (Constraints and Indexes are also migrated with etlalchemy out-of-the-box)

5. From a business standpoint, we are just completing the project now and are likely going to choose MySQL as our final destination. This is mainly because our database is very OLTP-oriented. Our applications perform very small SELECTS, INSERTS and UPDATES to our database at high frequency (think football scouts updating player statuses, live stats feeds from the NFL, etc...), and InnoDB (MySQL storage engine) happens to do a very good job of facilitating this type of database (high-frequency simple updates). PostgreSQL would have been a better choice if we were to run less frequent queries involving more computation (OLAP-oriented), but we don't (yet) have a huge Machine Learning/Data Mining requirement here. From experience, PostgreSQL has a huge toolkit of functions, very cool indexes, and other nice-to-haves, we just don't need them for our business case. Both choices have a much stronger community, and answers are found via StackOverflow/Google rather then reading the Oracle manual until your eyes bleed.

6. If you want to read about my experience, I have posted a quick write-up here, including some more background info on etlalchemy: thelaziestprogrammer.com/sharrington/databases/migrating-between-databases-with-etlalchemy

If you decide to give the tool a try (it is battle-tested), please let me know how it works for you. And please, if you have any questions/feature-requests/issues, reach out to me. I am trying to get some momentum going for this project, as it would solve lots of people's database migrations problems as it solved mine.


In horror I see a lot of business application stacks driven by Oracle. Probably because of some nifty set feature early on. Again and again we pay dearly for it.

A small and growing part of me wants to cut through the bulls hit and look into eliminating it wherever I can sell the idea.


My experience with my current employer is that Oracle/SQL Server/Windows are used because of the lack of trust in open source solutions, usually by management who grew up in the Stone Age of Software.

All 3 of the aforementioned slow EVERY process in the building down, especially when it comes to application development.


Did you get a chance to try out etlalchemy? Please feel free to share the tool with friends/colleagues that may benefit from it!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: